CN110349586B

CN110349586B - Telecommunication fraud detection method and device

Info

Publication number: CN110349586B
Application number: CN201910667382.6A
Authority: CN
Inventors: 吴桐; 郑康锋; 武斌; 伍淳华; 刘羽飞
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2019-07-23
Filing date: 2019-07-23
Publication date: 2022-05-13
Anticipated expiration: 2039-07-23
Also published as: CN110349586A

Abstract

The invention provides a telecommunication fraud detection method and a telecommunication fraud detection device, wherein the method comprises the following steps: in the telecommunication communication process of a user, obtaining the communication voice of the user; extracting tone-related acoustic features of the call voice of the user; generating input data according to the extracted acoustic features related to the tone; and inputting the generated input data into a telecom fraud classifier obtained through pre-training for classification, so as to obtain a telecom fraud detection result. By the scheme, the real-time performance of telecommunication fraud detection can be improved.

Description

Telecommunication fraud detection method and device

Technical Field

The invention relates to the technical field of network completion, in particular to a telecommunication fraud detection method and device.

Background

At present, the research of the universal detection technology of social engineering is mainly divided into two aspects: human-based detection and technology-based detection. The human-centered detection mechanism aims at improving the self-protection consciousness of the subject and providing a model capable of enabling the subject to perform self-detection, which has certain significance on safety education, but the detection method has low reliability and has larger effect difference due to different users. The research of the detection method based on the technology aims to provide a detection mechanism which can be realized by coding, detects the attack process aiming at the relevant characteristics of the social engineering attack, but the research is scattered and not systematic.

Telecommunication fraud is a typical form of social engineering attack, and on the basis of interactive conversation, through media such as telephone and short message, the purposes of cheating the attacker to obtain money and obtaining personal information are achieved. Detection techniques for telecommunications fraud can also be divided into two categories: one is from the subscriber Call Detail Record (CDR) by extracting call related features such as: the number of calls, duration, geographical distribution and the like are used for identifying whether the number is a call number of telecommunication fraud; the other type is sent from the bank end, and whether the remittance mode of the bank account is abnormal is detected, so that whether the account is subjected to fraud is judged.

These telecommunication fraud detection methods, although capable of identifying telecommunication fraud to some extent, still have some problems, such as: low real-time performance, low application value and the like. Therefore, the purpose of detecting complex telecommunication fraud has not been achieved, nor has a protective barrier been established for potential attacked users.

Disclosure of Invention

In view of this, the present invention provides a method and an apparatus for detecting telecommunication fraud, so as to improve the real-time performance of telecommunication fraud detection.

In order to achieve the purpose, the invention is realized by adopting the following scheme:

according to an aspect of an embodiment of the present invention, there is provided a telecommunication fraud detection method, including: in the telecommunication communication process of a user, obtaining the communication voice of the user; extracting tone-related acoustic features of the call voice of the user; generating input data according to the extracted acoustic features related to the tone; and inputting the generated input data into a telecom fraud classifier obtained through pre-training for classification, so as to obtain a telecom fraud detection result.

In some embodiments, generating input data from the extracted mood-related acoustic features comprises: converting the extracted voice related acoustic features into first feature vectors according to the voice acoustic feature list; and performing dimensionality reduction on the first feature vector to obtain a second feature vector as input data.

In some embodiments, the telecommunication fraud detection method further comprises: and outputting a prompt message according to the telecom fraud detection result.

In some embodiments, before inputting the generated input data to a pre-trained telecom fraud classifier for classification, the telecom fraud detection method further includes: training an initial classifier model to obtain the telecom fraud classifier, wherein the initial classifier model is a hidden Markov model or a support vector machine model.

In some embodiments, performing a dimension reduction process on the first feature vector to obtain a second feature vector includes: and performing dimensionality reduction on the first feature vector by using a principal component analysis method or a linear discriminant analysis method to obtain a second feature vector.

In some embodiments, before the user's call voice is acquired during the user's telecommunication call, the method for detecting telecommunication fraud further includes: and acquiring the user authorization for acquiring the call voice on the platform where the user telecommunication call is positioned.

In some embodiments, the list of speech acoustic features comprises: the method comprises the steps of performing prosodic feature, construction feature, personalized voice acoustic feature and non-personalized voice emotional feature; the prosodic features include: fundamental frequency related features, energy related features and time-length related features; the construction features include: time structure, amplitude structure, fundamental frequency structure, formant structure and MFCC coefficient; the personalized speech acoustic features include: time structure, amplitude structure, fundamental frequency structure, formant structure, MFCC coefficient and Mel spectrum energy dynamic coefficient; the non-personalized speech emotion characteristics comprise: time structure, amplitude structure, fundamental frequency structure, formant structure, and MFCC coefficients.

According to another aspect of the embodiments of the present invention, there is also provided a telecommunication fraud detection apparatus, including:

the call voice acquisition unit is used for acquiring the call voice of the user in the telecommunication call process of the user;

the acoustic feature extraction unit is used for extracting tone-related acoustic features of the call voice of the user;

an input data generation unit for generating input data according to the extracted speech related acoustic features;

and the detection result generation unit is used for inputting the generated input data into a telecom fraud classifier obtained by training in advance for classification to obtain a telecom fraud detection result.

According to yet another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of the above embodiments when executing the program.

According to still another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium on which a computer program is stored, which when executed by a processor, implements the steps of the method of the above-described embodiments.

The telecommunication fraud detection method, the telecommunication fraud detection device, the electronic equipment and the computer readable storage medium can identify whether the telecommunication call is the telecommunication fraud or not in the telecommunication call process of the user, improve the real-time performance of the telecommunication fraud detection, thereby improving the value of the telecommunication fraud detection and avoiding the problem of hysteresis existing in the telecommunication fraud after one or more telecommunication calls or loss.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:

FIG. 1 is a flow chart of a telecommunication fraud detection method of an embodiment of the present invention;

FIG. 2 is a block diagram of a telecommunication fraud detection method of a specific embodiment of the present invention;

FIG. 3 is a block diagram of a method of speech detection according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a telecommunication fraud detection apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

Through a user terminal device (e.g., a mobile phone, a fixed phone, a tablet computer, a personal computer, etc.), a user can make a voice call with another terminal. For example, through a telecommunication platform (e.g., china mobile, china unicom), a user can receive an incoming call request, and after the incoming call is connected, the user can perform a voice call with another terminal; through a social software platform (e.g., WeChat), a user may receive a stranger plus a friend, and then the stranger may initiate a voice call request, and after the voice call is connected, the user may make a voice call with the stranger.

During a voice conversation with other terminal bodies by a user through his terminal device, the user may become under attack of telecommunication fraud. The existing telecommunication fraud detection technology is mainly used for identifying whether the previous voice call belongs to telecommunication fraud calls after the user call is ended, multiple calls are carried out or property loss is suffered. Therefore, the prior art cannot identify telecommunication fraud during the voice communication process and prevent users from being subjected to fraud or property loss in time.

In order to solve the above problems, the present invention provides a telecommunication fraud detection method to realize real-time detection of telecommunication fraud and improve the value of telecommunication fraud detection. The following will specifically explain embodiments of the present invention.

FIG. 1 is a flow chart of a telecommunication fraud detection method according to an embodiment of the present invention, as shown in FIG. 1, the telecommunication fraud detection method of some embodiments may include the following steps S110 to S140.

Step S110: and acquiring the call voice of the user in the telecommunication call process of the user.

The device on which the user telecommunication call is based may be a user terminal device, e.g. a mobile phone, a fixed telephone, a tablet computer, a personal computer, etc. The platform on which the user telecommunication call is based may be telecommunications (e.g., china mobile, china unicom), social software (e.g., wechat), etc.

Here, "user" mainly refers to a user at the end of the user terminal device. In the process of the telecommunication call of the user, the call voice of the user is acquired, specifically, the action of acquiring the call voice can be triggered when the user terminal equipment is detected to be in a voice call state, and the fact that a real person appears as a 'user' is not limited when the step is executed.

In a specific embodiment, after user authorization, in the telecommunication call process of the user, the recording plug-in is loaded to obtain the call voice of the user, and the attribution of the voice is judged by judging whether the voice is input or played.

In a specific embodiment, when a telecommunication call of a user is detected, the call voice of the two parties is obtained first, and then the call voice of the user is extracted from the call voice of the two parties; or, when the telecommunication call of the user is detected, the call voice of the user can be directly detected and acquired.

Since a user terminal is usually owned by the same or a small number of users, the user or the small number of users often use the user terminal to perform voice calls, and therefore, during the historical voice calls, the voice characteristics or voiceprint characteristics of one or more users who have performed voice calls through the terminal can be recorded. According to the recorded voice characteristics or voiceprint characteristics, the call voice of the user can be distinguished from the call voice of the two parties.

If the communication voice of the user exists in the communication voice exchanged between the two parties, the required voice data can be obtained, and then the steps after the step S110 are executed; if there is no sound or no communication voice of the user in the communication voice of the two parties, the required voice data cannot be obtained, at this time, the communication voice of the two parties can be continuously obtained, and whether the voice data of the user is included therein is continuously judged, until the communication voice of the user is obtained, the steps after the step S110 are not executed. The call voice may be in units of one voice segment of each party, for example, when the user speaks a voice segment a, and then another call body speaks a voice segment B, the voice segment a may be regarded as a unit voice.

In some embodiments, the user's speech is obtained after the user's authorization is obtained. For example, before the step S110, the method shown in fig. 1 may further include the steps of: s150, obtaining the user authorization of obtaining the call voice on the platform where the user telecommunication call is located. For example, before preparing to obtain the user's call voice, the user may be prompted to authorize the user terminal authorization or the social software side authorization to allow obtaining his call voice during his voice call. Therefore, the personal privacy of the user can be better respected.

Step S120: and extracting tone-related acoustic features of the call voice of the user.

One or more mood-related acoustic features may be extracted from data of a user's speech of a call. The mood-related acoustic features may include one or more of the following features: pitch frequency, pitch frequency mean, pitch frequency variation range, pitch frequency variation rate, pitch frequency mean square error, pitch frequency standard deviation, short-term average energy variation rate, short-term energy average amplitude, energy amplitude average variation rate, short-term maximum energy amplitude, speech rate, short-term average zero-crossing rate, silence part time ratio, pitch frequency maximum (maximum of pitch contour), pitch frequency average variation rate, pitch frequency 1/3 quantile, pitch frequency 1/4 quantile, 1/3 quantile of pitch frequency variation (1/3 quantile of pitch variation, 1/3 quantile of pitch frequency variation rate), 1/4 quantile of pitch frequency variation (1/4 quantile of pitch variation, 1/4 quantile of pitch frequency variation rate), 3/4 quantile of pitch frequency change rate, 2/3 quantile of pitch frequency change rate, 3/4 quantile of pitch frequency, 2/3 quantile of pitch frequency, maximum value of first formant frequency, average value of first formant frequency, dynamic change range of first formant frequency, average change rate of first formant frequency, mean square error of first formant frequency, 1/3 quantile of first formant frequency, 1/4 quantile of first formant frequency, 3/4 quantile of first formant frequency, 2/3 quantile of first formant frequency, 1/3 quantile of average change rate of first formant frequency, 1/4 quantile of average change rate of first formant frequency, 3/4 quantile of average change rate of first formant frequency, 3/4 quantile of pitch frequency, and the like, 2/3 quantile of average rate of change of first formant frequency, maximum value of second formant frequency, average value of second formant frequency, dynamic range of change of second formant frequency, average rate of change of second formant frequency, mean square error of second formant frequency, 1/3 quantile of second formant frequency, 1/4 quantile of second formant frequency, 3/4 quantile of second formant frequency, 2/3 quantile of second formant frequency, 1/3 quantile of average rate of change of second formant frequency, 1/4 quantile of average rate of change of second formant frequency, 3/4 quantile of average rate of change of second formant frequency, 2/3 quantile of average rate of change of second formant frequency, maximum value of third formant frequency, maximum value of second formant frequency, average value of second formant frequency, and the like, An average value of the third formant frequency, a dynamic variation range of the third formant frequency, an average variation rate of the third formant frequency, a mean square error of the third formant frequency, 1/3 quantile of the third formant frequency, 1/4 quantile of the third formant frequency, 3/4 quantile of the third formant frequency, 2/3 quantile of the third formant frequency, 1/3 quantiles of the average rate of change of the third formant frequency, 1/4 quantile of the average rate of change of the third formant frequency, 3/4 quantiles of the average rate of change of the third formant frequency, 2/3 quantiles of the average rate of change of the third formant frequency, MFCC coefficient of 12 th order, first-order difference MFCC coefficient (Mel-frequency cepstral coefficients, Mel frequency cepstral co-cepstral, second-order cepstral cepstra. The user's speech for conversation may be classified or its category evaluated according to different mood-related acoustic characteristics.

Step S130: generating input data according to the extracted acoustic features related to the tone;

in some embodiments, the extracted mood-related acoustic features may be directly used as input data for a telecom fraud classifier; in other embodiments, the extracted tone-related acoustic features may be converted into feature vectors and then used as input data for the telecom fraud classifier.

Illustratively, the step S130 may specifically include the steps of: s131, converting the extracted voice related acoustic features into first feature vectors according to the voice acoustic feature list; and S132, performing dimensionality reduction on the first feature vector to obtain a second feature vector as input data.

In the above step S131, for example, falseSetting extracted feature x_jIs n, the dimension of the first feature vector is n, wherein j is more than or equal to 1 and less than or equal to n, the first feature vector can be represented as X_i＝[x₁,x₂,...,x_n]。

In the step S131, in the speech acoustic feature list, all the tone-related acoustic features may be classified, and each classification may correspond to a number, and accordingly, the extracted various tone-related acoustic features may be correspondingly converted into corresponding numbers, and the numbers obtained through conversion may obtain feature vectors corresponding to the extracted tone-related acoustic features, that is, the first feature vectors. In some embodiments, if the dimension of the first feature vector is not too large, the first feature vector may be directly used as input data of the telecom fraud classifier.

Wherein the list of speech acoustic features may include: prosodic features, structural features, personalized speech acoustic features, and non-personalized speech emotional features. The prosodic features may include: fundamental frequency related features, energy related features and long-term related features. The construction features may include: time structure, amplitude structure, fundamental frequency structure, formant structure, and MFCC coefficients. The personalized speech acoustic features may include: time structure, amplitude structure, fundamental frequency structure, formant structure, MFCC coefficient and Mel frequency spectrum energy dynamic coefficient. The non-personalized speech emotion characteristics can comprise: time structure, amplitude structure, fundamental frequency structure, formant structure, and MFCC coefficients.

In the above-mentioned speech-related acoustic features, the pitch frequency mean, the pitch frequency variation range, the pitch frequency variation rate, and the pitch frequency mean square error may belong to a fundamental frequency-related feature in a prosodic feature class; the short-term average energy, the short-term energy rate of change, the short-term average energy amplitude, the energy amplitude rate of change, and the short-term maximum energy amplitude may belong to energy-related features in the prosodic feature class; the speed of speech and the short-term average zero-crossing rate can belong to duration-related features in the prosodic feature category; the short-time average zero-crossing rate, the unvoiced portion time ratio, may belong to a temporal structure in a structural feature class; the short-term average energy, short-term energy rate of change, short-term average energy amplitude, energy amplitude average rate of change, short-term maximum energy amplitude may belong to an amplitude structure in a class of structure features; the maximum of the pitch trajectory curve, the mean of the fundamental of the entire curve, the mean rate of change of the fundamental, the mean variance of the fundamental, the pitch frequency at the 1/3 quantile, the 1/4 quantile, and the 1/3 quantile, the 1/4 quantile of the pitch change can belong to the fundamental construction in the category of construction features; the maximum value, the average value, the dynamic change range, the average change rate, the mean square error, the 1/3 quantile point and the 1/4 quantile point of the three formant frequencies can belong to formant structures in the structure characteristic category; the MFCC coefficients of order 12, first order difference MFCC coefficients, second order difference MCFF coefficients may belong to MFCC coefficients in the category of the structural feature; the short-term average zero-crossing rate may belong to a temporal construct in a personalized speech acoustic feature category; the short-term average energy, short-term average energy amplitude, and short-term maximum energy amplitude may belong to an amplitude construct in a personalized speech acoustic feature category; the maximum of the fundamental frequency trajectory curve, the fundamental frequency average of the entire curve, the variation range, and the 1/4 quanta, 3/4 quanta, 1/3 quanta, and 2/3 quanta of the pitch frequency may belong to the fundamental frequency configuration in the personalized speech acoustics feature category; a first formant frequency, a second formant frequency, a third formant frequency, a maximum of the three formant frequencies, an average, a dynamic range of variation, an 1/4 quantile, a 3/4 quantile, a 1/3 quantile, and a 2/3 quantile may belong to formant configurations in the personalized speech acoustics category; the MFCC coefficients of order 12 may belong to MFCC coefficients in the personalized speech acoustic feature class; the spectral energy dynamics over 12 equally spaced bands may belong to Mel spectral energy dynamics in the personalized speech acoustic feature class; the ratio of unvoiced portion time to voiced portion time may belong to a temporal construct in the category of non-personalized speech emotion features; the short-time energy average change rate and the energy amplitude average change rate can belong to amplitude structures in the non-personalized speech emotion characteristic category; the average change rate and the standard deviation of the fundamental frequency, and 1/4 quantiles, 3/4 quantiles, 1/3 quantiles and 2/3 quantiles of the change rate of the fundamental frequency can belong to fundamental frequency structures in the non-personalized speech emotion characteristic category; the 1/4 quantile, 3/4 quantile, 1/3 quantile and 2/3 quantile of the average rate of change of the first, second and third formant frequencies may belong to formant configurations in the category of non-personalized speech emotion characteristics; the first order difference MFCC coefficients of 12 th order may belong to MFCC coefficients in the non-personalized speech emotion feature class.

In step S132, since there may be correlation between different features, and the dimension of the feature vector is higher due to the feature with larger correlation, dimension reduction may be performed according to the correlation between different components in the first feature vector, or the correlation between different speech related acoustic features, so as to reduce the correlation between different features, thereby reducing the dimension of the feature vector. Therefore, the requirement of the high-dimensional feature vector on the processor can be reduced, overfitting during classifier training is prevented, the telecommunication fraud detection speed is increased, and the detection instantaneity is further improved.

The dimension reduction processing may be performed on the first feature vector by using an existing dimension reduction method. Illustratively, the step S132 may, more specifically, include: and performing dimensionality reduction on the first feature vector by using a Principal Component Analysis (PCA) method or a Linear Discriminant Analysis (LDA) method to obtain a second feature vector. In other embodiments, the dimension reduction processing may be performed on the first feature vector by using other methods.

Step S140: and inputting the generated input data into a telecom fraud classifier obtained through pre-training for classification, so as to obtain a telecom fraud detection result.

The telecom fraud detection result can be the probability that the voice call is telecom fraud. The probability of telecommunication fraud can be output by a telecommunication fraud classifier; or, the telecom fraud classifier outputs the probability that the call belongs to the acoustic features related to various categories of voice, and then calculates the probability that the call is telecom fraud according to certain judgment rules (the probability that the voice call corresponding to which category of voice features is telecom fraud can be determined according to experience) according to the probabilities.

The telecom fraud classifier may be obtained by training user-related voice data. For example, before the step S140, the method shown in fig. 1 may further include the steps of: s160, the initial classifier model is trained to obtain the telecom fraud classifier.

Specifically, the user voice data may be collected in advance, the user voice acoustic features are extracted according to step S130, a training sample set is established, and the training sample set is input into the initial classifier model for training, so as to obtain the telecom fraud classifier. The training sample set can comprise two parts, namely data and a label, wherein the data is a user voice acoustic feature sample, the label can be a label of whether the sample corresponds to a telecom fraud sample, the label of the telecom fraud sample can be 1, and otherwise, the label of the telecom fraud sample can be 0.

The initial classifier model may be a Hidden Markov Model (HMM) or a support vector machine model (SVM). In other embodiments, the above-described telecom fraud classifier may be trained based on other classifier models. Of course, no matter which classifier model is trained to obtain the above telecommunication fraud classifier, the training samples used for training to obtain the telecommunication fraud classifier and the above input data should generally conform to the same standard or form, for example, both feature data themselves or both feature vectors converted according to the same standard (for example, converted according to the above voice acoustic feature list). For another example, if the training samples are reduced-dimension feature vectors, then the input data should also be reduced-dimension feature vectors. In addition, the speech of the samples used to train the classifier can be obtained by a number of experiments on the subject. The training samples may include speech features and corresponding belonged categories, or feature vectors converted from the speech features according to the above-mentioned speech acoustic feature list. Specifically, for example, a telecom call voice data set can be acquired, and telecom fraud related voices are marked out, so as to be used as an input of a classifier model, and a telecom fraud detection model is trained.

Through the telecommunication fraud detection method of each embodiment, whether the telecommunication call is the telecommunication fraud or not can be identified in the telecommunication call process of the user, the real-time performance of the telecommunication fraud detection is improved, the value of the telecommunication fraud detection can be improved, and the problem of hysteresis existing in the telecommunication fraud identification after one or more telecommunication calls or loss is avoided.

In a further embodiment, the telecommunication fraud detection method of the above embodiments may further comprise making a corresponding action on telecommunication fraud. Exemplarily, the telecommunication fraud detection method shown in fig. 1 may further include the steps of: s170, outputting a prompt message according to the telecom fraud detection result. In the above step S170, the prompting message may be in various forms, for example, a short message, an audible and visual alarm, and the like. In a specific embodiment, different prompt messages can be sent according to the probability or degree of the current voice call for the telecommunication fraud, which is displayed by the telecommunication fraud detection result, for example, in the case of a small telecommunication fraud probability, a prompt of "please keep track of the risk of transferring money, revealing a password, and the like" can be given, and in the case of a large telecommunication fraud probability, a prompt of "do not make a transfer money, reveal a password, and the like" can be given.

For another example, the telecommunication fraud detection method shown in fig. 1 may further include the steps of: s180, cutting off the telecommunication communication of the user under the condition that the telecommunication fraud probability obtained according to the telecommunication fraud detection result is greater than a set value. In the above step S180, in case that the telecom fraud detection result is a probability, the set value may be a set probability value, for example, 80%. And when the telecommunication fraud probability is greater than the set value, the telecommunication conversation of the user can be cut off so as to play a role in guaranteeing the safety of the user. Preferably, the step of disconnecting the user's telecommunication session has been authorized by the user.

In the embodiments, when the telecommunication call of the user is detected to be a telecommunication fraud call, the response is made in time, so that the user can be prevented from suffering property loss and the like under the conditions of unconsciousness, no perception or irrationality.

In order that those skilled in the art will better understand the present invention, embodiments of the present invention will be described below with reference to specific examples.

The telecommunication fraud detection method based on the body tone characteristics provides a detection method for identifying the voice of an attacker in the telecommunication fraud process and extracting the body tone characteristics to judge whether the body suffers telecommunication fraud. The method can be combined with a large number of literature summaries and case analysis, and the voice and rhythm of the speech of an attacker are guided by the attack process in the telecommunication fraud process to generate regular and characteristic changes, and the changes can be reflected in some voice acoustic characteristics. The method summarizes the tone characteristics of the main body, trains the telecommunication fraud classification model, analyzes the tone characteristics of the main body in real time in the voice communication process, detects the telecommunication fraud and provides real-time protection for an attacker.

FIG. 2 is a block diagram of a telecommunication fraud detection method of a specific embodiment of the present invention. Referring to FIG. 2, in some embodiments, the telecommunication fraud detection method is divided into three parts: (1) acquiring voice; (2) voice detection and (3) response. In the voice acquisition part, the call voice of the user is acquired from the call process of the telephone, and the communication process of the user to be detected and the opposite side is monitored; in the voice detection part, training a detection model in advance, converting and detecting the voice acquired by the voice acquisition module in the implementation process, converting the voice into a feature vector according to a feature list and inputting the feature vector into a classification model to obtain a detection result; and in the response part, the detection result is analyzed, and the user is reminded and warned according to the relevant information of the user, so that the risk of loss of the user is reduced.

(1) Voice acquisition:

and obtaining the self conversation voice of the user in the communication process between the user to be detected and the opposite side by using user authorization, social platform authorization and other modes. The scheme is not limited by a communication platform, and communication voice of communication modes such as telecommunication communication, social software network communication and the like can be obtained after authorization is obtained.

(2) Voice detection:

the method comprises four sub-specific parts of feature extraction, feature dimension reduction, model training and detection classification, and is shown in FIG. 3. The following sub-modules describe the specific working principle of the part:

a feature extraction module: the original call data is input into the module, and feature extraction can be carried out through modes of spectral analysis, derivation and the like, and relevant acoustic features of the original call data are extracted to represent tone and rhythm characteristics in the voice. The list of extractable speech acoustic features is numerous and can be represented according to different classification methods, as shown in table 1.

TABLE 1 mood-related Acoustic features and Classification

And selecting related acoustic features for extraction according to specific implementation requirements.

A feature dimension reduction module: the feature vector extracted by the direct feature has high dimensionality and high correlation of different features, when the data dimensionality is far larger than the number of samples, overfitting is easily caused, and the feature vector with high dimensionality has high requirements on a processor and is difficult to detect and analyze in real time in the call process, so a certain feature dimensionality reduction method needs to be adopted, such as a Principal Component Analysis (PCA) method, a Linear Discriminant Analysis (LDA) method and the like.

A model training module: the pre-trained model is used as a classification model, the speech vectors after feature dimensionality reduction are input into a classifier for classification, parameters of the model are trained, the model is optimized through experiments, and finally the optimal parameters are obtained, stored and used for classification in the implementation process. The classifier models that can be selected are Hidden Markov Models (HMMs), support vector machine models (SVMs), and the like.

A detection and classification module: the module is used for real-time detection during implementation, the captured voice data is subjected to the same operation as a training set, including feature extraction and feature dimension reduction, and then the vector is input into an output model of the model training module to obtain a detection result.

(3) In response:

and carrying out corresponding processing according to the detection result and the user setting, and reminding and warning the user. When telecommunication fraud is detected, the user is prompted by messages according to the degree, the user is reminded of not making actions such as transferring accounts and revealing passwords, and under the condition of strong authorization of the user, if the possibility of telecommunication fraud reaches a threshold value, the user can be interrupted, so that the safety of the user is guaranteed.

The following is a concrete description of the method of implementation of the invention using an example. In the embodiment, the basic work is to acquire the telecom call voice data set and mark out telecom fraud related voices, and use the marked telecom fraud related voices as the input of the classifier model to train the telecom fraud detection model. In specific implementation, the telecommunication is used as a platform, the communication voice of a user is obtained in real time in the communication process of the user, the communication voice of each speaking is subjected to basic processing and feature extraction according to a feature set, then feature dimension reduction is carried out, the communication voice is converted into a feature vector which can be recognized by a model, and the feature vector is used as the input of a detection model. The detection model detects the detection result and outputs the detection result, reminds the user whether the user is suffering telecommunication fraud or not according to whether the detection result is telecommunication fraud or not, reminds the user not to perform sensitive operations such as transfer remittance and password leakage and prompts the user to interrupt the telephone in time if the detection result is telecommunication fraud. The detection process continuously loops along with the conversation process of the user until the conversation is completed. The system is mainly composed of three parts of physical equipment: the system comprises a call voice collecting end, a call voice detecting end and a user terminal processing end. Wherein, (1) the conversation pronunciation collects the end: obtaining the conversation voice audio of the user in the conversation process by using the authorization of the user terminal or the authorization of the social software end, and transmitting the conversation voice audio into a conversation voice detection end; (2) the call voice detection end: performing telecommunication fraud detection analysis on the user voice; (3) a user terminal processing end: according to the detection result of the call voice detection end, if the fact that the voice-related acoustic characteristics of the user are likely to suffer telecommunication fraud is detected, corresponding response is made at the user terminal according to the probability, the user is prompted to pay attention to the safety of the user, and the risk of loss of the user is reduced.

In the embodiments, the method combines a large amount of literature summaries and case analysis, extracts the voice-related acoustic characteristics of the voice of an attacker based on voice recognition of the voice of a user during a call, converts the voice of the call into a feature vector, uses the feature vector for input of a telecom fraud classifier, and selects a proper classifier model for classification to identify whether the current call is a telecom fraud call, so that real-time telecom fraud protection can be provided for the user.

Based on the same inventive concept as the telecommunication fraud detection method shown in fig. 1, the embodiment of the invention also provides a telecommunication fraud detection system, as described in the following embodiments. Since the principle of solving the problem of the telecommunication fraud detection system is similar to that of the telecommunication fraud detection method, the implementation of the telecommunication fraud detection system can refer to the implementation of the telecommunication fraud detection method, and repeated details are not repeated.

FIG. 4 is a schematic structural diagram of a telecommunication fraud detection apparatus according to an embodiment of the present invention. As shown in FIG. 4, the telecommunication fraud detection apparatus of some embodiments may comprise:

a call voice acquiring unit 210, configured to acquire a call voice of a user during a telecommunication call of the user;

an acoustic feature extraction unit 220, configured to extract a mood-related acoustic feature of a call voice of a user;

an input data generating unit 230 for generating input data from the extracted speech-related acoustic features;

the detection result generating unit 240 is configured to input the generated input data to a telecom fraud classifier obtained through pre-training for classification, so as to obtain a telecom fraud detection result.

In some embodiments, the input data generating unit 230 may include:

the feature vector conversion module is used for converting the extracted voice related acoustic features into first feature vectors according to the voice acoustic feature list;

and the dimension reduction processing module is used for carrying out dimension reduction processing on the first feature vector to obtain a second feature vector as input data.

In some embodiments, the telecommunication fraud detection apparatus shown in fig. 4 may further include: and a message presentation unit, which may be connected to the detection result generation unit 240. And the message prompting unit is used for outputting a prompting message according to the telecom fraud detection result.

In some embodiments, the telecommunication fraud detection apparatus shown in fig. 4 may further include: the classifier training unit may be connected to the input end of the detection result generating unit 240 and before the detection result generating unit. And the classifier training unit is used for training an initial classifier model to obtain the telecom fraud classifier, wherein the initial classifier model is a hidden Markov model or a support vector machine model.

In some embodiments, the dimension reduction processing module may include: and the dimension reduction analysis module is used for performing dimension reduction processing on the first feature vector by utilizing a principal component analysis method or a linear discriminant analysis method to obtain a second feature vector.

In some embodiments, the telecommunication fraud detection apparatus shown in fig. 4 may further include: and a user authorization unit, which can be connected to the call voice acquiring unit 210. And the user authorization unit is used for acquiring the user authorization of acquiring the call voice on the platform where the user telecommunication call is located.

In some embodiments, the list of speech acoustic features may include: prosodic features, structural features, personalized speech acoustic features, and non-personalized speech emotional features. The prosodic features may include: fundamental frequency related features, energy related features and time-length related features; the construction features may include: time structure, amplitude structure, fundamental frequency structure, formant structure and MFCC coefficient; the personalized speech acoustic features may include: time structure, amplitude structure, fundamental frequency structure, formant structure, MFCC coefficient and Mel spectrum energy dynamic coefficient; the non-personalized speech emotion characteristics can comprise: time structure, amplitude structure, fundamental frequency structure, formant structure, and MFCC coefficients.

The embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the steps of the method described in the above embodiment are implemented. The telecommunication fraud detection apparatus and the computer device may be user terminal devices, such as mobile phones, fixed phones, tablet computers, personal computers, and the like.

The embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the steps of the method described in the above embodiment are implemented.

Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method described in the above embodiments.

In summary, the telecommunication fraud detection method, the telecommunication fraud detection apparatus and the computer-readable storage medium according to the embodiments of the present invention can identify whether the current telecommunication call is a telecommunication fraud in the telecommunication call process of the user, thereby improving the real-time performance of the telecommunication fraud detection, so as to improve the value of the telecommunication fraud detection and avoid the problem of hysteresis in identifying the telecommunication fraud after one or more telecommunication calls or loss.

In the description herein, reference to the description of the terms "one embodiment," "a particular embodiment," "some embodiments," "for example," "an example," "a particular example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. The sequence of steps involved in the various embodiments is provided to schematically illustrate the practice of the invention, and the sequence of steps is not limited and can be suitably adjusted as desired.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A telecommunications fraud detection method, comprising:

acquiring user authorization for acquiring call voice on a platform where a user telecommunication call is located, and acquiring user authorization for cutting off the user telecommunication call under the condition that the telecommunication fraud probability is greater than a set value;

in the telecommunication conversation process of a user, according to the recorded voice characteristics or voiceprint characteristics of one or more users who have voice conversation through terminal equipment in the historical voice conversation process, distinguishing the conversation voice of the user from the telecommunication conversation voice, and thus obtaining the conversation voice of the user;

extracting tone-related acoustic features of the call voice of the user;

generating input data according to the extracted acoustic features related to the tone;

inputting the generated input data into a telecom fraud classifier obtained through pre-training for classification to obtain a telecom fraud detection result; the telecommunication fraud detection result is the probability that the telecommunication conversation of the user is telecommunication fraud;

outputting a prompt message according to the telecom fraud detection result, wherein the telecom fraud probabilities obtained according to different telecom fraud detection results are different from the output prompt message, and cutting off the telecom conversation of the user under the condition that the telecom fraud probabilities obtained according to the telecom fraud detection results are greater than a set value;

the generating of the input data according to the extracted acoustic features related to the tone comprises:

converting the extracted voice related acoustic features into first feature vectors according to the voice acoustic feature list;

performing dimensionality reduction on the first feature vector to obtain a second feature vector which is used as input data;

the list of speech acoustic features comprises: the method comprises the steps of performing prosodic feature, construction feature, personalized voice acoustic feature and non-personalized voice emotional feature;

the prosodic features include: fundamental frequency related features, energy related features and time-length related features; the construction features include: time structure, amplitude structure, fundamental frequency structure, formant structure and MFCC coefficient; the personalized speech acoustic features include: time structure, amplitude structure, fundamental frequency structure, formant structure, MFCC coefficient and Mel spectrum energy dynamic coefficient; the non-personalized speech emotion characteristics comprise: time structure, amplitude structure, fundamental frequency structure, formant structure, and MFCC coefficients.

2. The telecommunication fraud detection method of claim 1, wherein before inputting the generated input data into a pre-trained telecommunication fraud classifier for classification, further comprising:

training an initial classifier model to obtain the telecom fraud classifier, wherein the initial classifier model is a hidden Markov model or a support vector machine model.

3. The telecommunications fraud detection method of claim 1, wherein performing dimension reduction processing on the first feature vector to obtain a second feature vector comprises:

and performing dimensionality reduction on the first feature vector by using a principal component analysis method or a linear discriminant analysis method to obtain a second feature vector.

4. A telecommunications fraud detection apparatus, comprising:

the user authorization unit is connected with the call voice acquisition unit and used for acquiring the user authorization for acquiring the call voice on the platform where the user telecommunication call is located and cutting off the user telecommunication call under the condition that the telecommunication fraud probability is greater than a set value;

the device comprises a call voice acquisition unit, a voice recognition unit and a voice recognition unit, wherein the call voice acquisition unit is used for distinguishing call voices of users from call voices of telecommunication according to the voice characteristics or voiceprint characteristics of one or more users who have passed through the voice call of the terminal equipment and are recorded in the historical voice call process in the telecommunication call process of the users so as to acquire the call voices of the users;

the detection result generation unit is used for inputting the generated input data into a telecom fraud classifier obtained through pre-training for classification to obtain a telecom fraud detection result; the telecommunication fraud detection result is the probability that the telecommunication conversation of the user is telecommunication fraud;

a message prompting unit connected with the detection result generating unit and used for outputting a prompting message according to the telecom fraud detection result, wherein the telecom fraud probabilities obtained by different telecom fraud detection results are different corresponding to the output prompting message, and the telecom conversation of the user is cut off under the condition that the telecom fraud probabilities obtained according to the telecom fraud detection results are greater than a set value;

the generating of the input data according to the extracted acoustic features related to the mood comprises:

5. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 3 are implemented when the processor executes the program.

6. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 3.