WO2019037205A1 - 语音欺诈识别方法、装置、终端设备及存储介质 - Google Patents

语音欺诈识别方法、装置、终端设备及存储介质 Download PDF

Info

Publication number
WO2019037205A1
WO2019037205A1 PCT/CN2017/104891 CN2017104891W WO2019037205A1 WO 2019037205 A1 WO2019037205 A1 WO 2019037205A1 CN 2017104891 W CN2017104891 W CN 2017104891W WO 2019037205 A1 WO2019037205 A1 WO 2019037205A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
feature
voice
lie
verification
Prior art date
Application number
PCT/CN2017/104891
Other languages
English (en)
French (fr)
Inventor
梁浩
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019037205A1 publication Critical patent/WO2019037205A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/16Hidden Markov models [HMM]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2203/00Aspects of automatic or semi-automatic exchanges
    • H04M2203/60Aspects of automatic or semi-automatic exchanges related to security aspects in telephonic communication systems
    • H04M2203/6027Fraud preventions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2203/00Aspects of automatic or semi-automatic exchanges
    • H04M2203/60Aspects of automatic or semi-automatic exchanges related to security aspects in telephonic communication systems
    • H04M2203/6045Identity confirmation

Definitions

  • the present application relates to the field of voice processing, and in particular, to a voice fraud identification method, apparatus, terminal device, and storage medium.
  • the use of anti-fraud services to identify malicious users of fraud in order to solve the fraud threats encountered in the payment, lending, wealth management, risk control and other business links, to achieve the goal of reducing losses.
  • the anti-fraud service is a service that identifies fraudulent acts such as transaction fraud, online fraud, telephone fraud, and stolen card hacking.
  • financial institutions monitor and identify the content of calls between service personnel and customers through the provision of quality inspectors, and use anti-fraud purposes by whether the customer lies to determine whether the customer is committing fraud.
  • the manual quality inspection customer's call content is used to identify whether the customer is in the process of fraud, the processing process is inefficient, and professional quality inspectors are required, and the labor cost is high.
  • the embodiment of the present invention provides a voice fraud identification method, device, terminal device, and storage medium, so as to solve the problem of low efficiency and high labor cost in the current use of manual quality inspection to identify fraud.
  • the embodiment of the present application provides a voice fraud identification method, including:
  • a fraud risk assessment result is obtained based on the authentication information and the lie verification information.
  • a voice fraud identification apparatus including:
  • the voice acquisition module to be tested is used to obtain voice information to be tested
  • a voice feature acquiring module configured to perform feature extraction on the voice information to be tested, and acquire a voice feature
  • An authentication obtaining module configured to perform identity verification on the voice feature by using an identity confirmation model, and obtain identity verification information
  • a lie verification obtaining module configured to perform lie verification on the voice feature by using a lie monitoring model, and obtain lie verification information
  • the fraud risk assessment module is configured to obtain a fraud risk assessment result based on the identity verification information and the lie verification information.
  • an embodiment of the present application provides a terminal device, including a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, where the processor executes the computer The following steps are implemented when reading the instruction:
  • a fraud risk assessment result is obtained based on the authentication information and the lie verification information.
  • an embodiment of the present application provides a computer readable storage medium, where the computer readable storage medium stores computer readable instructions, and when the computer readable instructions are executed by a processor, the following steps are implemented:
  • a fraud risk assessment result is obtained based on the authentication information and the lie verification information.
  • voice fraud identification method device, terminal device and storage medium provided by the embodiments of the present application
  • feature extraction is performed on the voice information to be tested to obtain voice features; and then the voice feature is verified by using the identity verification model and the lie verification model respectively.
  • the fraud risk assessment result is then obtained based on the authentication information and the lie verification information.
  • the voice information to be tested can be intelligently identified to obtain the fraud risk assessment result, and the process efficiency is high, and no manual intervention is needed, which is beneficial to saving labor costs.
  • FIG. 1 is a flowchart of a voice fraud identification method in Embodiment 1 of the present application.
  • FIG. 2 is a specific schematic diagram of step S30 of FIG. 1.
  • FIG. 3 is a specific schematic diagram of step S31 of FIG. 2.
  • FIG. 4 is a specific schematic diagram of step S34 of FIG. 2.
  • FIG. 5 is a specific schematic diagram of step S40 of FIG. 1.
  • FIG. 6 is a specific schematic diagram of step S50 of FIG. 1.
  • FIG. 7 is another flowchart of the voice fraud identification method in Embodiment 1 of the present application.
  • FIG. 8 is a schematic diagram of a voice fraud recognition apparatus in Embodiment 2 of the present application.
  • FIG. 9 is a schematic diagram of a terminal device in Embodiment 4 of the present application.
  • Fig. 1 is a flow chart showing a method of voice fraud recognition in this embodiment.
  • the voice fraud identification method is applied to a terminal device of a financial institution such as a bank, a securities, an insurance, a P2P, or another institution that needs to perform voice fraud identification, and is configured to implement intelligent recognition of a speaker's voice information to be tested to identify a speaker. Whether fraud is taking place.
  • the voice fraud identification method includes the following steps:
  • the voice information to be tested is the voice information of the speaker collected by the terminal device.
  • the voice information to be tested may be voice information in wav, mp3 or other format. It can be understood that each voice information to be tested is associated with a user ID, which is an identifier of a speaker for uniquely identifying the voice information to be tested.
  • the agent or other staff member guides the speaker to reply the identity information related to the speaker according to the preset question, so that the voice information to be tested includes the speaker identity. information.
  • the robot recording is used to guide the speaker to reply the identity letter related to the speaker. Interest, so that the voice information to be tested includes speaker identity information.
  • the identity information includes, but is not limited to, information related to the user such as name, age, ID number, contact number, address, and work unit in the embodiment.
  • S20 Feature extraction of the voice information to be measured, and acquiring voice features.
  • the identity verification model in step S30 and the lie monitoring model in step S40 are both processing the voice feature instead of directly processing the voice information, the feature information needs to be extracted in advance to obtain the identity confirmation.
  • Models and lie monitor the speech features used in the model.
  • Speech features include, but are not limited to, prosodic features, phonological features, spectral features, lexical features, and voiceprint features.
  • the prosodic feature also known as the super-sound quality feature or the super-segment feature, refers to the change in pitch, pitch length and sound intensity in the speech other than the sound quality feature.
  • the prosodic features include, but are not limited to, the pitch frequency, the pronunciation duration, the pronunciation amplitude, and the pronunciation rate in the present embodiment.
  • Sound quality features include, but are not limited to, formants F1-F3, band energy distribution, harmonic signal to noise ratio, and short-term energy jitter in this embodiment.
  • Spectral characteristics also known as vibrational spectral features, refer to the decomposition of complex oscillations into resonant waveforms of different amplitudes and frequencies, and the amplitudes of these resonant oscillations are arranged in a frequency pattern.
  • the spectral features are combined with prosodic features and sound quality features to improve the anti-noise effect of the characteristic parameters.
  • the spectral features are Mel-Frequency Cepstral Coefficients (MFCC), which can reflect the auditory characteristics of the human ear.
  • MFCC Mel-Frequency Cepstral Coefficients
  • the vocabulary feature is a part of speech feature for embodying words in the speech data to be tested, including but not limited to positive words and negative words in the embodiment.
  • the part-of-speech feature is combined with other phonetic features to facilitate the recognition of the speaker's emotion corresponding to the speech data to be tested.
  • the voiceprint feature i.e., i-vector feature
  • the voiceprint feature is a speaker-related feature that, combined with other phonetic features, can more effectively improve the accuracy of recognition in the speech recognition process.
  • the feature extraction of the voice information to be tested includes a feature extraction process of pre-emphasizing, framing, windowing, endpoint detection, fast Fourier transform, Meyer filter group, and discrete cosine transform acquisition of the voice information to be measured, Get the phonetic features.
  • the value of ⁇ is between 0.9 and 1.0, and we usually take 0.96.
  • the purpose of pre-emphasis is to raise the high-frequency part, flatten the spectrum of the signal, and maintain the entire frequency band from low frequency to high frequency.
  • the spectrum can be obtained with the same signal-to-noise ratio, highlighting the high-frequency formant.
  • Framing is the collection of N sample points into one unit of observation, called a frame.
  • the value of N is 256 or 512, and the time covered is about 20-30ms.
  • the overlapping area contains M sampling points, and usually the value of M is about 1/2 or 1/3 of N. This process is called framing.
  • Windowing is multiplied by the Hamming window (ie HammingWindow). Since the amplitude-frequency characteristic of the Hamming window is large in side-lobe attenuation, the windowing process can increase the continuity of the left end of the frame and the right end of the frame; Frame and windowing processes convert non-stationary speech signals into short-term stationary signals.
  • Endpoint detection is mainly used to distinguish between speech and noise and to extract valid speech parts.
  • the energy value is calculated, and the voice part and the noise part are distinguished according to the energy value, and an effective voice part is extracted therefrom.
  • Fast Fourier Transform is used to convert time domain signals into frequency domain energy spectrum analysis. Since the signal is usually difficult to see the characteristics of the signal in the time domain, it is usually converted to the energy distribution in the frequency domain to observe, and different energy distributions can represent the characteristics of different speech. Therefore, after multiplying the Hamming window, each frame of the signal needs to perform a fast Fourier transform to obtain the energy distribution in the spectrum. Performing fast Fourier transform on each frame signal after frame-winding to obtain each frame spectrum (ie, energy spectrum).
  • the Meyer filter bank is used to smooth the spectrum and eliminate the filtering effect, which can highlight the formant characteristics of the speech and reduce the amount of calculation. Then calculate the logarithmic energy of each delta filter output in the Meyer filter bank Where M is the number of triangular filters.
  • a discrete cosine transform is performed on the logarithmic energy output from the Mel filter bank to obtain a Mel Frequency Cepstrum Coefficient (MFCC).
  • MFCC Mel Frequency Cepstrum Coefficient
  • the discrete cosine transform (DCT) is calculated as follows: Where M is the number of triangular filters, L is the order of the MFCC coefficients, usually taken as 12-16, and the logarithmic energy is brought into the discrete cosine transform to obtain the L-order Mel-scale Cepstrum parameters, based on The Mel cepstrum coefficient obtains a speech feature, and specifically, the speech feature may be a speech feature sequence.
  • the identity verification model is used to authenticate the voice feature to obtain identity verification information.
  • the identity confirmation model is a model pre-trained in the organization for identity verification.
  • the identity confirmation model includes a pre-set user information repository in which user information associated with the user ID is stored.
  • the voice information to be tested acquired by the terminal device includes the identity information associated with the user ID, and then the user identity database is queried based on the user ID to obtain the corresponding standard identity information, and the identity information is compared with the standard identity information. Authentication to get authentication information.
  • the standard identity information is identity information stored by the user in the user information base, and the standard identity information is associated with the user ID.
  • step S30 the identity verification model is used to perform identity verification on the voice feature, and the identity verification information is obtained, which specifically includes the following steps:
  • S31 Perform speech recognition on the speech feature by using a speech recognition model to obtain target text information.
  • the speech recognition model includes pre-trained acoustic models and language models.
  • the acoustic model is used to process the correspondence between the speech features and the words, that is, the relationship for processing which word corresponds to each of the tones.
  • the language model is used to deal with the correspondence between words and words, that is, how to combine to form a reasonable sentence output.
  • step S31 the speech recognition function is used to perform speech recognition on the speech feature, and the acquisition of the target text information specifically includes the following steps:
  • S311 The speech feature is identified by a single phoneme training model to obtain a single phoneme feature sequence.
  • the monophone training model is a model for converting a speech feature sequence into a phoneme feature sequence.
  • the voice feature acquired by performing feature extraction on the voice information to be measured in step S20 is specifically a voice feature sequence.
  • the monophone training model is a model that is pre-trained by the system and stored in the database for direct invocation when in use. Since the training process of the single phoneme training model is based on the phoneme level training, the main consideration is the maximum posterior probability of each frame in the sentence, which can effectively improve the accuracy of voice fraud recognition. It can be understood that the single phoneme training model is the first link using acoustic model recognition, which can convert the frame level based recognition into the phoneme level based recognition, and improve the recognition accuracy.
  • the monophone training model is specifically a monophonic hybrid Gaussian Model-Hidden Markov Model (hereinafter referred to as a monophone GMM-HMM model).
  • the Hidden Markov Model (HMM model) is a double stochastic process, which is a hidden Markov chain with a certain state number and a display random function set. It is a state-level training model.
  • the training process of the monophone GMM-HMM model includes the initial iteration and the multiple iteration process. Through the initial iterative training and the multiple iteration training, the trained monophone GMM-HMM model can more accurately identify the monophone feature sequence.
  • the initial iteration of the monophone GMM-HMM model a small number of speech feature sequences are roughly calculated to obtain the mean and variance, and then the initial monophone GMM-HMM model is obtained. Then, each frame of the initial monophone corresponding to the speech feature sequence is labeled based on the initial monophone GMM-HMM model, that is, each speech in the speech feature sequence
  • the word corresponding to the sign is replaced with a phoneme expression by the pronunciation dictionary to obtain the initial monophone label. Since it is pronounced only for each word, it is called a monophone (ie, a monophone).
  • each iteration needs to train the extracted speech feature sequence and the initial monophone annotation obtained in the previous iteration to obtain the target single phoneme GMM-HMM model. Then, the ground truth is used to identify the correct pronunciation of each word, save as the target single phoneme label corresponding to the next iteration, and perform alignment processing according to the start and end time of the phoneme to obtain the target single phoneme feature.
  • Using the aligned data as the text data trained by the acoustic model is beneficial to ensure the accuracy of subsequent speech recognition.
  • multiple iterations generally need to perform 20-30 iterations, which can avoid too many iterations and lead to long training time; and avoid the number of iterations being too short, which affects the accuracy of obtaining a single phoneme feature sequence.
  • a single phoneme feature sequence is obtained based on all target monophone features to perform acoustic model training based on the phoneme feature sequence, thereby improving the accuracy of speech fraud recognition.
  • the triphone feature sequence is identified by using a triphone training model to obtain a triphone feature sequence.
  • the triphone training model is a model for converting a monophone feature sequence into a triphone feature sequence.
  • the single phoneme feature sequence outputted in step S311 is identified, and the triphone feature sequence is obtained, so that the acquired triphone feature sequence fully considers the context phoneme feature, thereby further improving the accuracy of the speech fraud recognition. It is avoided that the single phoneme feature sequence acquired in step S311 does not consider its context phoneme feature, resulting in a problem of low recognition accuracy.
  • the triphone training model is the second link of acoustic model recognition, which can fully consider the context phoneme in the phoneme recognition process to improve the recognition accuracy.
  • the triphone feature sequence is specifically a triphone Mixture Gaussian Model-Hidden Markov Model (hereinafter referred to as a triphone GMM-HMM model). That is, the triphone GMM-HMM model is used to identify the single phoneme feature sequence, and the triphone feature sequence is obtained, so that the acquired triphone feature sequence combined with its context phoneme feature is beneficial to improve the accuracy of speech fraud recognition.
  • a triphone GMM-HMM model is used to identify the single phoneme feature sequence, and the triphone feature sequence is obtained, so that the acquired triphone feature sequence combined with its context phoneme feature is beneficial to improve the accuracy of speech fraud recognition.
  • the training process of the triphone GMM-HMM model includes the initial iteration and the multiple iteration process. Through the initial iteration and the multiple iteration training, the trained triphone GMM-HMM model can accurately identify the triphone feature sequence.
  • the initial triphone annotation is obtained by adding the context of each of the few target monophone features of the monophone feature sequence to each of the phonemes. Then, the obtained initial triphone label is input into the target monophone GMM-HMM model acquired in the subsequent iterative process of step S311 to obtain the initial triphone GMM-HMM model, so that the initial triphone GMM-HMM model can be based on three
  • the phonemes are trained to improve the accuracy of the training.
  • the decision tree algorithm is used to cluster the initial triphones with similar pronunciations in the initial triphone GMM-HMM model to obtain the clustered triphone GMM-HMM model to improve the efficiency and accuracy of speech fraud recognition.
  • the initial triphone labeling with similar pronunciations obtained by the initial triphone GMM-HMM model is clustered, and each clustering result is called a Senone.
  • Senone is a three-state HMM, and each HMM can be expressed by a minimum of three frames. Each HMM can be expressed in 1 frame, considering only the first frame of each phoneme (ie, the first state), and setting the rest of the state to null, with one HMM representing a or ab or abb.
  • the updated monophone feature sequences obtained by the triphone GMM-HMM model are used for acoustic model training to increase the accuracy of speech fraud recognition.
  • each iteration needs to train the extracted speech feature sequence and the initial triphone annotation obtained in the previous iteration to obtain the target triphone model. Then, the ground truth is used to identify the correct pronunciation of each word, save as the target triphone annotation corresponding to the next iteration, and perform alignment processing according to the start and end time of the phoneme to obtain the target triphone feature.
  • Using the aligned data as the text data trained by the acoustic model is beneficial to ensure the accuracy of subsequent speech recognition.
  • multiple iterations generally need to perform 20-30 iterations, which can avoid too many iterations and lead to long training time; and avoid the number of iterations being too short, which affects the accuracy of acquiring the triphone feature sequence.
  • a triphone feature sequence is obtained based on all target triphone features to perform acoustic model training based on the phoneme feature sequence, thereby improving the accuracy of speech fraud recognition.
  • S313 Identifying the triphone feature sequence by using the long and short recursive neural network model to obtain initial text information.
  • the long-short term memory is a time recurrent neural network model suitable for processing and predicting important events with relatively long intervals and delays in time series.
  • the LSTM model has a time memory unit and is therefore used to process speech information.
  • the LSTM model structure has three layers, each layer contains 1024 neurons, and its output is a Softmax (regression model) for classifying and outputting the corresponding word pronunciation.
  • Softmax regression model
  • Softmax regression model
  • Softmax is a classification function commonly used in neural networks. It maps the output of multiple neurons to the interval [0,1], which can be understood as probability. It is simple and convenient to calculate, so as to carry out multi-classification. Output. It can be understood that the long-and-short recursive neural network model is the last link identified by the acoustic model, and the recognition process is simple and convenient and has high accuracy.
  • word-level sequence training is integrated into the phone-level LSTM model to achieve the fusion training of the two to ensure the fitting effect.
  • constraints such as cross-entropy training criteria, L2-norm training criteria, and Leaky HMM training criteria are required. Achieve the fusion training of the two to obtain the target acoustic model.
  • Words are adopted by using cross-entropy training criteria (ie, cross entropy training criteria), L2-norm training criteria (L2 norm training criteria), and Leaky HMM training criteria (ie, leaky bucket-hidden Markov model training criteria).
  • the word-level sequence training is integrated into the phone-level LSTM model to achieve the fusion training of the two to ensure the fitting effect.
  • the cross-entropy training criterion is a regular training criterion in neural network model training.
  • the L2-norm training criterion is an additional constraint to integrate word-level sequence training into the phone-level LSTM model to achieve fusion training between the two.
  • the L2-norm training guidelines are as follows: Among them, L( ⁇ ) is the contrast between the output of the neural network node and the ground truth. The smaller the error, the more the target acoustic model after training can fit the training speech signal. At the same time, in order to prevent the over-fitting phenomenon, the target acoustic model obtained by training has a good expression effect on any test data, and the regular term ⁇ (cost) needs to be added. In the L2-norm training criterion, the regular term is expressed as
  • the Leaky HMM training guidelines are additional constraints for incorporating word-level sequence training into the phone-level LSTM model.
  • the Leaky HMM training criterion is a new neural network training criterion for matching the single-state HMM constructed in this embodiment to perform the LSTM acoustic model of a normal three-state HMM.
  • the traditional three-state HMM has at least three transition probabilities, and the HMM used in this embodiment is single-state.
  • the transition probability of the b state is continuously updated to implement the word-level sequence training into the phoneme.
  • S314 Identify the initial text information by using a language model, and obtain target text information.
  • steps S311-S313 are processes for identifying a voice feature by using an acoustic model to obtain initial text information, and the initial text information is mainly represented by a correspondence between a voice feature and a word, and does not consider a word-to-word relationship.
  • the initial text information is identified by using a language model, so that the acquired target text information not only takes into account the correspondence between the speech features and the words, but also considers the correspondence between words and words.
  • the language model is specifically a language model tool Srilm. Srilm is used to build and apply statistical language models, mainly for speech recognition, statistical labeling and segmentation, and machine translation, running on UNIX and Windows platforms.
  • S32 Perform keyword extraction on the target text information to obtain identification information.
  • the identification identity information is a speaker identity information obtained by extracting keywords from target text information formed by the voice information to be tested. Because the speaker needs to guide the speaker to reply to the letter related to his identity information during the process of collecting the voice information to be tested.
  • the identification information obtained by extracting the target text information of the acquired character information includes the speaker identity information.
  • the speaker identity information includes, but is not limited to, information related to the user such as name, age, ID number, contact number, address, and work unit acquired during the voice information collection process to be tested.
  • the identity confirmation model further includes a preset keyword library for storing a preset question keyword that guides the speaker to reply to the speaker-related identity information.
  • a preset keyword library for storing a preset question keyword that guides the speaker to reply to the speaker-related identity information.
  • Each speaker has a corresponding keyword library, and each keyword library is associated with a user ID, which is an identifier for uniquely identifying the speaker's keyword library.
  • the preset question keyword has a one-to-one correspondence with the speaker's reply.
  • the text preprocessing algorithm is used to preprocess the target text information, and the text preprocessing algorithm includes at least one of simplification and simplification, unified case, Chinese word segmentation and stop word removal.
  • Chinese Word Segmentation refers to the division of a sequence of Chinese characters into a single word.
  • Stop Words are words or words that are automatically filtered out when processing natural language data, such as English characters, numbers, numeric characters, logo symbols, and single Chinese characters with extremely high frequency of use.
  • the problem keyword matching is performed on the pre-processed target text information based on the preset question keywords in the keyword library, that is, the preset keyword is found in a piece of text, and the speaker corresponding to the successful problem keyword is matched.
  • the target text information of the reply is the identification information.
  • KMP Knuth-Morris-Pratt
  • KMP algorithm is an improved string matching algorithm.
  • the key of KMP algorithm is to utilize Match the failed information to minimize the number of matches between the pattern string and the main string to achieve fast matching.
  • the KMP algorithm is selected for keyword extraction, which saves time and improves the efficiency of voice fraud recognition.
  • the keyword extraction may also use a Garbage-Hidden Markov Model (JMO-HMM model).
  • JMO-HMM model Garbage-Hidden Markov Model
  • the garbage-hidden Markov model is a common model for keyword recognition.
  • the process of keyword extraction is mainly to identify the keyword to obtain the target keyword information, that is, the identification identity information.
  • Hidden Markov Model is a common method for continuous speech recognition of non-specific person keyword recognition. Non-specific person speech recognition is not used for the recognition technology of designated speakers, using garbage model. To "absorb" non-keywords. It can be understood that keyword recognition can regard training as a combination of keywords and non-keywords, that is, the training speech is divided into two parts: a keyword and a non-keyword.
  • Each keyword corresponds to a keyword model
  • each non-keyword corresponds to a non-keyword model.
  • Non-keywords are represented by M garbage models (Garbage)
  • keywords are represented by N keyword models.
  • the garbage-hidden Markov model training process includes: acquiring training speech, extracting features of training speech, obtaining training speech feature sequences, and then training the initial keyword model and the initial garbage model based on the acquired training speech feature sequences respectively.
  • the target keyword model and the target garbage model based on the target keyword model and the target garbage model, obtain the global hidden Markov model. Garbage-hidden Markov model.
  • the speech feature acquired in step S20 is trained by using a global hidden Markov model to obtain an implicit state sequence.
  • the Viterbi (ie Viterbi) algorithm is used to find the best state path. If the best state path contains a subsequence so that each state in the subsequence corresponds to the state in a certain keyword model, then the sub-sequence is considered
  • the sequence of speech features corresponding to the sequence is the initial keyword information to be identified.
  • the initial keyword information is identified by using a language model to obtain target keyword information, that is, identification identity information.
  • target keyword information that is, identification identity information.
  • the Viterbi algorithm is a dynamic programming algorithm, generally used for sequence decoding. Understandably, each point in the sequence has a state.
  • the purpose of the Viterbi algorithm is to find the state of each point so that the decoding result of this sequence is globally superior. Using the Viterbi algorithm to find the implicit state sequence, the efficiency is high, and the computational complexity is reduced.
  • the keyword extraction algorithm is used to identify the voice features acquired in step S20, and the text information is acquired without identifying the entire voice feature, and the keyword information is directly extracted from the text information through the garbage-HMM model, thereby saving extraction time. To make voice fraud recognition more efficient.
  • standard identity information with the user ID is stored in advance in the user information base.
  • the terminal device of the organization obtains the voice information to be tested associated with the user ID
  • the user information database may be queried based on the user ID to obtain corresponding standard identity information.
  • the user information database may be a MySQL database, and the query voice may be used, and the user ID is used as a query field to obtain standard identity information corresponding to the user ID.
  • S34 Acquire identity verification information based on identifying identity information and standard identity information.
  • the identification identity information is compared with the standard identity information, and it is determined whether the identification identity information and the standard identity information correspond to the same speaker, so as to output corresponding identity verification information.
  • the acquired identity verification information is low fraud risk information; correspondingly, if the identity identification information and the standard identity information do not correspond to the same speaker, the acquired The authentication information is high fraud risk information.
  • the identity verification information outputted in this embodiment may output a probability value that the identification identity information and the standard identity information correspond to the same speaker.
  • step S34 based on the identification identity information and the standard identity information, obtaining the identity verification information specifically includes the following steps:
  • S341 Calculate the identity similarity between the identification identity information and the standard identity information.
  • the identification information may be compared with the standard identity information obtained in the user information base, and the identity identification information and the standard identity information may be divided by the same number to identify the identity information and the standard identity. The total amount of information, the ratio obtained is taken as the identity similarity.
  • the Euclidean distance of the identification identity information and the standard identity information can be calculated to obtain a corresponding identity similarity.
  • the euclidean metric also known as the Euclidean metric refers to the true distance between two points in the m-dimensional space, or the natural length of the vector (that is, the distance from the point to the origin).
  • the identification identity information can be represented by a vector a (Xi1, Xi2, ..., Xin), and the standard identity information can be represented by a vector b (Xj1, Xj2, ..., Xjn).
  • S342 Compare the identity similarity with a preset similarity threshold to obtain identity verification information.
  • the preset similarity threshold is preset to evaluate the similarity that the two identity information corresponds to the same speaker.
  • the authentication information is the result of the verification of the authentication.
  • the authentication information may include low fraud risk information and high fraud risk information, and may also include other information.
  • the preset similarity threshold may be set to 0.5, that is, if the identity similarity acquired in step S341 is greater than 0.5, the acquired identity verification information is low fraud risk information; otherwise, if the identity acquired in step S341 is similar If the degree is not more than 0.5, the obtained authentication information is high fraud risk information.
  • the lie monitoring model is used to lie the voice features and obtain the lie verification information.
  • the lie monitoring model is a pre-trained model for lie verification in the organization.
  • the lie monitoring model includes a pre-set lie information database, and the lie speech library stores preset lie detection problems and lie speech features of related services (ie, The standard feature of the lie in this embodiment).
  • the lie standard features include, but are not limited to, standard features such as speech frequency, utterance duration, amplitude variation, and tone quality features, including, but not limited to, formants and short-term energy jitter.
  • the lie verification can be implemented to obtain the lie verification model.
  • step S40 the lie detection model is used to perform lie verification on the voice feature, and the obtaining the lie verification information specifically includes the following steps:
  • the feature similarity can be calculated by using the Euclidean distance, that is, the speech feature is taken as the n-dimensional vector a (Xi1, Xi2, ..., Xin), and the standard feature is taken as the n-dimensional vector b (Xj1, Xj2,... , Xjn), then the Euclidean distance between the two
  • the standard verification information refers to the verification information corresponding to each standard feature in the lie voice library, and the standard verification information may be output in the form of high fraud risk information and low fraud risk information; the risk risk of fraud risk probability may also be adopted.
  • the form of the output refers to selecting the standard feature corresponding to the maximum value of the at least two feature similarities in the feature similarity obtained by the step S41.
  • the lie verification information corresponding to the target feature is used as the lie verification information.
  • step S30 and step S40 is not sequential.
  • S50 Acquire fraud risk assessment results based on authentication information and lie verification information.
  • the dual verification of the identity verification model and the lie verification model makes the obtained fraud risk assessment result more accurate, and can more accurately make the fraud risk assessment judgment and reduce the fraud risk.
  • step S50 based on the identity verification information and the lie verification information, obtaining the fraud risk assessment result specifically includes the following steps:
  • S51 Normalize the authentication information and the lie verification information, and obtain the identity verification standard value and the lie verification standard value.
  • data normalization is to scale the data to a small specific interval, to remove the unit limit of the data, and convert it into a pure value of the infinite level, which is convenient for different units or magnitude indicators.
  • the authentication information and the lie verification information are respectively standardized by using min-max normalization to obtain the identity verification standard value and the lie verification standard value.
  • min-max normalization is also called deviation normalization, which refers to the process of linearly transforming the original data by using a conversion function to make the result fall into a preset interval, wherein the conversion function Min is the minimum value of the sample data, max is the maximum value of the sample data, and N is the interval size of the preset interval. If N is 1, the result of the min-max normalization process falls within the range of [0, 1]; if N is 10, the result of the min-max normalization process falls to [0, 10]. Within the range.
  • S52 Multiply the authentication standard value and the lie verification standard value by the risk weight respectively, and obtain the authentication risk value and the lie verification risk value.
  • the risk weighting coefficient is preset to obtain the authentication risk value and the lie verification risk value.
  • the risk weighting coefficient of the identity verification may be set to 0.6
  • the risk weight of the lie verification is set to 0.4
  • the identity verification standard value and the lie verification standard value obtained in step S51 are respectively multiplied by the risk weight. Coefficients to obtain authentication risk values and lie verification risk values.
  • S53 Calculate the sum of the authentication risk value and the lie verification risk value, and obtain the fraud risk assessment result.
  • the authentication risk value and the lie verification risk value in step S52 are added to obtain the fraud risk assessment result, and the fraud risk assessment result is sent to the call center in real time to assist in making the risk assessment judgment.
  • the weight verification algorithm is used to weight the identity verification information and the lie verification information to obtain the fraud risk assessment result.
  • the feature information is extracted by the feature information to obtain the voice feature; the identity verification model and the lie verification model are respectively used to verify the voice feature, and then the identity verification information and the lie verification information are used. Get the fraud risk assessment results.
  • the voice fraud identification method can realize intelligent identification of the voice information to be tested to obtain the fraud risk assessment result, and the process has high processing efficiency, high accuracy and no manual intervention, which is beneficial to save labor costs.
  • the voice fraud identification method specifically includes the following steps:
  • S10' Acquire the voice information to be tested collected by the call center in real time.
  • the call center can be integrated in a financial institution or a terminal device of another institution that needs to perform voice fraud identification, or can be connected to a financial institution or a terminal device of another institution that needs voice fraud identification through a network to collect the call center in real time.
  • the detected voice information to be sent is sent to the terminal device, so that the terminal device performs fraud detection on the obtained voice information to be tested.
  • the call center is connected to the client terminal to enable the agent to talk with the customer.
  • the call center is a terminal that performs human-computer interaction with an agent in the organization.
  • the client terminal is a terminal that performs human-computer interaction with the client.
  • the client in this embodiment is the speaker of the voice information to be tested, and the terminal is a phone or a mobile phone.
  • the call center is provided with a recording module, and the recording module is configured to record the voice information to be tested collected by the call center in real time to obtain the voice information to be tested, and send the voice information to be tested to the client terminal.
  • the identity verification model is used to authenticate the voice feature to obtain identity verification information.
  • the lie monitoring model is used to lie the voice features and obtain the lie verification information.
  • S50' Obtain a fraud risk assessment result based on the authentication information and the lie verification information.
  • the steps S20'-S50' are the same as the implementation of the steps S20-S50 in the above specific embodiment. To avoid repetition, details are not described herein.
  • S60' The fraud risk assessment result is sent to the call center in real time.
  • the fraud risk result obtained in step S50 is fed back to the call center in real time, so as to assist the agent in the call center to make a fraud risk assessment judgment on the client, so that the agent performs a call process with the client.
  • the voice fraud identification method adopts an artificial intelligence recognition method, and the processing efficiency is high, and the process does not need to be equipped with professional quality inspection personnel for sampling inspection, which can save labor costs and reduce fraud risk.
  • the voice information to be tested collected by the call center in real time is obtained, and then the feature information is extracted by the voice information to be obtained to obtain the voice feature; and then the identity verification model and the lie verification model are respectively used.
  • the voice feature is verified, and then the fraud risk assessment result is obtained based on the authentication information and the lie verification information, and the fraud risk assessment result is sent to the call center in real time.
  • the voice fraud identification method can realize intelligent identification of the voice collected in real time to obtain the fraud risk result, and can send the fraud risk result to the call center in real time, and make fraud risk assessment judgment based on the fraud risk assessment result, and process processing thereof High efficiency, strong real-time performance, high flexibility and no need for manual intervention, which helps to save labor costs and reduce the risk of fraud.
  • Fig. 8 is a block diagram showing the principle of the voice fraud recognition apparatus corresponding to the voice fraud identification method in the first embodiment.
  • the voice fraud identification device includes a voice acquisition module 10 to be tested, a voice feature acquisition module 20, an identity verification acquisition module 30, a lie verification acquisition module 40, a fraud risk assessment module 50, and an evaluation result sending module 60.
  • Corresponding steps S10-S60 or steps S10'-S60' correspond one-to-one. In order to avoid redundancy, the present embodiment will not be described in detail.
  • the voice acquisition module 10 is configured to acquire voice information to be tested.
  • the voice feature acquiring module 20 is configured to perform feature extraction on the voice information to be measured, and acquire voice features.
  • the authentication obtaining module 30 is configured to perform identity verification on the voice feature by using an identity confirmation model to obtain identity verification information.
  • the lie verification obtaining module 40 is configured to perform lie verification on the voice feature by using the lie monitoring model to obtain lie verification information.
  • the fraud risk assessment module 50 is configured to obtain a fraud risk assessment result based on the authentication information and the lie verification information.
  • the identity verification module 30 includes a target character acquisition unit 31, an identification identity acquisition unit 32, a standard identity acquisition unit 33, and an identity verification acquisition unit 34.
  • the target text obtaining unit 31 is configured to perform speech recognition on the speech feature by using a speech recognition model to acquire target text information.
  • the identification identity obtaining unit 32 is configured to perform keyword extraction on the target text information to obtain the identification identity information.
  • the standard identity obtaining unit 33 is configured to obtain standard identity information corresponding to the user ID from the user information base.
  • the authentication obtaining unit 34 is configured to obtain the identity verification information based on the identification identity information and the standard identity information.
  • the target character acquisition sub-unit 31 includes a monophone feature acquisition sub-unit 311, a triphone feature acquisition sub-unit 312, an initial character acquisition sub-unit 313, and a target character acquisition sub-unit 314.
  • the monophone feature acquisition sub-unit 311 is configured to identify a speech feature by using a single phoneme training model to obtain a single phoneme feature sequence.
  • the triphone feature acquisition sub-unit 312 is configured to identify the monophone feature sequence by using the triphone training model to obtain the triphone feature sequence.
  • the initial character acquisition sub-unit 313 is configured to identify the triphone feature sequence by using the long-short recursive neural network model to obtain initial text information.
  • the target text obtaining subunit 314 is configured to identify the initial text information by using a language model, and obtain target text information.
  • the identity verification acquisition unit 34 includes an identity similarity acquisition sub-unit 341 and an identity verification information acquisition sub-unit 342.
  • the identity similarity obtaining sub-unit 341 is configured to calculate identity similarity between the identification identity information and the standard identity information.
  • the authentication information obtaining sub-unit 342 is configured to compare the identity similarity with the preset similarity threshold to obtain the identity verification information.
  • the lie verification acquisition module 40 includes a feature similarity acquisition unit 41 and a lie verification acquisition unit 42.
  • the feature similarity obtaining unit 41 is configured to compare the voice feature with all the standard features in the lie speech library, and calculate the feature similarity between the voice feature and each standard feature.
  • the lie verification obtaining unit 42 is configured to select the standard feature corresponding to the most similar feature similarity as the target feature, and use the standard verification information corresponding to the target feature as the lie verification information.
  • the fraud risk assessment module 50 includes a standard value acquisition unit 51, a risk value acquisition unit 52, and a fraud risk result acquisition unit 53.
  • the standard value obtaining unit 51 is configured to perform normalization processing on the identity verification information and the lie verification information, and obtain the identity verification standard value and the lie verification standard value.
  • the risk value obtaining unit 52 is configured to multiply the identity verification standard value and the lie verification standard value by the risk weight respectively. Get the authentication risk value and the lie verification risk value.
  • the fraud risk result obtaining unit 53 is configured to calculate a sum of the identity verification risk value and the lie verification risk value, and obtain the fraud risk assessment result.
  • the to-be-tested voice acquisition module 10 is configured to acquire the voice information to be tested collected by the call center in real time.
  • the evaluation result sending module 60 is configured to send the fraud risk assessment result to the call center in real time.
  • the embodiment provides a computer readable storage medium having stored thereon computer readable instructions, which are implemented by a processor to implement the voice fraud identification method in Embodiment 1, in order to avoid duplication, here No longer.
  • the computer readable instructions are executed by the processor, the functions of the modules/units in the voice fraud identification in Embodiment 2 are implemented. To avoid repetition, details are not described herein again.
  • FIG. 9 is a schematic diagram of a terminal device according to an embodiment of the present application.
  • the terminal device 90 of this embodiment includes a processor 91, a memory 92, and computer readable instructions 93 stored in the memory 92 and operable on the processor 91.
  • the processor 91 implements the steps of the voice fraud recognition method in the above-described Embodiment 1 when the computer readable instructions 93 are executed, such as steps S10 to S50 shown in Fig. 1, or steps S10' to S60' shown in Fig. 7.
  • the processor 91 executes the computer readable instructions 93
  • the functions of the modules/units in the voice fraud recognition apparatus in the second embodiment are implemented, for example, the voice acquisition module 10 to be tested, the voice feature acquisition module 20, and the identity shown in FIG.
  • the functions of the module such as the verification acquisition module 30, the lie verification acquisition module 40, the fraud risk assessment module 50, and the evaluation result transmission module 60 are provided.
  • computer readable instructions 93 may be partitioned into one or more modules/units, one or more modules/units being stored in memory 92 and executed by processor 91 to complete the application.
  • the one or more modules/units may be a series of computer readable instruction instructions segments capable of performing a particular function for describing the execution of computer readable instructions 93 in the terminal device 90.
  • the computer readable instructions 93 may be divided into the to-be-tested speech acquisition module 10, the speech feature acquisition module 20, the identity verification acquisition module 30, the lie verification acquisition module 40, the fraud risk assessment module 50, and the evaluation result sent in the embodiment 2.
  • the function of each module is as described in Embodiment 2, and details are not described herein.
  • the terminal device 90 can be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the terminal device may include, but is not limited to, a processor 91, a memory 92. It will be understood by those skilled in the art that FIG. 9 is merely an example of the terminal device 90, does not constitute a limitation of the terminal device 90, may include more or less components than those illustrated, or may combine certain components, or different components.
  • the terminal device may further include an input and output device, a network Network access devices, buses, etc.
  • the processor 91 may be a central processing unit (CPU), or may be another general-purpose processor, a digital signal processor (DSP), or an application specific integrated circuit (ASIC). Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory 92 may be an internal storage unit of the terminal device 90, such as a hard disk or a memory of the terminal device 90.
  • the memory 92 may also be an external storage device of the terminal device 90, such as a plug-in hard disk equipped with the terminal device 90, a smart memory card (SMC), a Secure Digital (SD) card, and a flash memory card (Flash). Card) and so on.
  • the memory 92 may also include both an internal storage unit of the terminal device 90 and an external storage device.
  • Memory 92 is used to store computer readable instructions as well as other programs and data required by the terminal device.
  • the memory 92 can also be used to temporarily store data that has been output or is about to be output.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated modules/units if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium.
  • the present application implements all or part of the processes in the foregoing embodiments, and may also be implemented by computer readable instructions, which may be stored in a computer readable storage medium.
  • the computer readable instructions when executed by a processor, may implement the steps of the various method embodiments described above.
  • the computer readable instructions comprise computer readable instruction code, which may be in the form of source code, an object code form, an executable file or some intermediate form or the like.
  • the computer readable medium can include any entity capable of carrying the computer readable instruction code Or device, recording medium, U disk, mobile hard disk, disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM, Random Access Memory), electrical carrier signal, telecommunication signal And software distribution media, etc. It should be noted that the content contained in the computer readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in a jurisdiction, for example, in some jurisdictions, according to legislation and patent practice, computer readable media It does not include electrical carrier signals and telecommunication signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种语音欺诈识别方法、装置、终端设备(90)及存储介质。该语音欺诈识别方法包括:获取待测语音信息(S10);对所述待测语音信息进行特征提取,获取语音特征(S20);采用身份确认模型对所述语音特征进行身份验证,获取身份验证信息(S30);采用谎言监控模型对所述语音特征进行谎言验证,获取谎言验证信息(S40);基于所述身份验证信息和所述谎言验证信息,获取欺诈风险评估结果(S50)。该语音欺诈识别方法进行语音欺诈识别时,具有效率高、准确率高且人工成本低的优点。

Description

语音欺诈识别方法、装置、终端设备及存储介质
本专利申请以2017年8月24日提交的申请号为2017107343010,名称为“语音欺诈识别方法、装置、终端设备及存储介质”的中国发明专利申请为基础,并要求其优先权。
技术领域
本申请涉及语音处理领域,尤其涉及一种语音欺诈识别方法、装置、终端设备及存储介质。
背景技术
在银行、证券、保险、P2P等金融机构中采用反欺诈服务识别恶意用户的欺诈行为,以解决在支付、借贷、理财、风控等业务环节遇到的欺诈威胁,达到降低损失的目标。其中,反欺诈服务是对包含交易诈骗,网络诈骗,电话诈骗,盗卡盗号等欺诈行为进行识别的一项服务。当前金融机构通过配备质检人员对服务人员与客户之间的通话内容进行监控识别,通过客户是否说谎以确定客户是否正在进行欺诈行为,以起到反欺诈目的。这种人工质检客户的通话内容以识别客户是否在进行欺诈作为的方式,处理过程效率低,且需配备专业的质检人员,人工成本高。
发明内容
本申请实施例提供一种语音欺诈识别方法、装置、终端设备及存储介质,以解决当前采用人工质检方式识别欺诈行为所存在的效率低且人工成本高的问题。
第一方面,本申请实施例提供一种语音欺诈识别方法,包括:
获取待测语音信息;
对所述待测语音信息进行特征提取,获取语音特征;
采用身份确认模型对所述语音特征进行身份验证,获取身份验证信息;
采用谎言监控模型对所述语音特征进行谎言验证,获取谎言验证信息;
基于所述身份验证信息和所述谎言验证信息,获取欺诈风险评估结果。
第二方面,本申请实施例提供一种语音欺诈识别装置,包括:
待测语音获取模块,用于获取待测语音信息;
语音特征获取模块,用于对所述待测语音信息进行特征提取,获取语音特征;
身份验证获取模块,用于采用身份确认模型对所述语音特征进行身份验证,获取身份验证信息;
谎言验证获取模块,用于采用谎言监控模型对所述语音特征进行谎言验证,获取谎言验证信息;
欺诈风险评估模块,用于基于所述身份验证信息和所述谎言验证信息,获取欺诈风险评估结果。
第三方面,本申请实施例提供一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取待测语音信息;
对所述待测语音信息进行特征提取,获取语音特征;
采用身份确认模型对所述语音特征进行身份验证,获取身份验证信息;
采用谎言监控模型对所述语音特征进行谎言验证,获取谎言验证信息;
基于所述身份验证信息和所述谎言验证信息,获取欺诈风险评估结果。
第四方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下步骤:
获取待测语音信息;
对所述待测语音信息进行特征提取,获取语音特征;
采用身份确认模型对所述语音特征进行身份验证,获取身份验证信息;
采用谎言监控模型对所述语音特征进行谎言验证,获取谎言验证信息;
基于所述身份验证信息和所述谎言验证信息,获取欺诈风险评估结果。
本申请实施例提供的语音欺诈识别方法、装置、终端设备及存储介质中,通过对待测语音信息进行特征提取,以获取语音特征;再采用身份验证模型和谎言验证模型分别对语音特征进行验证,然后基于身份验证信息和谎言验证信息得到欺诈风险评估结果。该语音欺诈识别方法、装置、终端设备及存储介质中,可实现待测语音信息进行智能识别,以获取欺诈风险评估结果,其过程处理效率高,且无需人工干涉,有利于节省人工成本。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要 使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例1中语音欺诈识别方法的一流程图。
图2是图1中步骤S30的一具体示意图。
图3是图2中步骤S31的一具体示意图。
图4是图2中步骤S34的一具体示意图。
图5是图1中步骤S40的一具体示意图。
图6是图1中步骤S50的一具体示意图。
图7是本申请实施例1中语音欺诈识别方法的另一流程图。
图8是本申请实施例2中语音欺诈识别装置的一示意图。
图9是本申请实施例4中终端设备的一示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
实施例1
图1示出本实施例中语音欺诈识别方法的流程图。该语音欺诈识别方法应用在银行、证券、保险、P2P等金融机构或者需要进行语音欺诈识别的其他机构的终端设备中,用于实现对说话人的待测语音信息进行智能识别,以识别说话人是否在进行欺诈行为。如图1所示,该语音欺诈识别方法包括如下步骤:
S10:获取待测语音信息。
其中,待测语音信息是终端设备采集到的说话人的语音信息。该待测语音信息可以是wav、mp3或其他格式的语音信息。可以理解地,每一待测语音信息与一用户ID关联,该用户ID是用于唯一识别待测语音信息的说话人的标识。在机构内的终端设备采集并获取待测语音信息过程中,由坐席人员或其他工作人员按预设问题引导说话人回复与说话人相关的身份信息,以使待测语音信息中包括说话人身份信息。或者,在机构内的终端设备采集并获取待测语音信息过程中,采用机器人录音引导说话人回复与说话人相关的身份信 息,以使待测语音信息中包括说话人身份信息。该身份信息包括但不限于本实施例中的姓名、年龄、身份证号、联系电话、地址和工作单位等与用户相关的信息。
S20:对待测语音信息进行特征提取,获取语音特征。
由于步骤S30中的身份确认模型和步骤S40中谎言监控模型均是对语音特征进行处理而不是直接对待测语音信息进行处理,因此,需预先对待测语音信息进行特征提取,以获取可在身份确认模型和谎言监控模型中使用的语音特征。
语音特征包括但不限于韵律特征、音质特征、频谱特征、词汇特征和声纹特征。其中,韵律特征,又叫超音质特征或者超音段特征,是指语音中除音质特征之外的音高、音长和音强方面的变化。该韵律特征包括但不限于本实施例中的基音频率、发音持续时间、发音振幅和发音语速。音质特征包括但不限于本实施例中的共振峰F1-F3、频带能量分布、谐波信噪比和短时能量抖动。频谱特征,又称振动谱特征,是指将复杂振荡分解为振幅不同和频率不同的谐振荡,这些谐振荡的幅值按频率排列形成的图形。频谱特征与韵律特征和音质特征相融合,以提高特征参数的抗噪声效果。本实施例中,频谱特征采用能够反映人耳听觉特性的梅尔频率倒谱系数(Mel-Frequency Cepstral Coefficients,以下简称MFCC)。词汇特征是用于体现待测语音数据中用词的词性特征,包括但不限于本实施例中的积极词和消极词。词性特征与其他语音特征结合,有利于识别待测语音数据对应的说话人的情绪。声纹特征(即i-vector特征)是与说话人相关的特征,其与其他语音特征结合,在语音识别过程中可更有效提高识别的准确率。
具体地,对待测语音信息进行特征提取具体包括对待测语音信息预加重、分帧、加窗、端点检测、快速傅里叶变换、梅尔滤波器组和离散余弦变换获取等特征提取过程,以获取语音特征。
其中,预加重处理其实是将语音信号通过一个高通滤波器:H(Z)=1-μz-1
式中μ值介于0.9-1.0之间,我们通常取0.96。预加重的目的是提升高频部分,使信号的频谱变得平坦,保持在低频到高频的整个频带中,能用同样的信噪比求频谱,突出高频的共振峰。
分帧是将N个采样点集合成一个观测单位,称为帧。通常情况下N的值为256或512,涵盖的时间约为20-30ms左右。为避免相邻两帧的变化过大,通过使两相邻帧之间有一段重叠区域,此重叠区域包含了M个取样点,通常M的值约为N的1/2或1/3,此过程称为分帧。
加窗是每一帧乘以汉明窗(即HammingWindow),由于汉明窗的幅频特性是旁瓣衰减 较大,通过加窗处理,可增加帧左端和帧右端的连续性;即通过分帧和加窗处理,可将非平稳语音信号转变为短时平稳信号。设分帧后的信号为S(n),n=0,1…,N-1,N为帧的大小,乘以汉明窗的信号S'(n)=S(n)×W(n),其中,W(n)形式如下:
Figure PCTCN2017104891-appb-000001
不同的a值会产生不同的汉明窗,一般情况下a取0.46。
端点检测主要用于区分语音和噪声,并提取有效的语音部分。在端点检测过程中,通过分帧、加窗处理后,计算出其能量值,根据能量值区分语音部分和噪声部分,从中提取有效的语音部分。
快速傅里叶变换用于将时域信号转换为频域能量谱分析。由于信号在时域上的变换通常很难看出信号的特性,所以通常将它转换为频域上的能量分布来观察,不同的能量分布,就能代表不同语音的特性。所以在乘上汉明窗后,每帧信号还需进行快速傅里叶变换以得到在频谱上的能量分布。对分帧加窗后的各帧信号进行快速傅里叶变换得到各帧频谱(即能量谱)。
梅尔滤波器组是指将快速傅里叶变换输出的能量谱通过一组Mel(梅尔)尺度的三角滤波器组,定义一个有M个滤波器的滤波器组,采用的滤波器为三角滤波器,中心频率为f(m),m=1,2,...,M。M通常取22-26。梅尔滤波器组用于对频谱进行平滑化,并起消除滤波作用,可以突出语音的共振峰特征,可降低运算量。然后计算梅尔滤波器组中每个三角滤波器输出的对数能量
Figure PCTCN2017104891-appb-000002
其中,M是三角滤波器的个数。
对梅尔滤波器组输出的对数能量进行离散余弦变换(DCT),得到梅尔倒谱系数(Mel Frequency Cepstrum Coefficient,以下简称MFCC)。具体地,离散余弦变换(DCT)的计算公式如下:
Figure PCTCN2017104891-appb-000003
其中,M是三角滤波器的个数,L是MFCC系数的阶数,通常取12-16,将上述对数能量带入离散余弦变换,即可求出L阶的Mel-scale Cepstrum参数,基于梅尔倒谱系数获取语音特征,具体地,该语音特征可为语音特征序列。
S30:采用身份确认模型对语音特征进行身份验证,获取身份验证信息。
其中,身份确认模型是机构内预先训练好用于进行身份验证的模型。该身份确认模型包括预先设置的用户信息库,用户信息库中存储与用户ID相关联的用户信息。本实施例 中,终端设备获取到的待测语音信息包含与用户ID相关联的身份信息,再基于用户ID查询用户信息库获取对应的标准身份信息,将识别身份信息与标准身份信息进行比较,即可实现身份验证,以获取身份验证信息。其中,标准身份信息是用户存储在用户信息库中的身份信息,该标准身份信息与用户ID相关联。
在一具体实施方式中,如图2所示,步骤S30中,采用身份确认模型对语音特征进行身份验证,获取身份验证信息,具体包括如下步骤:
S31:采用语音识别模型对语音特征进行语音识别,获取目标文字信息。
语音识别模型包括预先训练好的声学模型和语言模型。其中,声学模型用于处理语音特征与字之间的对应关系,即用于处理每个音对应哪个字的关系。语言模型用于处理字与字之间的对应关系,即怎样组合形成一合理句子输出。
具体地,如图3所示,步骤S31中,采用语音识别模型对语音特征进行语音识别,获取目标文字信息具体包括如下步骤:
S311:采用单音素训练模型对语音特征进行识别,获取单音素特征序列。
其中,单音素训练模型是用于将语音特征序列转换成音素特征序列的模型。可以理解地,步骤S20中对待测语音信息进行特征提取所获取的语音特征具体为语音特征序列。该单音素训练模型是系统预先训练好并存储在数据库中,以便使用时直接调用的模型。由于单音素训练模型的训练过程是基于音素级别的训练,主要考虑的是语句中每帧的最大后验概率,可有效提高语音欺诈识别的准确率。可以理解地,单音素训练模型是采用声学模型识别的第一个环节,可将基于帧级别的识别转换成基于音素级别的识别,提高识别的准确率。
本实施例中,单音素训练模型具体为单音素混合高斯模型-隐马尔科夫模型(monophone Mixture Gaussian Model-Hidden Markov Model,以下简称单音素GMM-HMM模型)。其中,隐马尔科夫模型(Hidden Markov Model,以下简称HMM模型)是一个双重随机过程,是具有一定状态数的隐马尔可夫链和显示随机函数集,是基于状态级别的训练模型。
单音素GMM-HMM模型的训练过程包括初次迭代和多次迭代过程,通过初始迭代训练和多次迭代训练,使得训练出的单音素GMM-HMM模型可更准确地识别单音素特征序列。在单音素GMM-HMM模型的初次迭代过程中,通过对少量的语音特征序列进行粗略计算,以获取其均值和方差,进而获取初始单音素GMM-HMM模型。然后基于初始单音素GMM-HMM模型对语音特征序列所对应的初始单音素的每一帧进行标注,即将语音特征序列中的每一语音特 征对应的词通过发音词典替换为音素表达以获取初始单音素标注。由于只针对每一词发音,因此称为monophone(即单音素)。
在单音素GMM-HMM模型的多次迭代过程中,每次迭代均需将提取到的语音特征序列和上一次迭代中获取到的初始单音素标注进行训练,获取目标单音素GMM-HMM模型。然后,对照文本标注(ground truth),以识别每个词的正确发音,保存为下一次迭代对应的目标单音素标注,并按照音素的起止时间进行对齐处理,获取目标单音素特征。将对齐后的数据作为声学模型训练的文本数据,有利于保障后续语音识别的准确性。本实施例中,多次迭代一般需要进行20-30次迭代,既可避免迭代次数过多,导致训练时间过长;又可避免迭代次数过短,影响获取单音素特征序列的准确率。最后,基于所有目标单音素特征获取单音素特征序列,以便基于该音素特征序列进行声学模型训练,从而提高语音欺诈识别的准确率。
S312:采用三音素训练模型对单音素特征序列进行识别,获取三音素特征序列。
其中,三音素训练模型是用于将单音素特征序列转换成三音素特征序列的模型。通过采用三音素训练模型,对步骤S311输出的单音素特征序列进行识别,获取三音素特征序列,使获取到的三音素特征序列充分考虑其上下文音素特征,进一步提高语音欺诈识别的准确率,以避免步骤S311中获取的单音素特征序列未考虑其上下文音素特征而导致识别准确率低的问题。可以理解地,三音素训练模型是采用声学模型识别的第二个环节,可在音素识别过程中充分考虑上下文音素,以提高识别的准确率。
本实施例中,三音素特征序列具体为三音素混合高斯模型-隐马尔科夫模型(triphone Mixture Gaussian Model-Hidden Markov Model,以下简称三音素GMM-HMM模型)。即采用三音素GMM-HMM模型对单音素特征序列进行识别,获取三音素特征序列,以使获取到的三音素特征序列结合其上下文音素特征,有利于提高语音欺诈识别的准确率。
三音素GMM-HMM模型的训练过程包括初次迭代和多次迭代过程,通过初始迭代和多次迭代训练,使得训练出的三音素GMM-HMM模型可准确地识别出三音素特征序列。在三音素GMM-HMM模型的初次迭代过程,通过将单音素特征序列的少量目标单音素特征的每个音素加上其上下文,以获取初始三音素标注。再将获取的初始三音素标注输入步骤S311的后续迭代过程中获取到的目标单音素GMM-HMM模型中,以获取初始三音素GMM-HMM模型,以使初始三音素GMM-HMM模型可基于三音素进行训练,提高训练的准确率。然后采用决策树算法将获取到的初始三音素GMM-HMM模型中发音相近的初始三音素标注聚成一类,以获取聚类三音素GMM-HMM模型,以提高语音欺诈识别的效率和准确率。具体地,采用决策树算 法将初始三音素GMM-HMM模型获取的发音相近的初始三音素标注聚类,每个聚类结果称为一个Senone。本实施例中,Senone是一个三状态的HMM,每个HMM可以被最少3帧来表达。每个HMM可以采用1帧来表达,只考虑每个音素的第一帧(即第一个状态),而将其余状态设置为空,可用一个HMM代表a或ab或abb。采用三音素GMM-HMM模型获取到的更新的单音素特征序列进行声学模型训练,增加语音欺诈识别的准确率。
在三音素GMM-HMM模型的多次迭代过程中,每次迭代均需将提取到的语音特征序列和上一次迭代中获取到的初始三音素标注进行训练,获取到目标三音素模型。然后,对照文本标注(ground truth),以识别每个词的正确发音,保存为下一次迭代对应的目标三音素标注,并按照音素的起止时间进行对齐处理,获取目标三音素特征。将对齐后的数据作为声学模型训练的文本数据,有利于保障后续语音识别的准确性。本实施例中,多次迭代一般需要进行20-30次迭代,既可避免迭代次数过多,导致训练时间过长;又可避免迭代次数过短,影响获取三音素特征序列的准确率。最后,基于所有目标三音素特征获取三音素特征序列,以便基于该音素特征序列进行声学模型训练,从而提高语音欺诈识别的准确率。
S313:采用长短时递归神经网络模型对三音素特征序列进行识别,获取初始文字信息。
长短时递归神经网络模型(long-short term memory,以下简称LSTM)是一种时间递归神经网络模型,适合于处理和预测时间序列中间隔和延迟相对较长的重要事件。LSTM模型具有时间记忆单元因而用来处理语音信息,LSTM模型结构有三层,每层含1024个神经元,它的输出是一个Softmax(回归模型),用于分类输出对应的字的发音。Softmax(回归模型)是一种常用于神经网络的分类函数,它将多个神经元的输出,映射到[0,1]区间内,可以理解成概率,计算起来简单方便,从而来进行多分类输出。可以理解地,长短时递归神经网络模型是采用声学模型识别的最后一个环节,识别过程简单方便且准确率高。
具体地,为了将词级别(word-level)的序列训练融入到音素级别(phone-level)的LSTM模型中,需采用cross-entropy训练准则、L2-norm训练准则和Leaky HMM训练准则等约束条件实现两者的融合训练,以获取目标声学模型。通过采用cross-entropy训练准则(即交叉熵训练准则)、L2-norm训练准则(L2范数训练准则)和Leaky HMM训练准则(即漏桶-隐马尔科夫模型训练准则)等准则,将词级别(word-level)的序列训练融入到音素级别(phone-level)的LSTM模型中,实现两者的融合训练,保证其拟合效果。
其中,cross-entropy训练准则是神经网络模型训练中常规的训练准则。该 cross-entropy训练准则如下:
Figure PCTCN2017104891-appb-000004
其中,a是每个神经网络节点的输出,y是标注比对样本,x是每个神经网络节点的输入;当a=y时cost=0。
L2-norm训练准则是为了将词级别(word-level)的序列训练融入到音素级别(phone-level)的LSTM模型而额外增加的约束条件,以实现两者的融合训练。该L2-norm训练准则如下:
Figure PCTCN2017104891-appb-000005
其中,L(·)为神经网络节点的输出与文本标注(ground truth)对比误差,该误差越小越能保证训练后的目标声学模型越拟合训练语音信号。同时,为了防止过拟合现象,使得训练得到的目标声学模型在任意的测试数据也具有良好的表达效果,需加入正则项λΩ(cost),在L2-norm训练准则中,正则项表达为
Figure PCTCN2017104891-appb-000006
Leaky HMM训练准则是为了将词级别(word-level)的序列训练融入到音素级别(phone-level)的LSTM模型而额外增加的约束条件。Leaky HMM训练准则是一种新的神经网络训练准则,用于匹配本实施例中构建的单状态HMM来进行正常三状态的HMM的LSTM声学模型。传统三状态的HMM至少具有三个转移概率,而本实施例中采用的HMM是单状态的,为实现a->b状态的转移,设置其转移概率如下:P=leakyHMM系数×b状态的转移概率,其中leakyHMM系数可设为0.1,b状态的初始转移概率为0.5,在目标声学模型训练过程,不断更新b状态的转移概率,以实现将词级别(word-level)的序列训练融入到音素级别(phone-level)的LSTM模型。
S314:采用语言模型对初始文字信息进行识别,获取目标文字信息。
本实施例中,步骤S311-S313是采用声学模型对语音特征进行识别,获取初始文字信息的过程,该初始文字信息主要体现为语音特征与字之间的对应关系,没有考虑字与字之间的对应关系。因此,步骤S314中需采用语言模型对初始文字信息进行识别,以使获取的目标文字信息不仅考虑到语音特征与字之间的对应关系,还考虑到字与字之间的对应关系。本实施例中,语言模型具体为语言模型工具Srilm。Srilm用来构建和应用统计语言模型,主要用于语音识别,统计标注和切分,以及机器翻译,可运行在UNIX及Windows平台上。
S32:对目标文字信息进行关键词提取,获取识别身份信息。
其中,识别身份信息是从待测语音信息形成的目标文字信息进行关键词提取,获取的说话人身份信息。由于待测语音信息采集过程中需引导说话人回复与其身份信息相关的信 息,从而使其获取的目标文字信息提取关键词获取的识别身份信息包括说话人身份信息。该说话人身份信息包括但不限于在待测语音信息采集过程中获取的姓名、年龄、身份证号、联系电话、地址和工作单位等与用户相关的信息。
在一具体实施方式中,身份确认模型还包括预先设置的关键词库,用于存储引导说话人回复与说话人相关身份信息的预设问题关键词。其中,每一说话人都有一个与其对应的关键词库,每一关键词库与用户ID相关联,该用户ID是用于唯一识别说话人的关键词库的标识。可以理解地,预设问题关键词与说话人的回复一一对应。本实施例中,采用文本预处理算法对目标文字信息进行预处理,文本预处理算法包括繁简体统一、大小写统一、中文分词和停用词去除中的至少一种。中文分词(Chinese Word Segmentation)指的是将一个汉字序列切分成一个一个单独的词。停用词(Stop Words)是指在处理自然语言数据时会自动过滤掉的某些字或词,如英文字符、数字、数字字符、标识符号及使用频率特高的单汉字等。最后,基于关键词库中的预设问题关键词对预处理后的目标文字信息进行问题关键词匹配,即在一段文本中找出预设关键词,匹配成功的问题关键词所对应的说话人答复的目标文字信息即为识别身份信息。
关键词匹配所选用的算法是克努特——莫里斯——普拉特算法(Knuth-Morris-Pratt,简称KMP),KMP算法是一种改进的字符串匹配算法,KMP算法的关键是利用匹配失败后的信息,尽量减少模式串与主串的匹配次数以达到快速匹配的目的。本实施例中,选用KMP算法进行关键词提取,节省时间,提高语音欺诈识别的效率。
在另一具体实施方式中,关键词提取也可选用垃圾-隐马尔科夫模型(Garbage-Hidden Markov Model,简称垃圾-HMM模型)。垃圾-隐马尔科夫模型是一种用于关键词识别的常用模型。本实施例中,关键词提取的过程主要是对关键词进行识别得到目标关键词信息即识别身份信息。其中,隐马尔科夫模型(Hidden Markov Model,以下简称HMM)是用于连续语音识别非特定人关键词识别的常用方法,非特定人语音识别是不用针对指定说话人的识别技术,利用垃圾模型来“吸收”非关键词。可以理解地,关键词识别可将训练看作是关键词和非关键词的组合,即将训练语音分为关键词和非关键词两部分。每个关键词对应一个关键词模型,每一个非关键词对应一个非关键词模型。非关键词由M个垃圾模型(Garbage)来表示,关键词由N个关键词模型来表示。垃圾-隐马尔科夫模型训练过程包括:获取训练语音,对训练语音进行特征提取,获取训练语音特征序列,然后基于获取的训练语音特征序列分别对初始关键词模型和初始垃圾模型进行训练,获取目标关键词模型和目标垃圾模型,基于目标关键词模型和目标垃圾模型,获取全局隐马尔科夫模型即 垃圾-隐马尔科夫模型。再对步骤S20获取到的语音特征采用全局隐马尔科夫模型进行训练,以获取隐含状态序列。最后,采用Viterbi(即维特比)算法找出最佳状态路径,如果最佳状态路径中含有一个子序列使得子序列中的每个状态都对应某个关键词模型中的状态,则认为该子序列对应的语音特征序列是要识别的初始关键词信息。采用语言模型对初始关键词信息进行识别得到目标关键词信息即识别身份信息。对于HMM而言,其中一个重要的任务就是要找出最有可能产生其观测序列的隐含状态序列。其中,Viterbi算法是一种动态规划算法,一般用于序列的译码。可以理解地,序列中每一个点有一个状态,Viterbi算法的目的是要找到每一个点的状态,使得这个序列的译码结果全局较优。采用Viterbi算法找出隐含状态序列,效率高,减少计算的复杂度。本实施例中,采用关键词提取算法对步骤S20获取到的语音特征进行识别,无需识别整个语音特征,获取文字信息,再通过垃圾-HMM模型从文字信息中直接提取关键词信息,节省提取时间,使得语音欺诈识别的效率更高。
S33:从用户信息库中获取与用户ID相对应的标准身份信息。
具体地,用户信息库中预先存储与用户ID的标准身份信息。在机构的终端设备获取到与用户ID关联的待测语音信息时,可基于该用户ID查询用户信息库,以获取对应的标准身份信息。本实施例中,用户信息库可以为MySQL数据库,可采用查询语音,以用户ID为查询字段查询获取与用户ID相对应的标准身份信息。
S34:基于识别身份信息与标准身份信息,获取身份验证信息。
具体地,将识别身份信息与标准身份信息进行对比,判断识别身份信息与标准身份信息是否对应同一说话人,以输出相应的身份验证信息。本实施例中,若识别身份信息与标准身份信息对应同一说话人,则获取的身份验证信息为低欺诈风险信息;相应地,若识别身份信息与标准身份信息不对应同一说话人,则获取的身份验证信息为高欺诈风险信息。或者,本实施例中输出的身份验证信息可以输出识别身份信息与标准身份信息对应同一说话人的概率值。
在一具体实施方式中,如图4所示,步骤S34中,基于识别身份信息与标准身份信息,获取身份验证信息具体包括如下步骤:
S341:计算识别身份信息和标准身份信息的身份相似度。
在一具体实施方式中,可将识别身份信息与在用户信息库中获取到的标准身份信息进行身份信息比对,将识别身份信息与标准身份信息相同的数量除以进行识别身份信息和标准身份信息的总数量,将获取到的比值作为身份相似度。
在另一具体实施方式中,可通过计算识别身份信息和标准身份信息的欧氏距离,以获取对应的身份相似度。其中,欧氏距离(euclidean metric,又称欧几里得度量)是指在m维空间中两个点之间的真实距离,或者向量的自然长度(即该点到原点的距离)。任意两个n维向量a(Xi1,Xi2,...,Xin)与b(Xj1,Xj2,...,Xjn)的欧氏距离
Figure PCTCN2017104891-appb-000007
其中,识别身份信息可用向量a(Xi1,Xi2,...,Xin)表示,标准身份信息可用向量b(Xj1,Xj2,...,Xjn)来表示。
S342:将身份相似度与预设相似阈值进行比较,获取身份验证信息。
其中,预设相似阈值是预先设置用于评价两个身份信息对应同一说话人需要达到的相似度。身份验证信息是进行身份验证的验证结果。身份验证信息可以包括低欺诈风险信息和高欺诈风险信息,也可以包括其他信息。本实施例中,该预设相似阈值可设置为0.5,即若步骤S341中获取的身份相似度大于0.5,则获取的身份验证信息为低欺诈风险信息;反之,若步骤S341中获取的身份相似度不大于0.5,则获取的身份验证信息为高欺诈风险信息。
S40:采用谎言监控模型对语音特征进行谎言验证,获取谎言验证信息。
其中,谎言监控模型是机构内预先训练好用于谎言验证的模型,该谎言监控模型包括预先设置的谎言信息库,谎言语音库中存储预设的测谎问题以及相关业务的谎言语音特征(即本实施例中的谎言标准特征)。该谎言标准特征包括但不限于语音频率、发音时长、幅度变化和音质特征等标准特征,其中,音质特征包括但不限于共振峰和短时能量抖动。本实施例中,通过计算步骤S20获取的语音特征与谎言信息库中的谎言标准特征的特征相似度,即可实现谎言验证,以获取谎言验证模型。
在一具体实施方式中,如图5所示,步骤S40中,采用谎言监控模型对语音特征进行谎言验证,获取谎言验证信息具体包括如下步骤:
S41:将语音特征与谎言语音库中所有的标准特征进行对比,计算语音特征与每一标准特征的特征相似度。
其中,特征相似度可采用欧氏距离来计算,即将语音特征作为n维向量a(Xi1,Xi2,...,Xin),并将标准特征作为n维向量b(Xj1,Xj2,...,Xjn),则两者的欧氏距离
Figure PCTCN2017104891-appb-000008
S42:选取最相似的特征相似度对应的标准特征作为目标特征,并将目标特征对应的 标准验证信息作为谎言验证信息。
其中,标准验证信息是指谎言语音库中每一标准特征对应的验证信息,该标准验证信息可采用高欺诈风险信息和低欺诈风险信息这种形式输出;也可采用欺诈风险概率这种量化风险的形式输出。具体地,最相似的特征相似度的选取过程是指从步骤S41计算获取到至少两个语音特征与标准特征的特征相似度中,选取至少两个特征相似度中的最大值所对应的标准特征作为目标特征,再将目标特征所对应的谎言验证信息作为谎言验证信息。
可以理解地,步骤S30和步骤S40的执行顺序没有先后之分。
S50:基于身份验证信息和谎言验证信息,获取欺诈风险评估结果。
本实施例中,采用身份验证模型和谎言验证模型的双重验证使得获取到的欺诈风险评估结果更加准确,并能更精准的做出欺诈风险评估判断,降低欺诈风险。
在一具体实施方式中,如图6所示,步骤S50中,基于身份验证信息和谎言验证信息,获取欺诈风险评估结果具体包括如下步骤:
S51:对身份验证信息和谎言验证信息进行标准化处理,获取身份验证标准值和谎言验证标准值。
其中,数据标准化(normalization)是将数据按比例缩放,使之落入一个小的特定区间,用于去除数据的单位限制,将其转化为无量级的纯数值,便于不同单位或量级的指标能够进行比较和加权运算处理。本实施例中,采用min-max标准化(Min-max normalization)分别对身份验证信息和谎言验证信息进行标准化处理,以获取身份验证标准值和谎言验证标准值。其中,min-max标准化(Min-max normalization)也称为离差标准化,是指采用转换函数对原始数据进行线性变换,使结果落到预设区间的过程,其中,转换函数
Figure PCTCN2017104891-appb-000009
min为样本数据的最小值,max为样本数据的最大值,N为预设区间的区间大小。若N为1,则采用min-max标准化处理后的结果落在[0,1]这个区间范围内;若N为10,则采用min-max标准化处理后的结果落在[0,10]这个区间范围内。
S52:将身份验证标准值和谎言验证标准值分别乘以风险权重,获取身份验证风险值和谎言验证风险值。
其中,风险权重的系数是预先设置用于获取身份验证风险值和谎言验证风险值。本实施例中,可将身份验证的风险权重系数设定为0.6,谎言验证的风险权重设定为0.4,再将步骤S51中获取到的身份验证标准值和谎言验证标准值分别乘以风险权重系数,以获取身份验证风险值和谎言验证风险值。
S53:计算身份验证风险值和谎言验证风险值的和,获取欺诈风险评估结果。
将步骤S52中的身份验证风险值和谎言验证风险值做加法运算,得到欺诈风险评估结果,再将欺诈风险评估结果实时发送给呼叫中心,辅助做出风险评估的判断。
即本实施例的步骤S52和S53中,采用加权运算算法对身份验证信息和谎言验证信息进行加权处理,获取欺诈风险评估结果。加权运算算法如下:Pi=Σviwi,其中,Pi为身份验证风险值或者谎言验证风险值,Vi为身份验证信息或者谎言验证信息中每一标准特征数据的值,Wi是每一种标准特征数据的权重系数。
本实施例中的语音欺诈识别方法中,通过对待测语音信息进行特征提取,以获取语音特征;再采用身份验证模型和谎言验证模型分别对语音特征进行验证,然后基于身份验证信息和谎言验证信息得到欺诈风险评估结果。该语音欺诈识别方法,可实现待测语音信息进行智能识别,以获取欺诈风险评估结果,其过程处理效率高、准确率高且无需人工干涉,有利于节省人工成本。
在一具体实施方式中,如图7所示,该语音欺诈识别方法具体包括如下步骤:
S10’:获取呼叫中心实时采集的待测语音信息。
该呼叫中心可以集成在金融机构或者需要进行语音欺诈识别的其他机构的终端设备上,也可以通过网络与金融机构或者需要进行语音欺诈识别的其他机构的终端设备通信相连,以将呼叫中心实时采集到的待测语音信息发送给终端设备,以便于终端设备对获取到的待测语音信息进行欺诈识别。该呼叫中心与客户终端通话相连,以实现坐席人员与客户进行通话。其中,该呼叫中心是与机构内的坐席人员进行人机交互的终端。客户终端是与客户进行人机交互的终端,本实施例中的客户是待测语音信息的说话人,而终端是电话或手机。具体地,呼叫中心上设有录音模块,该录音模块用于对呼叫中心实时采集到的待测语音信息进行录音,以获取该待测语音信息,并将待测语音信息发送给客户终端。
S20’:对待测语音信息进行特征提取,获取语音特征。
S30’:采用身份确认模型对语音特征进行身份验证,获取身份验证信息。
S40’:采用谎言监控模型对语音特征进行谎言验证,获取谎言验证信息。
S50’:基于身份验证信息和谎言验证信息,获取欺诈风险评估结果。
该具体实施方式中,步骤S20’-S50’与上述具体实施方式中步骤S20-S50的实施过程相同,为避免重复,在此不一一赘述。
S60’:将欺诈风险评估结果实时发送给呼叫中心。
本实施例中,将步骤S50获取到的欺诈风险结果实时反馈给呼叫中心,以辅助机构内呼叫中心的坐席人员对客户做出欺诈风险评估判断,使得坐席人员在与客户进行通话过程 中,即可起到反欺诈目的,避免因待测语音信息对应的说话人的欺诈行为造成损失。而且,该语音欺诈识别方法采用人工智能识别方式,处理效率高,且其过程无需配备专业的质检人员进行抽检,可节省人工成本,降低欺诈风险。
该具体实施方式所提供的语音欺诈识别方法中,获取呼叫中心实时采集的待测语音信息,再通过对待测语音信息进行特征提取,以获取语音特征;再采用身份验证模型和谎言验证模型分别对语音特征进行验证,然后基于身份验证信息和谎言验证信息得到欺诈风险评估结果,并将该欺诈风险评估结果实时发送给呼叫中心。该语音欺诈识别方法,可实现对实时采集的语音进行智能识别以获取欺诈风险结果,并能将该欺诈风险结果实时发送给呼叫中心,基于欺诈风险评估结果做出欺诈风险评估判断,其过程处理效率高,实时性强,灵活性高且无需人工干涉,有利于节省人工成本,降低欺诈风险。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
实施例2
图8示出与实施例1中语音欺诈识别方法一一对应的语音欺诈识别装置的原理框图。如图8所示,该语音欺诈识别装置包括待测语音获取模块10、语音特征获取模块20、身份验证获取模块30、谎言验证获取模块40、欺诈风险评估模块50和评估结果发送模块60。其中,待测语音获取模块10、语音特征获取模块20、身份验证获取模块30、谎言验证获取模块40、欺诈风险评估模块50和评估结果发送模块60的实现功能与实施例1中语音欺诈识别方法对应的步骤S10-S60或者步骤S10’-S60’一一对应,为避免赘述,本实施例不一一详述。
待测语音获取模块10,用于获取待测语音信息。
语音特征获取模块20,用于对待测语音信息进行特征提取,获取语音特征。
身份验证获取模块30,用于采用身份确认模型对语音特征进行身份验证,获取身份验证信息。
谎言验证获取模块40,用于采用谎言监控模型对语音特征进行谎言验证,获取谎言验证信息。
欺诈风险评估模块50,用于基于身份验证信息和谎言验证信息,获取欺诈风险评估结果。
优选地,身份验证模块30包括目标文字获取单元31、识别身份获取单元32、标准身份获取单元33和身份验证获取单元34。
目标文字获取单元31,用于采用语音识别模型对语音特征进行语音识别,获取目标文字信息。
识别身份获取单元32,用于对目标文字信息进行关键词提取,获取识别身份信息。
标准身份获取单元33,用于从用户信息库中获取与用户ID相对应的标准身份信息。
身份验证获取单元34,用于基于识别身份信息与标准身份信息,获取身份验证信息。
优选地,目标文字获取子单元31包括单音素特征获取子单元311、三音素特征获取子单元312、初始文字获取子单元313和目标文字获取子单元314。
单音素特征获取子单元311,用于采用单音素训练模型对语音特征进行识别,获取单音素特征序列。
三音素特征获取子单元312,用于采用三音素训练模型对单音素特征序列进行识别,获取三音素特征序列。
初始文字获取子单元313,用于采用长短时递归神经网络模型对三音素特征序列进行识别,获取初始文字信息。
目标文字获取子单元314,用于采用语言模型对初始文字信息进行识别,获取目标文字信息。
优选地,身份验证获取单元34包括身份相似度获取子单元341和身份验证信息获取子单元342。
身份相似度获取子单元341,用于计算识别身份信息和标准身份信息的身份相似度。
身份验证信息获取子单元342,用于将身份相似度与预设相似阈值进行比较,获取身份验证信息。
优选地,谎言验证获取模块40包括特征相似度获取单元41和谎言验证获取单元42。
特征相似度获取单元41,用于将语音特征与谎言语音库中所有的标准特征进行对比,计算语音特征与每一标准特征的特征相似度。
谎言验证获取单元42,用于选取最相似的特征相似度对应的标准特征作为目标特征,并将目标特征对应的标准验证信息作为谎言验证信息。
优选地,欺诈风险评估模块50包括标准值获取单元51、风险值获取单元52和欺诈风险结果获取单元53。
标准值获取单元51,用于对身份验证信息和谎言验证信息进行标准化处理,获取身份验证标准值和谎言验证标准值。
风险值获取单元52,用于将身份验证标准值和谎言验证标准值分别乘以风险权重, 获取身份验证风险值和谎言验证风险值。
欺诈风险结果获取单元53,用于计算身份验证风险值和谎言验证风险值的和,获取欺诈风险评估结果。
优选地,待测语音获取模块10,用于获取呼叫中心实时采集的所述待测语音信息。
评估结果发送模块60,用于将欺诈风险评估结果实时发送给呼叫中心。
实施例3
本实施例提供一计算机可读存储介质,该计算机可读存储介质上存储有计算机可读指令,该计算机可读指令被处理器执行时实现实施例1中语音欺诈识别方法,为避免重复,这里不再赘述。或者,该计算机可读指令被处理器执行时实现实施例2中语音欺诈识别中各模块/单元的功能,为避免重复,这里不再赘述。
实施例4
图9是本申请一实施例提供的终端设备的示意图。如图9所示,该实施例的终端设备90包括:处理器91、存储器92以及存储在存储器92中并可在处理器91上运行的计算机可读指令93。处理器91执行计算机可读指令93时实现上述实施例1中语音欺诈识别方法的步骤,例如图1所示的步骤S10至S50,或者,如图7所示的步骤S10’至S60’。或者,处理器91执行计算机可读指令93时实现上述实施例2中语音欺诈识别装置中各模块/单元的功能,例如图8所示的待测语音获取模块10、语音特征获取模块20、身份验证获取模块30、谎言验证获取模块40、欺诈风险评估模块50和评估结果发送模块60等模块的功能。
示例性的,计算机可读指令93可以被分割成一个或多个模块/单元,一个或者多个模块/单元被存储在存储器92中,并由处理器91执行,以完成本申请。一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令指令段,该指令段用于描述计算机可读指令93在终端设备90中的执行过程。例如,计算机可读指令93可以被分割成实施例2中的待测语音获取模块10、语音特征获取模块20、身份验证获取模块30、谎言验证获取模块40、欺诈风险评估模块50和评估结果发送模块60,各模块具体功能如实施例2所述,在此不一一赘述。
终端设备90可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。终端设备可包括,但不仅限于,处理器91、存储器92。本领域技术人员可以理解,图9仅仅是终端设备90的示例,并不构成对终端设备90的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如终端设备还可以包括输入输出设备、网 络接入设备、总线等。
所称处理器91可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
存储器92可以是终端设备90的内部存储单元,例如终端设备90的硬盘或内存。存储器92也可以是终端设备90的外部存储设备,例如终端设备90上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器92还可以既包括终端设备90的内部存储单元也包括外部存储设备。存储器92用于存储计算机可读指令以及终端设备所需的其他程序和数据。存储器92还可以用于暂时地存储已经输出或者将要输出的数据。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一计算机可读存储介质中,该计算机可读指令在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机可读指令包括计算机可读指令代码,所述计算机可读指令代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机可读指令代码的任何实体 或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括是电载波信号和电信信号。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (22)

  1. 一种语音欺诈识别方法,其特征在于,包括:
    获取待测语音信息;
    对所述待测语音信息进行特征提取,获取语音特征;
    采用身份确认模型对所述语音特征进行身份验证,获取身份验证信息;
    采用谎言监控模型对所述语音特征进行谎言验证,获取谎言验证信息;
    基于所述身份验证信息和所述谎言验证信息,获取欺诈风险评估结果。
  2. 根据权利要求1所述的语音欺诈识别方法,其特征在于,所述待测语音信息与用户ID关联;
    所述采用身份确认模型对所述语音特征进行身份验证,获取身份验证信息,包括:
    采用语音识别模型对所述语音特征进行语音识别,获取目标文字信息;
    对所述目标文字信息进行关键词提取,获取识别身份信息;
    从用户信息库中获取与所述用户ID相对应的标准身份信息;
    基于所述识别身份信息与所述标准身份信息,获取所述身份验证信息。
  3. 根据权利要求2所述的语音欺诈识别方法,其特征在于,所述采用语音识别模型对所述语音特征进行语音识别,获取目标文字信息,包括:
    采用单音素训练模型对所述语音特征进行识别,获取单音素特征序列;
    采用三音素训练模型对所述单音素特征序列进行识别,获取三音素特征序列;
    采用长短时递归神经网络模型对所述三音素特征序列进行识别,获取初始文字信息;
    采用语言模型对所述初始文字信息进行识别,获取所述目标文字信息。
  4. 根据权利要求2所述的语音欺诈识别方法,其特征在于,所述基于所述识别身份信息与所述标准身份信息,获取所述身份验证信息,包括:
    计算所述识别身份信息和所述标准身份信息的身份相似度;
    将所述身份相似度与预设相似阈值进行比较,获取所述身份验证信息。
  5. 根据权利要求1所述的语音欺诈识别方法,其特征在于,所述采用谎言监控模型对所述语音特征进行谎言验证,获取谎言验证信息,包括:
    将所述语音特征与谎言语音库中所有的标准特征进行对比,计算所述语音特征与每一所述标准特征的特征相似度;
    选取最相似的所述特征相似度对应的标准特征作为目标特征,并将所述目标特征对应 的标准验证信息作为所述谎言验证信息。
  6. 根据权利要求1所述的语音欺诈识别方法,其特征在于,所述基于所述身份验证信息和所述谎言验证信息,获取欺诈风险评估结果,包括:
    对所述身份验证信息和所述谎言验证信息进行标准化处理,获取身份验证标准值和谎言验证标准值;
    将所述身份验证标准值和所述谎言验证标准值分别乘以风险权重,获取身份验证风险值和谎言验证风险值;
    计算所述身份验证风险值和所述谎言验证风险值的和,获取所述欺诈风险评估结果。
  7. 根据权利要求1所述的语音欺诈识别方法,其特征在于,所述获取待测语音信息,包括:获取呼叫中心实时采集的所述待测语音信息;
    所述语音欺诈识别方法还包括:
    将所述欺诈风险评估结果实时发送给所述呼叫中心。
  8. 一种语音欺诈识别装置,其特征在于,包括:
    待测语音获取模块,用于获取待测语音信息;
    语音特征获取模块,用于对所述待测语音信息进行特征提取,获取语音特征;
    身份验证获取模块,用于采用身份确认模型对所述语音特征进行身份验证,获取身份验证信息;
    谎言验证获取模块,用于采用谎言监控模型对所述语音特征进行谎言验证,获取谎言验证信息;
    欺诈风险评估模块,用于基于所述身份验证信息和所述谎言验证信息,获取欺诈风险评估结果。
  9. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取待测语音信息;
    对所述待测语音信息进行特征提取,获取语音特征;
    采用身份确认模型对所述语音特征进行身份验证,获取身份验证信息;
    采用谎言监控模型对所述语音特征进行谎言验证,获取谎言验证信息;
    基于所述身份验证信息和所述谎言验证信息,获取欺诈风险评估结果。
  10. 根据权利要求9所述的终端设备,其特征在于,所述待测语音信息与用户ID关联;
    所述采用身份确认模型对所述语音特征进行身份验证,获取身份验证信息,包括:
    采用语音识别模型对所述语音特征进行语音识别,获取目标文字信息;
    对所述目标文字信息进行关键词提取,获取识别身份信息;
    从用户信息库中获取与所述用户ID相对应的标准身份信息;
    基于所述识别身份信息与所述标准身份信息,获取所述身份验证信息。
  11. 根据权利要求10所述的终端设备,其特征在于,所述采用语音识别模型对所述语音特征进行语音识别,获取目标文字信息,包括:
    采用单音素训练模型对所述语音特征进行识别,获取单音素特征序列;
    采用三音素训练模型对所述单音素特征序列进行识别,获取三音素特征序列;
    采用长短时递归神经网络模型对所述三音素特征序列进行识别,获取初始文字信息;
    采用语言模型对所述初始文字信息进行识别,获取所述目标文字信息。
  12. 根据权利要求10所述的终端设备,其特征在于,所述基于所述识别身份信息与所述标准身份信息,获取所述身份验证信息,包括:
    计算所述识别身份信息和所述标准身份信息的身份相似度;
    将所述身份相似度与预设相似阈值进行比较,获取所述身份验证信息。
  13. 根据权利要求9所述的终端设备,其特征在于,所述采用谎言监控模型对所述语音特征进行谎言验证,获取谎言验证信息,包括:
    将所述语音特征与谎言语音库中所有的标准特征进行对比,计算所述语音特征与每一所述标准特征的特征相似度;
    选取最相似的所述特征相似度对应的标准特征作为目标特征,并将所述目标特征对应的标准验证信息作为所述谎言验证信息。
  14. 根据权利要求9所述的终端设备,其特征在于,所述基于所述身份验证信息和所述谎言验证信息,获取欺诈风险评估结果,包括:
    对所述身份验证信息和所述谎言验证信息进行标准化处理,获取身份验证标准值和谎言验证标准值;
    将所述身份验证标准值和所述谎言验证标准值分别乘以风险权重,获取身份验证风险值和谎言验证风险值;
    计算所述身份验证风险值和所述谎言验证风险值的和,获取所述欺诈风险评估结果。
  15. 根据权利要求9所述的终端设备,其特征在于,所述获取待测语音信息,包括:获取呼叫中心实时采集的所述待测语音信息;
    所述语音欺诈识别方法还包括:
    将所述欺诈风险评估结果实时发送给所述呼叫中心。
  16. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现如下步骤:
    获取待测语音信息;
    对所述待测语音信息进行特征提取,获取语音特征;
    采用身份确认模型对所述语音特征进行身份验证,获取身份验证信息;
    采用谎言监控模型对所述语音特征进行谎言验证,获取谎言验证信息;
    基于所述身份验证信息和所述谎言验证信息,获取欺诈风险评估结果。
  17. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述待测语音信息与用户ID关联;
    所述采用身份确认模型对所述语音特征进行身份验证,获取身份验证信息,包括:
    采用语音识别模型对所述语音特征进行语音识别,获取目标文字信息;
    对所述目标文字信息进行关键词提取,获取识别身份信息;
    从用户信息库中获取与所述用户ID相对应的标准身份信息;
    基于所述识别身份信息与所述标准身份信息,获取所述身份验证信息。
  18. 根据权利要求17所述的计算机可读存储介质,其特征在于,所述采用语音识别模型对所述语音特征进行语音识别,获取目标文字信息,包括:采用单音素训练模型对所述语音特征进行识别,获取单音素特征序列;
    采用三音素训练模型对所述单音素特征序列进行识别,获取三音素特征序列;
    采用长短时递归神经网络模型对所述三音素特征序列进行识别,获取初始文字信息;
    采用语言模型对所述初始文字信息进行识别,获取所述目标文字信息。
  19. 根据权利要求17所述的计算机可读存储介质,其特征在于,所述基于所述识别身份信息与所述标准身份信息,获取所述身份验证信息,包括:
    计算所述识别身份信息和所述标准身份信息的身份相似度;
    将所述身份相似度与预设相似阈值进行比较,获取所述身份验证信息。
  20. 根据权利要求17所述的计算机可读存储介质,其特征在于,所述采用谎言监控模型对所述语音特征进行谎言验证,获取谎言验证信息,包括:
    将所述语音特征与谎言语音库中所有的标准特征进行对比,计算所述语音特征与每一所述标准特征的特征相似度;
    选取最相似的所述特征相似度对应的标准特征作为目标特征,并将所述目标特征对应的标准验证信息作为所述谎言验证信息。
  21. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述基于所述身份验证信息和所述谎言验证信息,获取欺诈风险评估结果,包括:
    对所述身份验证信息和所述谎言验证信息进行标准化处理,获取身份验证标准值和谎言验证标准值;
    将所述身份验证标准值和所述谎言验证标准值分别乘以风险权重,获取身份验证风险值和谎言验证风险值;
    计算所述身份验证风险值和所述谎言验证风险值的和,获取所述欺诈风险评估结果。
  22. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述获取待测语音信息,包括:获取呼叫中心实时采集的所述待测语音信息;
    所述语音欺诈识别方法还包括;
    将所述欺诈风险评估结果实时发送给所述呼叫中心。
PCT/CN2017/104891 2017-08-24 2017-09-30 语音欺诈识别方法、装置、终端设备及存储介质 WO2019037205A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710734301.0 2017-08-24
CN201710734301.0A CN107680602A (zh) 2017-08-24 2017-08-24 语音欺诈识别方法、装置、终端设备及存储介质

Publications (1)

Publication Number Publication Date
WO2019037205A1 true WO2019037205A1 (zh) 2019-02-28

Family

ID=61134821

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/104891 WO2019037205A1 (zh) 2017-08-24 2017-09-30 语音欺诈识别方法、装置、终端设备及存储介质

Country Status (2)

Country Link
CN (1) CN107680602A (zh)
WO (1) WO2019037205A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109905282A (zh) * 2019-04-09 2019-06-18 国家计算机网络与信息安全管理中心 基于lstm的诈骗电话预测方法及预测系统
CN112329438A (zh) * 2020-10-27 2021-02-05 中科极限元(杭州)智能科技股份有限公司 基于域对抗训练的自动谎言检测方法及系统

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108492196B (zh) * 2018-03-08 2020-11-10 平安医疗健康管理股份有限公司 通过数据分析推断医疗保险违规行为的风控方法
CN108416592B (zh) * 2018-03-19 2022-08-05 成都信达智胜科技有限公司 一种高速语音识别方法
CN108564940B (zh) * 2018-03-20 2020-04-28 平安科技(深圳)有限公司 语音识别方法、服务器及计算机可读存储介质
CN110797008B (zh) * 2018-07-16 2024-03-29 阿里巴巴集团控股有限公司 一种远场语音识别方法、语音识别模型训练方法和服务器
US10692490B2 (en) * 2018-07-31 2020-06-23 Cirrus Logic, Inc. Detection of replay attack
CN109471953A (zh) * 2018-10-11 2019-03-15 平安科技(深圳)有限公司 一种语音数据检索方法及终端设备
CN109543516A (zh) * 2018-10-16 2019-03-29 深圳壹账通智能科技有限公司 签约意向判断方法、装置、计算机设备和存储介质
CN109451182B (zh) * 2018-10-19 2021-08-13 北京邮电大学 一种诈骗电话的检测方法和装置
CN109493882A (zh) * 2018-11-04 2019-03-19 国家计算机网络与信息安全管理中心 一种诈骗电话语音自动标注系统及方法
CN109344232B (zh) * 2018-11-13 2024-03-15 平安科技(深圳)有限公司 一种舆情信息检索方法及终端设备
CN111292739B (zh) * 2018-12-10 2023-03-31 珠海格力电器股份有限公司 一种语音控制方法、装置、存储介质及空调
CN109657181B (zh) * 2018-12-13 2024-05-14 平安科技(深圳)有限公司 互联网信息链式存储方法、装置、计算机设备及存储介质
CN111798857A (zh) * 2019-04-08 2020-10-20 北京嘀嘀无限科技发展有限公司 一种信息识别方法、装置、电子设备及存储介质
CN110136727B (zh) * 2019-04-16 2024-04-16 平安科技(深圳)有限公司 基于说话内容的说话者身份识别方法、装置及存储介质
CN110033778B (zh) * 2019-05-07 2021-07-23 苏州市职业大学 一种说谎状态实时识别修正系统
CN111862946B (zh) * 2019-05-17 2024-04-19 北京嘀嘀无限科技发展有限公司 一种订单处理方法、装置、电子设备及存储介质
CN110111796B (zh) * 2019-06-24 2021-09-17 秒针信息技术有限公司 识别身份的方法及装置
CN110362999B (zh) * 2019-06-25 2023-04-18 创新先进技术有限公司 用于检测账户使用异常的方法及装置
CN110491368B (zh) * 2019-07-23 2023-06-16 平安科技(深圳)有限公司 基于方言背景的语音识别方法、装置、计算机设备和存储介质
CN110570199B (zh) * 2019-07-24 2022-10-11 中国科学院信息工程研究所 一种基于用户输入行为的用户身份检测方法及系统
CN110738998A (zh) * 2019-09-11 2020-01-31 深圳壹账通智能科技有限公司 基于语音的个人信用评估方法、装置、终端及存储介质
CN112784038A (zh) * 2019-10-23 2021-05-11 阿里巴巴集团控股有限公司 信息的识别方法、系统、计算设备及存储介质
CN110751553A (zh) * 2019-10-24 2020-02-04 深圳前海微众银行股份有限公司 潜在风险对象的识别方法、装置、终端设备及存储介质
CN113112992B (zh) * 2019-12-24 2022-09-16 中国移动通信集团有限公司 一种语音识别方法、装置、存储介质和服务器
CN111429918A (zh) * 2020-03-26 2020-07-17 云知声智能科技股份有限公司 一种基于声纹识别和意图分析的访电话诈骗方法和系统
CN111601000B (zh) * 2020-05-14 2022-03-08 支付宝(杭州)信息技术有限公司 通信网络诈骗的识别方法、装置和电子设备
CN111816203A (zh) * 2020-06-22 2020-10-23 天津大学 基于音素级分析抑制音素影响的合成语音检测方法
CN114067834B (zh) * 2020-07-30 2024-08-09 中国移动通信集团有限公司 一种不良前导音识别方法、装置、存储介质和计算机设备
CN112216270B (zh) * 2020-10-09 2024-02-06 携程计算机技术(上海)有限公司 语音音素的识别方法及系统、电子设备及存储介质
CN112331230B (zh) * 2020-11-17 2024-07-05 平安科技(深圳)有限公司 一种欺诈行为识别方法、装置、计算机设备及存储介质
CN112466056B (zh) * 2020-12-01 2022-04-05 上海旷日网络科技有限公司 一种基于语音识别的自助柜取件系统及方法
CN112669881B (zh) * 2020-12-25 2023-02-28 北京融讯科创技术有限公司 一种语音检测方法、装置、终端及存储介质
CN112800272A (zh) * 2021-01-18 2021-05-14 德联易控科技(北京)有限公司 识别保险理赔欺诈行为的方法及装置
CN113808603B (zh) * 2021-09-29 2023-07-07 恒安嘉新(北京)科技股份公司 一种音频篡改检测方法、装置、服务器和存储介质
CN114512144B (zh) * 2022-01-28 2024-05-17 中国人民公安大学 一种识别恶意语音信息的方法、装置、介质和设备
CN114648994A (zh) * 2022-02-23 2022-06-21 厦门快商通科技股份有限公司 一种声纹鉴定比对推荐方法、装置、电子设备及存储介质
CN117291615B (zh) * 2023-11-27 2024-02-06 成都乐超人科技有限公司 基于网络支付下克服反欺诈的可视化对比分析方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060248019A1 (en) * 2005-04-21 2006-11-02 Anthony Rajakumar Method and system to detect fraud using voice data
CN102737634A (zh) * 2012-05-29 2012-10-17 百度在线网络技术(北京)有限公司 一种基于语音的认证方法及装置
CN103078828A (zh) * 2011-10-25 2013-05-01 上海博路信息技术有限公司 一种云模式的语音鉴权系统
CN103731832A (zh) * 2013-12-26 2014-04-16 黄伟 防电话、短信诈骗的系统和方法
CN103971700A (zh) * 2013-08-01 2014-08-06 哈尔滨理工大学 语音监控方法及装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101697514B (zh) * 2009-10-22 2016-08-24 中兴通讯股份有限公司 一种身份验证的方法及系统
CN102104676A (zh) * 2009-12-21 2011-06-22 深圳富泰宏精密工业有限公司 具测谎功能的无线通信装置及其测谎方法
CN103313249B (zh) * 2013-05-07 2017-05-10 百度在线网络技术(北京)有限公司 用于终端的提醒方法、系统和服务器
CN105991593B (zh) * 2015-02-15 2019-08-30 阿里巴巴集团控股有限公司 一种识别用户风险的方法及装置
CN106921495A (zh) * 2015-12-24 2017-07-04 阿里巴巴集团控股有限公司 一种验证用户身份方法及装置
CN105701704A (zh) * 2015-12-31 2016-06-22 先花信息技术(北京)有限公司 用户可信度社交网络数据的处理方法
CN105575404A (zh) * 2016-01-25 2016-05-11 薛明博 一种基于语音识别的心理检测方法及系统
CN106157135A (zh) * 2016-07-14 2016-11-23 微额速达(上海)金融信息服务有限公司 基于声纹识别性别年龄的防欺诈系统及方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060248019A1 (en) * 2005-04-21 2006-11-02 Anthony Rajakumar Method and system to detect fraud using voice data
CN103078828A (zh) * 2011-10-25 2013-05-01 上海博路信息技术有限公司 一种云模式的语音鉴权系统
CN102737634A (zh) * 2012-05-29 2012-10-17 百度在线网络技术(北京)有限公司 一种基于语音的认证方法及装置
CN103971700A (zh) * 2013-08-01 2014-08-06 哈尔滨理工大学 语音监控方法及装置
CN103731832A (zh) * 2013-12-26 2014-04-16 黄伟 防电话、短信诈骗的系统和方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109905282A (zh) * 2019-04-09 2019-06-18 国家计算机网络与信息安全管理中心 基于lstm的诈骗电话预测方法及预测系统
CN112329438A (zh) * 2020-10-27 2021-02-05 中科极限元(杭州)智能科技股份有限公司 基于域对抗训练的自动谎言检测方法及系统
CN112329438B (zh) * 2020-10-27 2024-03-08 中科极限元(杭州)智能科技股份有限公司 基于域对抗训练的自动谎言检测方法及系统

Also Published As

Publication number Publication date
CN107680602A (zh) 2018-02-09

Similar Documents

Publication Publication Date Title
WO2019037205A1 (zh) 语音欺诈识别方法、装置、终端设备及存储介质
CN107680582B (zh) 声学模型训练方法、语音识别方法、装置、设备及介质
Tirumala et al. Speaker identification features extraction methods: A systematic review
CN112259106B (zh) 声纹识别方法、装置、存储介质及计算机设备
CN109087648B (zh) 柜台语音监控方法、装置、计算机设备及存储介质
Sarangi et al. Optimization of data-driven filterbank for automatic speaker verification
Kinnunen et al. An overview of text-independent speaker recognition: From features to supervectors
Zhan et al. Vocal tract length normalization for large vocabulary continuous speech recognition
Kumar et al. Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm
Das et al. Speaker verification from short utterance perspective: a review
Koolagudi et al. Dravidian language classification from speech signal using spectral and prosodic features
Karthikeyan Adaptive boosted random forest-support vector machine based classification scheme for speaker identification
Velayuthapandian et al. A focus module-based lightweight end-to-end CNN framework for voiceprint recognition
Goh et al. Robust computer voice recognition using improved MFCC algorithm
Nijhawan et al. Speaker recognition using support vector machine
Ranjan et al. Text-dependent multilingual speaker identification for indian languages using artificial neural network
Kinnunen Optimizing spectral feature based text-independent speaker recognition
Chandra Keyword spotting system for Tamil isolated words using Multidimensional MFCC and DTW algorithm
Nagakrishnan et al. Generic speech based person authentication system with genuine and spoofed utterances: different feature sets and models
Jawarkar et al. Effect of nonlinear compression function on the performance of the speaker identification system under noisy conditions
Selvan et al. Speaker recognition system for security applications
Messerle et al. Accuracy of feature extraction approaches in the task of recognition and classification of isolated words in speech
Balpande et al. Speaker recognition based on mel-frequency cepstral coefficients and vector quantization
Abdiche et al. Text-independent speaker identification using mel-frequency energy coefficients and convolutional neural networks
Bhable et al. Automatic speech recognition (ASR) of isolated words in Hindi low resource language

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17922209

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25/09/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17922209

Country of ref document: EP

Kind code of ref document: A1