CN109493882A

CN109493882A - A kind of fraudulent call voice automatic marking system and method

Info

Publication number: CN109493882A
Application number: CN201811304612.4A
Authority: CN
Inventors: 张震; 李鹏; 黄远; 高圣翔; 杜裕琴; 倪江帆
Original assignee: Xun Feizhi Metamessage Science And Technology Ltd; National Computer Network and Information Security Management Center
Current assignee: Xun Feizhi Metamessage Science And Technology Ltd; National Computer Network and Information Security Management Center
Priority date: 2018-11-04
Filing date: 2018-11-04
Publication date: 2019-03-19

Abstract

The present invention discloses a kind of fraudulent call voice automatic marking system, including basic dimension labeling module, vocal print labeling module, continuous speech recognition labeling module, the output end of the basis dimension labeling module is connected with the input terminal of the input terminal of the vocal print labeling module, the continuous speech recognition labeling module respectively.The present invention also proposes a kind of fraudulent call voice automatic marking method, specifically comprises the following steps: basic dimension annotation step；Continuous speech recognition step；Vocal print annotation step.The present invention is realized by intelligent sound technology and is automatically processed to input voice data, that realizes voice data automatically analyzes identification, row label of going forward side by side marks work in advance, in conjunction with manual confirmation, carry out effective mark management of target data dimensional labels, call voice data, application and effect of the Intelligent Optimal voice technology under telephone fraud scene are effectively utilized, while guaranteeing DecryptDecryption and encrypted transmission of the data in annotation process.

Description

A kind of fraudulent call voice automatic marking system and method

Technical field

The present invention relates to a kind of fraudulent call voice automatic marking system and methods, belong to swindle early warning technology field.

Background technique

Telephone fraud has high flexibility, variability, antagonism as contactless crime, depends merely on single fixation Technical thought is difficult to cope with the form variation of complicated swindle gimmick, so confrontation telecommunication fraud activity needs to utilize artificial intelligence The existing bottleneck of technological break-through, depth excavate various types and swindle mode, form the quick discovery pre-alerting ability of swindle mode, realize Comprehensive discovery to swindle form, support fraudulent call pipe diameter design upgrading, promotes fraudulent call and manages ability.

The pre-alarm and prevention of domestic telecommunication swindle at present mainly has based on signaling data, the fraudulent call based on recording template matching The technology paths such as early warning and natural person's fraudulent call early warning based on intelligent sound technology.

The ticket information that communicating data is wherein relied primarily on based on signaling data early warning technology passes through point of phone bill evidence Analysis carries out the early warning output of call abnormal behavior call.Fraudulent call early warning based on recording template matching and based on intelligent language Natural person's fraudulent call early warning of sound technology is based primarily upon call voice data, is handled by the analysis to call voice data, Realize the quick early warning discovery of synthesized voice nocuousness phone and natural person's nocuousness phone.Analyze the call behavior of current fraudulent call Feature, in conjunction with existing fraudulent call early warning identification technology, in order to effectively promote the pre-alerting ability of fraudulent call in telecommunication network, Enough practical fraudulent call data need to be provided, by the mark to these data, provide training pattern for intelligent sound technology Required multidimensional attribute label.

But the data mark system of current standardization not yet marks work to instruct, for example, there are label dimensions It is indefinite, covering is not complete, no sensitive DecryptDecryption cryptographic means, mark degree is low for automation, annotation results lack and effectively veritify mode The problems such as.Therefore, for these problems, the data mask method for needing a set of standardization carrys out guide data mark work, thus The model training and iteration optimization ability of such business the relevant technologies are effectively promoted, power-assisted hits the more preferable hair of telecommunication fraud business Exhibition.

Summary of the invention

In order to effectively utilize call voice data, application and effect of the Intelligent Optimal voice technology under telephone fraud scene Fruit, while guaranteeing DecryptDecryption and encrypted transmission of the data in annotation process, the present invention proposes that a kind of fraudulent call voice is marked automatically Injection system and method, the main fining mark for studying voice corpus, while the automatic marking technology of corpus is studied, eventually by Labeling system is built, and realizes the mark and data management work of data label dimension.

In order to solve the above technical problems, the present invention provides a kind of fraudulent call voice automatic marking system, which is characterized in that Including basic dimension labeling module, vocal print labeling module, continuous speech recognition labeling module, the basis dimension labeling module Output end is connected with the input terminal of the input terminal of the vocal print labeling module, the continuous speech recognition labeling module respectively.

As a kind of preferred embodiment, the basis dimension labeling module include basic dimension speech preprocessing module, Languages identification module, basic dimension falsetto identification module, men and women's sound identification module, the basis dimension speech preprocessing module with The languages identification module is connected, and the languages identification module is connected with the basic dimension falsetto identification module, described Basic dimension falsetto identification module is connected with men and women's sound identification module.

As a kind of preferred embodiment, the basis dimension speech preprocessing module for natural-sounding for that will identify The application demand of analysis, swindle text discovery and voice content depth analysis, the speech provided in telephone channel voice data are living Dynamic detection, CRBT detection, the detection of invalid sound and efficient voice detection function；The languages identification module is used for by extracting call The core feature of voice, and model comparison and score judgement are carried out, to provide the languages identification service of call voice；The base Plinth dimension falsetto identification module is used to carry out fast accurate identification to synthesized voice template data；Men and women's sound identification module is used for It is synchronous to use men and women's sound identification technology according to the analysis to fraudulent call feature, according to male voice and female voice due to physiology acoustical generator Frequency spectrum difference caused by the difference of official carries out Sexual discriminating to speech utterance person.

As a kind of preferred embodiment, the continuous speech recognition labeling module include continuous speech preprocessing module, Continuous speech recognition module, the input terminal of the output end of the continuous speech preprocessing module and the continuous speech recognition module It is connected；After the continuous speech preprocessing module is used to receive input target voice, by according to the Energy distribution in voice, into The cutting of row sound bite, the data acquisition system as the subsequent input continuous speech recognition resume module；The continuous speech Identification module is used to provide the continuous speech recognition engine of bottom, and the content of every sound bite of input is effectively treated And export corresponding content of text.

As a kind of preferred embodiment, the vocal print labeling module includes vocal print speech preprocessing module, vocal print cluster Module, vocal print falsetto identification module, the input of the output end of the vocal print speech preprocessing module and the vocal print cluster module End is connected, and the output end of the vocal print cluster module is connected with the input terminal of the vocal print falsetto identification module；The sound The voice data of swindle voice is confirmed as in the input terminal input of line speech preprocessing module by the basic dimension labeling module；Institute It states vocal print speech preprocessing module to be used to after voice is swindled in input, engine believes the speaker according to contained by the swindle voice Breath carries out speaker's separation, and will carry out the filtering of invalid voice and carry out speech enhan-cement to efficient voice content；The vocal print Falsetto identification module is to be made whether that synthesized voice judges to each the speaker's sound bite identified, convenient for quickly right Synthesized voice data and speaker's voice are effectively distinguished.

As a kind of preferred embodiment, the vocal print cluster module includes that validity detection module, vocal print are registered automatically Module, vocal print identity comparison module, the validation checking module are used to the sorting speech length from alternate data and conform to The voice data asked, further by going ringing tone, speech detection, speech quality detection technique to filter out and meet from Alternative voice The voice data of automatic registered standard；The automatic registration module of vocal print is used to use to by the voice data of validation checking The correspondence voiceprint registration of automatic marking technology completion current data；The vocal print identity comparison module is used to newest registration Vocal print is compared with registered history vocal print library, if similarity is greater than threshold value, then it is assumed that current sound in history vocal print library Line does not change, and updates original vocal print feature using new registration vocal print；Otherwise, vocal print alteration detection is carried out.

The present invention also proposes a kind of fraudulent call voice automatic marking method, specifically comprises the following steps:

Step SS1: basic dimension annotation step；

Step SS2: continuous speech recognition step；

Step SS3: vocal print annotation step.

As a kind of preferred embodiment, the basis dimension annotation step is specifically included:

Step SS11: basic dimension voice pre-treatment step specifically includes: will be for natural-sounding discriminance analysis, swindle The application demand of text discovery and voice content depth analysis, provides voice activity detection in telephone channel voice data, coloured silk Bell detection, the detection of invalid sound and efficient voice detection function, it is ensured that without such invalid data in the data of subsequent mark processing, mention High data user rate；

Step SS12: languages identification step specifically includes: by extracting the core feature of call voice, and carrying out model It compares and score is adjudicated, to provide the languages identification service of call voice, languages identification is often as speech recognition and other One front end processing techniques of related application.Languages identify engine, can recognize used in be what languages, such as English, French, German etc.；Or which kind of native language, Ru Han, hiding, dimension, illiteracy etc.；

Step SS13: basic dimension falsetto identification step specifically includes: carrying out fast accurate knowledge to synthesized voice template data Not；

Step SS14: men and women's sound identification step specifically includes: in telephone fraud scene, existing a large amount of by harmful conjunction The phenomenon that audio template is grouied busy, it is synchronous to use men and women's sound identification technology according to the analysis to fraudulent call feature, according to male Sound and female voice the frequency spectrum difference due to caused by the difference of physiology vocal organs carry out Sexual discriminating to speech utterance person.

As a kind of preferred embodiment, in the early warning technology of fraudulent call, it is intended that understand and be used as core technology route One of, it can be realized the fast accurate early warning of same type set pattern fraudulent call, and be intended to understand the basis of technical application, it is to turn Write content, it is intended that understand mainly through the analysis to transcription content topic content, carry out the early warning work of fraudulent call, therefore this The continuous speech recognition step that invention uses specifically includes:

Step SS21: continuous speech pre-treatment step specifically includes:, will be according in voice after receiving input target voice Energy distribution, carry out the cutting of sound bite, the data acquisition system as the subsequent input continuous speech recognition resume module；

Step SS22: continuous speech recognition step specifically includes: when marking transcription content, fully relies on and manually goes to mark, Its efficiency and accuracy rate all will be unable to guarantee, in order to effectively improve annotating efficiency and accuracy rate, the continuous speech for providing bottom is known The content of every sound bite of input is effectively treated and exports corresponding content of text by other engine.

As a kind of preferred embodiment, under telephone fraud scene, swindle molecule leads in order not to the identity to stick one's chin out Often the identity of oneself can be hidden by way of the persistently number of changing, hiding number.For this characteristic, collection swindle people can be taken The mode of vocal print, constantly accumulation swindle voice line library, to support the early warning function for carrying out swindle molecule fraudulent call by vocal print Can, the vocal print annotation step specifically includes:

Step SS31: vocal print voice pre-treatment step specifically includes: after voice is swindled in input, engine will be according to described Speaker information contained by voice is swindled, carries out speaker's separation, and the filtering of invalid voice will be carried out and in efficient voice Hold and carries out speech enhan-cement；

Step SS32: it is same to specifically include the automatic registration step of validation checking step, vocal print, vocal print for vocal print sorting procedure One property comparison step；The validation checking step specifically includes: for from alternate data sorting speech length meet the requirements Voice data, further by go ringing tone, speech detection, speech quality detection technique filtered out from Alternative voice meet from The voice data of dynamic registered standard；The automatic registration step of vocal print specifically includes: to the voice data by validation checking The correspondence voiceprint registration of current data is completed using automatic marking technology；The vocal print identity comparison step specifically includes: will The vocal print of newest registration is compared with registered history vocal print library, if similarity is greater than threshold value, then it is assumed that history vocal print Current vocal print does not change in library, updates original vocal print feature using new registration vocal print；Otherwise, vocal print alteration detection is carried out；

Step SS33: vocal print falsetto identification step specifically includes: to each the speaker's sound bite identified into Whether synthesized voice judges row, convenient for quickly effectively being distinguished to synthesized voice data and speaker's voice.

Advantageous effects of the invention: the present invention mainly pass through intelligent sound technology realize to input voice data into Row automatically processes, and that realizes voice data automatically analyzes identification, and row label of going forward side by side marks work in advance, in conjunction with manual confirmation, into Effective mark of row target data dimensional labels manages.After whole system data flow includes: system input data, pass through first Basic dimension module utilizes the processing such as the voice pretreatment on backstage, languages identification, falsetto identification and the identification of men and women's sound, output one The corresponding label information of the complete voice of item manually can carry out result review according to the pre- annotation results of system.For wherein marking And the input as continuous speech recognition and voiceprint identification module is carried out corresponding transcription text by the data for being confirmed as swindle Content and the mark for swindling voice line accumulate work.

Detailed description of the invention

Fig. 1 is a kind of overall structure block diagram of fraudulent call voice automatic marking system of the invention.

Fig. 2 is the structural block diagram of basic dimension labeling module of the invention.

Fig. 3 is the structural block diagram of continuous speech recognition labeling module of the invention.

Fig. 4 is the structural block diagram of vocal print labeling module of the invention.

Specific embodiment

The invention will be further described below in conjunction with the accompanying drawings.Following embodiment is only used for clearly illustrating the present invention Technical solution, and not intended to limit the protection scope of the present invention.

As shown in Fig. 1 the overall structure block diagram of a kind of fraudulent call voice automatic marking system of the invention.This hair It is bright that a kind of fraudulent call voice automatic marking system is provided, which is characterized in that mark mould including basic dimension labeling module, vocal print Block, continuous speech recognition labeling module, the output end of the basis dimension labeling module respectively with the vocal print labeling module Input terminal, the continuous speech recognition labeling module input terminal be connected.

As shown in Fig. 2 the structural block diagram of basic dimension labeling module of the invention.As a kind of preferred embodiment, The basis dimension labeling module includes basic dimension speech preprocessing module, languages identification module, basic dimension falsetto identification Module, men and women's sound identification module, the basis dimension speech preprocessing module are connected with the languages identification module, institute's predicate Kind identification module is connected with the basic dimension falsetto identification module, the basis dimension falsetto identification module and the men and women Sound identification module is connected.

Fig. 3 is the structural block diagram of continuous speech recognition labeling module of the invention.It is described as a kind of preferred embodiment Continuous speech recognition labeling module includes continuous speech preprocessing module, continuous speech recognition module, and the continuous speech is located in advance The output end of reason module is connected with the input terminal of the continuous speech recognition module；The continuous speech preprocessing module is used to After receiving input target voice, by according to the Energy distribution in voice, the cutting of sound bite is carried out, as described in subsequent input The data acquisition system of continuous speech recognition resume module；The continuous speech recognition module is used to provide the continuous speech recognition of bottom The content of every sound bite of input is effectively treated and exports corresponding content of text by engine.

Fig. 4 is the structural block diagram of vocal print labeling module of the invention.As a kind of preferred embodiment, the vocal print mark Module includes vocal print speech preprocessing module, vocal print cluster module, vocal print falsetto identification module, and the vocal print voice pre-processes mould The output end of block is connected with the input terminal of the vocal print cluster module, the output end and the vocal print of the vocal print cluster module The input terminal of falsetto identification module is connected；The input terminal input of the vocal print speech preprocessing module is by the basic dimension mark Injection molding block is confirmed as the voice data of swindle voice；The vocal print speech preprocessing module is used to after voice is swindled in input, draws It holds up the speaker information according to contained by the swindle voice, carries out speaker's separation, and the filtering of invalid voice will be carried out simultaneously Speech enhan-cement is carried out to efficient voice content；The vocal print falsetto identification module is to speak human speech to each identified Tablet section is made whether that synthesized voice judges, convenient for quickly effectively being distinguished to synthesized voice data and speaker's voice.

Step SS1: basic dimension annotation step；

Step SS2: continuous speech recognition step；

Step SS3: vocal print annotation step.

Step SS11: basic dimension voice pre-treatment step specifically includes: will be for natural-sounding discriminance analysis, swindle The application demand of text discovery and voice content depth analysis, provides voice activity detection in telephone channel voice data, coloured silk Bell detection, the detection of invalid sound and efficient voice detection function；

Step SS12: languages identification step specifically includes: by extracting the core feature of call voice, and carrying out model It compares and score is adjudicated, to provide the languages identification service of call voice；

Step SS14: men and women's sound identification step specifically includes: synchronous to use male according to the analysis to fraudulent call feature Female voice identification technology sends out voice according to male voice and female voice the frequency spectrum difference due to caused by the difference of physiology vocal organs Sound person carries out Sexual discriminating.

As a kind of preferred embodiment, the continuous speech recognition step is specifically included:

Step SS22: continuous speech recognition step specifically includes: providing the continuous speech recognition engine of bottom, will input The content of every sound bite be effectively treated and export corresponding content of text.

As a kind of preferred embodiment, the vocal print annotation step is specifically included:

The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the technical principles of the invention, several improvement and deformations can also be made, these improvement and deformations Also it should be regarded as protection scope of the present invention.

Claims

1. a kind of fraudulent call voice automatic marking system, which is characterized in that mark mould including basic dimension labeling module, vocal print Block, continuous speech recognition labeling module, the output end of the basis dimension labeling module respectively with the vocal print labeling module Input terminal, the continuous speech recognition labeling module input terminal be connected.

2. a kind of fraudulent call voice automatic marking system according to claim 1, which is characterized in that the basis dimension Labeling module includes basic dimension speech preprocessing module, languages identification module, basic dimension falsetto identification module, the knowledge of men and women's sound Other module, it is described basis dimension speech preprocessing module be connected with the languages identification module, the languages identification module and The basis dimension falsetto identification module is connected, the basis dimension falsetto identification module and men and women's sound identification module phase Connection.

3. a kind of fraudulent call voice automatic marking system according to claim 2, which is characterized in that the basis dimension Speech preprocessing module will be for that will be directed to the application of natural-sounding discriminance analysis, swindle text discovery and voice content depth analysis Demand provides voice activity detection, CRBT detection, the detection of invalid sound and efficient voice detection function in telephone channel voice data Energy；The languages identification module is used for the core feature by extracting call voice, and carries out model comparison and score judgement, To provide the languages identification service of call voice；The basis dimension falsetto identification module is used to carry out synthesized voice template data Fast accurate identification；Men and women's sound identification module is used for according to the analysis to fraudulent call feature, synchronous to be known using men and women's sound Other technology, according to male voice and female voice the frequency spectrum difference due to caused by the difference of physiology vocal organs, to speech utterance person into Row Sexual discriminating.

4. a kind of fraudulent call voice automatic marking system according to claim 1, which is characterized in that the continuous speech Identification labeling module includes continuous speech preprocessing module, continuous speech recognition module, the continuous speech preprocessing module Output end is connected with the input terminal of the continuous speech recognition module；The continuous speech preprocessing module is used to receive input After target voice, by according to the Energy distribution in voice, the cutting of sound bite is carried out, as the subsequent input continuous speech The data acquisition system of identification module processing；The continuous speech recognition module is used to provide the continuous speech recognition engine of bottom, will The content of every sound bite of input is effectively treated and exports corresponding content of text.

5. a kind of fraudulent call voice automatic marking system according to claim 1, which is characterized in that the vocal print voice Preprocessing module, vocal print cluster module, vocal print falsetto identification module, the output end of the vocal print speech preprocessing module with it is described The input terminal of vocal print cluster module is connected, and the output end of the vocal print cluster module is defeated with the vocal print falsetto identification module Enter end to be connected；The input terminal input of the vocal print speech preprocessing module is confirmed as swindling by the basic dimension labeling module The voice data of voice；The vocal print speech preprocessing module is used to after voice is swindled in input, and engine will be according to the swindle Speaker information contained by voice, carry out speaker's separation, and will carry out the filtering of invalid voice and to efficient voice content into Row speech enhan-cement；The vocal print falsetto identification module to each the speaker's sound bite identified to be made whether to close Audio judgement, convenient for quickly effectively being distinguished to synthesized voice data and speaker's voice.

6. a kind of fraudulent call voice automatic marking system according to claim 5, which is characterized in that the vocal print cluster Module includes validity detection module, the automatic registration module of vocal print, vocal print identity comparison module, the validation checking module For the satisfactory voice data of sorting speech length from alternate data, further by going ringing tone, speech detection, speech Quality detection technology filters out the voice data for meeting automatic registered standard from Alternative voice；The automatic registration module of vocal print For the correspondence voiceprint registration of current data is completed using automatic marking technology to the voice data by validation checking；It is described Vocal print identity comparison module is used to for the vocal print of newest registration being compared with registered history vocal print library, if similarity Greater than threshold value, then it is assumed that current vocal print does not change in history vocal print library, updates original vocal print feature using new registration vocal print；It is no Then, vocal print alteration detection is carried out.

7. one kind is based on fraudulent call voice automatic marking method described in claim 1, which is characterized in that specifically include as follows Step:

Step SS1: basic dimension annotation step；

Step SS2: continuous speech recognition step；

Step SS3: vocal print annotation step.

8. a kind of fraudulent call voice automatic marking method according to claim 7, which is characterized in that the basis dimension Annotation step specifically includes:

Step SS11: basic dimension voice pre-treatment step specifically includes: will be for natural-sounding discriminance analysis, swindle text It was found that the application demand with voice content depth analysis, provides the voice activity detection in telephone channel voice data, CRBT is examined It surveys, invalid sound detects and efficient voice detection function；

Step SS12: languages identification step specifically includes: by extracting the core feature of call voice, and carrying out model comparison And score judgement, to provide the languages identification service of call voice；

Step SS13: basic dimension falsetto identification step specifically includes: carrying out fast accurate identification to synthesized voice template data；

Step SS14: men and women's sound identification step specifically includes: synchronous to use men and women's sound according to the analysis to fraudulent call feature Identification technology, according to male voice and female voice the frequency spectrum difference due to caused by the difference of physiology vocal organs, to speech utterance person Carry out Sexual discriminating.

9. a kind of fraudulent call voice automatic marking method according to claim 7, which is characterized in that the continuous speech Identification step specifically includes:

Step SS21: continuous speech pre-treatment step specifically includes:, will be according to the energy in voice after receiving input target voice Amount distribution, carries out the cutting of sound bite, the data acquisition system as the subsequent input continuous speech recognition resume module；

Step SS22: continuous speech recognition step specifically includes: the continuous speech recognition engine of bottom is provided, by the every of input The content of sound bite is effectively treated and exports corresponding content of text.

10. a kind of fraudulent call voice automatic marking method according to claim 7, which is characterized in that the vocal print mark Note step specifically includes:

Step SS31: vocal print voice pre-treatment step specifically includes: after voice is swindled in input, engine will be according to the swindle Speaker information contained by voice, carry out speaker's separation, and will carry out the filtering of invalid voice and to efficient voice content into Row speech enhan-cement；

Step SS32: vocal print sorting procedure specifically includes the automatic registration step of validation checking step, vocal print, vocal print identity Comparison step；The validation checking step specifically includes: being used to the satisfactory language of sorting speech length from alternate data Sound data, further by going ringing tone, speech detection, speech quality detection technique to filter out from Alternative voice and meet automatic note The voice data of volume standard；The automatic registration step of vocal print specifically includes: using to by the voice data of validation checking The correspondence voiceprint registration of automatic marking technology completion current data；The vocal print identity comparison step specifically includes: will be newest The vocal print of registration is compared with registered history vocal print library, if similarity is greater than threshold value, then it is assumed that in history vocal print library Current vocal print does not change, and updates original vocal print feature using new registration vocal print；Otherwise, vocal print alteration detection is carried out；

Step SS33: vocal print falsetto identification step specifically includes: being to each the speaker's sound bite identified No synthesized voice judgement, convenient for quickly effectively being distinguished to synthesized voice data and speaker's voice.