CN102760436B

CN102760436B - Voice lexicon screening method

Info

Publication number: CN102760436B
Application number: CN201210281686.7A
Authority: CN
Inventors: 白晓东; 李天印; 强锋刚; 薛万疆
Original assignee: KAIFENG Co OF HENAN TOBACCO Co
Current assignee: KAIFENG Co OF HENAN TOBACCO Co
Priority date: 2012-08-09
Filing date: 2012-08-09
Publication date: 2014-06-11
Anticipated expiration: 2032-08-09
Also published as: CN102760436A

Abstract

The invention relates to a voice lexicon screening method which can effectively solves the problem that monitor requirements for voice file contents and content based on keyword information. The technical scheme is that an enterprise phone order management keyword lexicon is contracted and a batch automatic voice recognition technique, a voice file content detection technique and a grouping keyword comparison screening technique are applied to achieve screening, automatic withdrawal and intelligent analysis of batch phone recording files based on a grouping keyword, the voice recognition technique is used to achieve conversion from batch voice to texts through a computer, and the voice file content detection technique is used to filter the voice files of a key word in the keyword lexicon related to the voice file through marking, information extraction and showing method through the computer to automatically catch and intelligently analyze and show specific or sensitive information in the enterprise phone order recording files.

Description

A kind of voice dictionary screening technique

Technical field

The present invention relates to multi-person speech automatic identification technology, continuous speech recognition technology, unspecified person speech recognition technology, voice dictionary artificial intelligence triage techniques and voice document content detection technique etc., particularly a kind of voice dictionary screening technique.

Background technology

(1) research history of speech recognition technology and present situation

Abroad, the research work of speech recognition can be traced back to the Audry system of AT & T Bell Laboratory the 1950's, and it is first speech recognition system that can identify ten English digitals.But really make substantial progress, and to set it as that an important problem conducts a research be at the beginning of the seventies at the end of the sixties.First this be because the realization that develops into speech recognition of computer technology provides the possibility of hardware and software, the more important thing is the proposition of voice signal linear predictive coding (LPC) technology and dynamic time warping (DTW) technology, effectively solved the feature extraction of voice signal and not isometric matching problem.The speech recognition in this period is mainly based on template matches principle, and the field of research is confined to particular person, and the isolated word recognition of little vocabulary has been realized the particular person isolated-word speech recognition system based on linear prediction cepstrum coefficient and DTW technology; Vector quantization (VQ) and hidden Markov model (HMM) theory have been proposed simultaneously.

Along with the expansion of application, these need to relax little vocabulary, particular person, isolated word etc. the constraint condition of speech recognition, have meanwhile also brought many new problems: the first, and the expansion of vocabulary makes choosing of template and sets up difficulty occurs; The second, in continuous speech, between each phoneme, syllable and word, there is no obvious border, there is coarticulation (Co-articulation) phenomenon that is subject to context strong effect in each pronunciation unit; The 3rd, when unspecified person identification, different people says that the corresponding acoustic feature of identical words has very large difference, also has very large difference even if identical people under different time, physiology, psychological condition, says the words of same content; The 4th, in the voice of identification, have powerful connections noise or other interference.Therefore original template matching method is no longer applicable.The quantum jump of laboratory the Research of Speech Recognition results from late 1980s: people have broken through this Three Obstacles of large vocabulary, continuous speech and unspecified person finally in laboratory, for the first time these three characteristics are all integrated in a system, relatively be typically the Sphinx system of CMU (CarnegieMellonUniversity), it is first high performance unspecified person, large vocabulary Continuous Speech Recognition System.In this period, the Research of Speech Recognition further trend is goed deep into, and its notable feature is HMM model and the successful Application of artificial neural network (ANN) in speech recognition.The widespread use of HMM model should be given the credit to the scientists' such as AT & TBell laboratory Rabiner effort, they are originally involved and abstruse HMM pure mathematics model engineering, thereby for more researchers understand and understanding, thereby make statistical method become the main flow of speech recognition technology.Statistical method turns to macroscopic view by researcher's sight line from microcosmic, no longer deliberately pursues the refinement of phonetic feature, but sets up best speech recognition system from the angle of ensemble average (statistics) more.Aspect acoustic model, taking Markov chain as basic voice sequence modeling method HMM(implicit expression Markov chain) more effectively solve voice signal stable, the long characteristic becoming constantly in short-term, and the sentence model that can become according to some basic modeling unit structures continuous speech, has reached higher modeling accuracy and modeling dirigibility.In speech level, be that N unit statistical model carrys out fuzzy phoneme and the homonym that Division identification brings by adding up co-occurrence probability between the word of true large-scale corpus.In addition, Artificial Neural Network, Language Processing mechanism based on grammar rule etc. have also obtained application in speech recognition.

The nineties in 20th century early stage, huge fund is all thrown to the practical research of speech recognition system as IBM, apple, AT & T and NTT by many famous major companies.Speech recognition technology has a good evaluation mechanism, and that is exactly the accuracy rate of identification, and this index has obtained continuous raising in the middle and later periods nineties laboratory study of 20th century.More representational system has: the ViaVoice that IBM Corporation releases and the NaturallySpeaking of DragonSystem company, the NuanceVoicePlatform voice platform of Nuance company, the Whisper of Microsoft, the VoiceTone of Sun etc.Wherein IBM Corporation developed Chinese ViaVoice speech recognition system in 1997, and next year is developed again the speech recognition system ViaVoice'98 of accents such as can identifying Shanghai native language, Cantonese and Sichuan words.It,, with the basic vocabulary table of 32,000 words, can expand to 65,000 words, also comprises the conventional entry of office, has " mechanism for correcting errors ", and its average recognition rate can reach 95%.To news, speech recognition has higher precision to this system, is current representative recognition system of Chinese continuous speech.

China's the Research of Speech Recognition work is started in the fifties, but development in recent years is very fast.It is practical that research level is also progressively moved towards from laboratory.Carried out after national 863 Program since 1987, National 863 intelligent computer expert group is the special project verification of speech recognition technology research, every two years rolls once.The research level of China's speech recognition technology is synchronizeed substantially with abroad, also has the characteristic and advantage of oneself, and reach advanced world standards in Chinese speech recognition technology.The scientific research institutions such as Institute of Automation, CAS, acoustics institute, Tsing-Hua University, Peking University, Harbin Institute of Technology, Shanghai Communications University, Chinese University of Science and Technology, Beijing University of Post & Telecommunication, the Central China University of Science and Technology have laboratory to carry out the research of speech recognition aspect, and wherein representative research unit is department of electronic engineering, tsinghua university and pattern-recognition National Key Laboratory of automation research institute of the Chinese Academy of Sciences.

Department of electronic engineering, tsinghua university voice technology and ASIC-based seminar, the accuracy of identification of the unspecified person Chinese digital string Continuous Speech Recognition System of research and development, reaches 94.8%(random length numeric string) and 96.8%(fixed length numeric string).In the reject rate situation that has 5%, system recognition rate can reach 96.9%(random length numeric string) and 98.7%(fixed length numeric string), this is one of international best at present recognition result, its performance has approached realistic scale.The discrimination that 5000 word mailbags of research and development are checked Speaker-independent continuous speech recognition system reaches 98.73%, and first three selects discrimination to reach 99.96%; And can identify mandarin and Sichuan words bilingual, reach real requirement.Institute of Automation, CAS and affiliated pattern science and technology (Pattek) company thereof have issued their common " day language " Chinese speech series of products---PattekASR towards different computing platforms and application releasing, the history that the Chinese speech recognition product that is through with is since nineteen ninety-eight monopolized by offshore company always for 2002.

(2) domestic voice document content monitoring technology application present situation

If the monitoring of voice document content adopts tradition to rely on manual type to carry out and will be faced with a large amount of difficulties.Because people's the power of hearing cannot be distinguished the sound-content of playing with speed, therefore, in manual type voice document content observation process, can only be that constant speed is play voice document, if voice document content monitoring personnel will process an approximately voice document of 6 hours genuine and believablely, just must spend the time of 6～8 hours.Therefore, for the voice document data processing of approximately 6 hours, need 1 and manually complete (within 8 hours, calculating by working for each person every day), then also need personnel's typing, check and correction, finally generate report, like this, at least need 1 to 2 manually to complete.If rely on traditional working method, realize the voice document content monitoring to everyone 6 hours calling records of 20 personnel, the manpower needing will exceed 20 people.Obviously, the manpower consumption of scale, concerning any one mechanism for monitoring, is to be difficult to burden like this, is also unpractical.Therefore,, although domestic a lot of industry has realized that the importance of voice document content monitoring, if there is no effective monitoring technology and monitoring means as guarantee, implement difficulty still very high.

Than adopting voice document content personal monitoring mode, computing machine is automatically processed and is had the features such as speed is fast, efficiency is high, loss is low, especially, while needing batch voice document to be carried out to content monitoring, voice document content is automatically monitored and treatment technology has irreplaceable effect especially simultaneously.In recent years, domesticly occur utilizing computer audio Rapid matching technology to carry out radio and television advertisement prison broadcasting, and utilize computing machine languages recognition technology to carry out the automatic monitoring of shortwave broadcasting, but these automatic monitoring technicals and achievement can not meet voice document content and the content monitoring requirements based on key word information, therefore, the improvement and bring new ideas of speech recognition technology is the problem of needing at present solution badly.

Summary of the invention

For above-mentioned situation, for solving the defect of prior art, the present invention's object is just to provide a kind of voice dictionary screening technique, can efficient solution must not meet the problem to voice document content and the content monitoring requirements based on key word information.

The technical scheme that the present invention solves is: this invention is by building a keywords database for enterprise phone order management, application is automatic speech recognition technology in batches, voice document content detection technique and grouping keyword contrast triage techniques, realize the screening based on grouping keyword to batch telephonograph file, automatically extract, a kind of method of intellectual analysis, the voice dictionary and the screening technique that build based on the present invention, application speech recognition technology is by the computer realization conversion of speech-to-text in batches, application voice document content detection technique by computer realization to relating to the voice document of key words in keywords database in voice document, all pass through mark, information extraction, the methods such as displaying are filtered such fileinfo, to realize the automatic capturing to specific in enterprise phone order recording file or sensitive information, intellectual analysis is shown, concrete steps are as follows:

The first step, keywords database are set up and are safeguarded

Requirement according to enterprise to phone order management, application oracle database system made keywords database and keywords database administration module, keywords database content comprises for phone order attendant's service quality, standard term, service prohibits the large classes such as language, each large class comprises some keywords, and each keyword comprises the large class in place, whether is selected, is selected the date, occurrence number and the information such as corresponding HMM numbering with it; Application JAVA programming technique is set up keywords database administration module, this module has the functions such as typing, amendment, preservation, deletion, inquiry and the keyword of keyword are chosen, imported, derivation, apply this administration module managerial personnel typing according to actual needs, editor, delete and choose phone order service expression keyword, can select specific one group of keyword to screen for speech recognition;

Second step, speech recognition system modeling

Application hidden Markov model HMM builds the acoustic model of speech recognition system, the language model of application hidden Markov Trigram model construction Chinese speech recognition system, its content comprises following three parts: the one, based on the evaluation problem of HMM model application forward backward algorithm solution keyword, be the each crucial corresponding HMM of generation in keywords database, each observation sequence is made up of the voice of a keyword, and the identification of keyword is that the HMM of the pronunciation by assessing and then select most possible generation observation sequence representative realizes; The 2nd, solve speech recognition decoder problem based on HMM model application Viterbi algorithm, solve a Chinese sentence and how to divide the suitable problem of its formation, by Hidden Markov Model (HMM), the segmenting method of Chinese sentence is regarded as to implicit state, given Observable state can be regarded as in sentence, thereby find out the correct segmenting method of most probable by building HMM, solve a difficult problem for correct identification Chinese word and continuous Chinese sentence; The 3rd, based on HMM model application Baum-Welch algorithm and Reversed Viterbi algorithm, solve recognition speed, accuracy rate, system self-adaption, the corpus collection of Chinese list language and arrangement, Model Selection, training, level and smooth, compression problem in speech recognition process;

The 3rd step, voice recognition processing analysis

Application JAVA programming technique builds voice recognition processing analysis module, this module has voice document and imports, keyword imports, keyword contrast is extracted, keyword occurrence number counting and preservation, special sound file derive and with the function such as the equipment of embedded unspecified person voice recognition chip SR160X intercoms mutually, apply this module, from keywords database, select some keywords according to the appraisal management personnel of enterprise, the equipment that application contains embedded unspecified person voice recognition chip SR160X carries out the batch comparative analysis based on keyword to designated telephone order personnel's phone order recording file, choose the recording file matching with designated key word in phone order recording file, record the number of times that designated key word occurs, and corresponding recording file is exported in the file of appointment,

The 4th step, Intelligent treatment analysis result

Application JAVA programming technique builds information process analysis module, realize point personnel, divide the statistical study of keyword, derive the functions such as voice document monitoring and the analysis of the time segment frequency of occurrences, , according to extract the data message that in the 3rd step voice recognition processing analysis, screening obtains from oracle database table, the corresponding phone order personnel working condition of statistical study recording file in information process analysis interface, its content comprises that whether phone order service quality is qualified, whether service standard term is up to standard, the relevant examination information such as whether the language frequency of occurrences exceeds standard are prohibited in service, the inquiry of Realization analysis result, storage, print and preserve the functions such as snapshot, to facilitate the Management and application of managerial personnel to analysis processing result.

The present invention efficiently solves the problem to voice document content and the content monitoring requirements based on key word information, and result of use is good, is the innovation on speech recognition technology.

Brief description of the drawings

Fig. 1 is operation steps process flow diagram of the present invention.

Embodiment

Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in further detail.

The first step, keywords database are set up and are safeguarded

Requirement according to enterprise to phone order management, application oracle database system made keywords database and corresponding table, keywords database content comprises for phone order attendant's service quality, standard term, service prohibits the large classes such as language, each large class comprises some keywords, and each keyword comprises the large class in place, whether is selected, is selected the date, occurrence number and the information such as corresponding HMM numbering with it; Application JAVA programming technique is set up keywords database maintenance module, this module has the functions such as typing, amendment, preservation, deletion, inquiry and the keyword of keyword are chosen, imported, derivation, apply this maintenance module managerial personnel typing according to actual needs, editor, delete and choose phone order service expression keyword, can select specific one group of keyword to screen for speech recognition;

Second step, speech recognition system modeling

The acoustic model of application hidden Markov model HMM model construction speech recognition system, the language model of application hidden Markov Trigram model construction Chinese speech recognition system, its content comprises following three parts: the one, based on the evaluation problem of HMM model application forward backward algorithm solution keyword, be the each crucial corresponding HMM of generation in keywords database, each observation sequence is made up of the voice of a keyword, and the identification of keyword is that the HMM of the pronunciation by assessing and then select most possible generation observation sequence representative realizes; The 2nd, solve speech recognition decoder problem based on HMM model application Viterbi algorithm, solve a Chinese sentence and how to divide the suitable problem of its formation.For example, sentence " developing country " is to be divided into " development-in-country ", still " development-China-family ", still " in development-country ".By Hidden Markov Model (HMM), the segmenting method of Chinese sentence is regarded as to implicit state, given Observable state can be regarded as in sentence, thereby find out the correct segmenting method of most probable by building HMM, solves a difficult problem for correct identification Chinese word and continuous Chinese sentence; The 3rd, based on HMM model application Baum-Welch algorithm and Reversed Viterbi algorithm, solve recognition speed, accuracy rate, system self-adaption, the corpus collection of Chinese list language and arrangement, Model Selection, training, level and smooth, compression problem in speech recognition process;

The 3rd step, voice recognition processing analysis

Application JAVA programming technique builds voice recognition processing analysis module, this module has that voice document imports, keyword imports, keyword contrast is extracted, keyword occurrence number counting and preservation, special sound file are derived and with the function such as the equipment of embedded unspecified person voice recognition chip SR160X intercoms mutually.Apply this module, from keywords database, select some keywords according to the appraisal management personnel of enterprise, the equipment that application contains embedded unspecified person voice recognition chip SR160X carries out the batch comparative analysis based on keyword to designated telephone order personnel's phone order recording file, choose the recording file matching with designated key word in phone order recording file, record the number of times that designated key word occurs, and corresponding recording file is exported in the file of appointment;

The 4th step, Intelligent treatment analysis result

Application JAVA programming technique structure information process analysis module, realizes the statistical study of point personnel, point keyword, derives the functions such as voice document monitoring and the analysis of the time segment frequency of occurrences.; according to extract the data message that in the 3rd step voice recognition processing analysis, screening obtains from oracle database table; the corresponding phone order personnel working condition of statistical study recording file in information process analysis interface; its content comprises that whether phone order service quality is qualified, whether service standard term is up to standard, the relevant examination information such as whether the language frequency of occurrences exceeds standard are prohibited in service; the functions such as inquiry, storage, printing and the preservation snapshot of Realization analysis result, to facilitate the Management and application of managerial personnel to analysis processing result.

In Fig. 1, from recording file storehouse, extract the recording file needing, by speech recognition enterprise phone order recording file, then by keywords database, keyword (as: language is prohibited in service) is extracted and screened, qualified keyword is labeled, extracts and shows, another keyword of ineligible carrying out is again again extracted and screens keyword by keywords database.

Application example:

For example: application the method is analyzed certain number to certain number phone order person and in all telephonograph files, existed " not knowing ", " unclear " two services to prohibit number of times, the frequency that languages occur within a certain period of time, and according to analysis result, certain number to certain number phone order person is prohibited to the frequency that language occurs by service and sort from low to high.Its operation is as follows:

One, the keywords database administration module that application the method builds carries out the selection of keyword.Concrete grammar is: on any one computing machine that can be connected to keyword database server, move keywords database administration module interface, querying condition is set in interface for " not knowing " or " unclear ", in Query Result, chooses " not knowing " and " unclear " these two words;

Two, the voice recognition processing analysis module that application the method builds carries out contrast identification and the statistical study of keyword.Concrete grammar is: on PC, call the voice recognition processing analysis module administration interface being arranged on application server, selection personnel are certain number to certain number phone order person (as: No. one to No. ten phone order person, select time is on July 2 to 6), in " definite keyword has been selected in application " front ticking, confirm that the keyword that application has been selected contrasts identification, click " contrast identification ", system module is automatically according to definite condition, recording file in fixed time section is carried out to " not knowing ", the contrast identification of " unclear " two keywords, " preserve contrast recognition result " at contrast recognition result mid point, can realize in contrast recognition result and comprise " not knowing ", the recording file record of " unclear " these two keywords, the number that keyword occurs, the preservation of the information such as frequency counting,

Three, the information process analysis module that application the method builds contrasts the analyzing and processing of recognition result information.Concrete grammar is: on PC computing machine, according to user of service's statistical study needs, select corresponding statistical condition, when appearance " not knowing " in certain number to certain number phone order person's phone is ordered goods service recording, " unclear " two services are prohibited, language carries out positive sequence by the data such as occurrence number, proportion or backward sorts and is recorded in the database table of appointment, for user's inquiry, print application.

The present invention considers environment in voice document recording process, accent, the complicacy of the factors such as word speed, certain content or sensitive information are generally using one group of keyword as feature, for realizing, batch voice document content is monitored, set up a keywords database being formed by a large amount of characteristic key words, in the time voice document content being carried out to automated intelligent detection, directly according to one or more groups keyword in the rule invocation system core dictionary of specifying, can the realization of appliance computer system automatically do accurate location to certain content or sensitive information, efficiently solve the problem to voice document content and the content monitoring requirements based on key word information, it is the innovation on speech recognition technology, there is good economic and social benefit.

Claims

1. a voice dictionary screening technique, is characterized in that, comprises the following steps:

The first step, keywords database are set up and are safeguarded, requirement according to enterprise to phone order management, application oracle database system made keywords database and keywords database administration module, keywords database content comprises for phone order attendant's service quality, standard term, service prohibits several large classes of language, each large class comprises some keywords, and each keyword comprises the large class in place, whether is selected, is selected the date, occurrence number and corresponding HMM number information with it; Application JAVA programming technique is set up keywords database administration module, this module has the function that typing, amendment, preservation, deletion, inquiry and the keyword of keyword are chosen, import, derived, apply this administration module managerial personnel typing according to actual needs, editor, delete and choose phone order service expression keyword, can select one group of keyword for speech recognition screening;

Second step, speech recognition system modeling, application hidden Markov model HMM builds the acoustic model of speech recognition system, the language model of application hidden Markov Trigram model construction Chinese speech recognition system, its content comprises following three parts: the one, based on the evaluation problem of HMM model application forward backward algorithm solution keyword, the each keyword being in keywords database generates a corresponding HMM, each observation sequence is made up of the voice of a keyword, the identification of keyword is that the HMM of the pronunciation by assessing and then select most possible generation observation sequence representative realizes, the 2nd, solve speech recognition decoder problem based on HMM model application Viterbi algorithm, solve a Chinese sentence and how to divide the suitable problem of its formation, by Hidden Markov Model (HMM), the segmenting method of Chinese sentence is regarded as to implicit state, given Observable state can be regarded as in sentence, thereby find out the correct segmenting method of most probable by building HMM, solve a difficult problem for correct identification Chinese word and continuous Chinese sentence, the 3rd, based on HMM model application Baum-Welch algorithm and Reversed Viterbi algorithm, solve recognition speed, accuracy rate, system self-adaption, the corpus collection of Chinese list language and arrangement, Model Selection, training, level and smooth, compression problem in speech recognition process,

The 3rd step, voice recognition processing is analyzed, application JAVA programming technique builds voice recognition processing analysis module, this module has voice document and imports, keyword imports, keyword contrast is extracted, keyword occurrence number counting and preservation, special sound file derive and with the mutual communication function of equipment of embedded unspecified person voice recognition chip SR160X, apply this module, from keywords database, select some keywords according to the appraisal management personnel of enterprise, the equipment that application contains embedded unspecified person voice recognition chip SR160X carries out the batch comparative analysis based on keyword to designated telephone order personnel's phone order recording file, choose the recording file matching with designated key word in phone order recording file, record the number of times that designated key word occurs, and corresponding recording file is exported in the file of appointment,

The 4th step, Intelligent treatment analysis result, application JAVA programming technique builds information process analysis module, realize point personnel, divide the statistical study of keyword, deriving voice document monitors and time segment frequency of occurrences analytic function, , according to extract the data message that in the 3rd step voice recognition processing analysis, screening obtains from oracle database table, the corresponding phone order personnel working condition of statistical study recording file in information process analysis interface, its content comprises that whether phone order service quality is qualified, whether service standard term is up to standard, the language frequency of occurrences relevant examination information that whether exceeds standard is prohibited in service, the inquiry of Realization analysis result, storage, print and preserve snapshot functions, to facilitate the Management and application of managerial personnel to analysis processing result.

2. voice dictionary screening technique according to claim 1, it is characterized in that, described keywords database administration module carries out the selection of keyword, to move keywords database administration module interface on any one computing machine that can be connected to keyword database server, querying condition is set in interface for " not knowing " or " unclear ", in Query Result, chooses " not knowing " and " unclear " these two words.

3. voice dictionary screening technique according to claim 1, it is characterized in that, described voice recognition processing analysis module carries out contrast identification and the statistical study of keyword, on PC, call the voice recognition processing analysis module administration interface being arranged on application server, selection personnel are certain number to certain number phone order person, in " definite keyword has been selected in application " front ticking, confirm that the keyword that application has been selected contrasts identification, click " contrast identification ", system module is automatically according to definite condition, recording file in fixed time section is carried out to " not knowing ", the contrast identification of " unclear " two keywords, " preserve contrast recognition result " at contrast recognition result mid point, can realize in contrast recognition result and comprise " not knowing ", the recording file record of " unclear " these two keywords, the number that keyword occurs, the preservation of frequency counting information.

4. voice dictionary screening technique according to claim 1, it is characterized in that, described information process analysis module contrasts the analyzing and processing of recognition result information, on PC computing machine, according to user of service's statistical study needs, select corresponding statistical condition, language is prohibited in two services of appearance " not knowing " during the service of ordering goods of certain number to certain number phone order person's phone is recorded, " unclear ", carry out positive sequence or backward sequence by occurrence number, proportion data, and be recorded in the database table of appointment, for user's inquiry, print application.