CN104464423A

CN104464423A - Calibration optimization method and system for speaking test evaluation

Info

Publication number: CN104464423A
Application number: CN201410798611.5A
Authority: CN
Inventors: 何春江; 赵乾; 胡阳; 宋铁
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2014-12-19
Filing date: 2014-12-19
Publication date: 2015-03-25

Abstract

The invention discloses a calibration optimization method and system for speaking test evaluation. The method includes the steps of selecting part of all voice data corresponding to question types to be calibrated to serve as calibration voice data, conducting manual annotation by calibration specialists, obtaining the manual annotation result of the manually-annotated calibration voice data, conducting voice recognition on the calibration voice data through a voice recognition system, extracting evaluation characteristics of different characteristic types corresponding to evaluation standards of the question types to be calibrated from the voice recognition result of the calibration voice data, and training and optimizing existing marking models corresponding to the question types to be calibrated through the combination of the evaluation characteristics and the manual annotation result of the calibration voice data so as to obtain new marking models. Due to the fact that the existing marking models are optimized through the manual annotation result, the new marking models can be matched with the marking standards of existing speaking tests as much as possible, and the better marking performance can be presented when the speaking test evaluation is conducted through the new marking models obtained through the method.

Description

A kind of school mark optimization method of SET evaluation and test and system

Technical field

The present invention relates to voice process technology field, particularly relate to a kind of SET school mark optimization method and system.

Background technology

Along with the development of speech recognition technology and increasingly mature, the intelligent sound evaluation and test technology relating to the multi-subject knowledges such as voice technology, natural language understanding, artificial intelligence, data mining, machine learning is widely used in the application scenarioss such as computer-aided instruction, SET automatic scoring, individual language pronouncing study.Particularly in the extensive SET of each speech like sound, in order to reduce examination cost, improve examination scoring efficiency, reduce the subjective differences between different scorer, guarantee the fairness of taking an examination, intelligent sound evaluation and test technology has played significant role, and progressively replace manually carrying out oral evaluation, such as, in the PSC in the whole nation, examine Oral English Exam in Jiangsu, in Guangdong college entrance examination Oral English Exam, all use intelligent sound evaluation and test technology to replace manually carrying out large-scale automatic scoring.

Existing SET evaluating method is the content identifying the speech data that examinee answers based on general speech recognition system, and based on general knowledge base and scoring model, provide corresponding evaluation result for the content identified, as shown in Figure 1, specifically comprise the steps:

Step 1: the speech data that reception examinee answers and corresponding examination paper.

Step 2: utilize general speech recognition system to obtain the voice identification result that in speech data, each basic voice unit is corresponding, the examination paper that this speech recognition system specifically utilizes step 1 to input by demoder on the basis of acoustic model and language model generates the state network space being more suitable for this subject type, decodes the voice identification result exporting this speech data with maximum probability.。

Step 3: for being such as read aloud the unique SET topic type of the Key for References such as topic type, directly extracts the evaluation and test feature of the different characteristic type relevant to standards of grading such as corresponding pronunciation accuracy, fluency, integrality, grammer, semanteme respectively from voice identification result; And for being such as the not unique SET topic type of the Key for References such as question-and-answer problem, based on being such as the knowledge base comprising answer main points, Key for Reference etc., from voice identification result, extract the evaluation and test feature of the different characteristic type relevant to standards of grading such as corresponding pronunciation accuracy, fluency, integrality, grammer, semanteme respectively.

Step 4: the evaluation and test feature phase computing of the different characteristic type using the good general scoring model of training in advance to extract with step 3, maps out corresponding mark by the linear of setting or nonlinear machine learning algorithm.

As can be seen here, scoring accuracy based on SET evaluation and test technology depends on the recognition performance of speech recognition system and the matching degree of scoring model and standards of grading, the evaluation and test participated in for needing knowledge base, also depends on the coverage of knowledge base to the contents of test question of current SET.For Large-scale Examinations, because areal variation exists speaker sound property, the machines such as microphone, the difference of the aspects such as playback environ-ment, and different examination paper paper, the standards of grading difference of all kinds of topic type and school, the subjective scoring difference of educational institution, so, the SET evaluating method of the general scoring model of existing employing and knowledge base, be difficult in the SET of different geographical, reach optimum marking effect, only in the microphone channels situation that speech data and the examinee of training acoustic model answer, ambient noise conditions etc. are consistent, and language model is when can include the language message of all examination examination questions, speech recognition system just can show good recognition performance, when only having scoring model to meet the standards of grading of examination completely simultaneously, the accuracy of scoring just can be protected.But in practical application, playback environ-ment, contents of test question, standards of grading are all uncontrollable, such as, for Oral English Exam, the examinee in relative town and country, the pronunciation characteristics of metropolitan most of examinee is near the mark English equivalents more, and the examination hall configuration in town and country relatively, the machinery and equipment such as the microphone that metropolitan examination hall configures are more advanced, and performance is better; In addition, big city formulate examination paper, usually also there is larger difference between standards of grading and town and country Oral English Exam, this makes general scoring model and knowledge base be difficult to all show performance of marking preferably in the SET of zones of different, different paper.

Known based on above explanation, there is the shortcoming of poor universality in existing SET evaluating method, is embodied in the following aspects:

1, time inconsistent when examinee's pronunciation characteristics, sound pick-up outfit sound channel, neighbourhood noise degree and acoustic training model, speech recognition system voice adaptability is very poor, Voice decoder weak effect;

2, when the language message in language model can not cover or stress the contents of test question with current SET, the recognition performance of recognition system is poor;

3, contents of test question, examination point can not be covered in knowledge base, when scoring model can not match with the standards of grading of current SET, will very poor scoring performance be shown.

Summary of the invention

Embodiments of the invention for existing SET evaluating method exist general scoring model can not match with the standards of grading of current SET time, the problem of very poor scoring performance can be shown, propose a kind of school mark optimization method and system of the SET evaluation and test based on artificial calibration.

For achieving the above object, the technical solution used in the present invention is: a kind of school mark optimization method of SET evaluation and test, comprising:

Receive a spoken Testing gateway of current SET, and using described SET topic type as treating school title type;

From treating described in correspondence to select part of speech data all speech datas that the examinee of school title type answers as calibration speech data, manually to be marked described calibration speech data by calibration expert;

Obtain the artificial annotation results of the described calibration speech data obtained by described artificial mark;

Utilize speech recognition system to carry out speech recognition to described calibration speech data, obtain the voice identification result of described calibration speech data;

The evaluation and test feature of the different characteristic type of the standards of grading treating school title type described in correspondence is extracted respectively from the voice identification result of described calibration speech data;

In conjunction with the artificial annotation results of evaluation and test characteristic sum of described calibration speech data, the former scoring model treating school title type described in correspondence is optimized in training, obtains the new scoring model treating school title type described in correspondence.

Preferably, described method also comprises:

Utilize the former knowledge base treating school title type described in the artificial annotation results optimization correspondence of described calibration speech data, obtain the new knowledge base treating school title type described in correspondence;

The described evaluation and test feature extracting the different characteristic type of the standards of grading treating school title type described in correspondence from the voice identification result of described calibration speech data respectively comprises:

Based on described new knowledge base, from the voice identification result of described calibration speech data, extract the evaluation and test feature of the different characteristic type of the standards of grading treating school title type described in correspondence respectively.

Preferably, described utilization described in the artificial annotation results optimization correspondence of calibration speech data treats that the former knowledge base of school title type comprises:

The artificial annotation results training personality language model of described calibration speech data is utilized to be increased in described former knowledge base, from the artificial annotation results of described calibration speech data, extract answer main points be increased in described former knowledge base, and in the artificial annotation results of described calibration speech data, select the artificial transcription data of artificial scoring higher than setting mark as at least one be increased to reference to answer in described former knowledge base.

Preferably, described method also comprises:

From treating described in correspondence at least to select speech data based on part of speech data all speech datas that the examinee of school title type answers;

Utilize described basic speech data, training optimization is carried out to the acoustic model in former speech recognition system and at least one in language model, obtains new speech recognition system;

Describedly utilize speech recognition system to carry out speech recognition to described calibration speech data to comprise:

New speech recognition system is utilized to carry out speech recognition to described calibration speech data.

Preferably, describedly utilize described basic speech data, training optimization carried out to the acoustic model of former speech recognition system and comprises:

Utilize described former speech recognition system to carry out speech recognition to described basic speech data, obtain the voice identification result of described basic speech data;

Extract the data characteristics of the voice identification result of described basic speech data;

Select data characteristics and meet the voice identification result of the basic speech data that setting requires as qualified language material;

Described qualified language material is utilized to carry out training optimization to the acoustic model of former speech recognition system.

Preferably, describedly utilize described basic speech data, training optimization carried out to the language model of former speech recognition system and comprises:

From described basic speech data, select the statement treating the answer main points of school title type described in comprising, based on statement;

Described basic statement is utilized to carry out training optimization to the language model of former speech recognition system.

To achieve these goals, the technical solution used in the present invention is: a kind of school mark optimization system of SET evaluation and test, comprising:

Testing gateway load module, for receiving current SET one spoken Testing gateway, and using described SET topic type as treating school title type;

Calibration speech data Choosing module, for from treating described in correspondence to select part of speech data in all speech datas that the examinee of school title type answers as calibration speech data, manually to mark described calibration speech data by calibrating expert;

The calibration results acquisition module, for obtaining the artificial annotation results of the described calibration speech data obtained by described artificial mark;

Recognition result acquisition module, for utilizing speech recognition system to carry out speech recognition to described calibration speech data, obtains the voice identification result of described calibration speech data;

Characteristic extracting module, for extracting the evaluation and test feature of the different characteristic type of the standards of grading treating school title type described in correspondence respectively in the voice identification result from described calibration speech data; And,

Scoring model optimizes module, and for the artificial annotation results of evaluation and test characteristic sum in conjunction with described calibration speech data, the former scoring model treating school title type described in correspondence is optimized in training, obtains the new scoring model treating school title type described in correspondence.

Preferably, described system also comprises:

Knowledge base optimizes module, for utilizing the former knowledge base treating school title type described in the artificial annotation results optimization correspondence of described calibration speech data, obtains the new knowledge base treating school title type described in correspondence;

Described characteristic extracting module, specifically for based on described new knowledge base, extracts the evaluation and test feature of the different characteristic type of the standards of grading treating school title type described in correspondence respectively from the voice identification result of described calibration speech data.

Preferably, described knowledge base optimizes module specifically for by utilizing the artificial annotation results training personality language model of described calibration speech data to be increased in described former knowledge base, from the artificial annotation results of described calibration speech data, extract answer main points is increased in described former knowledge base, and in the artificial annotation results of described calibration speech data, select the artificial transcription data of artificial scoring higher than setting mark as at least one mode be increased to reference to answer in described former knowledge base, obtain the new knowledge base treating school title type described in correspondence.

Preferably, described system also comprises:

Basic speech data sorting module, for from treating described in correspondence at least to select speech data based on part of speech data in all speech datas that the examinee of school title type answers;

Speech recognition system optimizes module, for utilizing described basic speech data, carrying out training optimization, obtain new speech recognition system to the acoustic model in former speech recognition system and at least one in language model;

Described recognition result acquisition module carries out speech recognition specifically for utilizing new speech recognition system to described calibration speech data.

Preferably, described speech recognition system optimization module comprises acoustic model optimization unit;

Described acoustic model is optimized unit and is used for utilizing described former speech recognition system to carry out speech recognition to described basic speech data, obtains the voice identification result of described basic speech data; For extracting the data characteristics of the voice identification result of described basic speech data; The voice identification result of the basic speech data that setting requires is met as qualified language material for selecting data characteristics; And for utilizing described qualified language material to carry out training optimization to the acoustic model of former speech recognition system.

Preferably, described speech recognition system optimization module comprises language model optimization unit;

Described language model optimize unit be used for selecting from described basic speech data comprise described in treat the statement of the answer main points of school title type, based on statement; And for utilizing described basic statement to carry out training optimization to the language model of former speech recognition system.

Beneficial effect of the present invention is, the school mark optimization method of SET evaluation and test of the present invention and the embodiment of system are by treating that from correspondence selecting part of speech data in all speech datas that the examinee of school title type answers manually marks by calibrating expert, and utilize artificial annotation results optimization correspondence to treat the step of the former scoring model of school title type, new scoring model can be enable to match with the standards of grading of current SET as much as possible, and then can show when utilizing the new scoring model after optimizing to carry out current SET evaluation and test performance of better marking.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the embodiment of SET evaluating method;

Fig. 2 is the process flow diagram marking a kind of embodiment of optimization method according to the school of SET evaluation and test of the present invention;

Fig. 3 is the process flow diagram marking the another kind of embodiment of optimization method according to the school of SET evaluation and test of the present invention;

Fig. 4 is a kind of frame principle figure implementing structure of the school mark optimization system according to SET evaluation and test of the present invention;

Fig. 5 is the frame principle figure implementing structure according to the another kind of the school mark optimization system of SET evaluation and test of the present invention.

Embodiment

Be described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has element that is identical or similar functions from start to finish.Being exemplary below by the embodiment be described with reference to the drawings, only for explaining the present invention, and can not limitation of the present invention being interpreted as.

The present invention in order to solve existing SET evaluating method exist general scoring model can not match with the standards of grading of current SET time, the problem of very poor scoring performance can be shown, the school mark optimization method providing a kind of SET to evaluate and test, the method comprises the steps: as shown in Figure 2

Step S1: the spoken Testing gateway receiving current SET, and using this SET topic type as treating school title type.

The technical matters that general-purpose knowledge bases and general scoring model show different scoring performance in the SET of the discrepant different geographicals such as playback environ-ment, contents of test question, standards of grading because the inventive method is to be solved, so current SET is herein interpreted as the SET carried out in the region determined according to actual conditions such as playback environ-ment, contents of test question, standards of grading, be suitable for adopting same set of scoring model to carry out SET evaluation and test.

Step S2: select part of speech data as calibration speech data from correspondence treats all speech datas that the examinee of school title type answers, manually to be marked calibration speech data by calibration expert, here be that each speech data picked out all independently is calibrated speech data as one, such as from correspondence treats all speech datas that the examinee of school title type answers by the speech data of men and women's equal proportion random choose predetermined percentage (such as 2% ~ 5%) as calibration speech data.

This artificial mark such as comprises artificial transcription data, according to the artificial scoring that the standards of grading of current SET are carried out, sentence, word, carrying a tune property of phonetic symbol mark, spoken tricky question degree mark, voice recording quality annotation, answer content manually expands etc., and " examinee " here refers to the examinee participating in above-mentioned current SET.

Step S3: the artificial annotation results obtaining the calibration speech data that the artificial mark by calibrating expert obtains.

Step S4: utilize speech recognition system to carry out speech recognition to calibration speech data, obtain the voice identification result calibrating speech data.

This voice identification result such as comprises sound bite (voice border) corresponding to each basic voice unit of calibration speech data, voice content, recognition confidence etc., and this basic voice unit can be syllable, phoneme etc.

The demoder of speech recognition system will be decoded to calibration speech data based on acoustic model and language model, to obtain the voice identification result calibrating speech data, this acoustic model is such as based on MFCC (Mel-Frequency Cepstrum Coefficients, MFCC cepstrum) acoustic model of feature, based on PLP (Perceptual Linear Predictive, perception linear prediction) acoustic model of feature, based on HMM-GMM (Hidden Markov Model-Gaussian Mixture Model, hidden Markov model-gauss hybrid models) acoustic model, or based on DBN (Deep BeliefNetwork, degree of depth belief network) neural network acoustic model etc., decoding process such as adopts Viterbi to search for, and A* search etc. is decoded to calibration speech data.

Particularly, do not have the calibration speech data of text marking can obtain text corresponding to calibration speech data and basic speech unit sequence by continuous speech recognition for question-and-answer problem etc., and the voice identification result corresponding to each basic voice unit.Voice alignment thereof is then adopted to obtain the time boundary of the voice snippet corresponding to each basic voice unit for the calibration speech data reading aloud topic etc. and have model answer.

Step S5: extract the corresponding evaluation and test feature treating the different characteristic type of the standards of grading of school title type respectively from the voice identification result of calibration speech data.For being such as the not unique SET topic type of the Key for References such as question-and-answer problem, this step is specially knowledge based storehouse, from the voice identification result of calibration speech data, extract the corresponding evaluation and test feature treating the different characteristic type of the standards of grading of school title type respectively.Above-mentioned characteristic type according to standards of grading such as can comprise following one or more: integrity feature, pronunciation accuracy characteristics, fluency feature, prosodic features, grammer, semanteme etc., wherein:

This integrity feature is for describing the text integrity degree of basic speech unit sequence corresponding to voice segment sequence corresponding to answer network.In embodiments of the present invention, can by basic speech unit sequence be mated with answer network, obtain optimal path, using the matching degree of optimal path and basic speech unit sequence as integrity feature.

It should be noted that, for different topic types, the form of answer network can be different, such as, to reading aloud topic type, its Key for Reference is topic face words sequence, and for semi-open topic types such as question-and-answer problems, its Key for Reference is often made up of the answer main points determined and other complementary connection words.In addition due to the uncertainty of answer, its expression-form is often more, and corresponding answer network is made up of multiple Key for Reference usually, shows as the Key for Reference of multiple answer clause or grid configuration.

Certainly, when Key for Reference is not unique, the answer network of a Weight can also be built according to the probability of occurrence of each Key for Reference, and select corresponding weighted registration rate to calculate the matching degree of optimal path and basic speech unit sequence, using the matching degree of each for correspondence basic voice unit as integrity feature.

Further, in the answer network of semi-open topic type, the answer main points determined in answer check on one's answers correctness describe importance will far above other connectivity words, to check on one's answers the importance of integrity degree to highlight answer main points, the weight of different numerical value can be set answer main points and connection words respectively, the optimal path of basic speech unit sequence is searched in the answer network of Weight, and using the cumulative score of optimal path as matching degree.

This pronunciation accuracy characteristics is for describing the pronunciation standard degree of each voice snippet.Particularly, the similarity of acoustic feature corresponding to answer network of each voice snippet can be calculated respectively, using described similarity as pronunciation accuracy characteristics.

This fluency feature, for describing the smoothness of examinee's statement statement, includes but not limited to the average word speed of statement (ratio etc. as voice duration and voice unit number), the average flow length of statement, statement effectively pause ratio etc.In addition, in order to compensate the difference of different speaker in word speed, phoneme section feature can also be adopted, rear common composition fluency feature is normalized to all pronunciation parts.Particularly, can by the duration discrete probability distribution of statistics context-free phoneme, the log probability that after calculating normalization, duration is marked, obtains segment length's scoring of phoneme.

This prosodic features, for describing the rhythm feature of examinee's pronunciation, comprises the features such as pitch variation fluctuating.Particularly, the fundamental frequency characteristic sequence of each voice snippet can be extracted, also can obtain its dynamic change characterization further subsequently, as extracted the prosodic features as a supplement such as first order difference, second order difference.

This grammar property is for describing the grammer accuracy of basic speech unit sequence corresponding to grammer network.

This semantic feature is for describing the semantic accuracy of basic speech unit sequence corresponding to semantic network.

Step S6: in conjunction with the artificial annotation results of evaluation and test characteristic sum of each calibration speech data, the corresponding former scoring model treating school title type is optimized in training, obtain the new scoring model that correspondence treats school title type, calibrate the artificial scoring of speech data to make new scoring model as much as possible close to correspondence according to the final scoring that the evaluation and test feature of calibration speech data provides.Like this, with corresponding, relatively former scoring model can be treated that the standards of grading of school title type match by new scoring model more, and then can show performance of better marking.

The artificial annotation results participating in the training optimization of scoring model herein mainly comprises artificial scoring.

The speech data of answering due to the examinee of current SET and calibration expert reflect the actual conditions such as playback environ-ment, contents of test question, standards of grading of current SET, therefore, from correspondence treats all speech datas that the examinee of school title type answers, select part of speech data manually mark by calibrating expert, and utilize artificial annotation results to treat that the former scoring model of school title type is optimized to correspondence, the new scoring model after optimization can be enable to match with the standards of grading of current SET as much as possible.And then, if utilize the new scoring model after optimizing to carry out SET according to method as shown in Figure 1 evaluate and test the appraisal result that more will be met the actual conditions such as playback environ-ment, contents of test question, standards of grading of current SET.

Those skilled in the art can it is clear that, the above-mentioned embodiment that former scoring model is optimized, both made, for being such as the not unique SET topic type of the Key for References such as question-and-answer problem, also to obtain performance of better marking by relatively existing SET evaluating method.At this, in order to improve the SET evaluation and test accuracy for such SET topic type further, as shown in Figure 3, method of the present invention also can comprise the steps: further

Step S4a: utilize the artificial annotation results optimization correspondence of calibration speech data to treat the former knowledge base of school title type, obtain the new knowledge base that correspondence treats school title type.

On this basis, above-mentioned steps S5 is specially: based on new knowledge base, from the voice identification result of calibration speech data, extract the corresponding evaluation and test feature treating the different characteristic type of the standards of grading of school title type respectively.

The artificial annotation results participating in former knowledge base optimization herein mainly comprises artificial transcription data, sentence, word, carrying a tune property of phonetic symbol mark, spoken tricky question degree mark, voice recording quality annotation, answer content manually expands etc., particularly the artificial transcription data of artificial annotation results.

This new knowledge base such as comprises answer network, grammer network, semantic network, tricky question disaggregated model, theme and keyword relational model, collocations rule tree etc.

Because the artificial annotation results of calibrating speech data comprises the answer main points that correspondence treats school title type, so relatively former knowledge base can be covered contents of test question, the examination point that correspondence treats school title type by new knowledge base more that utilize the artificial annotation results optimization of this calibration speech data to obtain.

On this basis, the artificial annotation results optimization correspondence of calibration speech data is utilized to treat the former knowledge base of school title type, obtain correspondence and treat that the new knowledge base of school title type can comprise: utilize the artificial annotation results training personality language model of described calibration speech data to be increased in described former knowledge base, from the artificial annotation results of described calibration speech data, extract answer main points be increased in described former knowledge base, and using scoring artificial in artificial annotation results higher than setting the artificial transcription data of mark as at least one be increased to reference to answer in described former knowledge base.

Automatically the method extracting answer main points from the artificial annotation results of calibration speech data can be: according to manually marking, decile value from high to low divides N (natural number) individual subset to calibration speech data, the frequency of occurrences is calculated to the word in each subset, phrase or collocation, extracts in each subset the word of the setting number percent being such as 20% by the preference strategy that frequency is high, the word frequency of phrase or collocation and correspondence forms the part of a data model as new knowledge base.

Owing to needing to utilize speech recognition system to carry out speech recognition to calibration speech data in above-mentioned steps S4, therefore, speech recognition system also will affect the accuracy of evaluation and test feature extraction to a certain extent for the recognition performance of calibration speech data.In order to enable this speech recognition system adapt with the playback environ-ment of current SET, the pronunciation characteristics etc. of examinee, method of the present invention can also comprise the step be optimized speech recognition system, specifically comprises:

Step S8: at least select speech data based on part of speech data from correspondence treats all speech datas that the examinee of school title type answers, such as from correspondence treats all speech datas that the examinee of school title type answers by speech data based on the speech data of men and women's equal proportion random choose predetermined percentage (such as 5% ~ 15%), in the embodiment of carrying out speech recognition system optimization, also can select to select calibration speech data from basic speech data, such as from basic speech data, select the basic speech data of predetermined number as calibration speech data according to men and women's equal proportion.

Step S9: utilize basic speech data, carries out training optimization to the acoustic model in former speech recognition system and at least one in language model, obtains new speech recognition system.It should be understood that at this, for correspondence, the target that training is optimized should treat that the speech data that the examinee of school title type answers shows the recognition performance being better than former speech recognition system for making new speech recognition system, this recognition performance evaluation index is such as phoneme degree of confidence mean value.

Here can from correspondence treat all speech datas except basic speech data that the examinee of school title type answers, validating speech data based on the speech data selecting predetermined number or predetermined percentage; Utilize former speech recognition system to carry out speech recognition to basic validating speech data, and recognition performance evaluation index is met the basic validating speech data of impose a condition (such as phoneme degree of confidence mean value is greater than 80%) as final validating speech data; New speech recognition system is utilized to carry out speech recognition to final validating speech data, if the recognition performance evaluation index of the final validating speech data of correspondence that the recognition performance evaluation index of all final validating speech data decoded all decodes higher than former speech recognition system, then illustrate for correspondence, new speech recognition system treats that the speech data that the examinee of school title type answers shows the recognition performance being better than former speech recognition system, then training optimization is terminated, otherwise proceeds training optimization.

Utilize basic speech data to carry out training in the embodiment of optimization to former speech recognition system, utilizing speech recognition system to carry out speech recognition to described calibration speech data in above-mentioned steps S4 and can be: utilize new speech recognition system to carry out speech recognition to calibration speech data.

In above-mentioned steps S9, utilize basic speech data, carrying out training optimization to the acoustic model of former speech recognition system can comprise further:

Step S91: utilize former speech recognition system to carry out speech recognition to basic speech data, obtains the voice identification result of basic speech data.

Step S92: the data characteristics extracting the voice identification result of basic speech data.

Step S93: select data characteristics and meet the voice identification result of the basic speech data that setting requires as qualified language material, such as, select phoneme degree of confidence mean value and be greater than the voice identification result of the basic speech data of 80% as qualified language material.

Step S94: utilize qualified language material, such as, carry out training optimization based on maximum a posteriori probability (MAP, Maximuma Posteriori) adaptive algorithm to the acoustic model of former speech recognition system.

In above-mentioned steps S9, utilize basic speech data, carrying out training optimization to the language model of former speech recognition system can comprise further:

Step S95: the statement selecting the answer main points treating school title type described in comprising from basic speech data, based on statement, the extraction of these answer main points can see the explanation in step S4.

Step S96: utilize basic statement to carry out training optimization to the language model of former speech recognition system.

Carry out training the method optimized be such as to language model: to utilize basic statement to train individualized language model, and individualized language model and original language model are carried out interpolation with the certain weight coefficient ratio of 0.4 and 0.6 (such as with) be mixed to form newspeak model.

To mark optimization method corresponding with the school that above-mentioned SET is evaluated and tested, as shown in Figure 4, the school mark optimization system of SET evaluation and test of the present invention comprises Testing gateway load module 1, calibration speech data Choosing module 2, the calibration results acquisition module 3, recognition result acquisition module 5, characteristic extracting module 6 and scoring model optimization module 7.The spoken Testing gateway of this Testing gateway extraction module 1 for receiving current SET, and using this SET topic type as treating school title type; This calibration speech data Choosing module 2 for from treating to select in all speech datas that the examinee of school title type answers part of speech data described in correspondence as calibration speech data, manually to mark described calibration speech data by calibrating expert; This calibration results acquisition module 3 is for obtaining the artificial annotation results of the described calibration speech data obtained by described artificial mark; This recognition result acquisition module 5 carries out speech recognition for utilizing speech recognition system to described calibration speech data, obtains the voice identification result of described calibration speech data; This characteristic extracting module 6 is for extracting the evaluation and test feature of the different characteristic type of the standards of grading treating school title type described in correspondence respectively in the voice identification result from described calibration speech data; This scoring model optimizes module 7 for manually marking in conjunction with the evaluation and test characteristic sum of described calibration speech data, and the former scoring model treating school title type described in correspondence is optimized in training, obtains the new scoring model treating school title type described in correspondence.

Further, as shown in Figure 5, system of the present invention can also comprise knowledge base and optimize module 4, and this knowledge base optimizes module 4 for utilizing the former knowledge base treating school title type described in the artificial annotation results optimization correspondence of all calibration speech datas, obtains the new knowledge base treating school title type described in correspondence; On this basis, above-mentioned characteristic extracting module 6, specifically for based on described new knowledge base, extracts the evaluation and test feature of the different characteristic type of the standards of grading treating school title type described in correspondence respectively from the voice identification result of described calibration speech data.

Above-mentioned knowledge base optimizes module 4 also for by utilizing the artificial annotation results training personality language model of described calibration speech data to be increased in described former knowledge base, from the artificial annotation results of described calibration speech data, extract answer main points is increased in described former knowledge base, and using scoring artificial in the artificial annotation results of described calibration speech data higher than setting the artificial transcription data of mark as at least one mode be increased to reference to answer in described former knowledge base, obtain the new knowledge base that correspondence treats school title type.

This system also can comprise basic speech data sorting module (not shown) further and speech recognition system optimizes module (not shown), and this basic speech data sorting module is used for from treating described in correspondence at least to select speech data based on part of speech data all speech datas that the examinee of school title type answers; This speech recognition system is optimized module and is used for utilizing described basic speech data, carries out training optimization, obtain new speech recognition system to the acoustic model in former speech recognition system and at least one in language model.On the basis of this embodiment, above-mentioned sound bite acquisition module 5 is also for utilizing new speech recognition system to carry out speech recognition to described calibration speech data.

Above-mentioned speech recognition system optimizes module also can comprise acoustic model optimization unit further, this acoustic model is optimized unit and is used for utilizing described former speech recognition system to carry out speech recognition to described basic speech data, obtains the voice identification result of described basic speech data; For extracting the data characteristics of the voice identification result of described basic speech data; The basic speech fragment of setting requirement is met as qualified language material for selecting data characteristics; And for utilizing described qualified language material to carry out training optimization to the acoustic model of former speech recognition system.

Above-mentioned speech recognition system is optimized module and also can be comprised language model further and optimize unit, this language model optimize unit be used for selecting from described basic speech data comprise described in treat the statement of the answer main points of school title type, based on statement; And for utilizing described basic statement to carry out training optimization to the language model of former speech recognition system.

Each embodiment in this instructions all adopts the mode of going forward one by one to describe, between each embodiment identical similar part mutually see, what each embodiment stressed is the difference with other embodiments.Especially, for system embodiment, because it is substantially similar to embodiment of the method, so describe fairly simple, relevant part illustrates see the part of embodiment of the method.System embodiment described above is only schematic, the wherein said module that illustrates as separating component or unit or can may not be and physically separate, parts as module or unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of module wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.Those of ordinary skill in the art, when not paying creative work, are namely appreciated that and implement.

Structure of the present invention, feature and action effect is described in detail above according to graphic shown embodiment; the foregoing is only preferred embodiment of the present invention; but the present invention does not limit practical range with shown in drawing; every change done according to conception of the present invention; or be revised as the Equivalent embodiments of equivalent variations; do not exceed yet instructions with diagram contain spiritual time, all should in protection scope of the present invention.

Claims

1. a school mark optimization method for SET evaluation and test, is characterized in that, comprising:

2. method according to claim 1, is characterized in that, described method also comprises:

3. method according to claim 2, is characterized in that, described utilization described in the artificial annotation results optimization correspondence of calibration speech data treats that the former knowledge base of school title type comprises:

4. the method according to claim 1,2 or 3, is characterized in that, described method also comprises:

5. method according to claim 4, is characterized in that, describedly utilizes described basic speech data, carries out training optimization comprise the acoustic model of former speech recognition system:

6. method according to claim 4, is characterized in that, describedly utilizes described basic speech data, carries out training optimization comprise the language model of former speech recognition system:

7. a school mark optimization system for SET evaluation and test, is characterized in that, comprising:

8. system according to claim 7, is characterized in that, described system also comprises:

9. system according to claim 8, it is characterized in that, described knowledge base optimizes module specifically for by utilizing the artificial annotation results training personality language model of described calibration speech data to be increased in described former knowledge base, from the artificial annotation results of described calibration speech data, extract answer main points is increased in described former knowledge base, and in the artificial annotation results of described calibration speech data, select the artificial transcription data of artificial scoring higher than setting mark as at least one mode be increased to reference to answer in described former knowledge base, obtain the new knowledge base treating school title type described in correspondence.

10. the system according to claim 7,8 or 9, is characterized in that, described system also comprises:

11. systems according to claim 10, is characterized in that, described speech recognition system is optimized module and comprised acoustic model optimization unit;

12. systems according to claim 10, is characterized in that, described speech recognition system is optimized module and comprised language model optimization unit;