CN101739869B - Priori knowledge-based pronunciation evaluation and diagnosis system - Google Patents

Priori knowledge-based pronunciation evaluation and diagnosis system Download PDF

Info

Publication number
CN101739869B
CN101739869B CN2008102266752A CN200810226675A CN101739869B CN 101739869 B CN101739869 B CN 101739869B CN 2008102266752 A CN2008102266752 A CN 2008102266752A CN 200810226675 A CN200810226675 A CN 200810226675A CN 101739869 B CN101739869 B CN 101739869B
Authority
CN
China
Prior art keywords
priori
pronunciation
phoneme
pronunciation evaluation
evaluation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2008102266752A
Other languages
Chinese (zh)
Other versions
CN101739869A (en
Inventor
徐波
徐爽
江杰
陈振标
浦剑涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN2008102266752A priority Critical patent/CN101739869B/en
Publication of CN101739869A publication Critical patent/CN101739869A/en
Application granted granted Critical
Publication of CN101739869B publication Critical patent/CN101739869B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a priori knowledge-based pronunciation evaluation and diagnosis system, which comprises a voice preprocessing unit, a pronunciation evaluation unit, an evaluation confirmation and diagnosis unit, a model and priori knowledge base unit, and an evaluation information and diagnosis information output unit. The priori knowledge is applied in two aspects of the priori knowledge-based pronunciation evaluation and diagnosis system: firstly, performing correction on the traditional posterior probability by using the priori knowledge of the mispronunciation of a confusable phoneme pair, and adopting the corrected posterior probability to perform the pronunciation evaluation; secondly, using the priori knowledge of the discriminative feature of the confusable phoneme pair, adopting a discriminative feature and sorter-based method, confirming an evaluation result, and obtaining better evaluation performance to provide the diagnostic information for a learner from the more basic and fine standpoint and help the learner to correct and improve the pronunciation. The priori knowledge-based pronunciation evaluation and diagnosis system can meet the requirement on high reliability and high accuracy in learning and testing Putonghua, and is an innovative and effective method.

Description

A kind of pronunciation evaluation and diagnostic system based on priori
Technical field
The present invention relates to computer-assisted language learning and voice processing technology field, particularly a kind of pronunciation evaluation and diagnostic system based on priori.
Background technology
Mandarin study and method of testing that China is traditional are being faced with the active demand of popularization standard spoken Chinese and the conspicuous contradiction of study and test condition deficiency at present.Mandarin study needs to correct to student's other problem, needs long interactive interchange, and these conditions can't satisfy with present qualified teachers.Simultaneously, SET is the effective means of check learning outcome, but the manpower organization that takes time and effort, and inevitably fairness problem and being difficult to provides factor such as feedback information to become the bottleneck of restriction mandarin SET development.Overcome the above problems, computer-assisted language learning is a feasible scheme with test.The development of Along with computer technology and speech recognition technology; Function development carries out automatic scoring, points out mispronounce and according to mispronounce diagnostic message is provided to the pronunciation level to the language learner from initial carrying out hearing, read, write etc. for computerese study and test macro; Thereby the pronunciation level to the learner carries out comprehensive test and helps the learner to improve pronunciation, improves learning efficiency.Therefore, as the core of next generation computer assisting language learning and test macro, pronunciation evaluation and diagnostic techniques receive publicity day by day automatically.
Present automatic pronunciation evaluation and diagnostic techniques are based on pronunciation evaluation and error detection strategy under the statistics of speech recognition framework.It at first carries out phoneme and cuts apart the input voice; To cutting apart each phoneme that obtains, calculate logarithm posterior probability or its reduced form as pronunciation character, carry out pronunciation quality assessment; Pronunciation level to the learner provides corresponding score, and adopts unified thresholding to carry out error-detecting.
The problem that said method faced is: at first, the accuracy of pronunciation evaluation and diagnosis is not very high, and is particularly right to some easy confusion tone elements that in the reality pronunciation, often make a mistake, that pronunciation is very approaching again.Secondly, can only just provide corresponding score, can't provide the diagnostic message that directive significance is more arranged to learner's pronunciation level.For overcoming the above problems, the present invention makes up a pronunciation evaluation and a diagnostic system based on priori, and this system not only can provide mark to learner's pronunciation, and can provide more careful diagnostic message.
Summary of the invention
The technical matters that (one) will solve
In view of this; Fundamental purpose of the present invention is the shortcoming to existing pronunciation evaluation and diagnostic method; Introduce the priori expertise in the teaching of linguistics and mandarin, a kind of pronunciation evaluation and diagnostic system based on priori is provided, to improve the efficient and the effect of language learning and test.
(2) technical scheme
For achieving the above object, the technical scheme that the present invention adopts is following:
A kind of pronunciation evaluation and diagnostic system based on priori, this system comprises:
The voice pretreatment unit is used for the raw tone of learner input is carried out pre-service, realizing the affirmation to the voice substance, the voice of the basic conformance with standard script of content is divided into the junior unit of phoneme level, is input to the pronunciation evaluation unit and differentiates;
The pronunciation evaluation unit; Be used for the voice of input are carried out preliminary pronunciation quality assessment; Utilize the plain right mispronounce priori of easy confusion tone that traditional posterior probability is revised; Carry out pronunciation evaluation based on revised posterior probability, the posterior probability that calculates can convert mark or the grade of weighing pronunciation level intuitively into through mapping model;
Pronunciation evaluation is confirmed and diagnosis unit; Be used for entry evaluation result to the input of pronunciation evaluation unit; Utilize the priori of the right distinctive feature of easy confusion tone element; Employing is carried out pronunciation evaluation result's affirmation based on the method for distinctive feature and sorter, and from the acoustic phonetics angle pronunciation diagnostic message is provided;
Model and priori library unit, the model that is used to preserve the phoneme alignment and calculates posterior probability, and priori storehouse; And
Appreciation information and diagnostic message output unit are used to export the mark, the locating information of mispronounce, the mispronounce type that comprise classification and marking result's pronunciation evaluation, and provide the guidance instruction of rectification.
In the such scheme, said voice pretreatment unit comprises:
The end-point detection subelement is used for distinguishing voice and non-speech audio from signal, confirms the starting point and the terminal point of voice;
The feature extraction subelement is used to calculate the parameters,acoustic of efficient voice, and carries out the calculating of characteristic, extracts the key characterization parameter of reflected signal characteristic;
The content check subelement is used for the voice of input are carried out the checking of content aspect, if the content and the given content of text difference of input pronunciation are little, then voice is carried out follow-up pronunciation evaluation and diagnosis; For content and the widely different voice of given text, no longer carry out follow-up pronunciation evaluation and diagnosis, directly be judged as the user pronunciation mistake;
Phoneme alignment subelement is used for the efficient voice of input is divided into the unit of phoneme level, so that follow-up processing.
In the such scheme; The key characterization parameter of said reflected signal characteristic is the Mel frequency cepstral coefficient MFCC of reflection human auditory system characteristic; Comprise by 12 dimension cepstrum values and add the static nature that 1 dimension energy value constitutes, and the single order behavioral characteristics of this static nature and second order behavioral characteristics.
In the such scheme, said phoneme alignment subelement adopts Viterbi Viterbi algorithm that the efficient voice of input is divided into the unit of phoneme level, realizes the alignment of phoneme.
In the such scheme; Said pronunciation evaluation unit further adopts branch phoneme thresholding strategy; Different phonemes is adopted different threshold values, and the phoneme that is lower than respective threshold is a mispronounce by preliminary judgement, is input to that pronunciation evaluation is confirmed and diagnosis unit is confirmed and error diagnosis.
In the such scheme, said pronunciation evaluation affirmation and diagnosis unit are when utilizing the priori of the right distinctive feature of easy confusion tone element, for specific mispronounce type; Utilize the priori of the distinctive feature of acoustic phonetics aspect, orthoepy and incorrect pronunciations are differentiated, thereby carry out mispronounce detection and diagnosis; Specifically be earlier according to priori, extract distinctive feature, obtain the acoustic phonetics distinctive feature after; Train two types of sorters; A sorter set up in each phoneme, obtained by the acoustic phonetics distinctive feature training of the carry a tune sample and the mispronounce sample extraction of this phoneme, because the distinctive feature kind of different specific pronunciation mistakes is different; So to different phonemes, used distinctive feature kind is different with dimension.
In the such scheme, after said pronunciation evaluation affirmation and diagnosis unit utilized the plain right distinctive feature priori of easy confusion tone to make up two types of sorters, the entry evaluation result that the pronunciation evaluation unit is exported confirmed; According to mispronounce priori and distinctive feature priori, generate the correspondence table of a phoneme and distinctive feature and sorter in advance; From this table, finding certain phoneme should corresponding which kind of characteristic and sorter, to carry out error-detecting; Whether the result of error-detecting can be used as confirmation on the one hand, correct in order to confirm the entry evaluation result, on the other hand, can obtain more careful diagnostic message according to the distinctive feature of this phoneme use.
In the such scheme; Said model and priori library unit are made up of model and priori storehouse; Wherein, model comprises phoneme master pattern and classification marking model, and the priori storehouse is the set of priori; For pronunciation evaluation unit and pronunciation evaluation are confirmed and diagnosis unit provides foregoing two kinds of prioris, and the correspondence table that provides phoneme and priori to shine upon mutually.
In the such scheme, said phoneme master pattern is the HMM model, is used for the phoneme alignment and calculates posterior probability.
In the such scheme, said classification marking model is the mapping model that is obtained by expert's subjective scores and objective posterior probability values, is used for converting posterior probability values into weigh voice quality score value or grade.
In the such scheme, said priori is obtained by system in advance, is obtained the knowledge that perhaps directly adopts the voice linguist to sum up through data driven technique by a large amount of speech datas.
(3) beneficial effect
Can find out that from technique scheme the present invention has following beneficial effect:
This pronunciation evaluation and diagnostic system provided by the invention based on priori, relatively more flexible and abundant to the utilization of priori.Priori is applied from two aspects: at first utilize the plain right mispronounce priori of easy confusion tone that traditional posterior probability is revised, adopt revised posterior probability to carry out pronunciation evaluation.Secondly, utilize the plain right distinctive feature priori of easy confusion tone, adopt method, confirm assessment result, obtain better assessed for performance, from more basis, finer angle provide diagnostic message for the learner based on distinctive feature and sorter.The regular number of priori can added or delete to the application of this two aspect according to actual conditions.Because the introducing of priori, the present invention can make accurate judgement to common easy confusion tone element, not only can just provide corresponding mark to pronunciation level, and the diagnostic message that directive significance is more arranged can be provided.In addition, the branch phoneme thresholding strategy of posterior probability, based on the reasonable flow process of the method for distinctive feature and sorter, total system all guaranteed efficiently, the carrying out of pronunciation evaluation and diagnosis accurately.
Description of drawings
Fig. 1 is provided by the invention based on the pronunciation evaluation of priori and the synoptic diagram of diagnostic system.
Embodiment
For making the object of the invention, technical scheme and advantage clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, to further explain of the present invention.
The priori expertise is being applied aspect two of native system: at first, utilize the plain right mispronounce priori of easy confusion tone that traditional posterior probability is revised, adopt revised posterior probability to carry out pronunciation evaluation.Secondly, utilize the plain right distinctive feature priori of easy confusion tone, adopt method based on distinctive feature and sorter; Confirm assessment result; Obtain better assessed for performance, from more basis, finer angle provide diagnostic message for the learner, help learner rectification and improvement pronunciation.
Pronunciation evaluation and diagnostic system based on priori proposed by the invention mainly comprise five unit: voice pretreatment unit, pronunciation evaluation unit, pronunciation evaluation affirmation and diagnosis unit, model and priori library unit, appreciation information and diagnostic message output unit.Specific as follows:
1. voice pretreatment unit
The voice pretreatment unit is used for the raw tone of learner input is carried out pre-service, realizing the affirmation to the voice substance, the voice of the basic conformance with standard script of content is divided into the junior unit of phoneme level, is input to the pronunciation evaluation unit and differentiates.
The voice pretreatment unit mainly comprises following four sub-cells: end-point detection subelement, feature extraction subelement, content check subelement and phoneme alignment subelement, and the function of each subelement is specific as follows:
The end-point detection subelement is used for distinguishing voice and non-speech audio from signal, confirms the starting point and the terminal point of voice.In actual environment, ground unrest is very big to the influence of assessment and diagnostic system, and under the low situation of signal to noise ratio (S/N ratio), system can't correctly judge the voice of input.From ground unrest, detect the beginning and the end position of voice effective range exactly, deletion does not contain the ground unrest of voice, not only can improve system performance, can also reduce the data volume of processing, thereby reduces the processing time.
The feature extraction subelement is used to calculate the parameters,acoustic of efficient voice, and carries out the calculating of characteristic, extracts the key characterization parameter of reflected signal characteristic, to reduce dimension and to be convenient to the carrying out of subsequent step.The characteristic parameter that uses in the native system is the Mel frequency cepstral coefficient (MFCC) of reflection human auditory system characteristic, comprises by 12 dimension cepstrum values to add the static nature that 1 dimension energy value constitutes, and the single order behavioral characteristics of this static nature and second order behavioral characteristics.
The content check subelement is used for the voice of input are carried out the checking of content aspect, if the content and the given content of text difference of input pronunciation are little, then voice is carried out follow-up pronunciation evaluation and diagnosis; For content and the widely different voice of given text, no longer carry out follow-up pronunciation evaluation and diagnosis, directly be judged as the user pronunciation mistake.
Phoneme alignment subelement is used for the efficient voice of input is divided into the unit of phoneme level, so that follow-up processing.Viterbi (Viterbi) algorithm is adopted in the phoneme alignment of native system.
2. pronunciation evaluation unit
The pronunciation evaluation unit is used for the voice of input are carried out preliminary pronunciation quality assessment; Utilize the plain right mispronounce priori of easy confusion tone that traditional posterior probability is revised; Carry out pronunciation evaluation based on revised posterior probability, the posterior probability that calculates can convert mark or the grade of weighing pronunciation level intuitively into through mapping model.Simultaneously, adopt to divide a phoneme thresholding strategy, different phonemes is adopted different threshold values, the phoneme that is lower than respective threshold is a mispronounce by preliminary judgement, is input to that pronunciation evaluation is confirmed and diagnosis unit is confirmed and error diagnosis.
1) based on the pronunciation evaluation of revising posterior probability
The linguist studies for a long period of time and shows, mispronounce or defective are divided into two types: one type generates owing to not being familiar with word or being unfamiliar with pronunciation rule, another kind ofly generated by the influence of mother tongue or dialect.The latter's regularity is stronger, often also is the normal typical mispronounce of violating of learner, should be paid attention to more fully and feed back.The regularity of this mispronounce can be used as priori, is incorporated in pronunciation evaluation and the diagnostic system, and in the present invention, it is used to revise traditional posterior probability calculation method, to obtain better pronunciation evaluation performance.
To phoneme q i, traditional posterior probability is defined as:
P i = 1 T i log ( Prob ( o i | q i ) Max j ∈ Q ( Prob ( o j | q j ) ) )
Wherein, P iBe corresponding pronunciation data o iTo phoneme q iPosterior probability, Prob (o i| q i) be phoneme q iLikelihood score, T iPhoneme q iDuration, Q is a model set.
Generally, Q gets all phonemes or works as q iDuring for initial consonant, Q gets initial consonant set, q iDuring for simple or compound vowel of a Chinese syllable, Q gets the rhythm superclass.
Introduce the mispronounce priori, o iTo phoneme q iPosterior probability be improved to:
P i = 1 T i log ( Prob ( o i | q i ) Max j ∈ Q i ( Prob ( o j | q j ) ) )
Wherein, Q iBe phoneme q iThe model set of common mispronounce type.
Adopt above computing method; Be the computer memory that has reduced denominator in the posterior probability computation process in essence; This has not only improved the computing velocity of posterior probability; And owing to got rid of the influence of outer other easy confusion tone prime model of common typical fault, so strengthened the detectability of common typical fault.
For obtaining of mispronounce priori, a kind of method is the mispronounce fundamental type that directly utilizes the linguist to sum up, and another kind of method is to adopt data driven technique, from a large amount of speech datas, adds up obtaining.The form that the present invention adopts two kinds of methods to combine obtains final mispronounce priori, and employed part mispronounce priori is as shown in table 1, and table 1 is the plain right mispronounce priori of easy confusion tone.
Figure GSB00000431872400071
Figure GSB00000431872400081
Table 1
Above-mentioned is the correction posterior probability of calculating to some phonemes, adopts the method for mapping, can be mapped to posterior probability values on the branch system that system needs, and keeps consistent with subjective testing.Mapping method can be taked linear method and nonlinear method, and linear method is fairly simple, and nonlinear method more meets objective reality.After mapping; System obtains the mark of learner to this phoneme pronunciation; And to learner's the overall evaluation; Can with the posterior probability values of each phoneme on vocabulary or the enterprising professional etiquette of whole flow whole, obtain shining upon again to obtain the overall evaluation after the posterior probability of this vocabulary or whole flow.Regular method can be the average or weighted mean of simple all phoneme posterior probability.
2) divide phoneme thresholding strategy
Above-mentioned appraisal procedure can obtain the pronunciation evaluation mark to phoneme, vocabulary and whole flow, and for the higher phoneme of posterior probability, the pronunciation evaluation mark should be than higher, and for posterior probability phoneme on the low side, the pronunciation evaluation mark is just lower.For pronunciation mark phoneme on the low side, we need set thresholding on the posterior probability level, and the phoneme that is lower than this thresholding is input to next unit and handles, so that finer pronunciation diagnostic message to be provided.Traditional gate method adopts unified thresholding to carry out error-detecting; Because the posterior probability of each phoneme model distributes and is inequality; And more outstanding behind this employing posterior probability modification method that do not coexist, so the present invention adopts branch phoneme thresholding strategy, different phonemes is adopted different threshold.Threshold value is obtained by training utterance.
3. pronunciation evaluation is confirmed and diagnosis unit
Pronunciation evaluation is confirmed and the effect of diagnosis unit is that the result that pronunciation evaluation obtains is confirmed and carries out the mispronounce diagnosis.It is to the phoneme information (comprising the entry evaluation result) of pronunciation evaluation unit input; Utilize the priori of the right distinctive feature of easy confusion tone element; Employing is based on the method for distinctive feature and sorter; Carry out pronunciation evaluation result's affirmation, and the pronunciation diagnostic message is provided from the acoustic phonetics angle.
1) utilization of distinctive feature priori
For specific mispronounce type, utilize the priori of the distinctive feature of acoustic phonetics aspect, can orthoepy and incorrect pronunciations be differentiated, thereby carry out mispronounce detection and diagnosis.On the one hand, this method can remedy the posterior probability characteristic defective not good to some false assessment performance, and the posterior probability assessment result is confirmed, reduces wrong report.On the other hand, the acoustic phonetics distinctive feature is closely related with pronunciation mechanism, has tangible physiology physical characteristics, can more careful and detailed feedback be provided to the learner, more helps the grasp of learner to the defective sound.
In this method,, extract distinctive feature at first according to priori.The distinctive feature that the present invention uses is as shown in table 2, and table 2 is the plain right differentiation property characteristic prioris of easy confusion tone.
Figure GSB00000431872400091
Figure GSB00000431872400101
Table 2
After obtaining the acoustic phonetics distinctive feature, can train two types of sorters.Each phoneme is set up one, is obtained by the acoustic phonetics distinctive feature training of the carry a tune sample and the mispronounce sample extraction of this phoneme.Because the distinctive feature kind of different specific pronunciation mistakes is different, so to different phonemes, used distinctive feature kind is different with dimension.
2) pronunciation evaluation is confirmed and diagnosis
Utilize the plain right distinctive feature priori of easy confusion tone, makes up two types of sorters after, just can the entry evaluation result that the pronunciation evaluation unit is exported be confirmed.According to mispronounce priori and distinctive feature priori, system generates the correspondence table of a phoneme and distinctive feature and sorter in advance.From table, can find certain phoneme should corresponding which kind of characteristic and sorter, to carry out error-detecting.Whether the result of error-detecting can be used as confirmation on the one hand, correct in order to confirm the entry evaluation result, on the other hand, can obtain more careful diagnostic message according to the distinctive feature of this phoneme use.
4. model and priori library unit
Model and priori library unit are made up of model and priori storehouse, the model that is used to preserve the phoneme alignment and calculates posterior probability, and priori storehouse.Wherein, model comprises the master pattern and the classification marking model of phoneme.The phoneme unit model is generally the HMM model, is used for the phoneme alignment and calculates posterior probability.Classification marking model is the mapping model that is obtained by expert's subjective scores and objective posterior probability values, is used for converting posterior probability values into weigh voice quality score value or grade.The priori storehouse is the set of priori, and it is for the pronunciation evaluation unit and pronunciation evaluation is confirmed and diagnosis unit provides foregoing two kinds of prioris, and the correspondence table that provides phoneme and priori to shine upon mutually.Priori is obtained by system in advance, can be obtained the knowledge that also can directly adopt the voice linguist to sum up by a large amount of speech datas through data driven technique.
5. appreciation information and diagnostic message output unit
Appreciation information and diagnostic message output unit are used to export mark (comprising classification and marking result), the locating information of mispronounce, the mispronounce type of pronunciation evaluation, and provide the guidance instruction of rectification.The output form of this unit has diversified characteristics, merges figure, table, literal and voice, has good user interface.
With reference to Fig. 1, the frame of broken lines on the left side partly is model and priori library unit, is obtained in advance by system once more.The right is appreciation information and diagnostic message output unit, in order to the output system net result.Center section comprises three process units: voice pretreatment unit, pronunciation evaluation unit, pronunciation evaluation are confirmed and diagnosis unit.Interaction flow between each unit is following:
System at first carries out pre-service to the voice of input, and through end-point detection, feature extraction, processes such as content check and phoneme alignment are divided into the junior unit of phoneme level with learner's voice, are input to the pronunciation evaluation unit.The pronunciation evaluation unit then calculates the correction posterior probability of phoneme according to the plain right mispronounce priori of easy confusion tone.For the posterior probability values that calculates, on the one hand be mark or grade intuitively through the classification model conversion of giving a mark, the phoneme thresholding with correspondence compares on the other hand.When posterior probability is lower than respective threshold, this phoneme pronunciation mistake of preliminary judgement.Thresholding pre-sets according to desired system performance.Then, tentatively be judged as wrong phoneme information and be imported into that assessment is confirmed and diagnosis unit is done further processing.At first according to the distinctive feature priori of phoneme, extract the corresponding acoustic phonetics distinctive feature of this phoneme, classify then, provide information such as whether wrong, errors present and corresponding rectification suggestion.At last, provide the net result of system after the out of Memory output unit on the right of the fraction levels of pronunciation evaluation unit output and pronunciation evaluation affirmation and diagnosis unit output merges.The principle that merges is that pronunciation evaluation is confirmed and the information of diagnosis unit output plays role of correcting to pronunciation evaluation unit output information, with the rate of false alarm of minimizing system to mispronounce.
Above-described specific embodiment; The object of the invention, technical scheme and beneficial effect have been carried out further explain, and institute it should be understood that the above is merely specific embodiment of the present invention; Be not limited to the present invention; All within spirit of the present invention and principle, any modification of being made, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (11)

1. pronunciation evaluation and diagnostic system based on a priori is characterized in that this system comprises:
The voice pretreatment unit is used for the raw tone of learner input is carried out pre-service, realizing the affirmation to the voice substance, the voice of the basic conformance with standard script of content is divided into the junior unit of phoneme level, is input to the pronunciation evaluation unit and differentiates;
The pronunciation evaluation unit; Be used for the voice of input are carried out preliminary pronunciation quality assessment; Utilize the plain right mispronounce priori of easy confusion tone that traditional posterior probability is revised; Carry out pronunciation evaluation based on revised posterior probability, the posterior probability that calculates can convert mark or the grade of weighing pronunciation level intuitively into through mapping model;
Pronunciation evaluation is confirmed and diagnosis unit; Be used for entry evaluation result to the input of pronunciation evaluation unit; Utilize the priori of the right distinctive feature of easy confusion tone element; Employing is carried out pronunciation evaluation result's affirmation based on the method for distinctive feature and sorter, and from the acoustic phonetics angle pronunciation diagnostic message is provided;
Model and priori library unit, the model that is used to preserve the phoneme alignment and calculates posterior probability, and priori storehouse; And
Appreciation information and diagnostic message output unit are used to export the mark, the locating information of mispronounce, the mispronounce type that comprise classification and marking result's pronunciation evaluation, and provide the guidance instruction of rectification.
2. pronunciation evaluation and diagnostic system based on priori according to claim 1 is characterized in that, said voice pretreatment unit comprises:
The end-point detection subelement is used for distinguishing voice and non-speech audio from signal, confirms the starting point and the terminal point of voice;
The feature extraction subelement is used to calculate the parameters,acoustic of efficient voice, and carries out the calculating of characteristic, extracts the key characterization parameter of reflected signal characteristic;
The content check subelement is used for the voice of input are carried out the checking of content aspect, if the content and the given content of text difference of input pronunciation are little, then voice is carried out follow-up pronunciation evaluation and diagnosis; For content and the widely different voice of given text, no longer carry out follow-up pronunciation evaluation and diagnosis, directly be judged as the user pronunciation mistake;
Phoneme alignment subelement is used for the efficient voice of input is divided into the unit of phoneme level, so that follow-up processing.
3. pronunciation evaluation and diagnostic system based on priori according to claim 2; It is characterized in that; The key characterization parameter of said reflected signal characteristic is the Mel frequency cepstral coefficient MFCC of reflection human auditory system characteristic; Comprise by 12 dimension cepstrum values and add the static nature that 1 dimension energy value constitutes, and the single order behavioral characteristics of this static nature and second order behavioral characteristics.
4. pronunciation evaluation and diagnostic system based on priori according to claim 2 is characterized in that, said phoneme alignment subelement adopts Viterbi Viterbi algorithm that the efficient voice of input is divided into the unit of phoneme level, realizes the alignment of phoneme.
5. pronunciation evaluation and diagnostic system based on priori according to claim 1; It is characterized in that; Said pronunciation evaluation unit further adopts branch phoneme thresholding strategy; Different phonemes is adopted different threshold values, and the phoneme that is lower than respective threshold is a mispronounce by preliminary judgement, is input to that pronunciation evaluation is confirmed and diagnosis unit is confirmed and error diagnosis.
6. pronunciation evaluation and diagnostic system based on priori according to claim 1 is characterized in that, said pronunciation evaluation affirmation and diagnosis unit are when utilizing the priori of the right distinctive feature of easy confusion tone element; For specific mispronounce type, utilize the priori of the distinctive feature of acoustic phonetics aspect, orthoepy and incorrect pronunciations are differentiated; Thereby carry out mispronounce detection and diagnosis; Specifically be earlier according to priori, extract distinctive feature, obtain the acoustic phonetics distinctive feature after; Train two types of sorters; A sorter set up in each phoneme, obtained by the acoustic phonetics distinctive feature training of the carry a tune sample and the mispronounce sample extraction of this phoneme, because the distinctive feature kind of different specific pronunciation mistakes is different; So to different phonemes, used distinctive feature kind is different with dimension.
7. pronunciation evaluation and diagnostic system based on priori according to claim 6; It is characterized in that; After said pronunciation evaluation affirmation and diagnosis unit utilized the plain right distinctive feature priori of easy confusion tone to make up two types of sorters, the entry evaluation result that the pronunciation evaluation unit is exported confirmed; According to mispronounce priori and distinctive feature priori, generate the correspondence table of a phoneme and distinctive feature and sorter in advance; From this table, finding certain phoneme should corresponding which kind of characteristic and sorter, to carry out error-detecting; Whether the result of error-detecting can be used as confirmation on the one hand, correct in order to confirm the entry evaluation result, on the other hand, can obtain more careful diagnostic message according to the distinctive feature of this phoneme use.
8. pronunciation evaluation and diagnostic system based on priori according to claim 1; It is characterized in that said model and priori library unit are made up of model and priori storehouse, wherein; Model comprises phoneme master pattern and classification marking model; The priori storehouse is the set of priori, for pronunciation evaluation unit and pronunciation evaluation are confirmed and diagnosis unit provides foregoing two kinds of prioris, and the correspondence table that provides phoneme and priori to shine upon mutually.
9. pronunciation evaluation and diagnostic system based on priori according to claim 8 is characterized in that said phoneme master pattern is the HMM model, are used for the phoneme alignment and calculate posterior probability.
10. pronunciation evaluation and diagnostic system based on priori according to claim 8; It is characterized in that; Said classification marking model is the mapping model that is obtained by expert's subjective scores and objective posterior probability values, is used for converting posterior probability values into weigh voice quality score value or grade.
11. pronunciation evaluation and diagnostic system based on priori according to claim 8; It is characterized in that; Said priori is obtained by system in advance, is obtained the knowledge that perhaps directly adopts the voice linguist to sum up through data driven technique by a large amount of speech datas.
CN2008102266752A 2008-11-19 2008-11-19 Priori knowledge-based pronunciation evaluation and diagnosis system Active CN101739869B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008102266752A CN101739869B (en) 2008-11-19 2008-11-19 Priori knowledge-based pronunciation evaluation and diagnosis system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008102266752A CN101739869B (en) 2008-11-19 2008-11-19 Priori knowledge-based pronunciation evaluation and diagnosis system

Publications (2)

Publication Number Publication Date
CN101739869A CN101739869A (en) 2010-06-16
CN101739869B true CN101739869B (en) 2012-03-28

Family

ID=42463294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008102266752A Active CN101739869B (en) 2008-11-19 2008-11-19 Priori knowledge-based pronunciation evaluation and diagnosis system

Country Status (1)

Country Link
CN (1) CN101739869B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9058751B2 (en) * 2011-11-21 2015-06-16 Age Of Learning, Inc. Language phoneme practice engine
CN102682768A (en) * 2012-04-23 2012-09-19 天津大学 Chinese language learning system based on speech recognition technology
CN103716470B (en) * 2012-09-29 2016-12-07 华为技术有限公司 The method and apparatus of Voice Quality Monitor
CN104599680B (en) * 2013-10-30 2019-11-26 语冠信息技术(上海)有限公司 Real-time spoken evaluation system and method in mobile device
CN107077863A (en) * 2014-08-15 2017-08-18 智能-枢纽私人有限公司 Method and system for the auxiliary improvement user speech in appointed language
CN105374352B (en) * 2014-08-22 2019-06-18 中国科学院声学研究所 A kind of voice activated method and system
JP2016045420A (en) * 2014-08-25 2016-04-04 カシオ計算機株式会社 Pronunciation learning support device and program
CN106073706B (en) * 2016-06-01 2019-08-20 中国科学院软件研究所 A kind of customized information and audio data analysis method and system towards Mini-mental Status Examination
WO2018077244A1 (en) * 2016-10-27 2018-05-03 The Chinese University Of Hong Kong Acoustic-graphemic model and acoustic-graphemic-phonemic model for computer-aided pronunciation training and speech processing
CN106611048A (en) * 2016-12-20 2017-05-03 李坤 Language learning system with online voice assessment and voice interaction functions
CN107221343B (en) * 2017-05-19 2020-05-19 北京市农林科学院 Data quality evaluation method and evaluation system
CN108053841A (en) * 2017-10-23 2018-05-18 平安科技(深圳)有限公司 The method and application server of disease forecasting are carried out using voice
CN108053822B (en) * 2017-11-03 2021-01-15 深圳和而泰智能控制股份有限公司 Voice signal processing method and device, terminal equipment and medium
CN108133436A (en) * 2017-11-23 2018-06-08 科大讯飞股份有限公司 Automatic method and system of deciding a case
CN107886968B (en) * 2017-12-28 2021-08-24 广州讯飞易听说网络科技有限公司 Voice evaluation method and system
CN109712643A (en) * 2019-03-13 2019-05-03 北京精鸿软件科技有限公司 The method and apparatus of Speech Assessment
CN109979257B (en) * 2019-04-27 2021-01-08 深圳市数字星河科技有限公司 Method for performing accurate splitting operation correction based on English reading automatic scoring
CN110211566A (en) * 2019-06-08 2019-09-06 安徽中医药大学 A kind of classification method of compressed sensing based hepatolenticular degeneration disfluency
CN110415725B (en) * 2019-07-15 2020-06-02 北京语言大学 Method and system for evaluating pronunciation quality of second language using first language data
CN111292769A (en) * 2020-03-04 2020-06-16 苏州驰声信息科技有限公司 Method, system, device and storage medium for correcting pronunciation of spoken language
CN112992184B (en) * 2021-04-20 2021-09-10 北京世纪好未来教育科技有限公司 Pronunciation evaluation method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1316083A (en) * 1999-05-13 2001-10-03 奥迪纳特公司 Automated language assessment using speech recognition modeling
JP2006084966A (en) * 2004-09-17 2006-03-30 Advanced Telecommunication Research Institute International Automatic evaluating device of uttered voice and computer program
CN101105939A (en) * 2007-09-04 2008-01-16 安徽科大讯飞信息科技股份有限公司 Sonification guiding method
JP2008040035A (en) * 2006-08-04 2008-02-21 Advanced Telecommunication Research Institute International Pronunciation evaluation apparatus and program
CN101246685A (en) * 2008-03-17 2008-08-20 清华大学 Pronunciation quality evaluation method of computer auxiliary language learning system
JP2008262120A (en) * 2007-04-13 2008-10-30 Nippon Hoso Kyokai <Nhk> Utterance evaluation device and program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1316083A (en) * 1999-05-13 2001-10-03 奥迪纳特公司 Automated language assessment using speech recognition modeling
JP2006084966A (en) * 2004-09-17 2006-03-30 Advanced Telecommunication Research Institute International Automatic evaluating device of uttered voice and computer program
JP2008040035A (en) * 2006-08-04 2008-02-21 Advanced Telecommunication Research Institute International Pronunciation evaluation apparatus and program
JP2008262120A (en) * 2007-04-13 2008-10-30 Nippon Hoso Kyokai <Nhk> Utterance evaluation device and program
CN101105939A (en) * 2007-09-04 2008-01-16 安徽科大讯飞信息科技股份有限公司 Sonification guiding method
CN101246685A (en) * 2008-03-17 2008-08-20 清华大学 Pronunciation quality evaluation method of computer auxiliary language learning system

Also Published As

Publication number Publication date
CN101739869A (en) 2010-06-16

Similar Documents

Publication Publication Date Title
CN101739869B (en) Priori knowledge-based pronunciation evaluation and diagnosis system
CN101826263B (en) Objective standard based automatic oral evaluation system
CN105845134B (en) Spoken language evaluation method and system for freely reading question types
CN101740024B (en) Method for automatic evaluation of spoken language fluency based on generalized fluency
CN102568475B (en) System and method for assessing proficiency in Putonghua
CN103578465B (en) Speech identifying method and electronic installation
CN101645271B (en) Rapid confidence-calculation method in pronunciation quality evaluation system
US9613638B2 (en) Computer-implemented systems and methods for determining an intelligibility score for speech
Gao et al. A study on robust detection of pronunciation erroneous tendency based on deep neural network.
Wang et al. Towards automatic assessment of spontaneous spoken English
CN101551947A (en) Computer system for assisting spoken language learning
Mostow Why and how our automated reading tutor listens
CN103559892A (en) Method and system for evaluating spoken language
CN103985392A (en) Phoneme-level low-power consumption spoken language assessment and defect diagnosis method
CN101650886A (en) Method for automatically detecting reading errors of language learners
CN110415725B (en) Method and system for evaluating pronunciation quality of second language using first language data
Duan et al. A Preliminary study on ASR-based detection of Chinese mispronunciation by Japanese learners
CN106856095A (en) The voice quality evaluating system that a kind of phonetic is combined into syllables
Hagen et al. Advances in children’s speech recognition within an interactive literacy tutor
Mary et al. Searching speech databases: features, techniques and evaluation measures
Gao et al. Spoken english intelligibility remediation with pocketsphinx alignment and feature extraction improves substantially over the state of the art
Zhao Study on the effectiveness of the asr-based english teaching software in helping college students’ listening learning
Li et al. English sentence pronunciation evaluation using rhythm and intonation
Qin et al. Automatic Detection of Word-Level Reading Errors in Non-native English Speech Based on ASR Output
CN113035237B (en) Voice evaluation method and device and computer equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: ANHUI USTC IFLYTEK CO., LTD.

Free format text: FORMER OWNER: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Effective date: 20120831

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 100080 HAIDIAN, BEIJING TO: 230088 HEFEI, ANHUI PROVINCE

TR01 Transfer of patent right

Effective date of registration: 20120831

Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Patentee after: Anhui USTC iFLYTEK Co., Ltd.

Address before: 100080 Zhongguancun East Road, Beijing, No. 95, No.

Patentee before: Institute of Automation, Chinese Academy of Sciences

C56 Change in the name or address of the patentee

Owner name: IFLYTEK CO., LTD.

Free format text: FORMER NAME: ANHUI USTC IFLYTEK CO., LTD.

CP01 Change in the name or title of a patent holder

Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Patentee after: Iflytek Co., Ltd.

Address before: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Patentee before: Anhui USTC iFLYTEK Co., Ltd.