CN101383103A - Spoken language pronunciation level automatic test method - Google Patents
Spoken language pronunciation level automatic test method Download PDFInfo
- Publication number
- CN101383103A CN101383103A CNA2008101685142A CN200810168514A CN101383103A CN 101383103 A CN101383103 A CN 101383103A CN A2008101685142 A CNA2008101685142 A CN A2008101685142A CN 200810168514 A CN200810168514 A CN 200810168514A CN 101383103 A CN101383103 A CN 101383103A
- Authority
- CN
- China
- Prior art keywords
- segment
- pronunciation
- recording
- model
- standard
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention relates to an automatic test method of a speech pronunciation level, which comprises the following steps: step 1, establishing a human language library of standard pronunciation and correlating saved recording file names with corresponding test texts; step 2, utilizing the human language library of standard pronunciation and the correlated test texts to train a standard syllabic segment model of standard speech; step 3, recording the pronunciation of a testee, establishing a Putonghua speech database, saving recording test questions, and correlating the recording file names with the recording test questions at the same time; step 4, marking pronunciation errors and correct initials, finals and tones; step 5, calculating various evaluation parameters of the speech to be tested to obtain an evaluation result. The invention utilizes a computer to carry out the level test, the instruction and the learning of Putonghua and utilizes a computer intelligent speech processing technology to realize the automatic accurate evaluation of the speech of a learner.
Description
The present invention is that application number is: 200610038588.5, and the applying date is: on February 28th, 2006, invention and created name is: the dividing an application of method that the utilization computing machine carries out PSC and guidance learning.
Technical field
The present invention relates to computer speech signal Processing field.
Background technology
PSC is the important method of the work of popularizing Beijing pronunciation, and is to make the work of popularizing Beijing pronunciation progressively go on the important behave of scientific, standardization, institutionalization." People's Republic of China (PRC) country general purpose language literal method " regulation that the 18 meeting of Standing Committee of the National People's Congress on October 31 in 2000 passed through must be participated in PSC as announcer, host and video display actor, teacher, the state personnel of working language and reach the classification standard of national regulation with mandarin.
The mode of artificial scoring is all adopted in the PSC work of carrying out at present, 2-3 name test man need be arranged in general each mandarin test examination hall, by the test man examinee's one's voice in speech is marked, 30 people can only be tested in one day in an examination hall, a thousands of people's PSC usually needs to organize at short notice up to a hundred test mans, not only waste time and energy, the testing cost height, and also efficient is low.Because all adopt artificial mode to mark, test man's subjectivity is very strong, has the fairness problem of test result to a certain extent.
Therefore, in conjunction with development of modern computer technology, research and development how with advanced person's information technology application in PSC, replacing whole or part substitute artificial evaluation and test person, thereby remedy the deficiency of traditional PSC method, all be significant for the fairness, the efficient aspect that cost such as use manpower and material resources sparingly and improve test.
The systematic research of computing machine PSC itself is very high for the requirement of aspects such as basic resource storehouse, cooperation interdisciplinary, has bigger difficulty.Just because of the guidance and the broad co-operation that lack system, the mode of computing machine realizes that PSC never obtains researchist's concern for a long time.
Summary of the invention
The present invention is directed to the deficiencies in the prior art, a kind of method of using computing machine to carry out PSC and guidance learning is provided, appliance computer intelligent sound treatment technology carries out accurately, evaluates and tests automatically learner's pronunciation.
The present invention is achieved by the following technical solutions:
Spoken language pronunciation level automatic test method is characterized in that may further comprise the steps realization:
(1) sets up Received Pronunciation people's corpus of phoneme balance according to Received Pronunciation people's pronunciation recording; Carry out related with corresponding test text the preservation filename of recording;
(2) text that uses Received Pronunciation people's corpus and be associated, the standard pronunciation segment model of training standard voice, the training step of standard pronunciation segment model is:
(a) divide frame with voice, frame length: 250ms, frame moves: 10ms, and Mel cepstrum coefficient (MFCC) parameter of computing voice frame by frame;
(b) hidden Markov model based on the MFCC parameter (HMM) of the various segments of training;
Utilization Received Pronunciation people's corpus and the text that is associated thereof, the concert pitch model of training standard voice, the training step of concert pitch model is:
(a) base frequency parameters of computing voice frame by frame;
(b) hidden Markov model based on base frequency parameters of the various tones of training;
(3) testee's pronunciation is recorded set up the mandarin pronunciation storehouse, preserve the recording examination question simultaneously, carry out related with the examination question of recording the recording file name;
(4) mark mispronounce, and identify correct initial consonant, simple or compound vowel of a Chinese syllable and tone;
(5) calculate frame by frame the recording examination question MFCC parameter, base frequency parameters and resonance peak; To carry out cutting to segment with the standard pronunciation segment model to the MFCC parameter of recording according to the corresponding text of recording, obtain the cutting likelihood score value of each segment relative standard model simultaneously; To arrive segment with the concert pitch model to the base frequency parameters information cutting of recording according to the corresponding text of recording, obtain the cutting likelihood score value of each tone relative standard model simultaneously; Calculate each segment duration according to the cutting of segment, the female duration ratio of sound, segments such as the resonance peak sequence of each segment evaluation and test parameter; With standard pronunciation segment model and pitch model segment identification and pitch recognition are carried out in recording, obtain recognition result and corresponding segment identification likelihood score and pitch recognition likelihood score; The cutting likelihood score of segment and tone is obtained the likelihood ratio of segment and tone divided by the identification likelihood score; With each segment duration, the female duration ratio of sound, the likelihood score of segment, the likelihood ratio of segment, the likelihood score of tone, the likelihood ratio of tone is carried out the regular evaluation result that obtains at the testee.
Judge the quality of computer testing system evaluation and test performance, here according to the requirement of PSC outline, mark absolute error average and grade concordance rate average between the result of computing machine test and three the artificial test results are judged.Between artificial, also there is error simultaneously, therefore contrasting as machine and manual testing's performance with mark absolute error average between three artificial test results and grade concordance rate average.By statistics, three bit test persons scoring absolute error between any two is average on test man's marking database, is being distributed on the different data sets between 1.5 ~ 2.5 minutes, and the grade concordance rate is distributed in 0.8 ~ 0.85.The total points average error of computer testing system evaluation and test is in 2 minutes, and the grade concordance rate between the test man reaches more than 0.8, so the computer testing effect reaches the level approaching with the manual testing substantially.The effect of computing machine guidance learning part is not only can substitute making a comment or criticism of part language teacher, in generation, read, and specific aim is set a question etc., the physical parameter of the pronunciation of pointing out the learner that can also vividerization, as waveform, frequency spectrum, and it and reference waveform, frequency spectrum compared.
Embodiment
Concrete implementation method step is as follows:
1, the foundation of Received Pronunciation people corpus:
1) divides sex, seek a collection of suitable Received Pronunciation people by PSC and study crowd's age distribution;
2) the phoneme balance principle design recording text that requires according to the PSC outline;
3) arrange the Received Pronunciation people to carry out recording work, the preservation filename of recording carries out related with corresponding text;
2, the foundation of received pronunciation model: comprise the foundation of segment model and pitch model.
Utilization Received Pronunciation people's corpus and the text that is associated thereof, the segment model of training standard voice can be phoneme, syllable-based hmm, also can be context-sensitive phoneme, syllable-based hmm, the training step of model is:
1) voice are divided frame (frame length: 250ms, frame moves: 10ms), and Mel cepstrum coefficient (MFCC) parameter of computing voice frame by frame;
2) hidden Markov model based on the MFCC parameter (HMM) of the various segments of training.
Utilization Received Pronunciation people's corpus and the text that is associated thereof, the pitch model of training standard voice can be simple four tones of standard Chinese pronunciation pitch model, also can be to transfer and the relevant pitch model of simple or compound vowel of a Chinese syllable with front and back, the training step of model is:
1) base frequency parameters of computing voice frame by frame;
2) hidden Markov model based on base frequency parameters of the various tones of training.
3, the collection of general mandarin corpus: at the PSC scene, examinee's examination pronunciation is recorded, preserve the recording examination question simultaneously, and carry out related with the recording examination question recording file name.
4, the mark of general mandarin language material: every part of mandarin live recording is independently given a mark, whether carrying a tune of initial consonant, simple or compound vowel of a Chinese syllable and the tone of each word during mark is recorded in detail, and each incorrect initial consonant, simple or compound vowel of a Chinese syllable and tone are identified its corresponding initial consonant, simple or compound vowel of a Chinese syllable and tone.
5, calculate every evaluation and test parameter of voice to be measured:
1) calculates MFCC parameter, base frequency parameters and the resonance peak of voice frame by frame
2) will carry out cutting to segment with the standard pronunciation segment model to the MFCC parameter of voice according to the text of voice correspondence, obtain the cutting likelihood score value of each segment relative standard model simultaneously
3) use the concert pitch model will arrive segment to the base frequency parameters phonetic segmentation of voice, obtain the cutting likelihood score value of each tone relative standard model simultaneously according to the text of voice correspondence
4) calculate each segment duration according to the cutting of segment, the female duration ratio of sound, segments such as the resonance peak sequence of each segment evaluation and test parameter
5) with standard pronunciation segment model and pitch model voice are carried out segment identification and pitch recognition, obtain recognition result and corresponding segment identification likelihood score and pitch recognition likelihood score
6) the cutting likelihood score of segment and tone is obtained the likelihood ratio of segment and tone divided by the identification likelihood score
7) with each segment duration, the female duration ratio of sound, the likelihood score of segment, the likelihood ratio of segment, the likelihood score of tone, that the likelihood ratio of tone is carried out is regular, and (the different content measurements such as speak of can reading aloud by word, speech, short essay, assign a topic carry out regularly, also can be undertaken regular by all pronunciations) obtains a group or a (carrying out regular corresponding to all pronunciations) evaluation and test parameter at certain examinee.
Claims (1)
1, spoken language pronunciation level automatic test method is characterized in that may further comprise the steps realization:
(1) sets up Received Pronunciation people's corpus of phoneme balance according to Received Pronunciation people's pronunciation recording; Carry out related with corresponding test text the preservation filename of recording;
(2) text that uses Received Pronunciation people's corpus and be associated, the standard pronunciation segment model of training standard voice, the training step of standard pronunciation segment model is:
(a) divide frame with voice, frame length: 250ms, frame moves: 10ms, and Mel cepstrum coefficient (MFCC) parameter of computing voice frame by frame;
(b) hidden Markov model based on the MFCC parameter (HMM) of the various segments of training;
Utilization Received Pronunciation people's corpus and the text that is associated thereof, the concert pitch model of training standard voice, the training step of concert pitch model is:
(a) base frequency parameters of computing voice frame by frame;
(b) hidden Markov model based on base frequency parameters of the various tones of training;
(3) testee's pronunciation is recorded set up the mandarin pronunciation storehouse, preserve the recording examination question simultaneously, carry out related with the examination question of recording the recording file name;
(4) mark mispronounce, and identify correct initial consonant, simple or compound vowel of a Chinese syllable and tone;
(5) calculate frame by frame the recording examination question MFCC parameter, base frequency parameters and resonance peak; To carry out cutting to segment with the standard pronunciation segment model to the MFCC parameter of recording according to the text of voice correspondence, obtain the cutting likelihood score value of each segment relative standard model simultaneously; To arrive segment with the concert pitch model to the base frequency parameters information cutting of recording according to the corresponding text of recording, obtain the cutting likelihood score value of each tone relative standard model simultaneously; Calculate each segment duration according to the cutting of segment, the female duration ratio of sound, segments such as the resonance peak sequence of each segment evaluation and test parameter; With standard pronunciation segment model and pitch model segment identification and pitch recognition are carried out in recording, obtain recognition result and corresponding segment identification likelihood score and pitch recognition likelihood score; The cutting likelihood score of segment and tone is obtained the likelihood ratio of segment and tone divided by the identification likelihood score; With each segment duration, the female duration ratio of sound, the likelihood score of segment, the likelihood ratio of segment, the likelihood score of tone, the likelihood ratio of tone is carried out the regular evaluation result that obtains at the testee.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNA2008101685142A CN101383103A (en) | 2006-02-28 | 2006-02-28 | Spoken language pronunciation level automatic test method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNA2008101685142A CN101383103A (en) | 2006-02-28 | 2006-02-28 | Spoken language pronunciation level automatic test method |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 200610038588 Division CN1815522A (en) | 2006-02-28 | 2006-02-28 | Method for testing mandarin level and guiding learning using computer |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101383103A true CN101383103A (en) | 2009-03-11 |
Family
ID=40462922
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNA2008101685142A Pending CN101383103A (en) | 2006-02-28 | 2006-02-28 | Spoken language pronunciation level automatic test method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101383103A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2337006A1 (en) | 2009-11-24 | 2011-06-22 | Kai Yu | Speech processing and learning |
CN102163428A (en) * | 2011-01-19 | 2011-08-24 | 无敌科技(西安)有限公司 | Method for judging Chinese pronunciation |
CN102568475A (en) * | 2011-12-31 | 2012-07-11 | 安徽科大讯飞信息科技股份有限公司 | System and method for assessing proficiency in Putonghua |
CN103366759A (en) * | 2012-03-29 | 2013-10-23 | 北京中传天籁数字技术有限公司 | Speech data evaluation method and speech data evaluation device |
CN104505089A (en) * | 2014-12-17 | 2015-04-08 | 福建网龙计算机网络信息技术有限公司 | Method and equipment for oral error correction |
CN104575490A (en) * | 2014-12-30 | 2015-04-29 | 苏州驰声信息科技有限公司 | Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm |
CN104834393A (en) * | 2015-06-04 | 2015-08-12 | 携程计算机技术(上海)有限公司 | Automatic testing device and system |
CN105609114A (en) * | 2014-11-25 | 2016-05-25 | 科大讯飞股份有限公司 | Method and device for detecting pronunciation |
CN106971743A (en) * | 2016-01-14 | 2017-07-21 | 广州酷狗计算机科技有限公司 | User's singing data treating method and apparatus |
CN107092606A (en) * | 2016-02-18 | 2017-08-25 | 腾讯科技(深圳)有限公司 | A kind of searching method, device and server |
CN108109633A (en) * | 2017-12-20 | 2018-06-01 | 北京声智科技有限公司 | The System and method for of unattended high in the clouds sound bank acquisition and intellectual product test |
CN111899576A (en) * | 2020-07-23 | 2020-11-06 | 腾讯科技(深圳)有限公司 | Control method and device for pronunciation test application, storage medium and electronic equipment |
CN112967736A (en) * | 2021-03-02 | 2021-06-15 | 厦门快商通科技股份有限公司 | Pronunciation quality detection method, system, mobile terminal and storage medium |
CN113053409A (en) * | 2021-03-12 | 2021-06-29 | 科大讯飞股份有限公司 | Audio evaluation method and device |
CN113421467A (en) * | 2021-06-15 | 2021-09-21 | 读书郎教育科技有限公司 | System and method for assisting in learning pinyin spelling and reading |
-
2006
- 2006-02-28 CN CNA2008101685142A patent/CN101383103A/en active Pending
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2337006A1 (en) | 2009-11-24 | 2011-06-22 | Kai Yu | Speech processing and learning |
CN102163428A (en) * | 2011-01-19 | 2011-08-24 | 无敌科技(西安)有限公司 | Method for judging Chinese pronunciation |
CN102568475A (en) * | 2011-12-31 | 2012-07-11 | 安徽科大讯飞信息科技股份有限公司 | System and method for assessing proficiency in Putonghua |
CN102568475B (en) * | 2011-12-31 | 2014-11-26 | 安徽科大讯飞信息科技股份有限公司 | System and method for assessing proficiency in Putonghua |
CN103366759A (en) * | 2012-03-29 | 2013-10-23 | 北京中传天籁数字技术有限公司 | Speech data evaluation method and speech data evaluation device |
CN105609114A (en) * | 2014-11-25 | 2016-05-25 | 科大讯飞股份有限公司 | Method and device for detecting pronunciation |
CN105609114B (en) * | 2014-11-25 | 2019-11-15 | 科大讯飞股份有限公司 | A kind of pronunciation detection method and device |
CN104505089A (en) * | 2014-12-17 | 2015-04-08 | 福建网龙计算机网络信息技术有限公司 | Method and equipment for oral error correction |
CN104575490A (en) * | 2014-12-30 | 2015-04-29 | 苏州驰声信息科技有限公司 | Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm |
CN104575490B (en) * | 2014-12-30 | 2017-11-07 | 苏州驰声信息科技有限公司 | Spoken language pronunciation evaluating method based on deep neural network posterior probability algorithm |
CN104834393A (en) * | 2015-06-04 | 2015-08-12 | 携程计算机技术(上海)有限公司 | Automatic testing device and system |
CN106971743A (en) * | 2016-01-14 | 2017-07-21 | 广州酷狗计算机科技有限公司 | User's singing data treating method and apparatus |
CN106971743B (en) * | 2016-01-14 | 2020-07-24 | 广州酷狗计算机科技有限公司 | User singing data processing method and device |
CN107092606A (en) * | 2016-02-18 | 2017-08-25 | 腾讯科技(深圳)有限公司 | A kind of searching method, device and server |
CN108109633A (en) * | 2017-12-20 | 2018-06-01 | 北京声智科技有限公司 | The System and method for of unattended high in the clouds sound bank acquisition and intellectual product test |
CN111899576A (en) * | 2020-07-23 | 2020-11-06 | 腾讯科技(深圳)有限公司 | Control method and device for pronunciation test application, storage medium and electronic equipment |
CN112967736A (en) * | 2021-03-02 | 2021-06-15 | 厦门快商通科技股份有限公司 | Pronunciation quality detection method, system, mobile terminal and storage medium |
CN113053409A (en) * | 2021-03-12 | 2021-06-29 | 科大讯飞股份有限公司 | Audio evaluation method and device |
CN113053409B (en) * | 2021-03-12 | 2024-04-12 | 科大讯飞股份有限公司 | Audio evaluation method and device |
CN113421467A (en) * | 2021-06-15 | 2021-09-21 | 读书郎教育科技有限公司 | System and method for assisting in learning pinyin spelling and reading |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101383103A (en) | Spoken language pronunciation level automatic test method | |
CN103065626B (en) | Automatic grading method and automatic grading equipment for read questions in test of spoken English | |
CN1815522A (en) | Method for testing mandarin level and guiding learning using computer | |
Duan et al. | The NUS sung and spoken lyrics corpus: A quantitative comparison of singing and speech | |
CN103177733B (en) | Standard Chinese suffixation of a nonsyllabic "r" sound voice quality evaluating method and system | |
CN100514446C (en) | Pronunciation evaluating method based on voice identification and voice analysis | |
CN1835076B (en) | Speech evaluating method of integrally operating speech identification, phonetics knowledge and Chinese dialect analysis | |
KR20180137207A (en) | A new method for automatic evaluation of English speaking tests | |
CN101197084A (en) | Automatic spoken English evaluating and learning system | |
CN102214462A (en) | Method and system for estimating pronunciation | |
CN101650886A (en) | Method for automatically detecting reading errors of language learners | |
CN110047474A (en) | A kind of English phonetic pronunciation intelligent training system and training method | |
Mairano et al. | Acoustic distances, Pillai scores and LDA classification scores as metrics of L2 comprehensibility and nativelikeness | |
Chen et al. | iCALL corpus: Mandarin Chinese spoken by non-native speakers of European descent. | |
CN103366735A (en) | A voice data mapping method and apparatus | |
Duan et al. | A Preliminary study on ASR-based detection of Chinese mispronunciation by Japanese learners | |
Tamburini | Automatic prosodic prominence detection in speech using acoustic features: an unsupervised system. | |
Toledano et al. | Trying to mimic human segmentation of speech using HMM and fuzzy logic post-correction rules | |
Koudounas et al. | Italic: An italian intent classification dataset | |
Ballier et al. | Developing corpus interoperability for phonetic investigation of learner corpora | |
Lin et al. | Improving L2 English rhythm evaluation with automatic sentence stress detection | |
Tamburini | Automatic prominence identification and prosodic typology. | |
CN110246514A (en) | A kind of English word word pronunciation learning system based on pattern-recognition | |
CN109165836A (en) | The processing of lyrics pronunciation and assessment method and system in a kind of singing marking | |
Li et al. | Tone Variations in Regionally Accented Mandarin. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Open date: 20090311 |