CN101383103A - Spoken language pronunciation level automatic test method - Google Patents

Spoken language pronunciation level automatic test method Download PDF

Info

Publication number
CN101383103A
CN101383103A CNA2008101685142A CN200810168514A CN101383103A CN 101383103 A CN101383103 A CN 101383103A CN A2008101685142 A CNA2008101685142 A CN A2008101685142A CN 200810168514 A CN200810168514 A CN 200810168514A CN 101383103 A CN101383103 A CN 101383103A
Authority
CN
China
Prior art keywords
segment
pronunciation
recording
model
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2008101685142A
Other languages
Chinese (zh)
Inventor
刘庆升
魏思
易中华
吴晓如
王仁华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZHONGKEDA XUNFEI INFORMATION SCIENCE & TECHNOLOGY Co Ltd ANHUI PROV
iFlytek Co Ltd
Original Assignee
ZHONGKEDA XUNFEI INFORMATION SCIENCE & TECHNOLOGY Co Ltd ANHUI PROV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZHONGKEDA XUNFEI INFORMATION SCIENCE & TECHNOLOGY Co Ltd ANHUI PROV filed Critical ZHONGKEDA XUNFEI INFORMATION SCIENCE & TECHNOLOGY Co Ltd ANHUI PROV
Priority to CNA2008101685142A priority Critical patent/CN101383103A/en
Publication of CN101383103A publication Critical patent/CN101383103A/en
Pending legal-status Critical Current

Links

Landscapes

  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention relates to an automatic test method of a speech pronunciation level, which comprises the following steps: step 1, establishing a human language library of standard pronunciation and correlating saved recording file names with corresponding test texts; step 2, utilizing the human language library of standard pronunciation and the correlated test texts to train a standard syllabic segment model of standard speech; step 3, recording the pronunciation of a testee, establishing a Putonghua speech database, saving recording test questions, and correlating the recording file names with the recording test questions at the same time; step 4, marking pronunciation errors and correct initials, finals and tones; step 5, calculating various evaluation parameters of the speech to be tested to obtain an evaluation result. The invention utilizes a computer to carry out the level test, the instruction and the learning of Putonghua and utilizes a computer intelligent speech processing technology to realize the automatic accurate evaluation of the speech of a learner.

Description

Spoken language pronunciation level automatic test method
The present invention is that application number is: 200610038588.5, and the applying date is: on February 28th, 2006, invention and created name is: the dividing an application of method that the utilization computing machine carries out PSC and guidance learning.
Technical field
The present invention relates to computer speech signal Processing field.
Background technology
PSC is the important method of the work of popularizing Beijing pronunciation, and is to make the work of popularizing Beijing pronunciation progressively go on the important behave of scientific, standardization, institutionalization." People's Republic of China (PRC) country general purpose language literal method " regulation that the 18 meeting of Standing Committee of the National People's Congress on October 31 in 2000 passed through must be participated in PSC as announcer, host and video display actor, teacher, the state personnel of working language and reach the classification standard of national regulation with mandarin.
The mode of artificial scoring is all adopted in the PSC work of carrying out at present, 2-3 name test man need be arranged in general each mandarin test examination hall, by the test man examinee's one's voice in speech is marked, 30 people can only be tested in one day in an examination hall, a thousands of people's PSC usually needs to organize at short notice up to a hundred test mans, not only waste time and energy, the testing cost height, and also efficient is low.Because all adopt artificial mode to mark, test man's subjectivity is very strong, has the fairness problem of test result to a certain extent.
Therefore, in conjunction with development of modern computer technology, research and development how with advanced person's information technology application in PSC, replacing whole or part substitute artificial evaluation and test person, thereby remedy the deficiency of traditional PSC method, all be significant for the fairness, the efficient aspect that cost such as use manpower and material resources sparingly and improve test.
The systematic research of computing machine PSC itself is very high for the requirement of aspects such as basic resource storehouse, cooperation interdisciplinary, has bigger difficulty.Just because of the guidance and the broad co-operation that lack system, the mode of computing machine realizes that PSC never obtains researchist's concern for a long time.
Summary of the invention
The present invention is directed to the deficiencies in the prior art, a kind of method of using computing machine to carry out PSC and guidance learning is provided, appliance computer intelligent sound treatment technology carries out accurately, evaluates and tests automatically learner's pronunciation.
The present invention is achieved by the following technical solutions:
Spoken language pronunciation level automatic test method is characterized in that may further comprise the steps realization:
(1) sets up Received Pronunciation people's corpus of phoneme balance according to Received Pronunciation people's pronunciation recording; Carry out related with corresponding test text the preservation filename of recording;
(2) text that uses Received Pronunciation people's corpus and be associated, the standard pronunciation segment model of training standard voice, the training step of standard pronunciation segment model is:
(a) divide frame with voice, frame length: 250ms, frame moves: 10ms, and Mel cepstrum coefficient (MFCC) parameter of computing voice frame by frame;
(b) hidden Markov model based on the MFCC parameter (HMM) of the various segments of training;
Utilization Received Pronunciation people's corpus and the text that is associated thereof, the concert pitch model of training standard voice, the training step of concert pitch model is:
(a) base frequency parameters of computing voice frame by frame;
(b) hidden Markov model based on base frequency parameters of the various tones of training;
(3) testee's pronunciation is recorded set up the mandarin pronunciation storehouse, preserve the recording examination question simultaneously, carry out related with the examination question of recording the recording file name;
(4) mark mispronounce, and identify correct initial consonant, simple or compound vowel of a Chinese syllable and tone;
(5) calculate frame by frame the recording examination question MFCC parameter, base frequency parameters and resonance peak; To carry out cutting to segment with the standard pronunciation segment model to the MFCC parameter of recording according to the corresponding text of recording, obtain the cutting likelihood score value of each segment relative standard model simultaneously; To arrive segment with the concert pitch model to the base frequency parameters information cutting of recording according to the corresponding text of recording, obtain the cutting likelihood score value of each tone relative standard model simultaneously; Calculate each segment duration according to the cutting of segment, the female duration ratio of sound, segments such as the resonance peak sequence of each segment evaluation and test parameter; With standard pronunciation segment model and pitch model segment identification and pitch recognition are carried out in recording, obtain recognition result and corresponding segment identification likelihood score and pitch recognition likelihood score; The cutting likelihood score of segment and tone is obtained the likelihood ratio of segment and tone divided by the identification likelihood score; With each segment duration, the female duration ratio of sound, the likelihood score of segment, the likelihood ratio of segment, the likelihood score of tone, the likelihood ratio of tone is carried out the regular evaluation result that obtains at the testee.
Judge the quality of computer testing system evaluation and test performance, here according to the requirement of PSC outline, mark absolute error average and grade concordance rate average between the result of computing machine test and three the artificial test results are judged.Between artificial, also there is error simultaneously, therefore contrasting as machine and manual testing's performance with mark absolute error average between three artificial test results and grade concordance rate average.By statistics, three bit test persons scoring absolute error between any two is average on test man's marking database, is being distributed on the different data sets between 1.5 ~ 2.5 minutes, and the grade concordance rate is distributed in 0.8 ~ 0.85.The total points average error of computer testing system evaluation and test is in 2 minutes, and the grade concordance rate between the test man reaches more than 0.8, so the computer testing effect reaches the level approaching with the manual testing substantially.The effect of computing machine guidance learning part is not only can substitute making a comment or criticism of part language teacher, in generation, read, and specific aim is set a question etc., the physical parameter of the pronunciation of pointing out the learner that can also vividerization, as waveform, frequency spectrum, and it and reference waveform, frequency spectrum compared.
Embodiment
Concrete implementation method step is as follows:
1, the foundation of Received Pronunciation people corpus:
1) divides sex, seek a collection of suitable Received Pronunciation people by PSC and study crowd's age distribution;
2) the phoneme balance principle design recording text that requires according to the PSC outline;
3) arrange the Received Pronunciation people to carry out recording work, the preservation filename of recording carries out related with corresponding text;
2, the foundation of received pronunciation model: comprise the foundation of segment model and pitch model.
Utilization Received Pronunciation people's corpus and the text that is associated thereof, the segment model of training standard voice can be phoneme, syllable-based hmm, also can be context-sensitive phoneme, syllable-based hmm, the training step of model is:
1) voice are divided frame (frame length: 250ms, frame moves: 10ms), and Mel cepstrum coefficient (MFCC) parameter of computing voice frame by frame;
2) hidden Markov model based on the MFCC parameter (HMM) of the various segments of training.
Utilization Received Pronunciation people's corpus and the text that is associated thereof, the pitch model of training standard voice can be simple four tones of standard Chinese pronunciation pitch model, also can be to transfer and the relevant pitch model of simple or compound vowel of a Chinese syllable with front and back, the training step of model is:
1) base frequency parameters of computing voice frame by frame;
2) hidden Markov model based on base frequency parameters of the various tones of training.
3, the collection of general mandarin corpus: at the PSC scene, examinee's examination pronunciation is recorded, preserve the recording examination question simultaneously, and carry out related with the recording examination question recording file name.
4, the mark of general mandarin language material: every part of mandarin live recording is independently given a mark, whether carrying a tune of initial consonant, simple or compound vowel of a Chinese syllable and the tone of each word during mark is recorded in detail, and each incorrect initial consonant, simple or compound vowel of a Chinese syllable and tone are identified its corresponding initial consonant, simple or compound vowel of a Chinese syllable and tone.
5, calculate every evaluation and test parameter of voice to be measured:
1) calculates MFCC parameter, base frequency parameters and the resonance peak of voice frame by frame
2) will carry out cutting to segment with the standard pronunciation segment model to the MFCC parameter of voice according to the text of voice correspondence, obtain the cutting likelihood score value of each segment relative standard model simultaneously
3) use the concert pitch model will arrive segment to the base frequency parameters phonetic segmentation of voice, obtain the cutting likelihood score value of each tone relative standard model simultaneously according to the text of voice correspondence
4) calculate each segment duration according to the cutting of segment, the female duration ratio of sound, segments such as the resonance peak sequence of each segment evaluation and test parameter
5) with standard pronunciation segment model and pitch model voice are carried out segment identification and pitch recognition, obtain recognition result and corresponding segment identification likelihood score and pitch recognition likelihood score
6) the cutting likelihood score of segment and tone is obtained the likelihood ratio of segment and tone divided by the identification likelihood score
7) with each segment duration, the female duration ratio of sound, the likelihood score of segment, the likelihood ratio of segment, the likelihood score of tone, that the likelihood ratio of tone is carried out is regular, and (the different content measurements such as speak of can reading aloud by word, speech, short essay, assign a topic carry out regularly, also can be undertaken regular by all pronunciations) obtains a group or a (carrying out regular corresponding to all pronunciations) evaluation and test parameter at certain examinee.

Claims (1)

1, spoken language pronunciation level automatic test method is characterized in that may further comprise the steps realization:
(1) sets up Received Pronunciation people's corpus of phoneme balance according to Received Pronunciation people's pronunciation recording; Carry out related with corresponding test text the preservation filename of recording;
(2) text that uses Received Pronunciation people's corpus and be associated, the standard pronunciation segment model of training standard voice, the training step of standard pronunciation segment model is:
(a) divide frame with voice, frame length: 250ms, frame moves: 10ms, and Mel cepstrum coefficient (MFCC) parameter of computing voice frame by frame;
(b) hidden Markov model based on the MFCC parameter (HMM) of the various segments of training;
Utilization Received Pronunciation people's corpus and the text that is associated thereof, the concert pitch model of training standard voice, the training step of concert pitch model is:
(a) base frequency parameters of computing voice frame by frame;
(b) hidden Markov model based on base frequency parameters of the various tones of training;
(3) testee's pronunciation is recorded set up the mandarin pronunciation storehouse, preserve the recording examination question simultaneously, carry out related with the examination question of recording the recording file name;
(4) mark mispronounce, and identify correct initial consonant, simple or compound vowel of a Chinese syllable and tone;
(5) calculate frame by frame the recording examination question MFCC parameter, base frequency parameters and resonance peak; To carry out cutting to segment with the standard pronunciation segment model to the MFCC parameter of recording according to the text of voice correspondence, obtain the cutting likelihood score value of each segment relative standard model simultaneously; To arrive segment with the concert pitch model to the base frequency parameters information cutting of recording according to the corresponding text of recording, obtain the cutting likelihood score value of each tone relative standard model simultaneously; Calculate each segment duration according to the cutting of segment, the female duration ratio of sound, segments such as the resonance peak sequence of each segment evaluation and test parameter; With standard pronunciation segment model and pitch model segment identification and pitch recognition are carried out in recording, obtain recognition result and corresponding segment identification likelihood score and pitch recognition likelihood score; The cutting likelihood score of segment and tone is obtained the likelihood ratio of segment and tone divided by the identification likelihood score; With each segment duration, the female duration ratio of sound, the likelihood score of segment, the likelihood ratio of segment, the likelihood score of tone, the likelihood ratio of tone is carried out the regular evaluation result that obtains at the testee.
CNA2008101685142A 2006-02-28 2006-02-28 Spoken language pronunciation level automatic test method Pending CN101383103A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2008101685142A CN101383103A (en) 2006-02-28 2006-02-28 Spoken language pronunciation level automatic test method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2008101685142A CN101383103A (en) 2006-02-28 2006-02-28 Spoken language pronunciation level automatic test method

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN 200610038588 Division CN1815522A (en) 2006-02-28 2006-02-28 Method for testing mandarin level and guiding learning using computer

Publications (1)

Publication Number Publication Date
CN101383103A true CN101383103A (en) 2009-03-11

Family

ID=40462922

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2008101685142A Pending CN101383103A (en) 2006-02-28 2006-02-28 Spoken language pronunciation level automatic test method

Country Status (1)

Country Link
CN (1) CN101383103A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2337006A1 (en) 2009-11-24 2011-06-22 Kai Yu Speech processing and learning
CN102163428A (en) * 2011-01-19 2011-08-24 无敌科技(西安)有限公司 Method for judging Chinese pronunciation
CN102568475A (en) * 2011-12-31 2012-07-11 安徽科大讯飞信息科技股份有限公司 System and method for assessing proficiency in Putonghua
CN103366759A (en) * 2012-03-29 2013-10-23 北京中传天籁数字技术有限公司 Speech data evaluation method and speech data evaluation device
CN104505089A (en) * 2014-12-17 2015-04-08 福建网龙计算机网络信息技术有限公司 Method and equipment for oral error correction
CN104575490A (en) * 2014-12-30 2015-04-29 苏州驰声信息科技有限公司 Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm
CN104834393A (en) * 2015-06-04 2015-08-12 携程计算机技术(上海)有限公司 Automatic testing device and system
CN105609114A (en) * 2014-11-25 2016-05-25 科大讯飞股份有限公司 Method and device for detecting pronunciation
CN106971743A (en) * 2016-01-14 2017-07-21 广州酷狗计算机科技有限公司 User's singing data treating method and apparatus
CN107092606A (en) * 2016-02-18 2017-08-25 腾讯科技(深圳)有限公司 A kind of searching method, device and server
CN108109633A (en) * 2017-12-20 2018-06-01 北京声智科技有限公司 The System and method for of unattended high in the clouds sound bank acquisition and intellectual product test
CN111899576A (en) * 2020-07-23 2020-11-06 腾讯科技(深圳)有限公司 Control method and device for pronunciation test application, storage medium and electronic equipment
CN112967736A (en) * 2021-03-02 2021-06-15 厦门快商通科技股份有限公司 Pronunciation quality detection method, system, mobile terminal and storage medium
CN113053409A (en) * 2021-03-12 2021-06-29 科大讯飞股份有限公司 Audio evaluation method and device
CN113421467A (en) * 2021-06-15 2021-09-21 读书郎教育科技有限公司 System and method for assisting in learning pinyin spelling and reading

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2337006A1 (en) 2009-11-24 2011-06-22 Kai Yu Speech processing and learning
CN102163428A (en) * 2011-01-19 2011-08-24 无敌科技(西安)有限公司 Method for judging Chinese pronunciation
CN102568475A (en) * 2011-12-31 2012-07-11 安徽科大讯飞信息科技股份有限公司 System and method for assessing proficiency in Putonghua
CN102568475B (en) * 2011-12-31 2014-11-26 安徽科大讯飞信息科技股份有限公司 System and method for assessing proficiency in Putonghua
CN103366759A (en) * 2012-03-29 2013-10-23 北京中传天籁数字技术有限公司 Speech data evaluation method and speech data evaluation device
CN105609114A (en) * 2014-11-25 2016-05-25 科大讯飞股份有限公司 Method and device for detecting pronunciation
CN105609114B (en) * 2014-11-25 2019-11-15 科大讯飞股份有限公司 A kind of pronunciation detection method and device
CN104505089A (en) * 2014-12-17 2015-04-08 福建网龙计算机网络信息技术有限公司 Method and equipment for oral error correction
CN104575490A (en) * 2014-12-30 2015-04-29 苏州驰声信息科技有限公司 Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm
CN104575490B (en) * 2014-12-30 2017-11-07 苏州驰声信息科技有限公司 Spoken language pronunciation evaluating method based on deep neural network posterior probability algorithm
CN104834393A (en) * 2015-06-04 2015-08-12 携程计算机技术(上海)有限公司 Automatic testing device and system
CN106971743A (en) * 2016-01-14 2017-07-21 广州酷狗计算机科技有限公司 User's singing data treating method and apparatus
CN106971743B (en) * 2016-01-14 2020-07-24 广州酷狗计算机科技有限公司 User singing data processing method and device
CN107092606A (en) * 2016-02-18 2017-08-25 腾讯科技(深圳)有限公司 A kind of searching method, device and server
CN108109633A (en) * 2017-12-20 2018-06-01 北京声智科技有限公司 The System and method for of unattended high in the clouds sound bank acquisition and intellectual product test
CN111899576A (en) * 2020-07-23 2020-11-06 腾讯科技(深圳)有限公司 Control method and device for pronunciation test application, storage medium and electronic equipment
CN112967736A (en) * 2021-03-02 2021-06-15 厦门快商通科技股份有限公司 Pronunciation quality detection method, system, mobile terminal and storage medium
CN113053409A (en) * 2021-03-12 2021-06-29 科大讯飞股份有限公司 Audio evaluation method and device
CN113053409B (en) * 2021-03-12 2024-04-12 科大讯飞股份有限公司 Audio evaluation method and device
CN113421467A (en) * 2021-06-15 2021-09-21 读书郎教育科技有限公司 System and method for assisting in learning pinyin spelling and reading

Similar Documents

Publication Publication Date Title
CN101383103A (en) Spoken language pronunciation level automatic test method
CN103065626B (en) Automatic grading method and automatic grading equipment for read questions in test of spoken English
CN1815522A (en) Method for testing mandarin level and guiding learning using computer
Duan et al. The NUS sung and spoken lyrics corpus: A quantitative comparison of singing and speech
CN103177733B (en) Standard Chinese suffixation of a nonsyllabic "r" sound voice quality evaluating method and system
CN100514446C (en) Pronunciation evaluating method based on voice identification and voice analysis
CN1835076B (en) Speech evaluating method of integrally operating speech identification, phonetics knowledge and Chinese dialect analysis
KR20180137207A (en) A new method for automatic evaluation of English speaking tests
CN101197084A (en) Automatic spoken English evaluating and learning system
CN102214462A (en) Method and system for estimating pronunciation
CN101650886A (en) Method for automatically detecting reading errors of language learners
CN110047474A (en) A kind of English phonetic pronunciation intelligent training system and training method
Mairano et al. Acoustic distances, Pillai scores and LDA classification scores as metrics of L2 comprehensibility and nativelikeness
Chen et al. iCALL corpus: Mandarin Chinese spoken by non-native speakers of European descent.
CN103366735A (en) A voice data mapping method and apparatus
Duan et al. A Preliminary study on ASR-based detection of Chinese mispronunciation by Japanese learners
Tamburini Automatic prosodic prominence detection in speech using acoustic features: an unsupervised system.
Toledano et al. Trying to mimic human segmentation of speech using HMM and fuzzy logic post-correction rules
Koudounas et al. Italic: An italian intent classification dataset
Ballier et al. Developing corpus interoperability for phonetic investigation of learner corpora
Lin et al. Improving L2 English rhythm evaluation with automatic sentence stress detection
Tamburini Automatic prominence identification and prosodic typology.
CN110246514A (en) A kind of English word word pronunciation learning system based on pattern-recognition
CN109165836A (en) The processing of lyrics pronunciation and assessment method and system in a kind of singing marking
Li et al. Tone Variations in Regionally Accented Mandarin.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20090311