CN101383103A

CN101383103A - Spoken language pronunciation level automatic test method

Info

Publication number: CN101383103A
Application number: CNA2008101685142A
Authority: CN
Inventors: 刘庆升; 魏思; 易中华; 吴晓如; 王仁华
Original assignee: ZHONGKEDA XUNFEI INFORMATION SCIENCE & TECHNOLOGY Co Ltd ANHUI PROV
Current assignee: ZHONGKEDA XUNFEI INFORMATION SCIENCE & TECHNOLOGY Co Ltd ANHUI PROV; iFlytek Co Ltd
Priority date: 2006-02-28
Filing date: 2006-02-28
Publication date: 2009-03-11

Abstract

The invention relates to an automatic test method of a speech pronunciation level, which comprises the following steps: step 1, establishing a human language library of standard pronunciation and correlating saved recording file names with corresponding test texts; step 2, utilizing the human language library of standard pronunciation and the correlated test texts to train a standard syllabic segment model of standard speech; step 3, recording the pronunciation of a testee, establishing a Putonghua speech database, saving recording test questions, and correlating the recording file names with the recording test questions at the same time; step 4, marking pronunciation errors and correct initials, finals and tones; step 5, calculating various evaluation parameters of the speech to be tested to obtain an evaluation result. The invention utilizes a computer to carry out the level test, the instruction and the learning of Putonghua and utilizes a computer intelligent speech processing technology to realize the automatic accurate evaluation of the speech of a learner.

Description

Spoken language pronunciation level automatic test method

The present invention is that application number is: 200610038588.5, and the applying date is: on February 28th, 2006, invention and created name is: the dividing an application of method that the utilization computing machine carries out PSC and guidance learning.

Technical field

The present invention relates to computer speech signal Processing field.

Background technology

PSC is the important method of the work of popularizing Beijing pronunciation, and is to make the work of popularizing Beijing pronunciation progressively go on the important behave of scientific, standardization, institutionalization." People's Republic of China (PRC) country general purpose language literal method " regulation that the 18 meeting of Standing Committee of the National People's Congress on October 31 in 2000 passed through must be participated in PSC as announcer, host and video display actor, teacher, the state personnel of working language and reach the classification standard of national regulation with mandarin.

The mode of artificial scoring is all adopted in the PSC work of carrying out at present, 2-3 name test man need be arranged in general each mandarin test examination hall, by the test man examinee's one's voice in speech is marked, 30 people can only be tested in one day in an examination hall, a thousands of people's PSC usually needs to organize at short notice up to a hundred test mans, not only waste time and energy, the testing cost height, and also efficient is low.Because all adopt artificial mode to mark, test man's subjectivity is very strong, has the fairness problem of test result to a certain extent.

Therefore, in conjunction with development of modern computer technology, research and development how with advanced person's information technology application in PSC, replacing whole or part substitute artificial evaluation and test person, thereby remedy the deficiency of traditional PSC method, all be significant for the fairness, the efficient aspect that cost such as use manpower and material resources sparingly and improve test.

The systematic research of computing machine PSC itself is very high for the requirement of aspects such as basic resource storehouse, cooperation interdisciplinary, has bigger difficulty.Just because of the guidance and the broad co-operation that lack system, the mode of computing machine realizes that PSC never obtains researchist's concern for a long time.

Summary of the invention

The present invention is directed to the deficiencies in the prior art, a kind of method of using computing machine to carry out PSC and guidance learning is provided, appliance computer intelligent sound treatment technology carries out accurately, evaluates and tests automatically learner's pronunciation.

The present invention is achieved by the following technical solutions:

Spoken language pronunciation level automatic test method is characterized in that may further comprise the steps realization:

(1) sets up Received Pronunciation people's corpus of phoneme balance according to Received Pronunciation people's pronunciation recording; Carry out related with corresponding test text the preservation filename of recording;

(2) text that uses Received Pronunciation people's corpus and be associated, the standard pronunciation segment model of training standard voice, the training step of standard pronunciation segment model is:

(a) divide frame with voice, frame length: 250ms, frame moves: 10ms, and Mel cepstrum coefficient (MFCC) parameter of computing voice frame by frame;

(b) hidden Markov model based on the MFCC parameter (HMM) of the various segments of training;

Utilization Received Pronunciation people's corpus and the text that is associated thereof, the concert pitch model of training standard voice, the training step of concert pitch model is:

(a) base frequency parameters of computing voice frame by frame;

(b) hidden Markov model based on base frequency parameters of the various tones of training;

(3) testee's pronunciation is recorded set up the mandarin pronunciation storehouse, preserve the recording examination question simultaneously, carry out related with the examination question of recording the recording file name;

(4) mark mispronounce, and identify correct initial consonant, simple or compound vowel of a Chinese syllable and tone;

(5) calculate frame by frame the recording examination question MFCC parameter, base frequency parameters and resonance peak; To carry out cutting to segment with the standard pronunciation segment model to the MFCC parameter of recording according to the corresponding text of recording, obtain the cutting likelihood score value of each segment relative standard model simultaneously; To arrive segment with the concert pitch model to the base frequency parameters information cutting of recording according to the corresponding text of recording, obtain the cutting likelihood score value of each tone relative standard model simultaneously; Calculate each segment duration according to the cutting of segment, the female duration ratio of sound, segments such as the resonance peak sequence of each segment evaluation and test parameter; With standard pronunciation segment model and pitch model segment identification and pitch recognition are carried out in recording, obtain recognition result and corresponding segment identification likelihood score and pitch recognition likelihood score; The cutting likelihood score of segment and tone is obtained the likelihood ratio of segment and tone divided by the identification likelihood score; With each segment duration, the female duration ratio of sound, the likelihood score of segment, the likelihood ratio of segment, the likelihood score of tone, the likelihood ratio of tone is carried out the regular evaluation result that obtains at the testee.

Judge the quality of computer testing system evaluation and test performance, here according to the requirement of PSC outline, mark absolute error average and grade concordance rate average between the result of computing machine test and three the artificial test results are judged.Between artificial, also there is error simultaneously, therefore contrasting as machine and manual testing's performance with mark absolute error average between three artificial test results and grade concordance rate average.By statistics, three bit test persons scoring absolute error between any two is average on test man's marking database, is being distributed on the different data sets between 1.5 ~ 2.5 minutes, and the grade concordance rate is distributed in 0.8 ~ 0.85.The total points average error of computer testing system evaluation and test is in 2 minutes, and the grade concordance rate between the test man reaches more than 0.8, so the computer testing effect reaches the level approaching with the manual testing substantially.The effect of computing machine guidance learning part is not only can substitute making a comment or criticism of part language teacher, in generation, read, and specific aim is set a question etc., the physical parameter of the pronunciation of pointing out the learner that can also vividerization, as waveform, frequency spectrum, and it and reference waveform, frequency spectrum compared.

Embodiment

Concrete implementation method step is as follows:

1, the foundation of Received Pronunciation people corpus:

1) divides sex, seek a collection of suitable Received Pronunciation people by PSC and study crowd's age distribution;

2) the phoneme balance principle design recording text that requires according to the PSC outline;

3) arrange the Received Pronunciation people to carry out recording work, the preservation filename of recording carries out related with corresponding text;

2, the foundation of received pronunciation model: comprise the foundation of segment model and pitch model.

Utilization Received Pronunciation people's corpus and the text that is associated thereof, the segment model of training standard voice can be phoneme, syllable-based hmm, also can be context-sensitive phoneme, syllable-based hmm, the training step of model is:

1) voice are divided frame (frame length: 250ms, frame moves: 10ms), and Mel cepstrum coefficient (MFCC) parameter of computing voice frame by frame;

2) hidden Markov model based on the MFCC parameter (HMM) of the various segments of training.

Utilization Received Pronunciation people's corpus and the text that is associated thereof, the pitch model of training standard voice can be simple four tones of standard Chinese pronunciation pitch model, also can be to transfer and the relevant pitch model of simple or compound vowel of a Chinese syllable with front and back, the training step of model is:

1) base frequency parameters of computing voice frame by frame;

2) hidden Markov model based on base frequency parameters of the various tones of training.

3, the collection of general mandarin corpus: at the PSC scene, examinee's examination pronunciation is recorded, preserve the recording examination question simultaneously, and carry out related with the recording examination question recording file name.

4, the mark of general mandarin language material: every part of mandarin live recording is independently given a mark, whether carrying a tune of initial consonant, simple or compound vowel of a Chinese syllable and the tone of each word during mark is recorded in detail, and each incorrect initial consonant, simple or compound vowel of a Chinese syllable and tone are identified its corresponding initial consonant, simple or compound vowel of a Chinese syllable and tone.

5, calculate every evaluation and test parameter of voice to be measured:

1) calculates MFCC parameter, base frequency parameters and the resonance peak of voice frame by frame

2) will carry out cutting to segment with the standard pronunciation segment model to the MFCC parameter of voice according to the text of voice correspondence, obtain the cutting likelihood score value of each segment relative standard model simultaneously

3) use the concert pitch model will arrive segment to the base frequency parameters phonetic segmentation of voice, obtain the cutting likelihood score value of each tone relative standard model simultaneously according to the text of voice correspondence

4) calculate each segment duration according to the cutting of segment, the female duration ratio of sound, segments such as the resonance peak sequence of each segment evaluation and test parameter

5) with standard pronunciation segment model and pitch model voice are carried out segment identification and pitch recognition, obtain recognition result and corresponding segment identification likelihood score and pitch recognition likelihood score

6) the cutting likelihood score of segment and tone is obtained the likelihood ratio of segment and tone divided by the identification likelihood score

7) with each segment duration, the female duration ratio of sound, the likelihood score of segment, the likelihood ratio of segment, the likelihood score of tone, that the likelihood ratio of tone is carried out is regular, and (the different content measurements such as speak of can reading aloud by word, speech, short essay, assign a topic carry out regularly, also can be undertaken regular by all pronunciations) obtains a group or a (carrying out regular corresponding to all pronunciations) evaluation and test parameter at certain examinee.

Claims

1, spoken language pronunciation level automatic test method is characterized in that may further comprise the steps realization:

(a) base frequency parameters of computing voice frame by frame;

(5) calculate frame by frame the recording examination question MFCC parameter, base frequency parameters and resonance peak; To carry out cutting to segment with the standard pronunciation segment model to the MFCC parameter of recording according to the text of voice correspondence, obtain the cutting likelihood score value of each segment relative standard model simultaneously; To arrive segment with the concert pitch model to the base frequency parameters information cutting of recording according to the corresponding text of recording, obtain the cutting likelihood score value of each tone relative standard model simultaneously; Calculate each segment duration according to the cutting of segment, the female duration ratio of sound, segments such as the resonance peak sequence of each segment evaluation and test parameter; With standard pronunciation segment model and pitch model segment identification and pitch recognition are carried out in recording, obtain recognition result and corresponding segment identification likelihood score and pitch recognition likelihood score; The cutting likelihood score of segment and tone is obtained the likelihood ratio of segment and tone divided by the identification likelihood score; With each segment duration, the female duration ratio of sound, the likelihood score of segment, the likelihood ratio of segment, the likelihood score of tone, the likelihood ratio of tone is carried out the regular evaluation result that obtains at the testee.