CN101751919B - Spoken Chinese stress automatic detection method - Google Patents

Spoken Chinese stress automatic detection method Download PDF

Info

Publication number
CN101751919B
CN101751919B CN2008102388779A CN200810238877A CN101751919B CN 101751919 B CN101751919 B CN 101751919B CN 2008102388779 A CN2008102388779 A CN 2008102388779A CN 200810238877 A CN200810238877 A CN 200810238877A CN 101751919 B CN101751919 B CN 101751919B
Authority
CN
China
Prior art keywords
syllable
fundamental frequency
sentence
average
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2008102388779A
Other languages
Chinese (zh)
Other versions
CN101751919A (en
Inventor
徐波
朱涛涛
浦剑涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN2008102388779A priority Critical patent/CN101751919B/en
Publication of CN101751919A publication Critical patent/CN101751919A/en
Application granted granted Critical
Publication of CN101751919B publication Critical patent/CN101751919B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to a spoken Chinese stress automatic detection method. In the method, the voice recognition technology is used to combine the reading voice of a speaker with the text content which is read or read after the tape to perform automatic segmentation and alignment, the voice signal processing technology is combined with the basic theory of linguistics to extract the characteristic parameters of the segmented voice sections and then the machine learning method is used to classify the extracted characteristic parameters to obtain the detection and diagnostic message of spoken Chinese stressed syllables and unstressed syllables of the speaker. The invention can automatically and effectively detect whether the speaker stresses the right syllables and can be used in a Putonghua automatic test and evaluation system so as to benefit that students can better master the stressed syllables and ensure that the expressed meaning is clearer. In a Putonghua auxiliary teaching software, the stress detection and diagnosis functions are added so as to benefit that the Putonghua of the speaker can become more native, thus achieving the aim of efficient communication.

Description

A kind of spoken Chinese stress automatic detection method
Technical field
The present invention relates to speech recognition, the automatic test and evaluation of mandarin and Chinese characters spoken language teaching field, relate to a kind of spoken Chinese stress detection method of practicality.
Background technology
In recent years, along with the development of speech recognition technology, the automatic test and evaluation of mandarin system has received common attention at home.This system has gathered high-quality precision and sophisticated technology objective evaluating systems such as speech recognition, examination, psychology; Correlation techniques such as utilization pronunciation judge, rhythm judge; The accuracy of correctness, pronunciation and the rhythm that the tester is answered and the fluent degree of expression are judged automatically; Provide the evaluation result of tester's each item performance, and on the basis of comprehensive each item performance scoring, provide the macro-forecast of tester's spoken language proficiency.And in automatic speech recognition field and mandarin level automatic evaluation system, stress is to weigh important measurement index in the voice quality quality, is a kind of indispensable key character in the rhythm composition.
From philological angle, its theme and focus are in short all arranged, grammer solves the problem of correctness in short in form, is whether the content in concrete linguistic context, expressed is effective and stress solves, and whether emotion is distinct, the problem whether center is clear and definite.We can say that the preparation of stress has critical role being held in the semantic expression.From technological layer, speech recognition at present particularly large vocabulary continuous speech recognition technology reaches its maturity, utilizes data driven technique to make up model, and is feasible when solving detection and the diagnosis problem of spoken stress.
Summary of the invention
The objective of the invention is to utilize computer realization that the stress position in the oral communication is positioned, apply to the mandarin teaching software, in the test and evaluation system, therefore, propose a kind of detection method and device of spoken Chinese stress automatically.
For reaching said purpose, the present invention provides a kind of detection method of spoken Chinese stress, comprises the steps:
Step 1: combine its corresponding standard to read aloud content of text to voice document and carry out the automatic segmentation alignment;
Step 2: utilize voice process technology to combine philological basic theories that the voice segments of cutting is extracted fundamental frequency characteristic parameter, otonaga features parameter, absolute energy characteristic parameter and spectrum signature parameter thereof;
Step 3: the above-mentioned characteristic parameter to extracting carries out normalization in the sentence, obtains normalized characteristic parameter;
Step 4: normalized characteristic parameter is carried out the data pre-service, obtain the characteristic parameter that is used to classify;
Step 5: utilize sorter that pretreated characteristic of division parameter is classified, obtain the machine score of each syllable;
Step 6: set thresholding to the stress or the non-stress syllable that will detect detect and export diagnostic message.
Beneficial effect of the present invention: the present invention adopts Automatic Measurement Technique, can effectively detect the declaimer incorrect place of stress position of pronouncing, and is used for the mandarin Teaching Aid Software, can play the effect of aided education.Be applied to help to let the student better hold the stress of Chinese in the automatic test and evaluation of the mandarin system, the meaning of expression is clearer and more definite.Particularly in teaching Chinese as a foreign language, we can find that the Chinese that some foreigners say does not have grammar mistake, but the phenomenon that some speeches are not expressed one's ideas can occur, and reason just is to hold improper to the stress of mandarin or tone.In this type Assisted Instruction Software, increase the detection and the diagnostic function thereof of stress, help foreigner's Chinese proficiency purer and genuine, thereby reach the purpose of efficient communication.
Description of drawings
Accompanying drawing 1 is the FB(flow block) of stress automatic testing method of the present invention.
Embodiment
Specify each related detailed problem in the technical scheme of the present invention below in conjunction with accompanying drawing.Be to be noted that described embodiment only is intended to be convenient to understanding of the present invention, and it is not played any qualification effect.
In continuous oral communication, the duration of a sound, the pitch size, the energy size all can produce complicated variation along with the change of position in sentence, and these change the distribution that all might influence stress.Therefore, we select the duration of a sound of voice for use, and pitch, energy and spectrum signature are distinguished stress and non-stress.
Technical essential of the present invention is to combine its corresponding standard to read aloud content of text to voice document and carry out the automatic segmentation alignment; Utilize voice process technology to combine philological basic theories that the voice segments of cutting is extracted fundamental frequency characteristic parameter, otonaga features parameter, absolute energy characteristic parameter and spectrum signature parameter thereof; Said characteristic parameter to extracting carries out normalization in the sentence, obtains normalized characteristic parameter; Normalized characteristic parameter is carried out the data pre-service, obtain the characteristic parameter that is used to classify; Utilize sorter that pretreated characteristic of division parameter is classified, obtain the machine score of each syllable; Set thresholding to the stress or the non-stress syllable that will detect detect and export diagnosis.
Through the RP text its relevant voice file is forced cutting; Obtain the sound bite of each Chinese syllable; Its sound bite information comprises title, zero-time, the terminal time of each phoneme of title, zero-time, termination time and this syllable of composition of this syllable, obtains the time long of each sound bite, and time span information comprises that the time of whole sentence is long; The time of each syllable and phoneme is long, and the averaging time of syllable is long.Through extracting the fundamental frequency characteristic of sound bite, obtain the pitch of speech syllable then, the syllable fundamental frequency information that extracts is done normalization through the average pitch of whole sentence.Calculate the absolute energy of each syllable in conjunction with the temporal information of each syllable that obtains after the cutting, and do the absolute energy characteristic parameter that normalization obtains each syllable through sentence average absolute energy.To the voice after the cutting is that unit carries out spectrum analysis with the frame; The time span of every frame is got 25ms; The time interval of consecutive frame is got 10ms, and being unit with the frame carries out the high frequency pre-emphasis to the voice of each syllable, adds Hamming window; Obtain the spectrum information of each frame with FFT; Calculate the sub belt energy of each frame through BPF., combine the temporal information of syllable to try to achieve the sub belt energy and a sentence average sub band energy of each syllable, the average sub band energy of each syllable of usefulness is respectively done the sub belt energy characteristic parameter that normalization in the sentence obtains each syllable divided by sentence average sub band energy.Calculate the sub belt energy characteristic of each syllable through the duration information of sound bite being forced cutting obtain.At last above-mentioned characteristic parameter is calculated through the design category device and provide the machine score, thereby and through thresholding the machine score is adjudicated and to provide relevant detection result and diagnostic message.
This method mainly is made up of following flow process, the automatic segmentation alignment of voice document, the extraction of each syllable characteristic, the design of sorter.Detailed process is following:
1. given voice document is passed through the RP text; The acoustics score of dictionary and the every frame of acoustics Model Calculation voice document, utilize the Viterbi search technique to obtain each syllable frag info result of the automatic segmentation alignment of these voice---promptly read aloud the corresponding one by one and syllable and form the time period information of each phoneme of this syllable of respectively being pronounced of each initial consonant, simple or compound vowel of a Chinese syllable in initial consonant, simple or compound vowel of a Chinese syllable and the RP text in the voice.This each syllable frag info comprises the zero-time and the termination time of each syllable and initial consonant thereof, simple or compound vowel of a Chinese syllable.
2. according to the time period information of 1 resulting respectively pronounce syllable and each phoneme of forming this syllable, each syllable is extracted each duration of a sound, fundamental frequency, absolute energy, the duration of a sound and spectrum signature and does the characteristic that normalization detects as stress in the sentence respectively, specific as follows:
2.1) otonaga features:
According to philological knowledge, the duration of a sound is a basic acoustic feature in the Chinese accent, so we get the duration of a sound of each syllable after the cutting respectively, first syllable duration of a sound is T1, the duration of a sound T of n syllable n, average duration of a sound T AverBe defined as
T aver = 1 N Σ n = 1 n = N T n
Wherein N is a syllable number in the voice.Described otonaga features parameter is at first to calculate the average duration of a sound of one section voice, the length that is calculated as each syllable of the average duration of a sound and divided by the number of syllable;
2.1.1 remove with the average duration of a sound through each syllable duration of a sound, obtain the average duration of a sound normalization characteristic parameter of the duration of a sound of each syllable in the sentence.
2.1.2 (rate of speech ROS) does the word speed normalization characteristic parameter that normalization obtains each syllable duration of a sound in the sentence with word speed through each syllable duration of a sound removal.Because exist when pausing in short, word speed ROS computing formula is following:
ROS = T N end - T 1 start - T silence N
Wherein
Figure G2008102388779D00043
Be the terminal time of ultima,
Figure G2008102388779D00044
Be the zero-time of first syllable, T SilenceFor first speech arrives the quiet part of length between the last speech, realize through end-point detection.
2.2) the fundamental frequency characteristic:
Using correlation method as unit voice document to be done preliminary fundamental frequency with frame extracts; And the frequency multiplication that produces in the correction leaching process or half frequently; Utilize splines that interpolation is carried out in the position that does not have fundamental frequency, thereby obtain the continuous fundamental curve of whole sentence voice, the scope of the value of fundamental frequency is from 50HZ to 500HZ; A phoneme after the cutting is formed by at least 3 frames or more than 3 frames, and every frame obtains a fundamental frequency value after the interpolation; Fundamental frequency average, fundamental frequency maximal value, fundamental frequency minimum value, terminal point fundamental frequency value and the fundamental frequency range value of trying to achieve each syllable then respectively are as the fundamental frequency characteristic parameter, and we only consider the vowel phoneme section part of each syllable in the superincumbent fundamental frequency calculation of characteristic parameters; The concrete introduction as follows:
A. through whole sentence fundamental frequency average the fundamental frequency average of each syllable is done normalization, promptly the fundamental frequency average of each syllable is divided by the fundamental frequency average of whole sentence voice.
B. through whole sentence fundamental frequency average the fundamental frequency maximal value of each syllable is done normalization, promptly the fundamental frequency maximal value of each syllable is divided by the fundamental frequency average of whole sentence voice.
C. through whole sentence fundamental frequency average the fundamental frequency minimum value of each syllable is done normalization, promptly the fundamental frequency minimum value value of each syllable is divided by the fundamental frequency average of whole sentence voice.
D. through whole sentence fundamental frequency average the terminal point fundamental frequency value of each syllable is done normalization, promptly the terminal point fundamental frequency value of each syllable is divided by the fundamental frequency average of whole sentence voice.
E. through whole sentence fundamental frequency average the fundamental frequency amplitude difference of each syllable is done normalization, promptly the fundamental frequency maximal value in each syllable subtracts the fundamental frequency average of fundamental frequency minimum value divided by whole sentence voice.
2.3) the absolute energy characteristic:
By the syllable time span that step 1 obtains, ask for the absolute energy of each syllable.The absolute energy E of i syllable iComputing formula is following:
E i = log Σ n = iStart n = iEnd A n 2
A wherein nBe the amplitude size of n sampled point of audio frequency, E iBe the energy size of i syllable, iStart and iEnd are respectively the initial sampled point and termination sampled point of i syllable, and n is initial sampled point and stops the integer between the sampled point.Sentence average absolute energy E AveCan obtain by following computing formula:
E ave = 1 N Σ i = 1 i = N E i
Wherein N be in short in syllable number in the voice.
Sentence average absolute energy through these voice is done normalization to the absolute energy of each syllable, and promptly the absolute energy of each syllable is divided by the sentence average absolute energy of these voice.
2.4) spectrum signature:
With the frame is that unit handles the speech frame of each syllable, behind the branch frame, and usually will be to speech frame signal S nCarrying out the high frequency pre-emphasis obtains
Figure G2008102388779D00061
s ^ n = s n - 0.97 s n - 1
Mainly be to amplify compensation because the emission of lip causes voice high-frequency signal s through high frequency nDecay, wherein 0.97 is pre emphasis factor.At last before carrying out frequency-domain analysis, need be to every frame signal windowing (adopting Hamming window usually), the spectrum leakage that causes owing to signal cutout with compensation.
s n ′ = { 0.54 - 0.46 · cos ( 2 π ( n - 1 ) N - 1 ) } · s ^ n
Figure G2008102388779D00064
is the value that n is ordered after the windowing, and N is counting of Hamming window.Obtain the frequency domain information of each frame through FFT; Adopt band-pass filter to calculate the sub belt energy of each frame; The sub belt energy and a sentence average sub band energy of trying to achieve each syllable through the temporal information of syllable are done the sub belt energy characteristic parameter that normalization in the sentence obtains each syllable with the average sub band energy of each syllable divided by sentence average sub band energy.Show that through a large amount of experiments the sub belt energy of frequency domain is helpful to the detection of stress.From 50Hz~500Hz, 500Hz~2200HZ, 2200HZ~4000HZ and intermediate frequency 4000Hz can play effect preferably to the sub belt energy size of high frequency 8000HZ in the classification of stress.Therefore, to the logical FIR wave filter of this design band, try to achieve 50Hz~500Hz; 500Hz~2200HZ; 2200HZ~4000HZ, 4000Hz~8000HZ sub belt energy, and do the sub belt energy characteristic parameter of normalization through each the sub-band averaging energy in the sentence respectively as this voice stress.
2.5) the data pre-service
Data are carried out helping after the pre-service to accelerate the speed that classifier data is handled, two kinds of schemes are arranged here, a kind of be with data compression in lower and upper interval, promptly to each dimensional feature, maximizing F MaxWith minimum value F MinThe normalization of laggard line data is handled:
F norm = lower + F raw - F min F max - F min ( upper - lower )
F wherein NormBe the eigenwert after the normalization, F RawBe to carry out the preceding eigenwert of normalization.Upper is the interval upper limit of data normalization, and lower is the interval lower limit of data normalization.
Another kind method is to subtract average divided by variance:
F norm = F raw - AVER VAR
Wherein AVER is the mean value of one-dimensional characteristic, and VAR is the variance yields of one-dimensional characteristic.
3. the detection of stress and diagnosis
Adopt sorter (for example SVMs, artificial neural network, decision tree, random forest or gauss hybrid models sorter) that the characteristic parameter that is extracted is carried out classification processing with machine learning method, each syllable that detects is handled obtaining a machine score.Support vector machine classifier is selected radially basic kernel function for use.We come the machine score of a syllable is judged through setting a thresholding.The stress that is judged to be that is higher than this thresholding is lower than the non-stress of being judged as of this thresholding.
Stress result and the good stress of the mark content of text of reading aloud partly that classification and Detection is come out compare, and confirm whether the stress speech carries a tune; Whether whether word or speech through the form of text points out the declaimer in pronunciation, should read again read again, do not have mark stressed word or speech to send out into stress.Because Chinese is a kind of single syllable language, for single word, judge through syllable, for speech, convert thereof into the combination of a plurality of syllables, it is detected and diagnoses.
The system of embodiment of the present invention method mainly is made up of above three parts, on PC, realizes, adopts the c++ language, and has worked out algorithm and the program that realizes this method.This technological novelty is: after voice are carried out automatic segmentation; Extract fundamental frequency; Energy, the duration of a sound and spectrum signature parameter thereof are carried out specific normalization and data pre-service thereof to carrying characteristic parameter; The machine score of utilizing sorter that the later data of pre-service are classified and obtained each syllable, the method through setting thresholding to the stress that will detect or non-stress syllable detect and diagnose.The thresholding of setting can be a fixing value or decide (for example: the accent difference according to the people of different regions is decided) according to different situations.
The instance explanation
Chinese is a very graceful language, and its intension is of extensive knowledge and profound scholarship.In this stress detection method, we select some works and expressions for everyday use Journal of Sex Research that experimentizes for use.Instance is following:
The content of text of reading aloud:
1) (Li Ming) to go Sunday Bird's Nest to take part in the Olympic Games?
Will 2) Li Ming (Sunday) go Bird's Nest to take part in the Olympic Games?
Will 3) Li Ming go (Bird's Nest) to take part in the Olympic Games Sunday?
Will 4) Li Ming go Bird's Nest to participate in (Olympic Games) Sunday?
Have as stated in four words of accent mark, the speech of the speech that adds parenthesis for reading again, we can see, with in short with the different stress tone the theme that will explain make a world of difference.For example first emphasical be this sportsman of Li Ming, be this time on Sunday and second word is stressed, what the 3rd word was stressed is this place of Bird's Nest, what the 4th word was emphasical is this global magnificent sports meet that is the theme with the Olympic Games.If the reader the in short the stress read aloud for " Li Ming (Sunday) will go Bird's Nest to take part in the Olympic Games? " Automatically the stressed position of articulation that detects the reader through system does not carry out according to the requirement of reading aloud text; Point out the pronunciation of reader's stress position not as requested, prompting reader's stress position of articulation should be put into first above speech.
We have enrolled Beijing native's totally 8000 of standard mandarin respectively, in the Recording Process to the part that will read again mark, the sampling rate of recording is 16K.The declaimer is according to marking text reading or following and read, and declaimer's strictness is followed the recording standard to guarantee the accuracy of recording data.With accompanying drawing 1 method declaimer's voice being carried out stress then detects.Step is following:
Record voice and standard are read aloud text carry out the automatic segmentation alignment.
The voice that segmented are carried out characteristic parameter extraction and data pre-service thereof, extract characteristic parameter and comprise: duration parameters, energy parameter, base frequency parameters, frequency spectrum parameter, and carry out the normalization and the data pre-service of characteristic parameter;
After pretreated characteristic parameter handled through sorter; Each syllable obtains a machine score; Get method through the setting thresholding again and confirm that this obtains stress and non-stress testing result partly in the spoken message, contrast had at last marked the pronunciation diagnosis report that content of text provides stress of reading aloud of stress part originally.
The above; Be merely the embodiment among the present invention, but protection scope of the present invention is not limited thereto, anyly is familiar with this technological people in the technical scope that the present invention disclosed; Can understand conversion or the replacement expected; All should be encompassed in of the present invention comprising within the scope, therefore, protection scope of the present invention should be as the criterion with the protection domain of claims.

Claims (8)

1. spoken Chinese stress automatic detection method is characterized in that:
Step 1: voice document is passed through the RP text; The acoustics score of dictionary and the every frame of acoustics Model Calculation voice document; Utilize the Viterbi search technique to obtain each syllable frag info result of the automatic segmentation alignment of these voice---promptly read aloud the corresponding one by one and syllable and form the time period information of each phoneme of this syllable of respectively being pronounced of each initial consonant, simple or compound vowel of a Chinese syllable in initial consonant, simple or compound vowel of a Chinese syllable and the RP text in the voice, this each syllable frag info comprises the zero-time and the termination time of each syllable and initial consonant thereof, simple or compound vowel of a Chinese syllable;
Step 2:, each syllable is extracted each otonaga features, fundamental frequency characteristic, absolute energy characteristic and spectrum signature and does normalization in the sentence respectively according to the time period information of resulting respectively pronounce syllable and each phoneme of forming this syllable of step 1;
Step 3: normalized characteristic parameter is carried out the data pre-service, obtain the characteristic parameter that is used to classify;
Step 4: utilize sorter that pretreated characteristic parameter is classified, obtain the machine score of each syllable;
Step 5: set thresholding to the stress or the non-stress syllable that will detect detect and export diagnostic message.
2. spoken Chinese stress detection method according to claim 1 is characterized in that: each syllable is extracted fundamental frequency characteristic do that normalized step comprises in the sentence:
Using correlation method as unit voice document to be done fundamental frequency with frame extracts; And the frequency multiplication that produces in the correction leaching process or half frequently; Utilize splines that interpolation is carried out in the position that does not have fundamental frequency; Obtain the continuous fundamental frequency curve of whole sentence voice, a phoneme after the cutting is by forming more than 3 frames, and every frame obtains a fundamental frequency value after the interpolation; Fundamental frequency average, fundamental frequency maximal value, fundamental frequency minimum value, terminal point fundamental frequency value and the fundamental frequency amplitude difference of trying to achieve each syllable then respectively are as the fundamental frequency characteristic parameter; Calculate the vowel phoneme section part of the scope of each the syllable speech frame in the above-mentioned fundamental frequency characteristic parameter process for this syllable, specific as follows:
A. through whole sentence fundamental frequency average the fundamental frequency average of each syllable is done normalization, promptly the fundamental frequency average of each syllable is divided by the fundamental frequency average of whole sentence voice;
B. through whole sentence fundamental frequency average the fundamental frequency maximal value of each syllable is done normalization, promptly the fundamental frequency maximal value of each syllable is divided by the fundamental frequency average of whole sentence voice;
C. through whole sentence fundamental frequency average the fundamental frequency minimum value of each syllable is done normalization, promptly the fundamental frequency minimum value value of each syllable is divided by the fundamental frequency average of whole sentence voice;
D. through whole sentence fundamental frequency average the terminal point fundamental frequency value of each syllable is done normalization, promptly the terminal point fundamental frequency value of each syllable is divided by the fundamental frequency average of whole sentence voice;
E. through whole sentence fundamental frequency average the fundamental frequency amplitude difference of each syllable is done normalization, promptly the fundamental frequency maximal value in each syllable subtracts the fundamental frequency average of fundamental frequency minimum value divided by whole sentence voice.
3. spoken Chinese stress detection method according to claim 1; It is characterized in that: each syllable is extracted absolute energy characteristic do that normalized step comprises in the sentence:, try to achieve the absolute energy and a sentence average absolute energy of each syllable by the syllable time span that step 1 obtains; Through sentence average absolute energy the absolute energy of each syllable is done normalization in the sentence.
4. spoken Chinese stress detection method according to claim 3 is characterized in that: normalized method in the said sentence: be to remove with sentence average absolute energy with the absolute energy of each syllable.
5. spoken Chinese stress detection method according to claim 1; It is characterized in that: describedly each syllable is extracted otonaga features do that normalized step comprises in the sentence: at first calculate the average duration of a sound of one section voice, the length sum that is calculated as each syllable of the average duration of a sound is divided by the number of syllable; Remove with the average duration of a sound through each syllable duration of a sound, obtain the average duration of a sound normalization characteristic parameter of the duration of a sound of each syllable in the sentence.
6. spoken Chinese stress detection method according to claim 1 is characterized in that: describedly each syllable is extracted otonaga features do that normalized step comprises in the sentence: utilize the duration of a sound of each syllable to do the word speed normalization characteristic parameter that normalization obtains the duration of a sound divided by word speed.
7. spoken Chinese stress detection method according to claim 1; It is characterized in that: describedly each syllable is extracted spectrum signature do normalization in the sentence; Be that to be unit with the frame carry out the high frequency pre-emphasis to the voice of each syllable; Add Hamming window, obtain the frequency domain information of each frame, adopt band-pass filter to calculate the sub belt energy of each frame through FFT; The sub belt energy and a sentence average sub band energy of trying to achieve each syllable in conjunction with the temporal information of syllable are respectively done the sub belt energy characteristic parameter that normalization in the sentence obtains each syllable with syllable average sub band energy divided by sentence average sub band energy.
8. spoken Chinese stress detection method according to claim 1 is characterized in that: sorter is selected SVMs, artificial neural network, decision tree, random forest or gauss hybrid models sorter for use.
9. spoken Chinese stress detection method according to claim 8 is characterized in that: sorter stress result who detects and the content of text of reading aloud that has marked the stress part are compared, confirm whether the stress speech carries a tune; Whether whether word or speech through the form of text points out the declaimer in pronunciation, should read again read again, do not have mark stressed word or speech to send out into stress.
CN2008102388779A 2008-12-03 2008-12-03 Spoken Chinese stress automatic detection method Active CN101751919B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008102388779A CN101751919B (en) 2008-12-03 2008-12-03 Spoken Chinese stress automatic detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008102388779A CN101751919B (en) 2008-12-03 2008-12-03 Spoken Chinese stress automatic detection method

Publications (2)

Publication Number Publication Date
CN101751919A CN101751919A (en) 2010-06-23
CN101751919B true CN101751919B (en) 2012-05-23

Family

ID=42478790

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008102388779A Active CN101751919B (en) 2008-12-03 2008-12-03 Spoken Chinese stress automatic detection method

Country Status (1)

Country Link
CN (1) CN101751919B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI431563B (en) 2010-08-03 2014-03-21 Ind Tech Res Inst Language learning system, language learning method, and computer product thereof
CN102376182B (en) * 2010-08-26 2014-08-27 财团法人工业技术研究院 Language learning system, language learning method and program product thereof
JP6263868B2 (en) * 2013-06-17 2018-01-24 富士通株式会社 Audio processing apparatus, audio processing method, and audio processing program
CN103474072B (en) * 2013-10-11 2016-06-01 福州大学 Utilize the quick anti-noise chirping of birds sound recognition methods of textural characteristics and random forest
CN104575519B (en) * 2013-10-17 2018-12-25 清华大学 The method, apparatus of feature extracting method, device and stress detection
WO2016014970A1 (en) * 2014-07-24 2016-01-28 Harman International Industries, Incorporated Text rule based multi-accent speech recognition with single acoustic model and automatic accent detection
CN109872727B (en) * 2014-12-04 2021-06-08 上海流利说信息技术有限公司 Voice quality evaluation device, method and system
CN107833572A (en) * 2017-11-06 2018-03-23 芋头科技(杭州)有限公司 The phoneme synthesizing method and system that a kind of analog subscriber is spoken
CN109036384B (en) * 2018-09-06 2019-11-15 百度在线网络技术(北京)有限公司 Audio recognition method and device
CN109087627A (en) * 2018-10-16 2018-12-25 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN109754822A (en) * 2019-01-22 2019-05-14 平安科技(深圳)有限公司 The method and apparatus for establishing Alzheimer's disease detection model
CN110060665A (en) * 2019-03-15 2019-07-26 上海拍拍贷金融信息服务有限公司 Word speed detection method and device, readable storage medium storing program for executing
CN109859548A (en) * 2019-04-15 2019-06-07 新乡学院 A kind of university's mandarin teaching aid equipment
CN111583961A (en) * 2020-05-07 2020-08-25 北京一起教育信息咨询有限责任公司 Stress evaluation method and device and electronic equipment
CN111292763B (en) * 2020-05-11 2020-08-18 新东方教育科技集团有限公司 Stress detection method and device, and non-transient storage medium
CA3097328C (en) * 2020-05-11 2021-12-21 Neworiental Education & Technology Group Ltd. Accent detection method and accent detection device, and non-transitory storage medium
CN111312231B (en) * 2020-05-14 2020-09-04 腾讯科技(深圳)有限公司 Audio detection method and device, electronic equipment and readable storage medium
CN111582746B (en) * 2020-05-15 2021-02-23 深圳看齐信息有限公司 Intelligent oral English examination system
CN115148225A (en) * 2021-03-30 2022-10-04 北京猿力未来科技有限公司 Intonation scoring method, intonation scoring system, computing device and storage medium
CN117012178A (en) * 2023-07-31 2023-11-07 支付宝(杭州)信息技术有限公司 Prosody annotation data generation method and device

Also Published As

Publication number Publication date
CN101751919A (en) 2010-06-23

Similar Documents

Publication Publication Date Title
CN101751919B (en) Spoken Chinese stress automatic detection method
CN102354495B (en) Testing method and system of semi-opened spoken language examination questions
CN101739867B (en) Method for scoring interpretation quality by using computer
CN101740024B (en) Method for automatic evaluation of spoken language fluency based on generalized fluency
CN103617799B (en) A kind of English statement pronunciation quality detection method being adapted to mobile device
CN101346758B (en) Emotion recognizer
CN102034475B (en) Method for interactively scoring open short conversation by using computer
CN101551947A (en) Computer system for assisting spoken language learning
CN102376182B (en) Language learning system, language learning method and program product thereof
CN110675854A (en) Chinese and English mixed speech recognition method and device
CN105825852A (en) Oral English reading test scoring method
CN104464423A (en) Calibration optimization method and system for speaking test evaluation
CN102426834B (en) Method for testing rhythm level of spoken English
CN103366735B (en) The mapping method of speech data and device
CN103366759A (en) Speech data evaluation method and speech data evaluation device
Meng et al. Development of automatic speech recognition and synthesis technologies to support Chinese learners of English: The CUHK experience
Kopparapu Non-linguistic analysis of call center conversations
CN109300339A (en) A kind of exercising method and system of Oral English Practice
CN107240394A (en) A kind of dynamic self-adapting speech analysis techniques for man-machine SET method and system
CN110176251B (en) Automatic acoustic data labeling method and device
Koudounas et al. Italic: An italian intent classification dataset
CN109697975B (en) Voice evaluation method and device
Hämäläinen et al. A multimodal educational game for 3-10-year-old children: collecting and automatically recognising european portuguese children’s speech
CN202758611U (en) Speech data evaluation device
Hönig Automatic assessment of prosody in second language learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: ANHUI USTC IFLYTEK CO., LTD.

Free format text: FORMER OWNER: RESEARCH INST. OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Effective date: 20120529

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 100080 HAIDIAN, BEIJING TO: 230088 HEFEI, ANHUI PROVINCE

TR01 Transfer of patent right

Effective date of registration: 20120529

Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Patentee after: Anhui USTC iFLYTEK Co., Ltd.

Address before: 100080 Zhongguancun East Road, Beijing, No. 95, No.

Patentee before: Institute of Automation, Chinese Academy of Sciences

C56 Change in the name or address of the patentee

Owner name: IFLYTEK CO., LTD.

Free format text: FORMER NAME: ANHUI USTC IFLYTEK CO., LTD.

CP01 Change in the name or title of a patent holder

Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Patentee after: Iflytek Co., Ltd.

Address before: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Patentee before: Anhui USTC iFLYTEK Co., Ltd.