CN106531189A - Intelligent spoken language evaluation method - Google Patents
Intelligent spoken language evaluation method Download PDFInfo
- Publication number
- CN106531189A CN106531189A CN201611181451.5A CN201611181451A CN106531189A CN 106531189 A CN106531189 A CN 106531189A CN 201611181451 A CN201611181451 A CN 201611181451A CN 106531189 A CN106531189 A CN 106531189A
- Authority
- CN
- China
- Prior art keywords
- feature
- section
- vowel
- consonant
- spectrum energy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The present invention provides an intelligent spoken language evaluation method. A computer-based recording device acquires the oral speech data information of a user and the phonetic features of the user are extracted out of the oral speech data information of the user. The phonetic features of the user are compared and aligned with standard phonetic features, wherein vowels and consonants in the phonetic features of the user are respectively and correspondingly compared with vowels and consonants in the standard phonetic features to form the contrast data information. The contrast data information is scored and then both the contrast data information and the scored result are stored in a database. Therefore, the user can find out inaccurately pronounced words in the spoken language through comparing the spoken language with the standard spoken language. As a result, the language learning convenience of learners is improved, and the foreign language learning efficiency is improved. The learning interest of the user is improved.
Description
Technical field
The present invention relates to language communication field, more particularly to a kind of intelligence spoken language assessment method.
Background technology
With the development of global economic integration, English increasingly shows its important work as international language
With.Increasingly frequently, increasing people needs to learn a foreign language, so for the activity such as commercial exchange, cultural exchanges, transnational tourist
The raising of oral communicative competence has become the active demand of foreign language learning.How the results of learning of foreign language are improved, preferably
Meet the demand of user's Foreign Language Learning, have become current problem demanding prompt solution.
The content of the invention
In order to overcome above-mentioned deficiency of the prior art, it is an object of the present invention to provide a kind of intelligence spoken language test and appraisal side
Method, method include:
S1:The spoken voice data message of user is obtained using the sound pick-up outfit of computer, is extracted in user voice data
User vocal feature;
S2:User vocal feature is alignd with received pronunciation feature, and by the vowel in user vocal feature, consonant
The vowel with received pronunciation feature is corresponded to respectively, and consonant is contrasted, and is contrasted data message;
S3:Correction data information is scored;
S4:Correction data information and appraisal result are stored into database.
Preferably, also include before step S1:Setting standard reads aloud text, and the received pronunciation that acquisition standard reads aloud text is special
Levy;
Received pronunciation feature is temporally segmented, is divided into n sections, with 20ms as a time slice;
Each time period received pronunciation feature is divided into into static nature and behavioral characteristics;
The spectrum energy of each time period received pronunciation feature is decomposed, each time period received pronunciation is decomposited special
The spectrum energy distribution of the vowel section levied and the spectrum energy distribution of consonant section;
The vowel section MFCC characteristic vectors of each time period internal standard phonetic feature, consonant section MFCC characteristic vectors are set;
By the vowel section MFCC characteristic vectors of each time period internal standard phonetic feature, the storage of consonant section MFCC characteristic vectors
Into database.
Preferably, step S1 also includes:
User voice data is temporally segmented by S11, is divided into n sections, with 20ms as a time slice, to each time
Section user voice data adds rectangular window, or Hamming window to process and obtain being segmented voice signal Xn, n is segments;
S12 is to being segmented voice signal XnShort Time Fourier Transform is carried out, frequency-region signal is transformed to, time-domain signal will be turned in short-term
Turn to frequency-region signal Yn, and pass through Qn=│ Yn│2Calculate its short-time energy and compose Qn;
Short-time energy is composed Q by the way of first in first out by S13nBandpass filter is moved to from vector space S to be filtered
Ripple;As in each frequency band, acting in human ear for component is superposition, therefore the energy in each filter band is entered
Row is superimposed, and at this moment k-th filter output power composes x'(k);
The output of each wave filter is taken the logarithm by S14, obtains the log power spectrum of frequency band;And carry out anti-discrete cosine
Conversion, obtains M MFCC coefficient, and general M takes 13~15;MFCC coefficients are:
S15 using the user speech MFCC features of each time period for obtaining as static nature, then by the static nature
Single order and second differnce are done, corresponding behavioral characteristics are obtained.
Preferably, step S1 also includes:
Obtain the spectrum energy (f of each voice segments frequency rangek), upper frequency limit value k in the voice segments1, lower limit
k2, obtain the spectrum energy ratio PN in voice segmentsn;
Preferably, step S1 also includes:
If spectrum energy (f in voice segmentsk) >=first threshold, spectrum energy ratio PN in the voice segmentsn>=Second Threshold,
Then judge that this voice segments is vowel section;First threshold 0.1-0.5, Second Threshold take 60%-85%;
On the basis of the spectrum energy with vowel section, the spectrum energy before spectrum energy of the judgement with vowel section
Whether zero-crossing rate is more than the 3rd threshold value, if being more than the 3rd threshold value, concludes the spectrum energy for the consonant section before vowel, the 3rd threshold
Value takes 100;
On the basis of the spectrum energy with vowel section, the spectrum energy after spectrum energy of the judgement with vowel section
Whether zero-crossing rate is more than the 3rd threshold value, if being more than the 3rd threshold value, judges that the spectrum energy is the consonant after vowel;
If the zero-crossing rate of the spectrum energy after the spectrum energy with vowel section is more than the 3rd threshold value, and the spectrum energy
For the last frame of voice segments, then it is judged as nose tail consonant.
Preferably, step S2 also includes:
The vowel section MFCC characteristic vectors of user vocal feature in each time period, consonant section MFCC characteristic vectors are set;
Using DTW algorithms, obtain the minimum align to path of an error with, obtain the minimum align to path of an error and
Corresponding DTW distances;
Based on the align to path and corresponding DTW distances, by the vowel section MFCC of user vocal feature in same time period
The vowel section MFCC characteristic vectors of characteristic vector and received pronunciation feature carry out speech comparison and by user in same time period
The consonant section MFCC characteristic vectors of phonetic feature carry out speech comparison with the consonant section MFCC characteristic vectors of received pronunciation feature, obtain
The pronunciation difference gone out between user vocal feature and received pronunciation feature.
Preferably, step S2 also includes:
The quasi- speech feature vector of vowel feast-brand mark for arranging each time period internal standard phonetic feature is P1=[p1(1),p1
(2),…,p1(R)], first-order difference vector is PΔ1=[pΔ1(1),pΔ1(2),…,pΔ1(R)] (mothers of the R for received pronunciation feature
Syllable verbal audio length), PΔ1(n)=| p1(n)-p1(n-1) |, n=1,2 ..., R, p1(0)=0;
The quasi- speech feature vector of consonant feast-brand mark for arranging each time period internal standard phonetic feature is P '1=[p '1(1), p '1
(2) ..., p '1(R)], first-order difference vector is P 'Δ1=[p 'Δ1(1), p 'Δ1(2) ..., p 'Δ1(R)] (R is received pronunciation feature
Voice length), P 'Δ1(n)=| p '1(n)-p’1(n-1) |, n=1,2 ..., R, p '1(0)=0;
Preferably, step S2 also includes:
The vowel section characteristic vector for arranging user vocal feature in each time period is P2=[p2(1),p2(2),…,p2
(T)], its first-order difference vector is PΔ2=[pΔ2(1),pΔ2(2),…,pΔ2(T)] (T is the length of voice to be evaluated), PΔ2(n)
=| p2(n)-p2(n-1) |, n=1,2 ..., T, p2(0)=0;
The consonant section characteristic vector for arranging user vocal feature in each time period is P '2=[p '2(1), p '2(2) ...,
p’2(T)], its first-order difference vector is P 'Δ2=[p 'Δ2(1), p 'Δ2(2) ..., p 'Δ2(T)] (T is the length of voice to be evaluated
Degree),
P’Δ2(n)=| p '2(n)-p’2(n-1) |, n=1,2 ..., T, p '2(0)=0;
Using DTW algorithms, obtain the minimum align to path of an error with, obtain the minimum align to path of an error,
The vowel section and consonant section carried out in each time period compares;
Comparison draws gap d of vowel sectionp, and the gap Δ d of variable quantityp, compare gap d for drawing consonant section 'p, with
And the gap Δ d ' of variable quantityp, the similarity of user vocal feature and received pronunciation feature is obtained, i.e.,:
dp=| p1(n)-p2(m)|
d’p=| p '1(n)-p’2(m)|
Δdp=| Δ p1(n)-Δp2(m)|
Δd’p=| Δ p '1(n)-Δp’2(m)|
Wherein, Δ pi(n)=| pi(n)-pi(n-1)|
Δp’i(n)=| p 'i(n)-p’i(n-1)|。
Preferably, step S3 also includes:Scoring s be:
S=ω 1 (ω 11s11+ ω 12s12+ ...+ω 1js1j) 2 (ω 21s21+ ω 22s22+ ...+ω of+ω
2js2j)+……+ωn(ωn1sn1+ωn2sn2+……+ωnjsnj)
Wherein, ω 1, ω 2, ω n represent the weight of each voice segments respectively;
J represents the total quantity that vowel section in each voice segments adds consonant section;
ω 11, ω 12 ... ω 1j represent the weight of syllable in first voice segments respectively;
S11, s12 ...+s1j, represents each syllable in first voice segments;
ω 21, ω 22 ... ω 2j represent the weight of syllable in second voice segments respectively;
S21, s22 ...+s2j, represents each syllable in second voice segments;
ω n1, ω n2 ... ω nj represent the weight of syllable in n-th voice segments respectively;
Sn1, sn2 ...+snj, represents each syllable in n-th voice segments.
As can be seen from the above technical solutions, the present invention has advantages below:
Intelligence spoken language assessment method causes user to obtain same a piece of text with computer, carries out reading aloud contrast, uses
Family can learn which word pronunciation the spoken language of oneself has inaccurate with the spoken language of standard, in addition it is also necessary to be changed in which word
Enter and further learn.The convenience of study language is so brought to learner, the efficiency of foreign language learning is improved, increases user
Learning interest.
Description of the drawings
Fig. 1 is the flow chart of intelligence spoken language assessment method.
Specific embodiment
To enable goal of the invention of the invention, feature, advantage more obvious and understandable, below will be with specific
Embodiment and accompanying drawing, are clearly and completely described to the technical scheme of present invention protection, it is clear that enforcement disclosed below
Example is only a part of embodiment of the invention, and not all embodiment.Based on the embodiment in this patent, the common skill in this area
All other embodiment that art personnel are obtained under the premise of creative work is not made, belongs to the model of this patent protection
Enclose.
The present invention provides a kind of intelligence spoken language assessment method, as shown in figure 1, this method reads aloud text using a standard, meter
Calculation machine first obtains the content that the standard reads aloud text, and obtains the standard pronunciation that standard reads aloud text.Side involved in the present invention
Method is to coordinate corresponding program to realize based on computer hardware.So user obtains same a piece of text with computer, carries out
Read aloud contrast so that user can learn which word pronunciation the spoken language of oneself has inaccurate with the spoken language of standard, in addition it is also necessary to
It is improved and further learns in which word.The convenience of study language is so brought to learner, foreign language is improved
The efficiency of habit, increases user learning interest.
Method includes:
S1:The spoken voice data message of user is obtained using the sound pick-up outfit of computer, is extracted in user voice data
User vocal feature;
S2:User vocal feature is alignd with received pronunciation feature, and by the vowel in user vocal feature, consonant
The vowel with received pronunciation feature is corresponded to respectively, and consonant is contrasted, and is contrasted data message;
S3:Correction data information is scored;
S4:Correction data information and appraisal result are stored into database.
Also include before step S1:Setting standard reads aloud text, and acquisition standard reads aloud the received pronunciation feature of text;
Received pronunciation feature is temporally segmented, is divided into n sections, with 20ms as a time slice;
Each time period received pronunciation feature is divided into into static nature and behavioral characteristics;
The spectrum energy of each time period received pronunciation feature is decomposed, each time period received pronunciation is decomposited special
The spectrum energy distribution of the vowel section levied and the spectrum energy distribution of consonant section;
The vowel section MFCC characteristic vectors of each time period internal standard phonetic feature, consonant section MFCC characteristic vectors are set;
By the vowel section MFCC characteristic vectors of each time period internal standard phonetic feature, the storage of consonant section MFCC characteristic vectors
Into database.
Step S1 also includes:
User voice data is temporally segmented by S11, is divided into n sections, with 20ms as a time slice, to each time
Section user voice data adds rectangular window, or Hamming window to process and obtain being segmented voice signal Xn, n is segments;
S12 is to being segmented voice signal XnShort Time Fourier Transform is carried out, frequency-region signal is transformed to, time-domain signal will be turned in short-term
Turn to frequency-region signal Yn, and pass through Qn=│ Yn│2Calculate its short-time energy and compose Qn;
Short-time energy is composed Q by the way of first in first out by S13nBandpass filter is moved to from vector space S to be filtered
Ripple;As in each frequency band, acting in human ear for component is superposition, therefore the energy in each filter band is entered
Row is superimposed, and at this moment k-th filter output power composes x'(k);
The output of each wave filter is taken the logarithm by S14, obtains the log power spectrum of frequency band;And carry out anti-discrete cosine
Conversion, obtains M MFCC coefficient, and general M takes 13~15;MFCC coefficients are:
S15 using the user speech MFCC features of each time period for obtaining as static nature, then by the static nature
Single order and second differnce are done, corresponding behavioral characteristics are obtained.
In the present embodiment, step S1 also includes:
Obtain the spectrum energy (f of each voice segments frequency rangek), upper frequency limit value k in the voice segments1, lower limit
k2, obtain the spectrum energy ratio PN in voice segmentsn;
Step S1 also includes:
If spectrum energy (f in voice segmentsk) >=first threshold, spectrum energy ratio PN in the voice segmentsn>=Second Threshold,
Then judge that this voice segments is vowel section;First threshold 0.1-0.5, Second Threshold take 60%-85%;
On the basis of the spectrum energy with vowel section, the spectrum energy before spectrum energy of the judgement with vowel section
Whether zero-crossing rate is more than the 3rd threshold value, if being more than the 3rd threshold value, concludes the spectrum energy for the consonant before vowel, the 3rd threshold value
Take 100;
On the basis of the spectrum energy with vowel section, the spectrum energy after spectrum energy of the judgement with vowel section
Whether zero-crossing rate is more than the 3rd threshold value, if being more than the 3rd threshold value, judges that the spectrum energy is the consonant after vowel;
If the zero-crossing rate of the spectrum energy after the spectrum energy with vowel section is more than the 3rd threshold value, and the spectrum energy
For the last frame of voice segments, then it is judged as nose tail consonant.
Each voice segments of user are carried out into decomposition and draw vowel section, consonant section and voice segments last frame whether
There is nose tail consonant, nose tail consonant is nasal sound.
The vowel section that standard reads aloud each voice segments in text is pre-set in computer, consonant section and in voice segments
Last frame whether have nose tail consonant, nose tail consonant is nasal sound.The vowel section of each voice segments that user is read aloud, consonant
The nose tail consonant of section and the last frame in voice segments, is compared with received pronunciation feature respectively.
In the present embodiment, step S2 also includes:
The vowel section MFCC characteristic vectors of user vocal feature in each time period, consonant section MFCC characteristic vectors are set;
Using DTW algorithms, obtain the minimum align to path of an error with, obtain the minimum align to path of an error and
Corresponding DTW distances;
Based on the align to path and corresponding DTW distances, by the vowel section MFCC of user vocal feature in same time period
The vowel section MFCC characteristic vectors of characteristic vector and received pronunciation feature carry out speech comparison and by user in same time period
The consonant section MFCC characteristic vectors of phonetic feature carry out speech comparison with the consonant section MFCC characteristic vectors of received pronunciation feature, obtain
The pronunciation difference gone out between user vocal feature and received pronunciation feature.
In the present embodiment, step S2 also includes:
The quasi- speech feature vector of vowel feast-brand mark for arranging each time period internal standard phonetic feature is P1=[p1(1),p1
(2),…,p1(R)], first-order difference vector is PΔ1=[pΔ1(1),pΔ1(2),…,pΔ1(R)] (mothers of the R for received pronunciation feature
Syllable verbal audio length), PΔ1(n)=| p1(n)-p1(n-1) |, n=1,2 ..., R, p1(0)=0;
The quasi- speech feature vector of consonant feast-brand mark for arranging each time period internal standard phonetic feature is P '1=[p '1(1), p '1
(2) ..., p '1(R)], first-order difference vector is P 'Δ1=[p 'Δ1(1), p 'Δ1(2) ..., p 'Δ1(R)] (R is received pronunciation feature
Voice length), P 'Δ1(n)=| p '1(n)-p’1(n-1) |, n=1,2 ..., R, p '1(0)=0;
Step S2 also includes:
The vowel section characteristic vector for arranging user vocal feature in each time period is P2=[p2(1),p2(2),…,p2
(T)], its first-order difference vector is PΔ2=[pΔ2(1),pΔ2(2),…,pΔ2(T)] (T is the length of voice to be evaluated), PΔ2(n)
=| p2(n)-p2(n-1) |, n=1,2 ..., T, p2(0)=0;
The consonant section characteristic vector for arranging user vocal feature in each time period is P '2=[p '2(1), p '2(2) ...,
p’2(T)], its first-order difference vector is P 'Δ2=[p 'Δ2(1), p 'Δ2(2) ..., p 'Δ2(T)] (T is the length of voice to be evaluated
Degree),
P’Δ2(n)=| p '2(n)-p’2(n-1) |, n=1,2 ..., T, p '2(0)=0;
Using DTW algorithms, obtain the minimum align to path of an error with, obtain the minimum align to path of an error,
The vowel section and consonant section carried out in each time period compares;
Comparison draws gap d of vowel sectionp, and the gap Δ d of variable quantityp, compare gap d for drawing consonant section 'p, with
And the gap Δ d ' of variable quantityp, the similarity of user vocal feature and received pronunciation feature is obtained, i.e.,:
dp=| p1(n)-p2(m)|
d’p=| p '1(n)-p’2(m)|
Δdp=| Δ p1(n)-Δp2(m)|
Δd’p=| Δ p '1(n)-Δp’2(m)|
Wherein, Δ pi(n)=| pi(n)-pi(n-1)|
Δp’i(n)=| p 'i(n)-p’i(n-1)|。
Step S3 also includes:Scoring s be:
S=ω 1 (ω 11s11+ ω 12s12+ ...+ω 1js1j) 2 (ω 21s21+ ω 22s22+ ...+ω of+ω
2js2j)+……+ωn(ωn1sn1+ωn2sn2+……+ωnjsnj)
Wherein, ω 1, ω 2, ω n represent the weight of each voice segments respectively;
J represents the total quantity that vowel section in each voice segments adds consonant section;
ω 11, ω 12 ... ω 1j represent the weight of syllable in first voice segments respectively;
S11, s12 ...+s1j, represents each syllable in first voice segments;
In first voice segments if first syllable is consonant section syllable supplemented by s11, if first syllable is
Then s11 is vowel section to vowel section;The syllable supplemented by s12 if first syllable is consonant section, if first syllable is vowel
Then s12 is vowel section to section;Each voice segments is by that analogy.
ω 21, ω 22 ... ω 2j represent the weight of syllable in second voice segments respectively;
S21, s22 ...+s2j, represents each syllable in second voice segments;
ω n1, ω n2 ... ω nj represent the weight of syllable in n-th voice segments respectively;
Sn1, sn2 ...+snj, represents each syllable in n-th voice segments.
Each weight parameter, is to draw via substantial amounts of experiment, it is also possible to distributed by the weight proportion of each voice segments
Know.Can also be according to each voice segments for the importance of text sets.Can also be obtained based on after many experiments by research staff
Go out optimum efficiency to be set.
Claims (9)
1. it is a kind of intelligence spoken language assessment method, it is characterised in that method includes:
S1:The spoken voice data message of user is obtained using the sound pick-up outfit of computer, the use in user voice data is extracted
Family phonetic feature;
S2:User vocal feature is alignd with received pronunciation feature, and by the vowel in user vocal feature, consonant is distinguished
The vowel of correspondence and received pronunciation feature, consonant are contrasted, and are contrasted data message;
S3:Correction data information is scored;
S4:Correction data information and appraisal result are stored into database.
2. it is according to claim 1 intelligence spoken language assessment method, it is characterised in that method includes:
Also include before step S1:Setting standard reads aloud text, and acquisition standard reads aloud the received pronunciation feature of text;
Received pronunciation feature is temporally segmented, is divided into n sections, with 20ms as a time slice;
Each time period received pronunciation feature is divided into into static nature and behavioral characteristics;
The spectrum energy of each time period received pronunciation feature is decomposed, each time period received pronunciation feature is decomposited
The spectrum energy distribution of vowel section and the spectrum energy distribution of consonant section;
The vowel section MFCC characteristic vectors of each time period internal standard phonetic feature, consonant section MFCC characteristic vectors are set;
The vowel section MFCC characteristic vectors of each time period internal standard phonetic feature, consonant section MFCC characteristic vectors are stored to number
According in storehouse.
3. it is according to claim 1 intelligence spoken language assessment method, it is characterised in that method includes:
Step S1 also includes:
User voice data is temporally segmented by S11, is divided into n sections, with 20ms as a time slice, each time period is used
Family speech data adds rectangular window, or Hamming window to process and obtain being segmented voice signal Xn, n is segments;
S12 is to being segmented voice signal XnShort Time Fourier Transform is carried out, frequency-region signal is transformed to, time-domain signal will be converted in short-term
Frequency-region signal Yn, and pass through Qn=│ Yn│2Calculate its short-time energy and compose Qn;
Short-time energy is composed Q by the way of first in first out by S13nBandpass filter is moved to from vector space S to be filtered;By
In each frequency band component act on human ear in be superposition, therefore the energy in each filter band is folded
Plus, at this moment k-th filter output power composes x'(k);
The output of each wave filter is taken the logarithm by S14, obtains the log power spectrum of frequency band;And carry out anti-discrete cosine change
Change, obtain M MFCC coefficient, general M takes 13~15;MFCC coefficients are:
The user speech MFCC features of each time period for obtaining are done one as static nature, then by the static nature by S15
Rank and second differnce, obtain corresponding behavioral characteristics.
4. it is according to claim 1 intelligence spoken language assessment method, it is characterised in that method includes:
Step S1 also includes:
Obtain the spectrum energy (f of each voice segments frequency rangek), upper frequency limit value k in the voice segments1, lower limit k2, obtain
Take the spectrum energy ratio PN in voice segmentsn;
5. it is according to claim 4 intelligence spoken language assessment method, it is characterised in that method includes:
Step S1 also includes:
If spectrum energy (f in voice segmentsk) >=first threshold, spectrum energy ratio PN in the voice segmentsn>=Second Threshold, then sentence
This voice segments of breaking are vowel section;First threshold 0.1-0.5, Second Threshold take 60%-85%;
On the basis of the spectrum energy with vowel section, the zero passage of the spectrum energy before the spectrum energy with vowel section is judged
Whether rate is more than the 3rd threshold value, if being more than the 3rd threshold value, concludes the spectrum energy for the consonant section before vowel, and the 3rd threshold value takes
100;
On the basis of the spectrum energy with vowel section, the zero passage of the spectrum energy after spectrum energy of the judgement with vowel section
Whether rate is more than the 3rd threshold value, if being more than the 3rd threshold value, judges that the spectrum energy is the consonant after vowel;
If the zero-crossing rate of the spectrum energy after the spectrum energy with vowel section is more than the 3rd threshold value, and the spectrum energy is language
The last frame of segment, then be judged as nose tail consonant.
6. it is according to claim 5 intelligence spoken language assessment method, it is characterised in that method includes:
Step S2 also includes:
The vowel section MFCC characteristic vectors of user vocal feature in each time period, consonant section MFCC characteristic vectors are set;
Using DTW algorithms, obtain the minimum align to path of an error to obtain the minimum align to path of an error and correspondence
DTW distances;
Based on the align to path and corresponding DTW distances, by the vowel section MFCC features of user vocal feature in same time period
Vector and the vowel section MFCC characteristic vectors of received pronunciation feature carry out speech comparison and by user speech in same time period
The consonant section MFCC characteristic vectors of feature carry out speech comparison with the consonant section MFCC characteristic vectors of received pronunciation feature, draw use
Pronunciation difference between family phonetic feature and received pronunciation feature.
7. it is according to claim 5 intelligence spoken language assessment method, it is characterised in that method includes:
Step S2 also includes:
The quasi- speech feature vector of vowel feast-brand mark for arranging each time period internal standard phonetic feature is P1=[p1(1),p1(2),…,
p1(R)], first-order difference vector is PΔ1=[pΔ1(1),pΔ1(2),…,pΔ1(R)] (female syllable verbal audios of the R for received pronunciation feature
Length), PΔ1(n)=| p1(n)-p1(n-1) |, n=1,2 ..., R, p1(0)=0;
The quasi- speech feature vector of consonant feast-brand mark for arranging each time period internal standard phonetic feature is P '1=[p '1(1), p '1
(2) ..., p '1(R)], first-order difference vector is P 'Δ1=[p 'Δ1(1), p 'Δ1(2) ..., p 'Δ1(R)] (R is received pronunciation feature
Voice length), P 'Δ1(n)=| p '1(n)-p’1(n-1)|, n=1,2 ..., R, p '1(0)=0.
8. it is according to claim 7 intelligence spoken language assessment method, it is characterised in that method includes:
Step S2 also includes:
The vowel section characteristic vector for arranging user vocal feature in each time period isP 2=[p2(1),p2(2),…,p2(T)], its
First-order difference vector is PΔ2=[pΔ2(1),pΔ2(2),…,pΔ2(T)] (T is the length of voice to be evaluated), PΔ2(n)=| p2
(n)-p2(n-1) |, n=1,2 ..., T, p2(0)=0;
The consonant section characteristic vector for arranging user vocal feature in each time period is P '2=[p '2(1), p '2(2) ..., p '2
(T)], its first-order difference vector is P 'Δ2=[p 'Δ2(1), p 'Δ2(2) ..., p 'Δ2(T)] (T is the length of voice to be evaluated),
P’Δ2(n)=| p '2(n)-p’2(n-1) |, n=1,2 ..., T, p '2(0)=0;
Using DTW algorithms, obtain the minimum align to path of an error to obtain the minimum align to path of an error, carry out
Vowel section and consonant section in each time period compares;
Comparison draws gap d of vowel sectionp, and the gap Δ d of variable quantityp, compare gap d for drawing consonant section 'p, Yi Jibian
The gap Δ d ' of change amountp, the similarity of user vocal feature and received pronunciation feature is obtained, i.e.,:
dp=| p1(n)-p2(m)|
d’p=| p '1(n)-p’2(m)|
Δdp=| Δ p1(n)-Δp2(m)
Δd’p=| Δ p '1(n)-Δp’2(m)|
Wherein, Δ pi(n)=| pi(n)-pi(n-1)|
Δp’i(n)=| p 'i(n)-pp’i(n-1)|。
9. it is according to claim 1 intelligence spoken language assessment method, it is characterised in that method includes:
Step S3 also includes:Scoring s be:
S=ω 1 (ω 11s11+ ω 12s12+ ...+ω 1js1j)+ω 2 (ω 21s21+ ω 22s22+ ...+ω 2js2j)
+……+ωn(ωn1sn1+ωn2sn2+……+ωnjsnj)
Wherein, ω 1, ω 2, ω n represent the weight of each voice segments respectively;
J represents the total quantity that vowel section in each voice segments adds consonant section;
ω 11, ω 12 ... ω 1j represent the weight of syllable in first voice segments respectively;
S11, s12 ...+s1j, represents each syllable in first voice segments;
ω 21, ω 22 ... ω 2j represent the weight of syllable in second voice segments respectively;
S21, s22 ...+s2j, represents each syllable in second voice segments;
ω n1, ω n2 ... ω nj represent the weight of syllable in n-th voice segments respectively;
Sn1, sn2 ...+snj, represents each syllable in n-th voice segments.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611181451.5A CN106531189A (en) | 2016-12-20 | 2016-12-20 | Intelligent spoken language evaluation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611181451.5A CN106531189A (en) | 2016-12-20 | 2016-12-20 | Intelligent spoken language evaluation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106531189A true CN106531189A (en) | 2017-03-22 |
Family
ID=58340401
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611181451.5A Pending CN106531189A (en) | 2016-12-20 | 2016-12-20 | Intelligent spoken language evaluation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106531189A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107767862A (en) * | 2017-11-06 | 2018-03-06 | 深圳市领芯者科技有限公司 | Voice data processing method, system and storage medium |
CN108470476A (en) * | 2018-05-15 | 2018-08-31 | 黄淮学院 | A kind of pronunciation of English matching correcting system |
CN109300474A (en) * | 2018-09-14 | 2019-02-01 | 北京网众共创科技有限公司 | A kind of audio signal processing method and device |
CN109300484A (en) * | 2018-09-13 | 2019-02-01 | 广州酷狗计算机科技有限公司 | Audio alignment schemes, device, computer equipment and readable storage medium storing program for executing |
CN109727608A (en) * | 2017-10-25 | 2019-05-07 | 香港中文大学深圳研究院 | A kind of ill voice appraisal procedure based on Chinese speech |
CN110825244A (en) * | 2019-11-06 | 2020-02-21 | 王一峰 | Modern Shanghai input method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20040073291A (en) * | 2004-01-08 | 2004-08-19 | 정보통신연구진흥원 | appraisal system of foreign language pronunciation and method thereof |
CN101727903A (en) * | 2008-10-29 | 2010-06-09 | 中国科学院自动化研究所 | Pronunciation quality assessment and error detection method based on fusion of multiple characteristics and multiple systems |
CN101996635A (en) * | 2010-08-30 | 2011-03-30 | 清华大学 | English pronunciation quality evaluation method based on accent highlight degree |
-
2016
- 2016-12-20 CN CN201611181451.5A patent/CN106531189A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20040073291A (en) * | 2004-01-08 | 2004-08-19 | 정보통신연구진흥원 | appraisal system of foreign language pronunciation and method thereof |
CN101727903A (en) * | 2008-10-29 | 2010-06-09 | 中国科学院自动化研究所 | Pronunciation quality assessment and error detection method based on fusion of multiple characteristics and multiple systems |
CN101996635A (en) * | 2010-08-30 | 2011-03-30 | 清华大学 | English pronunciation quality evaluation method based on accent highlight degree |
Non-Patent Citations (1)
Title |
---|
周晓兰: "计算机辅助普通话水平测试中的语音", 《农村经济与科技》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109727608A (en) * | 2017-10-25 | 2019-05-07 | 香港中文大学深圳研究院 | A kind of ill voice appraisal procedure based on Chinese speech |
CN107767862A (en) * | 2017-11-06 | 2018-03-06 | 深圳市领芯者科技有限公司 | Voice data processing method, system and storage medium |
CN108470476A (en) * | 2018-05-15 | 2018-08-31 | 黄淮学院 | A kind of pronunciation of English matching correcting system |
CN108470476B (en) * | 2018-05-15 | 2020-06-30 | 黄淮学院 | English pronunciation matching correction system |
CN109300484A (en) * | 2018-09-13 | 2019-02-01 | 广州酷狗计算机科技有限公司 | Audio alignment schemes, device, computer equipment and readable storage medium storing program for executing |
CN109300484B (en) * | 2018-09-13 | 2021-07-02 | 广州酷狗计算机科技有限公司 | Audio alignment method and device, computer equipment and readable storage medium |
CN109300474A (en) * | 2018-09-14 | 2019-02-01 | 北京网众共创科技有限公司 | A kind of audio signal processing method and device |
CN109300474B (en) * | 2018-09-14 | 2022-04-26 | 北京网众共创科技有限公司 | Voice signal processing method and device |
CN110825244A (en) * | 2019-11-06 | 2020-02-21 | 王一峰 | Modern Shanghai input method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106531189A (en) | Intelligent spoken language evaluation method | |
CN103065626B (en) | Automatic grading method and automatic grading equipment for read questions in test of spoken English | |
CN106782609A (en) | A kind of spoken comparison method | |
CN110457432A (en) | Interview methods of marking, device, equipment and storage medium | |
CN103617799A (en) | Method for detecting English statement pronunciation quality suitable for mobile device | |
US11282511B2 (en) | System and method for automatic speech analysis | |
CN106847260A (en) | A kind of Oral English Practice automatic scoring method of feature based fusion | |
CN103366735B (en) | The mapping method of speech data and device | |
CN102723077B (en) | Method and device for voice synthesis for Chinese teaching | |
CN107886968A (en) | Speech evaluating method and system | |
Hirson et al. | Glottal fry and voice disguise: a case study in forensic phonetics | |
Sabu et al. | Automatic Assessment of Children's L2 Reading for Accuracy and Fluency. | |
CN112767961B (en) | Accent correction method based on cloud computing | |
CN111210845B (en) | Pathological voice detection device based on improved autocorrelation characteristics | |
Dumpala et al. | Analysis of the Effect of Speech-Laugh on Speaker Recognition System. | |
Luo et al. | Investigation of the effects of automatic scoring technology on human raters' performances in L2 speech proficiency assessment | |
Li et al. | English sentence pronunciation evaluation using rhythm and intonation | |
Liu | Application of speech recognition technology in pronunciation correction of college oral English teaching | |
CN103021226B (en) | Voice evaluating method and device based on pronunciation rhythms | |
Duan et al. | An English pronunciation and intonation evaluation method based on the DTW algorithm | |
Jambi et al. | An Empirical Performance Analysis of the Speak Correct Computerized Interface | |
Yu | Evaluation of English Pronunciation Quality Based on Decision Tree Algorithm | |
Pakhomov et al. | Forced-alignment and edit-distance scoring for vocabulary tutoring applications | |
CN101546553A (en) | Objective examination method of flat-tongue sound and cacuminal in standard Chinese | |
Bolanos et al. | Automatic assessment of oral reading fluency for Spanish speaking ELs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170322 |