CN106782609A - A kind of spoken comparison method - Google Patents

A kind of spoken comparison method Download PDF

Info

Publication number
CN106782609A
CN106782609A CN201710003810.6A CN201710003810A CN106782609A CN 106782609 A CN106782609 A CN 106782609A CN 201710003810 A CN201710003810 A CN 201710003810A CN 106782609 A CN106782609 A CN 106782609A
Authority
CN
China
Prior art keywords
feature
section
spectrum energy
user
vowel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710003810.6A
Other languages
Chinese (zh)
Inventor
杨白宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of CN106782609A publication Critical patent/CN106782609A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/06Foreign languages

Abstract

The present invention provides a kind of spoken comparison method, sets received text, obtains the received pronunciation feature of received text, and received pronunciation feature is stored into database;Received text is read aloud by user, user voice data is obtained, the user vocal feature in user voice data is extracted;User vocal feature is alignd with received pronunciation feature, and user vocal feature is contrasted with received pronunciation feature;User vocal feature and comparing result are stored into database.Allow users to learn which word pronunciation the spoken language of oneself has inaccurate with the spoken language of standard.The convenience of study language is so brought to learner, the efficiency of foreign language learning is improved, increases user learning interest.

Description

A kind of spoken comparison method
Technical field
The present invention relates to language communication field, more particularly to a kind of spoken comparison method.
Background technology
Voice is the acoustics performance of language, is the means of Human communication's information.Allow one to more efficiently produce, pass Defeated, storage and acquisition language message, promote the development of society.
As China's reform and opening-up and external cooperation are deepened constantly, the activity such as commercial exchange, cultural exchanges, transnational tourist Increasingly frequently, increasing people needs to learn a foreign language.The problem that learner's foreign language studying is present is true cacoepy, is learned Habit person cannot learn under one text which the pronunciation of itself has with the pronunciation of standard, is so brought to learner Larger puzzlement, and the efficiency of foreign language is acquired in influence.
The content of the invention
In order to overcome above-mentioned deficiency of the prior art, it is an object of the present invention to provide a kind of spoken comparison method, side Method includes:
S1:Received text is set, the received pronunciation feature of received text is obtained, received pronunciation feature is stored to database In;
S2:Received text is read aloud by user, user voice data is obtained, the user speech in user voice data is extracted Feature;
S3:User vocal feature is alignd with received pronunciation feature, and user vocal feature and received pronunciation is special Levy and contrasted;
S4:User vocal feature and comparing result are stored into database.
Preferably, step S2 also includes:
Temporally be segmented user voice data by S21, is divided into n sections, is a time slice with 20ms, to each time Section user voice data adds rectangular window, or Hamming window treatment to obtain segmentation voice signal Xn, n is segments;
S22 is to segmentation voice signal XnShort Time Fourier Transform is carried out, frequency-region signal is transformed to, time-domain signal will be turned in short-term Turn to frequency-region signal Yn, and by Qn=│ Yn2Calculate its short-time energy spectrum Qn
Short-time energy is composed Q by S23 by the way of first in first outnBandpass filter is moved to from vector space S to be filtered Ripple;Because acting in human ear for component is superposition in each frequency band, therefore the energy in each filter band is entered Row is superimposed, at this moment k-th filter output power spectrum x'(k);
S24 takes the logarithm the output of each wave filter, obtains the log power spectrum of frequency band;And carry out anti-discrete cosine Conversion, obtains M MFCC coefficient, and general M takes 13~15;MFCC coefficients are:
The MFCC features that S25 will be obtained do single order and second differnce as static nature, then by the static nature, obtain Corresponding behavioral characteristics.
Preferably, step S2 also includes:
Obtain the spectrum energy (f of each voice segments frequency rangek), the upper frequency limit value k in the voice segments1, lower limit k2, obtain the spectrum energy ratio PN in voice segmentsn
Preferably, step S3 also includes:
If spectrum energy (f in voice segmentsk) >=first threshold, spectrum energy ratio PN in the voice segmentsn>=Second Threshold, Then judge that this voice segments is vowel section;First threshold 0.1-0.5, Second Threshold takes 60%-85%;
On the basis of the spectrum energy with vowel section, judgement has the spectrum energy before the spectrum energy of vowel section Whether zero-crossing rate is more than the 3rd threshold value, if being more than the 3rd threshold value, concludes that the spectrum energy is the consonant before vowel, the 3rd threshold value Take 100;
On the basis of the spectrum energy with vowel section, judgement has the spectrum energy after the spectrum energy of vowel section Whether zero-crossing rate is more than the 3rd threshold value, if being more than the 3rd threshold value, judges that the spectrum energy is the consonant after vowel;
If the zero-crossing rate of the spectrum energy after the spectrum energy with vowel section is more than the 3rd threshold value, and the spectrum energy It is the last frame of voice segments, then is judged as nose tail consonant.
Preferably, step S1 also includes:
Received pronunciation feature is temporally segmented, is divided into n sections, be a time slice with 20ms;
Each time period received pronunciation feature is divided into static nature and behavioral characteristics;
The spectrum energy of each time period received pronunciation feature is decomposed, each time period received pronunciation is decomposited special The spectrum energy distribution of the vowel section levied and the spectrum energy distribution of consonant section;
The vowel section MFCC characteristic vectors of each time period internal standard phonetic feature, consonant section MFCC characteristic vectors are set.
Preferably, step S3 also includes:
The vowel section MFCC characteristic vectors of user vocal feature in each time period, consonant section MFCC characteristic vectors are set;
Using DTW algorithms, obtain the minimum align to path of an error with, obtain the minimum align to path of an error and Corresponding DTW distances;
Based on the align to path and corresponding DTW distances, by the vowel section MFCC of user vocal feature in same time period The vowel section MFCC characteristic vectors of characteristic vector and received pronunciation feature carry out speech comparison and by user in same time period The consonant section MFCC characteristic vectors of phonetic feature carry out speech comparison with the consonant section MFCC characteristic vectors of received pronunciation feature, obtain The pronunciation difference gone out between user vocal feature and received pronunciation feature.
Preferably, step S1 also includes:
The vowel segment standard speech feature vector for setting each time period internal standard phonetic feature is P1=[p1(1),p1 (2),…,p1(R)], first-order difference vector is PΔ1=[pΔ1(1),pΔ1(2),…,pΔ1(R)] (R is the mother of received pronunciation feature Segment voice length), PΔ1(n)=| p1(n)-p1(n-1) |, n=1,2 ..., R, p1(0)=0;
The consonant segment standard speech feature vector for setting each time period internal standard phonetic feature is P '1=[p '1(1), p '1 (2) ..., p '1(R)], first-order difference vector is P 'Δ1=[p 'Δ1(1), p 'Δ1(2) ..., p 'Δ1(R)] (R is received pronunciation feature Voice length), P 'Δ1(n)=| p '1(n)-p’1(n-1) |, n=1,2 ..., R, p '1(0)=0;
Preferably, step S3 also includes:
The vowel section characteristic vector for setting user vocal feature in each time period is P2=[p2(1),p2(2),…,p2 (T)], its first-order difference vector is PΔ2=[pΔ2(1),pΔ2(2),…,pΔ2(T)] (T is the length of voice to be evaluated), PΔ2(n) =| p2(n)-p2(n-1) |, n=1,2 ..., T, p2(0)=0;
The consonant section characteristic vector for setting user vocal feature in each time period is P '2=[p '2(1), p '2(2) ..., p’2(T)], its first-order difference vector is P 'Δ2=[p 'Δ2(1), p 'Δ2(2) ..., p 'Δ2(T)] (T is the length of voice to be evaluated Degree), P 'Δ2(n)=| p '2(n)-p’2(n-1) |, n=1,2 ..., T, p '2(0)=0;
Using DTW algorithms, obtain the minimum align to path of an error with, obtain the minimum align to path of an error, Carry out the section of the vowel in each time period and consonant section compares;
Compare the gap d for drawing vowel sectionp, and variable quantity gap Δ dp, compare the gap d ' for drawing consonant sectionp, with And the gap Δ d ' of variable quantitypTo obtain the similarity of user vocal feature and received pronunciation feature, i.e.,:
dp=| p1(n)-p2(m)|
d’p=| p '1(n)-p’2(m)|
Δdp=| Δ p1(n)-Δp2(m)|
Δd’p=| Δ p '1(n)-Δp’2(m)|
Wherein, Δ pi(n)=| pi(n)-pi(n-1)|
Δp’i(n)=| p 'i(n)-p’i(n-1)|。
As can be seen from the above technical solutions, the present invention has advantages below:
Spoken comparison method causes user's a piece of text same with computer acquisition, carries out reading aloud contrast, enables users to Which word pronunciation enough learns the spoken language of oneself has inaccurate with the spoken language of standard, in addition it is also necessary to be improved in which word and Further study.The convenience of study language is so brought to learner, the efficiency of foreign language learning is improved, increases user learning Interest.
Brief description of the drawings
Fig. 1 is the flow chart of spoken comparison method.
Specific embodiment
To enable that goal of the invention of the invention, feature, advantage are more obvious and understandable, below will be with specific Embodiment and accompanying drawing, the technical scheme to present invention protection are clearly and completely described, it is clear that implementation disclosed below Example is only a part of embodiment of the invention, and not all embodiment.Based on the embodiment in this patent, the common skill in this area All other embodiment that art personnel are obtained under the premise of creative work is not made, belongs to the model of this patent protection Enclose.
The present invention provides a kind of spoken comparison method, as shown in figure 1, this method uses a received text, computer first to obtain The content of the received text is taken, and obtains the standard pronunciation of received text.Method involved in the present invention is hard based on computer Part coordinates corresponding program to realize.So user's a piece of text same with computer acquisition, carries out reading aloud contrast so that user The spoken language of oneself can be learnt has inaccurate with the spoken language of standard for which word pronunciation, in addition it is also necessary to be improved in which word And further study.The convenience of study language is so brought to learner, the efficiency of foreign language learning is improved, is increased user and is learned Practise interest.
Method includes:
S1:Received text is set, the received pronunciation feature of received text is obtained, received pronunciation feature is stored to database In;
S2:Received text is read aloud by user, user voice data is obtained, the user speech in user voice data is extracted Feature;
S3:User vocal feature is alignd with received pronunciation feature, and user vocal feature and received pronunciation is special Levy and contrasted;
S4:User vocal feature and comparing result are stored into database.
Step S2 also includes:
Temporally be segmented user voice data by S21, is divided into n sections, is a time slice with 20ms, to each time Section user voice data adds rectangular window, or Hamming window treatment to obtain segmentation voice signal Xn, n is segments;
S22 is to segmentation voice signal XnShort Time Fourier Transform is carried out, frequency-region signal is transformed to, time-domain signal will be turned in short-term Turn to frequency-region signal Yn, and by Qn=│ Yn2Calculate its short-time energy spectrum Qn
Short-time energy is composed Q by S23 by the way of first in first outnBandpass filter is moved to from vector space S to be filtered Ripple;Because acting in human ear for component is superposition in each frequency band, therefore the energy in each filter band is entered Row is superimposed, at this moment k-th filter output power spectrum x'(k);
S24 takes the logarithm the output of each wave filter, obtains the log power spectrum of frequency band;And carry out anti-discrete cosine Conversion, obtains M MFCC coefficient, and general M takes 13~15;MFCC coefficients are:
The MFCC features that S25 will be obtained do single order and second differnce as static nature, then by the static nature, obtain Corresponding behavioral characteristics.
In the present embodiment, step S2 also includes:
Obtain the spectrum energy (f of each voice segments frequency rangek), the upper frequency limit value k in the voice segments1, lower limit k2, obtain the spectrum energy ratio PN in voice segmentsn
Step S3 also includes:
If spectrum energy (f in voice segmentsk) >=first threshold, spectrum energy ratio PN in the voice segmentsn>=Second Threshold, Then judge that this voice segments is vowel section;First threshold 0.1-0.5, Second Threshold takes 60%-85%;
On the basis of the spectrum energy with vowel section, judgement has the spectrum energy before the spectrum energy of vowel section Whether zero-crossing rate is more than the 3rd threshold value, if being more than the 3rd threshold value, concludes that the spectrum energy is the consonant before vowel, the 3rd threshold value Take 100;
On the basis of the spectrum energy with vowel section, judgement has the spectrum energy after the spectrum energy of vowel section Whether zero-crossing rate is more than the 3rd threshold value, if being more than the 3rd threshold value, judges that the spectrum energy is the consonant after vowel;
If the zero-crossing rate of the spectrum energy after the spectrum energy with vowel section is more than the 3rd threshold value, and the spectrum energy It is the last frame of voice segments, then is judged as nose tail consonant.
By each voice segments of user carry out decomposition draw vowel section, consonant section and voice segments last frame whether There is nose tail consonant, nose tail consonant is nasal sound.
In computer pre-sets received text each voice segments vowel section, consonant section and in voice segments most Whether a later frame has nose tail consonant, and nose tail consonant is nasal sound.Each voice segments that user is read aloud vowel section, consonant section with And the nose tail consonant of the last frame in voice segments, it is compared with received pronunciation feature respectively.
Step S1 also includes:
Received pronunciation feature is temporally segmented, is divided into n sections, be a time slice with 20ms;
Each time period received pronunciation feature is divided into static nature and behavioral characteristics;
The spectrum energy of each time period received pronunciation feature is decomposed, each time period received pronunciation is decomposited special The spectrum energy distribution of the vowel section levied and the spectrum energy distribution of consonant section;
The vowel section MFCC characteristic vectors of each time period internal standard phonetic feature, consonant section MFCC characteristic vectors are set.
Step S3 also includes:
The vowel section MFCC characteristic vectors of user vocal feature in each time period, consonant section MFCC characteristic vectors are set;
Using DTW algorithms, obtain the minimum align to path of an error with, obtain the minimum align to path of an error and Corresponding DTW distances;
Based on the align to path and corresponding DTW distances, by the vowel section MFCC of user vocal feature in same time period The vowel section MFCC characteristic vectors of characteristic vector and received pronunciation feature carry out speech comparison and by user in same time period The consonant section MFCC characteristic vectors of phonetic feature carry out speech comparison with the consonant section MFCC characteristic vectors of received pronunciation feature, obtain The pronunciation difference gone out between user vocal feature and received pronunciation feature.
Step S1 also includes:
The vowel segment standard speech feature vector for setting each time period internal standard phonetic feature is P1=[p1(1),p1 (2),…,p1(R)], first-order difference vector is PΔ1=[pΔ1(1),pΔ1(2),…,pΔ1(R)] (R is the mother of received pronunciation feature Segment voice length), PΔ1(n)=| p1(n)-p1(n-1) |, n=1,2 ..., R, p1(0)=0;
The consonant segment standard speech feature vector for setting each time period internal standard phonetic feature is P '1=[p '1(1), p '1 (2) ..., p '1(R)], first-order difference vector is P 'Δ1=[p 'Δ1(1), p 'Δ1(2) ..., p 'Δ1(R)] (R is received pronunciation feature Voice length), P 'Δ1(n)=| p '1(n)-p’1(n-1) |, n=1,2 ..., R, p '1(0)=0;
Step S3 also includes:
The vowel section characteristic vector for setting user vocal feature in each time period is P2=[p2(1),p2(2),…,p2 (T)], its first-order difference vector is PΔ2=[pΔ2(1),pΔ2(2),…,pΔ2(T)] (T is the length of voice to be evaluated), PΔ2(n) =| p2(n)-p2(n-1) |, n=1,2 ..., T, p2(0)=0;
The consonant section characteristic vector for setting user vocal feature in each time period is P '2=[p '2(1), p '2(2) ..., p’2(T)], its first-order difference vector is P 'Δ2=[p 'Δ2(1), p 'Δ2(2) ..., p 'Δ2(T)] (T is the length of voice to be evaluated Degree), P 'Δ2(n)=| p '2(n)-p’2(n-1) |, n=1,2 ..., T, p '2(0)=0;
Using DTW algorithms, obtain the minimum align to path of an error with, obtain the minimum align to path of an error, Carry out the section of the vowel in each time period and consonant section compares;
Compare the gap d for drawing vowel sectionp, and variable quantity gap Δ dp, compare the gap d ' for drawing consonant sectionp, with And the gap Δ d ' of variable quantitypTo obtain the similarity of user vocal feature and received pronunciation feature, i.e.,:
dp=| p1(n)-p2(m)|
d’p=| p '1(n)-p’2(m)|
Δdp=| Δ p1(n)-Δp2(m)|
Δd’p=| Δ p '1(n)-Δp’2(m)|
Wherein, Δ pi(n)=| pi(n)-pi(n-1)|
Δp’i(n)=| p 'i(n)-p’i(n-1)|。

Claims (8)

1. a kind of spoken comparison method, it is characterised in that method includes:
S1:Received text is set, the received pronunciation feature of received text is obtained, received pronunciation feature is stored into database;
S2:Received text is read aloud by user, user voice data is obtained, the user speech extracted in user voice data is special Levy;
S3:User vocal feature is alignd with received pronunciation feature, and user vocal feature is entered with received pronunciation feature Row contrast;
S4:User vocal feature and comparing result are stored into database.
2. spoken comparison method according to claim 1, it is characterised in that method includes:
Step S2 also includes:
Temporally be segmented user voice data by S21, is divided into n sections, is a time slice with 20ms, and each time period is used Family speech data adds rectangular window, or Hamming window treatment to obtain segmentation voice signal Xn, n is segments;
S22 is to segmentation voice signal XnShort Time Fourier Transform is carried out, frequency-region signal is transformed to, time-domain signal will be converted into short-term Frequency-region signal Yn, and by Qn=│ Yn2Calculate its short-time energy spectrum Qn
Short-time energy is composed Q by S23 by the way of first in first outnBandpass filter is moved to from vector space S to be filtered;By In each frequency band component act on human ear in be superposition, therefore the energy in each filter band is folded Plus, at this moment k-th filter output power composes x'(k);
S24 takes the logarithm the output of each wave filter, obtains the log power spectrum of frequency band;And carry out anti-discrete cosine change Change, obtain M MFCC coefficient, general M takes 13~15;MFCC coefficients are:
C n = Σ k = 1 M log x ( k ) cos ( ( 2 k + 1 ) π M ) ;
The MFCC features that S25 will be obtained do single order and second differnce as static nature, then by the static nature, obtain corresponding Behavioral characteristics.
3. spoken comparison method according to claim 1, it is characterised in that method includes:
Step S2 also includes:
Obtain the spectrum energy (f of each voice segments frequency rangek), the upper frequency limit value k in the voice segments1, lower limit k2, obtain Take the spectrum energy ratio PN in voice segmentsn
PN n = Σ k 1 k 2 h ( f k ) Σ k h ( f k ) × 100 % .
4. spoken comparison method according to claim 1, it is characterised in that method includes:
Step S3 also includes:
If spectrum energy (f in voice segmentsk) >=first threshold, spectrum energy ratio PN in the voice segmentsn>=Second Threshold, then sentence This voice segments of breaking are vowel section;First threshold 0.1-0.5, Second Threshold takes 60%-85%;
On the basis of the spectrum energy with vowel section, the zero passage of the spectrum energy before the spectrum energy with vowel section is judged Whether rate is more than the 3rd threshold value, if being more than the 3rd threshold value, concludes that the spectrum energy is the consonant before vowel, and the 3rd threshold value takes 100;
On the basis of the spectrum energy with vowel section, the zero passage of the spectrum energy after the spectrum energy with vowel section is judged Whether rate is more than the 3rd threshold value, if being more than the 3rd threshold value, judges that the spectrum energy is the consonant after vowel;
If the zero-crossing rate of the spectrum energy after the spectrum energy with vowel section is more than the 3rd threshold value, and the spectrum energy is language The last frame of segment, then be judged as nose tail consonant.
5. spoken comparison method according to claim 4, it is characterised in that method includes:
Step S1 also includes:
Received pronunciation feature is temporally segmented, is divided into n sections, be a time slice with 20ms;
Each time period received pronunciation feature is divided into static nature and behavioral characteristics;
The spectrum energy of each time period received pronunciation feature is decomposed, each time period received pronunciation feature is decomposited The spectrum energy distribution of vowel section and the spectrum energy distribution of consonant section;
The vowel section MFCC characteristic vectors of each time period internal standard phonetic feature, consonant section MFCC characteristic vectors are set.
6. spoken comparison method according to claim 5, it is characterised in that method includes:
Step S3 also includes:
The vowel section MFCC characteristic vectors of user vocal feature in each time period, consonant section MFCC characteristic vectors are set;
Using DTW algorithms, obtain the minimum align to path of an error to obtain the minimum align to path of an error and correspondence DTW distances;
Based on the align to path and corresponding DTW distances, by the vowel section MFCC features of user vocal feature in same time period Vector and the vowel of received pronunciation feature section MFCC characteristic vectors carry out speech comparison and by user speech in same time period The consonant section MFCC characteristic vectors of feature carry out speech comparison with the consonant section MFCC characteristic vectors of received pronunciation feature, draw use Pronunciation difference between family phonetic feature and received pronunciation feature.
7. spoken comparison method according to claim 5, it is characterised in that method includes:
Step S1 also includes:
The vowel segment standard speech feature vector for setting each time period internal standard phonetic feature is P1=[p1(1),p1(2),…, p1(R)], first-order difference vector is PΔ1=[pΔ1(1),pΔ1(2),…,pΔ1(R)] (R is the vowel section voice of received pronunciation feature Length), PΔ1(n)=| p1(n)-p1(n-1) |, n=1,2 ..., R, p1(0)=0;
The consonant segment standard speech feature vector for setting each time period internal standard phonetic feature is P '1=[p '1(1), p '1 (2) ..., p '1(R)], first-order difference vector is P 'Δ1=[p 'Δ1(1), p 'Δ1(2) ..., p 'Δ1(R)] (R is received pronunciation feature Voice length), P 'Δ1(n)=| p '1(n)-p’1(n-1) |, n=1,2 ..., R, p '1(0)=0.
8. spoken comparison method according to claim 7, it is characterised in that method includes:
Step S3 also includes:
The vowel section characteristic vector for setting user vocal feature in each time period is P2=[p2(1),p2(2),…,p2(T)], its First-order difference vector is PΔ2=[pΔ2(1),pΔ2(2),…,pΔ2(T)] (T is the length of voice to be evaluated), PΔ2(n)=| p2 (n)-p2(n-1) |, n=1,2 ..., T, p2(0)=0;
The consonant section characteristic vector for setting user vocal feature in each time period is P '2=[p '2(1), p '2(2) ..., p '2 (T)], its first-order difference vector is P 'Δ2=[p 'Δ2(1), p 'Δ2(2) ..., p 'Δ2(T)] (T is the length of voice to be evaluated),
P’Δ2(n)=| p '2(n)-p’2(n-1) |, n=1,2 ..., T, p '2(0)=0;
Using DTW algorithms, obtain the minimum align to path of an error with, obtain the minimum align to path of an error, carry out Vowel section and consonant section in each time period compare;
Compare the gap d for drawing vowel sectionp, and variable quantity gap Δ dp, compare the gap d ' for drawing consonant sectionp, Yi Jibian The gap Δ d ' of change amountpTo obtain the similarity of user vocal feature and received pronunciation feature, i.e.,:
dp=| p1(n)-p2(m)|
d’p=| p '1(n)-p’2(m)|
Δdp=| Δ p1(n)-Δp2(m)|
Δd’p=| Δ p '1(n)-Δp’2(m)|
Wherein, Δ pi(n)=| pi(n)-pi(n-1)|
Δp’i(n)=| p 'i(n)-p’i(n-1)|。
CN201710003810.6A 2016-12-20 2017-01-03 A kind of spoken comparison method Pending CN106782609A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611181163 2016-12-20
CN201611181163X 2016-12-20

Publications (1)

Publication Number Publication Date
CN106782609A true CN106782609A (en) 2017-05-31

Family

ID=58950067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710003810.6A Pending CN106782609A (en) 2016-12-20 2017-01-03 A kind of spoken comparison method

Country Status (1)

Country Link
CN (1) CN106782609A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107767862A (en) * 2017-11-06 2018-03-06 深圳市领芯者科技有限公司 Voice data processing method, system and storage medium
CN108470476A (en) * 2018-05-15 2018-08-31 黄淮学院 A kind of pronunciation of English matching correcting system
CN109192223A (en) * 2018-09-20 2019-01-11 广州酷狗计算机科技有限公司 The method and apparatus of audio alignment
CN109326162A (en) * 2018-11-16 2019-02-12 深圳信息职业技术学院 A kind of spoken language exercise method for automatically evaluating and device
CN111241308A (en) * 2020-02-27 2020-06-05 曾兴 Self-help learning method and system for spoken language
CN113436487A (en) * 2021-07-08 2021-09-24 上海松鼠课堂人工智能科技有限公司 Chinese reciting skill training method and system based on virtual reality scene
WO2022169417A1 (en) * 2021-02-07 2022-08-11 脸萌有限公司 Speech similarity determination method, device and program product
CN111241308B (en) * 2020-02-27 2024-04-26 曾兴 Self-help learning method and system for spoken language

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101740024A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for automatic evaluation based on generalized fluent spoken language fluency
CN101782941A (en) * 2009-01-16 2010-07-21 国际商业机器公司 Method and system for evaluating spoken language skill
CN101872616A (en) * 2009-04-22 2010-10-27 索尼株式会社 Endpoint detection method and system using same
CN102568475A (en) * 2011-12-31 2012-07-11 安徽科大讯飞信息科技股份有限公司 System and method for assessing proficiency in Putonghua
CN104732977A (en) * 2015-03-09 2015-06-24 广东外语外贸大学 On-line spoken language pronunciation quality evaluation method and system
CN105609114A (en) * 2014-11-25 2016-05-25 科大讯飞股份有限公司 Method and device for detecting pronunciation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101740024A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for automatic evaluation based on generalized fluent spoken language fluency
CN101782941A (en) * 2009-01-16 2010-07-21 国际商业机器公司 Method and system for evaluating spoken language skill
CN101872616A (en) * 2009-04-22 2010-10-27 索尼株式会社 Endpoint detection method and system using same
CN102568475A (en) * 2011-12-31 2012-07-11 安徽科大讯飞信息科技股份有限公司 System and method for assessing proficiency in Putonghua
CN105609114A (en) * 2014-11-25 2016-05-25 科大讯飞股份有限公司 Method and device for detecting pronunciation
CN104732977A (en) * 2015-03-09 2015-06-24 广东外语外贸大学 On-line spoken language pronunciation quality evaluation method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
庄毅 著: "《面向互联网的多媒体大数据信息高效查询处理》", 30 June 2015, 浙江大学出版社 *
王炳锡 等著: "《实用语音识别基础》", 31 January 2005, 国防工业出版社 *
韩纪庆 等编著: "《音频信息处理技术》", 31 January 2007, 清华大学出版社 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107767862A (en) * 2017-11-06 2018-03-06 深圳市领芯者科技有限公司 Voice data processing method, system and storage medium
CN108470476A (en) * 2018-05-15 2018-08-31 黄淮学院 A kind of pronunciation of English matching correcting system
CN108470476B (en) * 2018-05-15 2020-06-30 黄淮学院 English pronunciation matching correction system
CN109192223A (en) * 2018-09-20 2019-01-11 广州酷狗计算机科技有限公司 The method and apparatus of audio alignment
CN109326162A (en) * 2018-11-16 2019-02-12 深圳信息职业技术学院 A kind of spoken language exercise method for automatically evaluating and device
CN111241308A (en) * 2020-02-27 2020-06-05 曾兴 Self-help learning method and system for spoken language
CN111241308B (en) * 2020-02-27 2024-04-26 曾兴 Self-help learning method and system for spoken language
WO2022169417A1 (en) * 2021-02-07 2022-08-11 脸萌有限公司 Speech similarity determination method, device and program product
CN113436487A (en) * 2021-07-08 2021-09-24 上海松鼠课堂人工智能科技有限公司 Chinese reciting skill training method and system based on virtual reality scene

Similar Documents

Publication Publication Date Title
CN106782609A (en) A kind of spoken comparison method
CN105529028B (en) Speech analysis method and apparatus
US20190266998A1 (en) Speech recognition method and device, computer device and storage medium
CN103345923B (en) A kind of phrase sound method for distinguishing speek person based on rarefaction representation
CN106486131A (en) A kind of method and device of speech de-noising
CN106531189A (en) Intelligent spoken language evaluation method
CN110457432A (en) Interview methods of marking, device, equipment and storage medium
CN101887725A (en) Phoneme confusion network-based phoneme posterior probability calculation method
WO2020034628A1 (en) Accent identification method and device, computer device, and storage medium
CN104078039A (en) Voice recognition system of domestic service robot on basis of hidden Markov model
Marković et al. Whispered speech database: Design, processing and application
CN103456302A (en) Emotion speaker recognition method based on emotion GMM model weight synthesis
CN112509568A (en) Voice awakening method and device
Yang et al. Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection
Dua et al. Discriminative training using heterogeneous feature vector for Hindi automatic speech recognition system
Yadav et al. Non-Uniform Spectral Smoothing for Robust Children's Speech Recognition.
Hou et al. Intelligent model for speech recognition based on svm: a case study on English language
Shah et al. Speech emotion recognition based on SVM using MATLAB
Akila et al. Isolated Tamil word speech recognition system using HTK
CN112767961B (en) Accent correction method based on cloud computing
Shufang Design of an automatic english pronunciation error correction system based on radio magnetic pronunciation recording devices
Khetri et al. Automatic speech recognition for marathi isolated words
Ghonem et al. Classification of stuttering events using i-vector
Kathania et al. Spectral modification for recognition of children’s speech under mismatched conditions
Ma et al. Statistical formant descriptors with linear predictive coefficients for accent classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170531

RJ01 Rejection of invention patent application after publication