CN106782609A - A kind of spoken comparison method - Google Patents
A kind of spoken comparison method Download PDFInfo
- Publication number
- CN106782609A CN106782609A CN201710003810.6A CN201710003810A CN106782609A CN 106782609 A CN106782609 A CN 106782609A CN 201710003810 A CN201710003810 A CN 201710003810A CN 106782609 A CN106782609 A CN 106782609A
- Authority
- CN
- China
- Prior art keywords
- feature
- section
- spectrum energy
- user
- vowel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/06—Foreign languages
Abstract
The present invention provides a kind of spoken comparison method, sets received text, obtains the received pronunciation feature of received text, and received pronunciation feature is stored into database;Received text is read aloud by user, user voice data is obtained, the user vocal feature in user voice data is extracted;User vocal feature is alignd with received pronunciation feature, and user vocal feature is contrasted with received pronunciation feature;User vocal feature and comparing result are stored into database.Allow users to learn which word pronunciation the spoken language of oneself has inaccurate with the spoken language of standard.The convenience of study language is so brought to learner, the efficiency of foreign language learning is improved, increases user learning interest.
Description
Technical field
The present invention relates to language communication field, more particularly to a kind of spoken comparison method.
Background technology
Voice is the acoustics performance of language, is the means of Human communication's information.Allow one to more efficiently produce, pass
Defeated, storage and acquisition language message, promote the development of society.
As China's reform and opening-up and external cooperation are deepened constantly, the activity such as commercial exchange, cultural exchanges, transnational tourist
Increasingly frequently, increasing people needs to learn a foreign language.The problem that learner's foreign language studying is present is true cacoepy, is learned
Habit person cannot learn under one text which the pronunciation of itself has with the pronunciation of standard, is so brought to learner
Larger puzzlement, and the efficiency of foreign language is acquired in influence.
The content of the invention
In order to overcome above-mentioned deficiency of the prior art, it is an object of the present invention to provide a kind of spoken comparison method, side
Method includes:
S1:Received text is set, the received pronunciation feature of received text is obtained, received pronunciation feature is stored to database
In;
S2:Received text is read aloud by user, user voice data is obtained, the user speech in user voice data is extracted
Feature;
S3:User vocal feature is alignd with received pronunciation feature, and user vocal feature and received pronunciation is special
Levy and contrasted;
S4:User vocal feature and comparing result are stored into database.
Preferably, step S2 also includes:
Temporally be segmented user voice data by S21, is divided into n sections, is a time slice with 20ms, to each time
Section user voice data adds rectangular window, or Hamming window treatment to obtain segmentation voice signal Xn, n is segments;
S22 is to segmentation voice signal XnShort Time Fourier Transform is carried out, frequency-region signal is transformed to, time-domain signal will be turned in short-term
Turn to frequency-region signal Yn, and by Qn=│ Yn│2Calculate its short-time energy spectrum Qn;
Short-time energy is composed Q by S23 by the way of first in first outnBandpass filter is moved to from vector space S to be filtered
Ripple;Because acting in human ear for component is superposition in each frequency band, therefore the energy in each filter band is entered
Row is superimposed, at this moment k-th filter output power spectrum x'(k);
S24 takes the logarithm the output of each wave filter, obtains the log power spectrum of frequency band;And carry out anti-discrete cosine
Conversion, obtains M MFCC coefficient, and general M takes 13~15;MFCC coefficients are:
The MFCC features that S25 will be obtained do single order and second differnce as static nature, then by the static nature, obtain
Corresponding behavioral characteristics.
Preferably, step S2 also includes:
Obtain the spectrum energy (f of each voice segments frequency rangek), the upper frequency limit value k in the voice segments1, lower limit
k2, obtain the spectrum energy ratio PN in voice segmentsn;
Preferably, step S3 also includes:
If spectrum energy (f in voice segmentsk) >=first threshold, spectrum energy ratio PN in the voice segmentsn>=Second Threshold,
Then judge that this voice segments is vowel section;First threshold 0.1-0.5, Second Threshold takes 60%-85%;
On the basis of the spectrum energy with vowel section, judgement has the spectrum energy before the spectrum energy of vowel section
Whether zero-crossing rate is more than the 3rd threshold value, if being more than the 3rd threshold value, concludes that the spectrum energy is the consonant before vowel, the 3rd threshold value
Take 100;
On the basis of the spectrum energy with vowel section, judgement has the spectrum energy after the spectrum energy of vowel section
Whether zero-crossing rate is more than the 3rd threshold value, if being more than the 3rd threshold value, judges that the spectrum energy is the consonant after vowel;
If the zero-crossing rate of the spectrum energy after the spectrum energy with vowel section is more than the 3rd threshold value, and the spectrum energy
It is the last frame of voice segments, then is judged as nose tail consonant.
Preferably, step S1 also includes:
Received pronunciation feature is temporally segmented, is divided into n sections, be a time slice with 20ms;
Each time period received pronunciation feature is divided into static nature and behavioral characteristics;
The spectrum energy of each time period received pronunciation feature is decomposed, each time period received pronunciation is decomposited special
The spectrum energy distribution of the vowel section levied and the spectrum energy distribution of consonant section;
The vowel section MFCC characteristic vectors of each time period internal standard phonetic feature, consonant section MFCC characteristic vectors are set.
Preferably, step S3 also includes:
The vowel section MFCC characteristic vectors of user vocal feature in each time period, consonant section MFCC characteristic vectors are set;
Using DTW algorithms, obtain the minimum align to path of an error with, obtain the minimum align to path of an error and
Corresponding DTW distances;
Based on the align to path and corresponding DTW distances, by the vowel section MFCC of user vocal feature in same time period
The vowel section MFCC characteristic vectors of characteristic vector and received pronunciation feature carry out speech comparison and by user in same time period
The consonant section MFCC characteristic vectors of phonetic feature carry out speech comparison with the consonant section MFCC characteristic vectors of received pronunciation feature, obtain
The pronunciation difference gone out between user vocal feature and received pronunciation feature.
Preferably, step S1 also includes:
The vowel segment standard speech feature vector for setting each time period internal standard phonetic feature is P1=[p1(1),p1
(2),…,p1(R)], first-order difference vector is PΔ1=[pΔ1(1),pΔ1(2),…,pΔ1(R)] (R is the mother of received pronunciation feature
Segment voice length), PΔ1(n)=| p1(n)-p1(n-1) |, n=1,2 ..., R, p1(0)=0;
The consonant segment standard speech feature vector for setting each time period internal standard phonetic feature is P '1=[p '1(1), p '1
(2) ..., p '1(R)], first-order difference vector is P 'Δ1=[p 'Δ1(1), p 'Δ1(2) ..., p 'Δ1(R)] (R is received pronunciation feature
Voice length), P 'Δ1(n)=| p '1(n)-p’1(n-1) |, n=1,2 ..., R, p '1(0)=0;
Preferably, step S3 also includes:
The vowel section characteristic vector for setting user vocal feature in each time period is P2=[p2(1),p2(2),…,p2
(T)], its first-order difference vector is PΔ2=[pΔ2(1),pΔ2(2),…,pΔ2(T)] (T is the length of voice to be evaluated), PΔ2(n)
=| p2(n)-p2(n-1) |, n=1,2 ..., T, p2(0)=0;
The consonant section characteristic vector for setting user vocal feature in each time period is P '2=[p '2(1), p '2(2) ...,
p’2(T)], its first-order difference vector is P 'Δ2=[p 'Δ2(1), p 'Δ2(2) ..., p 'Δ2(T)] (T is the length of voice to be evaluated
Degree), P 'Δ2(n)=| p '2(n)-p’2(n-1) |, n=1,2 ..., T, p '2(0)=0;
Using DTW algorithms, obtain the minimum align to path of an error with, obtain the minimum align to path of an error,
Carry out the section of the vowel in each time period and consonant section compares;
Compare the gap d for drawing vowel sectionp, and variable quantity gap Δ dp, compare the gap d ' for drawing consonant sectionp, with
And the gap Δ d ' of variable quantitypTo obtain the similarity of user vocal feature and received pronunciation feature, i.e.,:
dp=| p1(n)-p2(m)|
d’p=| p '1(n)-p’2(m)|
Δdp=| Δ p1(n)-Δp2(m)|
Δd’p=| Δ p '1(n)-Δp’2(m)|
Wherein, Δ pi(n)=| pi(n)-pi(n-1)|
Δp’i(n)=| p 'i(n)-p’i(n-1)|。
As can be seen from the above technical solutions, the present invention has advantages below:
Spoken comparison method causes user's a piece of text same with computer acquisition, carries out reading aloud contrast, enables users to
Which word pronunciation enough learns the spoken language of oneself has inaccurate with the spoken language of standard, in addition it is also necessary to be improved in which word and
Further study.The convenience of study language is so brought to learner, the efficiency of foreign language learning is improved, increases user learning
Interest.
Brief description of the drawings
Fig. 1 is the flow chart of spoken comparison method.
Specific embodiment
To enable that goal of the invention of the invention, feature, advantage are more obvious and understandable, below will be with specific
Embodiment and accompanying drawing, the technical scheme to present invention protection are clearly and completely described, it is clear that implementation disclosed below
Example is only a part of embodiment of the invention, and not all embodiment.Based on the embodiment in this patent, the common skill in this area
All other embodiment that art personnel are obtained under the premise of creative work is not made, belongs to the model of this patent protection
Enclose.
The present invention provides a kind of spoken comparison method, as shown in figure 1, this method uses a received text, computer first to obtain
The content of the received text is taken, and obtains the standard pronunciation of received text.Method involved in the present invention is hard based on computer
Part coordinates corresponding program to realize.So user's a piece of text same with computer acquisition, carries out reading aloud contrast so that user
The spoken language of oneself can be learnt has inaccurate with the spoken language of standard for which word pronunciation, in addition it is also necessary to be improved in which word
And further study.The convenience of study language is so brought to learner, the efficiency of foreign language learning is improved, is increased user and is learned
Practise interest.
Method includes:
S1:Received text is set, the received pronunciation feature of received text is obtained, received pronunciation feature is stored to database
In;
S2:Received text is read aloud by user, user voice data is obtained, the user speech in user voice data is extracted
Feature;
S3:User vocal feature is alignd with received pronunciation feature, and user vocal feature and received pronunciation is special
Levy and contrasted;
S4:User vocal feature and comparing result are stored into database.
Step S2 also includes:
Temporally be segmented user voice data by S21, is divided into n sections, is a time slice with 20ms, to each time
Section user voice data adds rectangular window, or Hamming window treatment to obtain segmentation voice signal Xn, n is segments;
S22 is to segmentation voice signal XnShort Time Fourier Transform is carried out, frequency-region signal is transformed to, time-domain signal will be turned in short-term
Turn to frequency-region signal Yn, and by Qn=│ Yn│2Calculate its short-time energy spectrum Qn;
Short-time energy is composed Q by S23 by the way of first in first outnBandpass filter is moved to from vector space S to be filtered
Ripple;Because acting in human ear for component is superposition in each frequency band, therefore the energy in each filter band is entered
Row is superimposed, at this moment k-th filter output power spectrum x'(k);
S24 takes the logarithm the output of each wave filter, obtains the log power spectrum of frequency band;And carry out anti-discrete cosine
Conversion, obtains M MFCC coefficient, and general M takes 13~15;MFCC coefficients are:
The MFCC features that S25 will be obtained do single order and second differnce as static nature, then by the static nature, obtain
Corresponding behavioral characteristics.
In the present embodiment, step S2 also includes:
Obtain the spectrum energy (f of each voice segments frequency rangek), the upper frequency limit value k in the voice segments1, lower limit
k2, obtain the spectrum energy ratio PN in voice segmentsn;
Step S3 also includes:
If spectrum energy (f in voice segmentsk) >=first threshold, spectrum energy ratio PN in the voice segmentsn>=Second Threshold,
Then judge that this voice segments is vowel section;First threshold 0.1-0.5, Second Threshold takes 60%-85%;
On the basis of the spectrum energy with vowel section, judgement has the spectrum energy before the spectrum energy of vowel section
Whether zero-crossing rate is more than the 3rd threshold value, if being more than the 3rd threshold value, concludes that the spectrum energy is the consonant before vowel, the 3rd threshold value
Take 100;
On the basis of the spectrum energy with vowel section, judgement has the spectrum energy after the spectrum energy of vowel section
Whether zero-crossing rate is more than the 3rd threshold value, if being more than the 3rd threshold value, judges that the spectrum energy is the consonant after vowel;
If the zero-crossing rate of the spectrum energy after the spectrum energy with vowel section is more than the 3rd threshold value, and the spectrum energy
It is the last frame of voice segments, then is judged as nose tail consonant.
By each voice segments of user carry out decomposition draw vowel section, consonant section and voice segments last frame whether
There is nose tail consonant, nose tail consonant is nasal sound.
In computer pre-sets received text each voice segments vowel section, consonant section and in voice segments most
Whether a later frame has nose tail consonant, and nose tail consonant is nasal sound.Each voice segments that user is read aloud vowel section, consonant section with
And the nose tail consonant of the last frame in voice segments, it is compared with received pronunciation feature respectively.
Step S1 also includes:
Received pronunciation feature is temporally segmented, is divided into n sections, be a time slice with 20ms;
Each time period received pronunciation feature is divided into static nature and behavioral characteristics;
The spectrum energy of each time period received pronunciation feature is decomposed, each time period received pronunciation is decomposited special
The spectrum energy distribution of the vowel section levied and the spectrum energy distribution of consonant section;
The vowel section MFCC characteristic vectors of each time period internal standard phonetic feature, consonant section MFCC characteristic vectors are set.
Step S3 also includes:
The vowel section MFCC characteristic vectors of user vocal feature in each time period, consonant section MFCC characteristic vectors are set;
Using DTW algorithms, obtain the minimum align to path of an error with, obtain the minimum align to path of an error and
Corresponding DTW distances;
Based on the align to path and corresponding DTW distances, by the vowel section MFCC of user vocal feature in same time period
The vowel section MFCC characteristic vectors of characteristic vector and received pronunciation feature carry out speech comparison and by user in same time period
The consonant section MFCC characteristic vectors of phonetic feature carry out speech comparison with the consonant section MFCC characteristic vectors of received pronunciation feature, obtain
The pronunciation difference gone out between user vocal feature and received pronunciation feature.
Step S1 also includes:
The vowel segment standard speech feature vector for setting each time period internal standard phonetic feature is P1=[p1(1),p1
(2),…,p1(R)], first-order difference vector is PΔ1=[pΔ1(1),pΔ1(2),…,pΔ1(R)] (R is the mother of received pronunciation feature
Segment voice length), PΔ1(n)=| p1(n)-p1(n-1) |, n=1,2 ..., R, p1(0)=0;
The consonant segment standard speech feature vector for setting each time period internal standard phonetic feature is P '1=[p '1(1), p '1
(2) ..., p '1(R)], first-order difference vector is P 'Δ1=[p 'Δ1(1), p 'Δ1(2) ..., p 'Δ1(R)] (R is received pronunciation feature
Voice length), P 'Δ1(n)=| p '1(n)-p’1(n-1) |, n=1,2 ..., R, p '1(0)=0;
Step S3 also includes:
The vowel section characteristic vector for setting user vocal feature in each time period is P2=[p2(1),p2(2),…,p2
(T)], its first-order difference vector is PΔ2=[pΔ2(1),pΔ2(2),…,pΔ2(T)] (T is the length of voice to be evaluated), PΔ2(n)
=| p2(n)-p2(n-1) |, n=1,2 ..., T, p2(0)=0;
The consonant section characteristic vector for setting user vocal feature in each time period is P '2=[p '2(1), p '2(2) ...,
p’2(T)], its first-order difference vector is P 'Δ2=[p 'Δ2(1), p 'Δ2(2) ..., p 'Δ2(T)] (T is the length of voice to be evaluated
Degree), P 'Δ2(n)=| p '2(n)-p’2(n-1) |, n=1,2 ..., T, p '2(0)=0;
Using DTW algorithms, obtain the minimum align to path of an error with, obtain the minimum align to path of an error,
Carry out the section of the vowel in each time period and consonant section compares;
Compare the gap d for drawing vowel sectionp, and variable quantity gap Δ dp, compare the gap d ' for drawing consonant sectionp, with
And the gap Δ d ' of variable quantitypTo obtain the similarity of user vocal feature and received pronunciation feature, i.e.,:
dp=| p1(n)-p2(m)|
d’p=| p '1(n)-p’2(m)|
Δdp=| Δ p1(n)-Δp2(m)|
Δd’p=| Δ p '1(n)-Δp’2(m)|
Wherein, Δ pi(n)=| pi(n)-pi(n-1)|
Δp’i(n)=| p 'i(n)-p’i(n-1)|。
Claims (8)
1. a kind of spoken comparison method, it is characterised in that method includes:
S1:Received text is set, the received pronunciation feature of received text is obtained, received pronunciation feature is stored into database;
S2:Received text is read aloud by user, user voice data is obtained, the user speech extracted in user voice data is special
Levy;
S3:User vocal feature is alignd with received pronunciation feature, and user vocal feature is entered with received pronunciation feature
Row contrast;
S4:User vocal feature and comparing result are stored into database.
2. spoken comparison method according to claim 1, it is characterised in that method includes:
Step S2 also includes:
Temporally be segmented user voice data by S21, is divided into n sections, is a time slice with 20ms, and each time period is used
Family speech data adds rectangular window, or Hamming window treatment to obtain segmentation voice signal Xn, n is segments;
S22 is to segmentation voice signal XnShort Time Fourier Transform is carried out, frequency-region signal is transformed to, time-domain signal will be converted into short-term
Frequency-region signal Yn, and by Qn=│ Yn│2Calculate its short-time energy spectrum Qn;
Short-time energy is composed Q by S23 by the way of first in first outnBandpass filter is moved to from vector space S to be filtered;By
In each frequency band component act on human ear in be superposition, therefore the energy in each filter band is folded
Plus, at this moment k-th filter output power composes x'(k);
S24 takes the logarithm the output of each wave filter, obtains the log power spectrum of frequency band;And carry out anti-discrete cosine change
Change, obtain M MFCC coefficient, general M takes 13~15;MFCC coefficients are:
The MFCC features that S25 will be obtained do single order and second differnce as static nature, then by the static nature, obtain corresponding
Behavioral characteristics.
3. spoken comparison method according to claim 1, it is characterised in that method includes:
Step S2 also includes:
Obtain the spectrum energy (f of each voice segments frequency rangek), the upper frequency limit value k in the voice segments1, lower limit k2, obtain
Take the spectrum energy ratio PN in voice segmentsn;
4. spoken comparison method according to claim 1, it is characterised in that method includes:
Step S3 also includes:
If spectrum energy (f in voice segmentsk) >=first threshold, spectrum energy ratio PN in the voice segmentsn>=Second Threshold, then sentence
This voice segments of breaking are vowel section;First threshold 0.1-0.5, Second Threshold takes 60%-85%;
On the basis of the spectrum energy with vowel section, the zero passage of the spectrum energy before the spectrum energy with vowel section is judged
Whether rate is more than the 3rd threshold value, if being more than the 3rd threshold value, concludes that the spectrum energy is the consonant before vowel, and the 3rd threshold value takes
100;
On the basis of the spectrum energy with vowel section, the zero passage of the spectrum energy after the spectrum energy with vowel section is judged
Whether rate is more than the 3rd threshold value, if being more than the 3rd threshold value, judges that the spectrum energy is the consonant after vowel;
If the zero-crossing rate of the spectrum energy after the spectrum energy with vowel section is more than the 3rd threshold value, and the spectrum energy is language
The last frame of segment, then be judged as nose tail consonant.
5. spoken comparison method according to claim 4, it is characterised in that method includes:
Step S1 also includes:
Received pronunciation feature is temporally segmented, is divided into n sections, be a time slice with 20ms;
Each time period received pronunciation feature is divided into static nature and behavioral characteristics;
The spectrum energy of each time period received pronunciation feature is decomposed, each time period received pronunciation feature is decomposited
The spectrum energy distribution of vowel section and the spectrum energy distribution of consonant section;
The vowel section MFCC characteristic vectors of each time period internal standard phonetic feature, consonant section MFCC characteristic vectors are set.
6. spoken comparison method according to claim 5, it is characterised in that method includes:
Step S3 also includes:
The vowel section MFCC characteristic vectors of user vocal feature in each time period, consonant section MFCC characteristic vectors are set;
Using DTW algorithms, obtain the minimum align to path of an error to obtain the minimum align to path of an error and correspondence
DTW distances;
Based on the align to path and corresponding DTW distances, by the vowel section MFCC features of user vocal feature in same time period
Vector and the vowel of received pronunciation feature section MFCC characteristic vectors carry out speech comparison and by user speech in same time period
The consonant section MFCC characteristic vectors of feature carry out speech comparison with the consonant section MFCC characteristic vectors of received pronunciation feature, draw use
Pronunciation difference between family phonetic feature and received pronunciation feature.
7. spoken comparison method according to claim 5, it is characterised in that method includes:
Step S1 also includes:
The vowel segment standard speech feature vector for setting each time period internal standard phonetic feature is P1=[p1(1),p1(2),…,
p1(R)], first-order difference vector is PΔ1=[pΔ1(1),pΔ1(2),…,pΔ1(R)] (R is the vowel section voice of received pronunciation feature
Length), PΔ1(n)=| p1(n)-p1(n-1) |, n=1,2 ..., R, p1(0)=0;
The consonant segment standard speech feature vector for setting each time period internal standard phonetic feature is P '1=[p '1(1), p '1
(2) ..., p '1(R)], first-order difference vector is P 'Δ1=[p 'Δ1(1), p 'Δ1(2) ..., p 'Δ1(R)] (R is received pronunciation feature
Voice length), P 'Δ1(n)=| p '1(n)-p’1(n-1) |, n=1,2 ..., R, p '1(0)=0.
8. spoken comparison method according to claim 7, it is characterised in that method includes:
Step S3 also includes:
The vowel section characteristic vector for setting user vocal feature in each time period is P2=[p2(1),p2(2),…,p2(T)], its
First-order difference vector is PΔ2=[pΔ2(1),pΔ2(2),…,pΔ2(T)] (T is the length of voice to be evaluated), PΔ2(n)=| p2
(n)-p2(n-1) |, n=1,2 ..., T, p2(0)=0;
The consonant section characteristic vector for setting user vocal feature in each time period is P '2=[p '2(1), p '2(2) ..., p '2
(T)], its first-order difference vector is P 'Δ2=[p 'Δ2(1), p 'Δ2(2) ..., p 'Δ2(T)] (T is the length of voice to be evaluated),
P’Δ2(n)=| p '2(n)-p’2(n-1) |, n=1,2 ..., T, p '2(0)=0;
Using DTW algorithms, obtain the minimum align to path of an error with, obtain the minimum align to path of an error, carry out
Vowel section and consonant section in each time period compare;
Compare the gap d for drawing vowel sectionp, and variable quantity gap Δ dp, compare the gap d ' for drawing consonant sectionp, Yi Jibian
The gap Δ d ' of change amountpTo obtain the similarity of user vocal feature and received pronunciation feature, i.e.,:
dp=| p1(n)-p2(m)|
d’p=| p '1(n)-p’2(m)|
Δdp=| Δ p1(n)-Δp2(m)|
Δd’p=| Δ p '1(n)-Δp’2(m)|
Wherein, Δ pi(n)=| pi(n)-pi(n-1)|
Δp’i(n)=| p 'i(n)-p’i(n-1)|。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611181163 | 2016-12-20 | ||
CN201611181163X | 2016-12-20 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106782609A true CN106782609A (en) | 2017-05-31 |
Family
ID=58950067
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710003810.6A Pending CN106782609A (en) | 2016-12-20 | 2017-01-03 | A kind of spoken comparison method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106782609A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107767862A (en) * | 2017-11-06 | 2018-03-06 | 深圳市领芯者科技有限公司 | Voice data processing method, system and storage medium |
CN108470476A (en) * | 2018-05-15 | 2018-08-31 | 黄淮学院 | A kind of pronunciation of English matching correcting system |
CN109192223A (en) * | 2018-09-20 | 2019-01-11 | 广州酷狗计算机科技有限公司 | The method and apparatus of audio alignment |
CN109326162A (en) * | 2018-11-16 | 2019-02-12 | 深圳信息职业技术学院 | A kind of spoken language exercise method for automatically evaluating and device |
CN111241308A (en) * | 2020-02-27 | 2020-06-05 | 曾兴 | Self-help learning method and system for spoken language |
CN113436487A (en) * | 2021-07-08 | 2021-09-24 | 上海松鼠课堂人工智能科技有限公司 | Chinese reciting skill training method and system based on virtual reality scene |
WO2022169417A1 (en) * | 2021-02-07 | 2022-08-11 | 脸萌有限公司 | Speech similarity determination method, device and program product |
CN111241308B (en) * | 2020-02-27 | 2024-04-26 | 曾兴 | Self-help learning method and system for spoken language |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101740024A (en) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | Method for automatic evaluation based on generalized fluent spoken language fluency |
CN101782941A (en) * | 2009-01-16 | 2010-07-21 | 国际商业机器公司 | Method and system for evaluating spoken language skill |
CN101872616A (en) * | 2009-04-22 | 2010-10-27 | 索尼株式会社 | Endpoint detection method and system using same |
CN102568475A (en) * | 2011-12-31 | 2012-07-11 | 安徽科大讯飞信息科技股份有限公司 | System and method for assessing proficiency in Putonghua |
CN104732977A (en) * | 2015-03-09 | 2015-06-24 | 广东外语外贸大学 | On-line spoken language pronunciation quality evaluation method and system |
CN105609114A (en) * | 2014-11-25 | 2016-05-25 | 科大讯飞股份有限公司 | Method and device for detecting pronunciation |
-
2017
- 2017-01-03 CN CN201710003810.6A patent/CN106782609A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101740024A (en) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | Method for automatic evaluation based on generalized fluent spoken language fluency |
CN101782941A (en) * | 2009-01-16 | 2010-07-21 | 国际商业机器公司 | Method and system for evaluating spoken language skill |
CN101872616A (en) * | 2009-04-22 | 2010-10-27 | 索尼株式会社 | Endpoint detection method and system using same |
CN102568475A (en) * | 2011-12-31 | 2012-07-11 | 安徽科大讯飞信息科技股份有限公司 | System and method for assessing proficiency in Putonghua |
CN105609114A (en) * | 2014-11-25 | 2016-05-25 | 科大讯飞股份有限公司 | Method and device for detecting pronunciation |
CN104732977A (en) * | 2015-03-09 | 2015-06-24 | 广东外语外贸大学 | On-line spoken language pronunciation quality evaluation method and system |
Non-Patent Citations (3)
Title |
---|
庄毅 著: "《面向互联网的多媒体大数据信息高效查询处理》", 30 June 2015, 浙江大学出版社 * |
王炳锡 等著: "《实用语音识别基础》", 31 January 2005, 国防工业出版社 * |
韩纪庆 等编著: "《音频信息处理技术》", 31 January 2007, 清华大学出版社 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107767862A (en) * | 2017-11-06 | 2018-03-06 | 深圳市领芯者科技有限公司 | Voice data processing method, system and storage medium |
CN108470476A (en) * | 2018-05-15 | 2018-08-31 | 黄淮学院 | A kind of pronunciation of English matching correcting system |
CN108470476B (en) * | 2018-05-15 | 2020-06-30 | 黄淮学院 | English pronunciation matching correction system |
CN109192223A (en) * | 2018-09-20 | 2019-01-11 | 广州酷狗计算机科技有限公司 | The method and apparatus of audio alignment |
CN109326162A (en) * | 2018-11-16 | 2019-02-12 | 深圳信息职业技术学院 | A kind of spoken language exercise method for automatically evaluating and device |
CN111241308A (en) * | 2020-02-27 | 2020-06-05 | 曾兴 | Self-help learning method and system for spoken language |
CN111241308B (en) * | 2020-02-27 | 2024-04-26 | 曾兴 | Self-help learning method and system for spoken language |
WO2022169417A1 (en) * | 2021-02-07 | 2022-08-11 | 脸萌有限公司 | Speech similarity determination method, device and program product |
CN113436487A (en) * | 2021-07-08 | 2021-09-24 | 上海松鼠课堂人工智能科技有限公司 | Chinese reciting skill training method and system based on virtual reality scene |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106782609A (en) | A kind of spoken comparison method | |
CN105529028B (en) | Speech analysis method and apparatus | |
US20190266998A1 (en) | Speech recognition method and device, computer device and storage medium | |
CN103345923B (en) | A kind of phrase sound method for distinguishing speek person based on rarefaction representation | |
CN106486131A (en) | A kind of method and device of speech de-noising | |
CN106531189A (en) | Intelligent spoken language evaluation method | |
CN110457432A (en) | Interview methods of marking, device, equipment and storage medium | |
CN101887725A (en) | Phoneme confusion network-based phoneme posterior probability calculation method | |
WO2020034628A1 (en) | Accent identification method and device, computer device, and storage medium | |
CN104078039A (en) | Voice recognition system of domestic service robot on basis of hidden Markov model | |
Marković et al. | Whispered speech database: Design, processing and application | |
CN103456302A (en) | Emotion speaker recognition method based on emotion GMM model weight synthesis | |
CN112509568A (en) | Voice awakening method and device | |
Yang et al. | Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection | |
Dua et al. | Discriminative training using heterogeneous feature vector for Hindi automatic speech recognition system | |
Yadav et al. | Non-Uniform Spectral Smoothing for Robust Children's Speech Recognition. | |
Hou et al. | Intelligent model for speech recognition based on svm: a case study on English language | |
Shah et al. | Speech emotion recognition based on SVM using MATLAB | |
Akila et al. | Isolated Tamil word speech recognition system using HTK | |
CN112767961B (en) | Accent correction method based on cloud computing | |
Shufang | Design of an automatic english pronunciation error correction system based on radio magnetic pronunciation recording devices | |
Khetri et al. | Automatic speech recognition for marathi isolated words | |
Ghonem et al. | Classification of stuttering events using i-vector | |
Kathania et al. | Spectral modification for recognition of children’s speech under mismatched conditions | |
Ma et al. | Statistical formant descriptors with linear predictive coefficients for accent classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170531 |
|
RJ01 | Rejection of invention patent application after publication |