CN104732977A - On-line spoken language pronunciation quality evaluation method and system - Google Patents

On-line spoken language pronunciation quality evaluation method and system Download PDF

Info

Publication number
CN104732977A
CN104732977A CN201510102425.8A CN201510102425A CN104732977A CN 104732977 A CN104732977 A CN 104732977A CN 201510102425 A CN201510102425 A CN 201510102425A CN 104732977 A CN104732977 A CN 104732977A
Authority
CN
China
Prior art keywords
described
tested speech
speech
pronunciation
characteristic parameter
Prior art date
Application number
CN201510102425.8A
Other languages
Chinese (zh)
Other versions
CN104732977B (en
Inventor
李心广
李苏梅
徐集优
张胜斌
陈君宇
李升恒
朱小凡
王泽铿
许港帆
陈嘉华
林帆
Original Assignee
广东外语外贸大学
李心广
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广东外语外贸大学, 李心广 filed Critical 广东外语外贸大学
Priority to CN201510102425.8A priority Critical patent/CN104732977B/en
Publication of CN104732977A publication Critical patent/CN104732977A/en
Application granted granted Critical
Publication of CN104732977B publication Critical patent/CN104732977B/en

Links

Abstract

The invention discloses an on-line spoken language pronunciation quality evaluation method and a system. The method comprises the following steps that a test voice collected by a mobile client-side is received through a network; preprocessing is carried out on the received test voice; voice characteristic parameter extracting is carried out on the preprocessed test voice, and a characteristic parameter of the test voice is obtained; according to the characteristic parameter of the test voice and a characteristic parameter of a standard voice, evaluation is carried out on the test voice, and an evaluation result is obtained; the evaluation result is fed back to the mobile client-side through the network, and the evaluation result is showed through the mobile client-side. According to the on-line spoken language pronunciation quality evaluation method and the system, the on-line, convenient and accurate spoken language pronunciation quality evaluation can be achieved.

Description

A kind of online spoken language pronunciation quality evaluating method and system

Technical field

The present invention relates to speech recognition and assessment technique field, particularly relate to a kind of online spoken language pronunciation quality evaluating method and system.

Background technology

The application of signal processing technology in language learning is the important content that infotech and language learning are integrated, its target is combined with current teaching and learning method up-to-date voice technology, set up computer auxiliary language learning system, and spoken language pronunciation quality assessment receives much concern as the important content of assisting language learning always.

But traditional spoken language pronunciation QA system, is confined in traditional language learner or PC mostly, is inconvenient to carry and is limited to wired network link simultaneously, be unfavorable for that learner carries out verbal learning anywhere or anytime.The existing spoken language pronunciation quality score system being applicable to intelligent movable mobile phone, what adopt is the mode of off-line operation, but the intelligent movable mobile phone of main flow can not support that the storage of big data quantity and complicated voice calculate at present, limit the complexity of the pronunciation quality evaluating algorithm on mobile device, its appraisal result truly can not reflect the voice quality level of Oral English Practice learner.Meanwhile, Speech Assessment Account Dept is deployed in language learner, PC and mobile device by existing scheme, is unfavorable for Data Update, storage and algorithm improvement.Further, existing spoken language pronunciation QA system, in pronunciation quality evaluating, the evaluation index of comprehensive consideration is comprehensive not, mostly be confined to independent or a small amount of evaluation index, can not the sound pronunciation quality of user be provided science, comprehensively and accurately be evaluated, often just provide a mark according to pronunciation, lack evaluation and feedback.

Summary of the invention

The object of the embodiment of the present invention there are provided a kind of online spoken language pronunciation quality evaluating method and system, with realize online, easily, spoken language pronunciation quality assessment accurately.

On the one hand, embodiments provide a kind of online spoken language pronunciation quality evaluating method, comprising:

By the tested speech that network reception is gathered by mobile client;

Pre-service is carried out to the tested speech received;

Pretreated tested speech is carried out to the extraction of speech characteristic parameter, obtain the characteristic parameter of described tested speech;

According to the characteristic parameter of described tested speech and the characteristic parameter of received pronunciation, described tested speech is evaluated, obtain evaluation result;

Give described mobile client by described evaluation result by network-feedback, and by described mobile client, described evaluation result is shown.

Preferably, described online spoken language pronunciation quality evaluating method also comprises:

Described evaluation result is stored in database, and statistical study is carried out to evaluation result, obtain statistics;

Statistics is sent to management of webpage end, and by management of webpage end, described statistics is shown.

Preferably, described online spoken language pronunciation quality evaluating method also comprises:

Obtain received pronunciation;

Pre-service is carried out to described received pronunciation;

Pretreated received pronunciation is carried out to the extraction of speech characteristic parameter, obtain the characteristic parameter of described received pronunciation.

Preferably, described pre-service comprises pre-emphasis, framing, windowing and end-point detection.

Preferably, the described extraction pretreated tested speech being carried out to speech characteristic parameter, obtains the characteristic parameter of described tested speech, comprising:

Discrete Fourier transform is carried out to described tested speech, obtain the spectral coefficient of described tested speech, described spectral coefficient sequence triangular filter is carried out filtering, logarithm operation is carried out to filtered data, utilize discrete cosine transform, obtain the MFCC characteristic parameter of described tested speech;

The fundamental frequency feature of described tested speech, short-time energy feature, resonance peak feature are extracted, and resonance peak feature described in described fundamental frequency feature, described short-time energy characteristic sum is formed the affective characteristics parameter of described tested speech;

Calculate the pronunciation duration of described tested speech, obtain the pronunciation duration characteristics parameter of described tested speech;

Stress dividing elements is carried out to described tested speech, extracts start frame set of locations and the end frame set of locations of stress, obtain the stress position characteristic parameter of described tested speech;

Voice unit division is carried out to described tested speech, calculates the duration of each voice unit respectively, obtain the voice unit duration characteristics parameter of described tested speech;

Extracted the pitch of described each frame data of tested speech by the auto-relativity function method in time domain, obtain the pitch parameters parameter of described tested speech.

Preferably, describedly according to the characteristic parameter of described tested speech and the characteristic parameter of received pronunciation, described tested speech to be evaluated, obtains evaluation result, comprising:

According to the MFCC characteristic parameter of described tested speech, based on the probabilistic neural network integrated speech model of cognition of Segment Clustering, speech recognition is carried out to described tested speech, obtain voice identification result; And Similarity Measure is carried out to the MFCC characteristic parameter of described tested speech and the MFCC characteristic parameter of described received pronunciation, obtain MFCC related coefficient; According to institute's speech recognition result and described MFCC related coefficient, calculate the accuracy score of described tested speech;

According to the affective characteristics parameter of described tested speech, based on SVM emotion model, emotion recognition is carried out to described tested speech, obtain emotion recognition result; And Similarity Measure is carried out to the affective characteristics parameter of received pronunciation described in the affective characteristics parameter of described tested speech, obtain emotion related coefficient; According to described emotion recognition result and described emotion related coefficient, calculate the emotion score of described tested speech;

According to the pronunciation duration characteristics parameter of described received pronunciation and described tested speech, obtain the word speed ratio of described received pronunciation and described tested speech, and according to described word speed ratio, calculate the word speed score of described tested speech;

According to the stress position characteristic parameter of described tested speech and the stress position characteristic parameter of described received pronunciation, the stress position difference of tested speech described in comparison and described received pronunciation, and according to described stress position difference, calculate the stress score of described tested speech;

According to the voice unit duration characteristics parameter of described tested speech and the voice unit duration characteristics parameter of described received pronunciation, utilize dPVI algorithm, obtain the dPVI parameter of described tested speech, and according to described dPVI parameter, calculate the rhythm score of described tested speech;

According to the pitch parameters parameter of described tested speech and the pitch parameters parameter of described received pronunciation, utilize DTW algorithm, obtain the pitch differentiation of described received pronunciation and described tested speech, and according to described pitch differentiation, calculate the intonation score of described tested speech.

Preferably, describedly according to the characteristic parameter of described tested speech and the characteristic parameter of received pronunciation, described tested speech to be evaluated, obtains evaluation result, also comprise:

Summation is weighted to described accuracy score, described emotion score, described word speed score, described stress score, described rhythm score and described intonation score, obtains integrate score; And according to described accuracy score, described emotion score, described word speed score, described stress score, described rhythm score, described intonation score and described integrate score, in conjunction with the mapping relations of each score and grade evaluation, obtain the class of accuracy evaluation of described tested speech, emotion grade evaluation, word speed grade evaluation, stress grade evaluation, rhythm grade evaluation, intonation grade evaluation and integrated level evaluation; And using the class of accuracy evaluation of described tested speech, emotion grade evaluation, word speed grade evaluation, stress grade evaluation, rhythm grade evaluation, intonation grade evaluation and the integrated level evaluation evaluation result as described tested speech.

Preferably, described online spoken language pronunciation quality evaluating method also comprises:

According to described evaluation result, the spoken language pronunciation of user is instructed, obtain pronunciation instruction;

Give described mobile client by described pronunciation instruction by network-feedback, and by described mobile client, described pronunciation instruction is shown.

On the other hand, embodiments provide a kind of online spoken language pronunciation QA system, comprise the mobile client and server end that are connected by network;

Described mobile client comprises:

Voice collecting unit, for collecting test voice, and sends to described server end by network by described tested speech;

Described server end comprises:

Pretreatment unit, for carrying out pre-service to the tested speech received;

Characteristic parameter extraction unit, for carrying out the extraction of speech characteristic parameter to pretreated tested speech, obtains the characteristic parameter of described tested speech;

Voice evaluation, for according to the characteristic parameter of described tested speech and the characteristic parameter of received pronunciation, evaluates described tested speech, obtains evaluation result; And give described mobile client by described evaluation result by network-feedback;

Described mobile client also comprises:

Data display unit, for showing described evaluation result.

Preferably, described system also comprises management of webpage end, and described management of webpage end is connected with described server end by network; Described server end also comprises database and statistical analysis unit;

Described database, for storing described evaluation result;

Described statistical analysis unit, for carrying out statistical study to evaluation result, obtains statistics; And described statistics is sent to described management of webpage end;

Described management of webpage end, for showing the statistics received.

Compared with prior art, the advantage of the embodiment of the present invention is:

The embodiment of the present invention is based on C/S (Client/Server, client end/server end) framework, build mobile client and server end, gather the tested speech signal of user by mobile client and send to server end, server end is evaluated backward mobile client to tested speech and is returned Speech Assessment result, shows described evaluation result finally by mobile client.User can utilize mobile Internet access server side easily, obtains service and data, and corpus and evaluation method all can realize synchronous by server end, and provide by server end the speech analysis algorithms process that performance is more excellent, effect is better.

Secondly, the embodiment of the present invention is also based on B/S (Browser/Server, page end/server end) framework, build management of webpage end and server end, can by the spoken language pronunciation quality assessment statistics of web browser Real-time Obtaining mobile client end subscriber from the database of server end, for third party (as instructor) provides the spoken language pronunciation situation of mobile client end subscriber, be convenient to third party and formulate spoken under line guidance and improvement strategy.

Further, the embodiment of the present invention carries out various dimensions Speech Assessment for tested speech, and the evaluation method of each index is reasonable, credible, and for the spoken language pronunciation feedback pronunciation instruction of user, can contribute to the mispronounce correcting user, improve voice quality.

Accompanying drawing explanation

Fig. 1 is the flow chart of steps of an embodiment of online spoken language pronunciation quality evaluating method provided by the invention;

Fig. 2 is the process of establishing schematic diagram of probabilistic neural network integrated classifier provided by the invention;

Fig. 3 is the C/S configuration diagram of an embodiment of online spoken language pronunciation QA system provided by the invention;

Fig. 4 is the B/S configuration diagram of online spoken language pronunciation QA system as shown in Figure 3.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described.It should be noted that, the label in embodiment before each step, only in order to more clearly identify each step, does not have the restriction of inevitable sequencing between each step.

See Fig. 1, be the flow chart of steps of an embodiment of online spoken language pronunciation quality evaluating method provided by the invention, described method comprises:

S1, the tested speech gathered by mobile client by network reception.

In the middle of concrete enforcement, on the mobile phone that mobile client is installed on user in the mode of application program or other mobile devices, recorded by the recorded program called in mobile device, gather the voice that user sends in oral test, and generating the audio file of consolidation form, described mobile client sends to server end by network after carrying out compressed encoding to described audio file.Wherein, described audio file is preferably the audio file of wav form, described network is preferably mobile Internet, mobile client and server end adopt the Socket based on TCP/IP (Transmission Control Protocol/Internet Protocol, transmission control protocol/Internet Protocol) communication protocol to carry out data transmission.

S2, carries out pre-service to the tested speech received.

Server end, after receiving the data that mobile client sends, carries out decompression decoding to the data received, is reduced to the source document of tested speech.Simultaneously before treatment and analysis is carried out to tested speech, in order to eliminate because people's vocal organs itself and the impact that produces tested speech due to speech ciphering equipment, pre-service is carried out to tested speech, extraction for subsequent voice characteristic parameter provides the data source of high-quality, thus improves the quality of speech processes.Pre-service described in the present embodiment includes but not limited to pre-emphasis, framing, windowing and end-point detection, specific as follows:

2.1) pre-emphasis: the average power spectra of tested speech is subject to the impact of glottal excitation and mouth and nose radiation, front end presses 6dB/oct decay about more than 800Hz greatly, the higher corresponding composition of frequency is less, therefore needs to be promoted the HFS of described tested speech before analyzing tested speech.The present embodiment adopted the high boost pre-emphasis digital filter of a 6dB/oct before analyzing tested speech, the HFS of described tested speech is promoted, make the frequency spectrum of described tested speech become smooth, remain on low frequency in the frequency band of high frequency.The computing formula of pre-emphasis is as follows:

Y (n)=x (n)-0.9375*x (n-1) (formula 1)

Wherein, x (n) is original tested speech.

2.2) framing: voice signal has time-varying characteristics, but in a short time range, its characteristic remains unchanged namely relatively stable substantially, this characteristic of voice signal is called " short-time characteristic ", this short time range is generally 10 ~ 30ms, so be based upon on the basis of " short-time characteristic " to the treatment and analysis of tested speech, " short-time analysis " (i.e. sub-frame processing) is carried out to tested speech.Owing to there is correlativity between voice signal, the present embodiment adopts the mode of the overlapping framing of field to carry out framing to described tested speech.

2.3) windowing: for realizing being emphasized the speech waveform near sample position in tested speech and weakened the remainder of waveform, Hamming window is selected to carry out windowing to tested speech in the present embodiment, carry out windowing process after framing can reduce owing to blocking the Gibbs' effect (Gibbs phenomenon) caused, make the frequency spectrum of tested speech comparatively level and smooth.In the attainable mode of one, the computing formula of windowing is as follows:

S ω(n)=y (n) * ω (n) (formula 2)

Wherein, y (n) is the voice signal after pre-emphasis, and ω (n) is window function.

2.4) end-point detection: the present embodiment adopts double-threshold comparing method to carry out end-point detection, detects starting point and the end point of tested speech.Double-threshold comparing method is using short-time energy E and short-time average zero-crossing rate Z as feature, in conjunction with the advantage of short-time average zero-crossing rate Z and short-time energy E, make detection more accurate, the processing time of effective reduction system, the real-time of raising system process, and the noise of unvoiced segments can be got rid of, thus the handling property of the voice signal improved.

S3, carries out the extraction of speech characteristic parameter to pretreated tested speech, obtain the characteristic parameter of described tested speech.The characteristic parameter of described tested speech comprises MFCC (Mel-Frequency CepstralCoefficients, Mel cepstrum coefficient) characteristic parameter, affective characteristics parameter, pronunciation duration characteristics parameter, stress position characteristic parameter, voice unit duration characteristics parameter and pitch parameters parameter, the characteristic parameter extraction process of carrying out at server end is as follows:

3.1) discrete Fourier transform (DFT is carried out to described tested speech, Discrete Fourier Transform), obtain the spectral coefficient of described tested speech, described spectral coefficient sequence triangular filter is carried out filtering, logarithm operation is carried out to filtered data, utilize discrete cosine transform, obtain the MFCC characteristic parameter of described tested speech.Concrete steps are as follows:

Discrete Fourier transform is carried out to pretreated tested speech and obtains spectral coefficient X (k).

Filtering process is carried out to spectral coefficient X (k) sequence triangular filter, obtains one group of Coefficient m i.Calculate described Coefficient m iformula as follows:

M i=ln [X (k) * H i(k)] (formula 3)

Wherein,

(formula 4)

The centre frequency that f [i] is triangular filter, meets:

Mel (f [i+1])-Mel (f [i])=Mel (f [i])-Mel (f [i-1]) (formula 5)

Ask logarithm to the output of all wave filters, cepstral coefficients is tried to achieve in recycling discrete cosine transform, and computing formula is as follows:

C i = 2 P Σ j = 1 P log ( m i ) cos [ πi P ( j - 0.5 ) ] (formula 6)

Wherein, P is the number of triangular filter, C ifor required MFCC characteristic parameter.Preferably, the exponent number of described MFCC characteristic parameter is set to 12.

3.2) the fundamental frequency feature of described tested speech, short-time energy feature, resonance peak feature are extracted, and resonance peak feature described in described fundamental frequency feature, described short-time energy characteristic sum is formed the affective characteristics parameter of described tested speech.

3.2.1) fundamental frequency feature: periodicity when pitch period refers to and sends out voiced sound caused by vocal cord vibration, the inverse of fundamental frequency and pitch period.Fundamental frequency is one of most important parameter of voice signal, and research shows that fundamental frequency can reflect the change of emotion.The detection method of fundamental frequency feature includes but not limited to auto-relativity function method (ACF), cepstral analysis method, average magnitude difference function method (AMDF) and wavelet method.Preferably cepstral analysis method is adopted in the present embodiment, Fourier transform is carried out to pretreated tested speech, obtain the amplitude spectrum of described tested speech, described amplitude spectrum is taken the logarithm, obtain the one-period signal of tested speech at frequency domain, calculate the frequency values of described periodic signal, the fundamental frequency value of described tested speech can be obtained.Inverse Fourier transform is carried out to described periodic signal, obtains a peak value at pitch period place.Again by calculating maximal value, minimum value, average, the 7 rank fundamental frequency statistical variations parameters such as intermediate value and standard deviation of fundamental frequency after drawing fundamental frequency value, as the fundamental frequency feature of tested speech.

3.2.2) short-time energy feature: the energy of voice signal and the expression of emotional speech have comparatively High relevancy, energy greatly then show the volume of sound and loudness relatively large.In the life of reality, when people are angry or angry time, the volume of pronunciation is larger.When people are dejected or sad time, often spoken sounds is lower.Speech signal energy has short-time energy and short-time average magnitude energy two kinds usually, preferably chooses the short-time energy of tested speech as energy parameter.Short-time energy is the weighted sum of squares of a frame sampling point value, and short-time energy is defined as follows:

E n = Σ m = 0 N - 1 x n 2 ( m ) (formula 7)

Wherein, x nm () is the n-th frame signal of tested speech.

After drawing short-time energy, then by 7 rank short-time energy statistical variations parameters such as the maximal value, minimum value, average, intermediate value and the standard deviations that calculate short-time energy, as the short-time energy feature of tested speech.

3.2.3) resonance peak feature: resonance peak is an important parameter of reflection tract characteristics, when sound stimulation is by sound channel, can produce formant frequency.When people is in different emotions state, the tensity of its nerve is different, causes sound channel deformation, and corresponding change occurs formant frequency.The present embodiment preferably utilizes the formant parameter of linear prediction method to every frame voice signal to extract, can be quick, excellent and extract formant parameter effectively, the first resonance peak and second resonance peak of voice signal is obtained by linear prediction method, again with Segment Clustering method by the first resonance peak and the second resonance peak regular be 32 rank parameters, as the resonance peak feature of described tested speech.By resonance peak feature, fundamental frequency characteristic sum short-time energy integrate features together, the speech emotional characteristic parameter on 46 rank is formed.

3.3) calculate the pronunciation duration of described tested speech, obtain the pronunciation duration characteristics parameter of described tested speech.

In the middle of concrete enforcement, by setting the high-low limits of short-time energy and zero-crossing rate, end-point detection can be carried out to tested speech, obtains the pronunciation duration of tested speech.

3.4) stress dividing elements is carried out to described tested speech, extract start frame set of locations and the end frame set of locations of stress, obtain the stress position characteristic parameter of described tested speech.

Stress dividing elements flow process is as follows:

A. the energy value of tested speech is extracted.The feature that in tested speech, stressed syllable is loud will be reflected to the energy intensity in time domain, and namely to show as speech energy intensity large for syllable.

B. carry out regular to tested speech.Due to the gap between speaker's word speed, there is some difference to the pronunciation duration of same sentence for different speakers, but different people is but followed stress unit duration to the pronunciation of same sentence and accounted for a certain proportion of rule of whole sentence.Therefore, when marking to tested speech, can by transferring the pronunciation duration characteristics parameter of received pronunciation, by regular in proportion for the pronunciation duration of described tested speech be identical with the pronunciation duration of described received pronunciation, be conducive to the process of data, also make the evaluation of system more objective.

C. the syllable of tested speech is extracted.In the middle of concrete enforcement, double-threshold comparing method can be adopted to carry out stress end-point detection, according to the energy value of tested speech, search in tested speech one by one and be greater than stress threshold values T umaximum voice signal value S max, then to signal value S maxleft-right Searching equals non-stress threshold values T lvoice signal value S lwith S r, by S lwith S rbe set to the stress signal of tested speech, and by S lwith S rbetween signal quantity set to 0, avoid repetition at S lwith S rbetween search.Due to the feature that stressed syllable in tested speech has pronunciation partially to grow, and the stressed syllable unit that the first step searches for out may to there is energy value large, namely audible representation is loud for pronouncing, but the problem that the duration is very short, these unit may be short vowels, also may be the interference of signal peaks, they do not form stressed syllable, therefore can stressed syllable unit be screened the feature partially long according to stressed syllable pronunciation further, the minimum duration of stressed syllable unit is set as one is roughly read again vowel duration (Stressed vowel durations), be preferably 100ms, and contrast according to the minimum duration of setting.

By above step, complete the division to sentence stress unit, start frame set of locations and the end frame set of locations of the stress of sentence can be known, and using described start frame set of locations and the end frame set of locations stress position characteristic parameter as described tested speech.

3.5) voice unit division is carried out to described tested speech, calculate the duration of each voice unit respectively, obtain the voice unit duration characteristics parameter of described tested speech.The duration of institute's speech units refers to that each voice unit starts the duration to terminating.

3.6) extracted the pitch of described each frame data of tested speech by the auto-relativity function method (ACF) in time domain, obtain the pitch parameters parameter of described tested speech.

Auto-relativity function method uses autocorrelation function to calculate sound frame s (i) and the similarity of self, wherein, i=0 ~ n-1, computing formula is as follows:

acf ( τ ) = Σ i = 0 n - 1 - τ s ( i ) s ( i + τ ) (formula 8)

Wherein, n refers to the length of a frame speech data, and τ is time delay, finds out the τ value that acf (τ) can be made interior in some rational given zone, just can calculate the pitch of this sound frame.In concrete ACF computation process, by speech frame at every turn to right translation a bit, the lap of the sound frame after translation and former sound frame is done inner product, and n the inner product value obtained after repeating n time is exactly ACF value corresponding to a speech frame.

S4, according to the characteristic parameter of described tested speech and the characteristic parameter of received pronunciation, evaluates described tested speech, obtains evaluation result.

It should be noted that, the characteristic parameter of received pronunciation obtains by carrying out speech characteristic parameter extraction to received pronunciation in advance, is stored in database, transfers when needs use.The concrete steps extracting the characteristic parameter of received pronunciation comprise: obtain received pronunciation; Pre-service is carried out to described received pronunciation; Pretreated received pronunciation is carried out to the extraction of speech characteristic parameter, obtain the characteristic parameter of described received pronunciation.The concrete steps of the characteristic parameter extraction of received pronunciation are consistent with the characteristic parameter extraction process of tested speech, do not repeat them here.

According to the characteristic parameter of described tested speech and the characteristic parameter of received pronunciation, the process evaluated described tested speech is specific as follows:

4.1) according to the MFCC characteristic parameter of described tested speech, based on probabilistic neural network (the Probabilistic Neural Network of Segment Clustering, PNN) integrated speech model of cognition, carries out speech recognition to described tested speech, obtains voice identification result.And Similarity Measure is carried out to the MFCC characteristic parameter of described tested speech and the MFCC characteristic parameter of described received pronunciation, obtain MFCC related coefficient.According to institute's speech recognition result and described MFCC related coefficient, calculate the accuracy score of described tested speech.It should be noted that, the probabilistic neural network integrated speech model of cognition of described Segment Clustering is that precondition obtains, and is stored in database, transfers when needs use.

In the present embodiment, adopt Bagging (Bootstrap aggregating, self-service gathering) thought generates integrated required individual probabilistic neural network model, Bagging is a kind of integrated learning approach multiple different individual learner being integrated into a learner, by repeated sampling different data subsets can be obtained, make in different pieces of information subset, train the individual learner obtained have higher Generalization Capability and have larger diversity factor.Utilize the Distributed Calculation of existing network can improve the time efficiency of algorithm further, and Bagging can improve the performance of learner, be conducive to the classification accuracy and the generalization ability that improve probabilistic neural network.

With reference to Fig. 2, it is the process of establishing schematic diagram of probabilistic neural network integrated classifier provided by the invention.From training sample set A, randomly draw n sample (as Bagging sample A1, Bagging sample A2 in figure at every turn ... Bagging sample An), train by probabilistic neural network sorting algorithm, obtain a PNN sorter, utilize identical method to generate multiple PNN sorter (i.e. PNN sorter C in figure 1(x), PNN sorter C 2(x) ... PNN sorter C n(x)), a classification function sequence C can be obtained after training 1(x), C 2(x) ... C n(x), i.e. PNN integrated classifier, the namely integrated speech of PNN described in the present embodiment model of cognition, final classification function C (x) adopts ballot mode to classification problem, and who gets the most votes's classification results is the final classification of classification function C (x).

In the process of speech recognition, only need the MFCC characteristic parameter of described tested speech to be input in described PNN integrated speech model of cognition, classify in the mode of voting, judge that whether content is correct.Carry out Similarity Measure to the MFCC characteristic parameter of described tested speech and the MFCC characteristic parameter of described received pronunciation, the accuracy of size to tested speech that be whether correct and MFCC related coefficient of last foundation content is marked simultaneously.

4.2) according to the affective characteristics parameter of described tested speech, based on SVM (Support Vector Machine, support vector machine) emotion model, emotion recognition is carried out to described tested speech, obtain emotion recognition result.And Similarity Measure is carried out to the affective characteristics parameter of received pronunciation described in the affective characteristics parameter of described tested speech, obtain emotion related coefficient.According to described emotion recognition result and described emotion related coefficient, calculate the emotion score of described tested speech.

After the affective characteristics parameter extraction of tested speech is complete, by affective characteristics parameters input to classifying based on SVM emotion model, calculate the related coefficient of the affective characteristics parameter of tested speech and the affective characteristics parameter of received pronunciation simultaneously.Finally, emotion score is drawn according to emotional semantic classification result related coefficient size that is whether correct and affective characteristics parameter.

4.3) according to the pronunciation duration characteristics parameter of described received pronunciation and described tested speech, obtain the word speed ratio of described received pronunciation and described tested speech, and according to described word speed ratio, calculate the word speed score of described tested speech.

After extracting the pronunciation duration characteristics parameter of described tested speech, by following formulae discovery word speed ratio:

(formula 9)

Wherein, S pronunciation durationrefer to the duration of received pronunciation, T pronunciation durationrefer to the pronunciation duration of tested speech.

Word speed is too fast or cross the requirement all not meeting linguistics slowly and express, therefore can according to word speed ratio, and the degree too fast or excessively slow by word speed, marks to the word speed of tested speech.

4.4) according to the stress position characteristic parameter of described tested speech and the stress position characteristic parameter of described received pronunciation, the stress position difference of tested speech described in comparison and described received pronunciation, and according to described stress position difference, calculate the stress score of described tested speech.

When extracting stress position characteristic parameter, obtain start frame position and the end frame set of locations of stress, by the stress distributional difference diff of following formulae discovery tested speech with mark voice:

diff = Σ i = 1 n { ( left std [ i ] Len std - left test [ i ] Len test ) + ( right std [ i ] Len std - right test [ i ] Len test ) } (formula 10)

Wherein, Len stdrefer to the efficient voice frame length of received pronunciation, Len testrefer to the efficient voice frame length of tested speech.Left std[i] is the start frame set of locations of received pronunciation, right std[i] is the end frame set of locations of received pronunciation.Left test[i] is the start frame set of locations of tested speech, right test[i] is the end frame set of locations of tested speech.

The stress position difference size of foundation tested speech and received pronunciation, marks to the stress of described tested speech.

4.5) according to the voice unit duration characteristics parameter of described tested speech and the voice unit duration characteristics parameter of described received pronunciation, utilize dPVI (the Distinct Pairwise Variability Index) algorithm, obtain the dPVI parameter of described tested speech, and according to described dPVI parameter, calculate the rhythm score of described tested speech.

After extracting the voice unit duration characteristics parameter of tested speech, the voice unit duration characteristics parameter of the syllable unit duration characteristics parameter of tested speech and received pronunciation is carried out contrast budget, and the dPVI parameter changed out for system score basis, the computing formula of dPVI parameter is as follows:

dPVI = 100 × ( Σ k = 1 m - 1 | d 1 k - d 2 k | + | d 1 t - d 2 t | ) / Len (formula 11)

Wherein, d is the voice unit duration (as: d that sentence divides kfor a kth voice unit duration), m=min (S snum, T snum), S snumfor the voice unit number of received pronunciation, T snumfor the voice unit number of tested speech, Len is the duration of received pronunciation.

According to the size of dPVI parameter, calculate the rhythm score of described tested speech.

4.6) according to the pitch parameters parameter of described tested speech and the pitch parameters parameter of described received pronunciation, utilize DTW (Dynamic Time Warping, dynamic time consolidation) algorithm, obtain the pitch differentiation of described received pronunciation and described tested speech, and according to described pitch differentiation, calculate the intonation score of described tested speech.

After extracting the pitch parameters parameter of described tested speech, can also, by arranging median wave filter, coming pitch smoothing, excluding the speech frame of instability, pitch value exception.Utilize DTW algorithm that the pitch parameters parameter of tested speech and the pitch parameters parameter of received pronunciation are carried out otherness contrast, calculate pitch differentiation parameter d ist therebetween, calculate the intonation score of described tested speech again, the computing formula of intonation score is as follows:

S intonation = 100 1 + a × ( dist ) b (formula 12)

Wherein, by emulation experiment, contrast expert analysis mode data and system score data, calculate a=0.0005, b=2.

4.7) summation is weighted to described accuracy score, described emotion score, described word speed score, described stress score, described rhythm score and described intonation score, obtains integrate score.And according to described accuracy score, described emotion score, described word speed score, described stress score, described rhythm score, described intonation score and described integrate score, in conjunction with the mapping relations of each score and grade evaluation, obtain the class of accuracy evaluation of described tested speech, emotion grade evaluation, word speed grade evaluation, stress grade evaluation, rhythm grade evaluation, intonation grade evaluation and integrated level evaluation.And using the class of accuracy evaluation of described tested speech, emotion grade evaluation, word speed grade evaluation, stress grade evaluation, rhythm grade evaluation, intonation grade evaluation and the integrated level evaluation evaluation result as described tested speech.

Be weighted in the process of summation to described accuracy score, described emotion score, described word speed score, described stress score, described rhythm score and described intonation score, shared by each index mark, weight can adopt different values according to different demands, can close the weight combination of user's request according to user's own characteristic selector.According to the mapping relations of each score and grade evaluation, obtain grade evaluation and the integrated level evaluation of each index.Such as, if described accuracy score is in the fraction range of 90 ~ 100, then described class of accuracy is evaluated as A level; If described accuracy score is in the fraction range of 70 ~ 90, then described class of accuracy is evaluated as B level; If described accuracy score is in the fraction range of 60 ~ 70, then described class of accuracy is evaluated as C level; If described accuracy score is in the fraction range of 0 ~ 60, then described class of accuracy is evaluated as D level.The mapping relations that the mapping relations of other scores and grade evaluation and above-mentioned accuracy score and class of accuracy are evaluated are similar, do not repeat them here.It should be noted that, the mapping relations of above-mentioned mark and grade are only an example of the present invention, in the middle of practical application, can be according to actual needs, different threshold values is set, different fraction range is mapped in different grades, also can divides more grade natch.

S5, is given described mobile client by described evaluation result by network-feedback, and is shown described evaluation result by described mobile client.

After described server end obtains evaluation result, feed back to mobile client by evaluation result by mobile Internet, evaluation result information by evaluation result information displaying on the screen of the mobile device, or is pointed out by audible by mobile client.

In the middle of concrete enforcement, described server end according to described evaluation result, can also instruct the spoken language pronunciation of user after obtaining the evaluation result of tested speech, obtains pronunciation instruction.According to evaluation result, can mate with the pronunciation instruction in database.

Give described mobile client by described pronunciation instruction by network-feedback, and by described mobile client, described pronunciation instruction is shown.Point out the mistakes and short comings in user's spoken language pronunciation by pronunciation instruction, and propose the suggestion of improvement, if detect that the word speed of user crosses fast pace confusion, user can be pointed out can to slow down word speed a little, hold sentence rhythm etc.

The embodiment of the present invention is based on C/S (Client/Server, client end/server end) framework, build mobile client and server end, mobile client gathers the tested speech signal of user and sends to server end, server end is evaluated backward mobile client to tested speech and is returned Speech Assessment result, and is shown described evaluation result by mobile client.User can utilize mobile Internet access server side easily, obtains service and data, and corpus and evaluation method all can realize synchronous by server end, and provide by server end the speech analysis algorithms process that performance is more excellent, effect is better.

Further, described online spoken language pronunciation quality evaluating method also comprises:

S6, is stored into described evaluation result in database, and carries out statistical study to evaluation result, obtains statistics.

In the middle of concrete enforcement, when user test is complete, can by the user profile of user, tested speech and evaluation result are stored in database, described server end carries out statistical study to the evaluation result (comprising each index score and integrate score) in database, obtain the learning information analysis result of unique user, also can obtain group's learning information analysis result or the whole network study situation statistics for the user of specific user group or for all users of the whole network.

S7, sends to management of webpage end by statistics, and is shown described statistics by management of webpage end.The statistics that management of webpage end reception server end is evaluated the spoken language pronunciation of mobile client end subscriber, presents in visual form third party (as instructor).

The embodiment of the present invention is based on B/S (Browser/Server, page end/server end) framework, build management of webpage end and server end, can by the spoken language pronunciation quality assessment statistics of web browser Real-time Obtaining mobile client end subscriber from the database of server end, for third party provides the spoken language pronunciation situation of mobile client end subscriber, be convenient to third party and formulate spoken under line guidance and improvement strategy.

With reference to Fig. 3, it is the C/S Organization Chart of an embodiment of online spoken language pronunciation QA system provided by the invention.Described online spoken language pronunciation QA system with embodiment illustrated in fig. 1 in the ultimate principle of online spoken language pronunciation quality evaluating method consistent, in the present embodiment in detail part is not described in detail, can associated description in embodiment shown in Figure 1.

Described system comprises the mobile client 100 and server end 200 that are connected by network.

Described mobile client 100 comprises:

Voice collecting unit 101, for collecting test voice, and sends to described server end 200 by network by described tested speech.

Described server end 200 comprises:

Pretreatment unit 201, for carrying out pre-service to the tested speech received.

Characteristic parameter extraction unit 202, for carrying out the extraction of speech characteristic parameter to pretreated tested speech, obtains the characteristic parameter of described tested speech.

Voice evaluation 203, for according to the characteristic parameter of described tested speech and the characteristic parameter of received pronunciation, evaluates described tested speech, obtains evaluation result; And give described mobile client 100 by described evaluation result by network-feedback.

Described mobile client 100 also comprises:

Data display unit 102, for showing described evaluation result.

With reference to Fig. 4, it is the B/S configuration diagram of online spoken language pronunciation QA system as shown in Figure 3.

Described system also comprises management of webpage end 300, and described management of webpage end 300 is connected with described server end 200 by network.Described server end 200 also comprises database 204 and statistical analysis unit 205.

Described database 204, for storing described evaluation result.

Described statistical analysis unit 205, for carrying out statistical study to evaluation result, obtains statistics.And described statistics is sent to described management of webpage end 300.

Described management of webpage end 300, for showing the statistics received.

The embodiment of the present invention is based on C (B)/S, build mobile client 100, server end 200 and management of webpage end 300, gather the tested speech signal of user by mobile client 100 and send to server end 200, server end 200 pairs of tested speech are evaluated backward mobile client 100 and are returned Speech Assessment result, are shown described evaluation result by mobile client 100.User can utilize mobile Internet access server side 200 easily, obtain service and data, corpus and evaluation method all can be realized synchronously by server end 200, and provide by server end 200 the speech analysis algorithms process that performance is more excellent, effect is better.Can also by the spoken language pronunciation quality assessment statistics of management of webpage end 300 Real-time Obtaining mobile client end subscriber from the database of server end 200, for third party (as instructor) provides the spoken language pronunciation situation of mobile client end subscriber, be convenient to third party and formulate spoken under line guidance and improvement strategy.

The online spoken language pronunciation quality evaluating method that the embodiment of the present invention provides and system can be applicable to, in Oral English Practice study, detect the voice quality of Oral English Practice.Also the pronunciation quality evaluating of other languages can be applied to, as Japanese and French.

By the description of above embodiment, those skilled in the art can be well understood to the mode that the present invention can add required common hardware by software and realize, and can certainly comprise special IC, dedicated cpu, private memory, special components and parts etc. realize by specialized hardware.Technical scheme of the present invention can embody with the form of software product the part that prior art contributes in essence in other words, this software product is stored in the storage medium that can read, as the floppy disk of computing machine, USB flash disk, portable hard drive, ROM (read-only memory) (ROM, Read-Only Memory), random access memory (RAM, Random AccessMemory), magnetic disc or CD etc.

The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; change can be expected easily or replace, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of described claim.

Claims (10)

1. an online spoken language pronunciation quality evaluating method, is characterized in that, comprising:
By the tested speech that network reception is gathered by mobile client;
Pre-service is carried out to the tested speech received;
Pretreated tested speech is carried out to the extraction of speech characteristic parameter, obtain the characteristic parameter of described tested speech;
According to the characteristic parameter of described tested speech and the characteristic parameter of received pronunciation, described tested speech is evaluated, obtain evaluation result;
Give described mobile client by described evaluation result by network-feedback, and by described mobile client, described evaluation result is shown.
2. online spoken language pronunciation quality evaluating method as claimed in claim 1, it is characterized in that, described method also comprises:
Described evaluation result is stored in database, and statistical study is carried out to evaluation result, obtain statistics;
Statistics is sent to management of webpage end, and by management of webpage end, described statistics is shown.
3. online spoken language pronunciation quality evaluating method as claimed in claim 1, it is characterized in that, described method also comprises:
Obtain received pronunciation;
Pre-service is carried out to described received pronunciation;
Pretreated received pronunciation is carried out to the extraction of speech characteristic parameter, obtain the characteristic parameter of described received pronunciation.
4. the online spoken language pronunciation quality evaluating method as described in any one of claims 1 to 3, it is characterized in that, described pre-service comprises pre-emphasis, framing, windowing and end-point detection.
5. the online spoken language pronunciation quality evaluating method as described in any one of claims 1 to 3, is characterized in that, the described extraction pretreated tested speech being carried out to speech characteristic parameter, obtains the characteristic parameter of described tested speech, comprising:
Discrete Fourier transform is carried out to described tested speech, obtain the spectral coefficient of described tested speech, described spectral coefficient sequence triangular filter is carried out filtering, logarithm operation is carried out to filtered data, utilize discrete cosine transform, obtain the MFCC characteristic parameter of described tested speech;
The fundamental frequency feature of described tested speech, short-time energy feature, resonance peak feature are extracted, and resonance peak feature described in described fundamental frequency feature, described short-time energy characteristic sum is formed the affective characteristics parameter of described tested speech;
Calculate the pronunciation duration of described tested speech, obtain the pronunciation duration characteristics parameter of described tested speech;
Stress dividing elements is carried out to described tested speech, extracts start frame set of locations and the end frame set of locations of stress, obtain the stress position characteristic parameter of described tested speech;
Voice unit division is carried out to described tested speech, calculates the duration of each voice unit respectively, obtain the voice unit duration characteristics parameter of described tested speech;
Extracted the pitch of described each frame data of tested speech by the auto-relativity function method in time domain, obtain the pitch parameters parameter of described tested speech.
6. spoken language pronunciation quality evaluating method as claimed in claim 5 online, is characterized in that, describedly evaluates described tested speech according to the characteristic parameter of described tested speech and the characteristic parameter of received pronunciation, obtains evaluation result, comprising:
According to the MFCC characteristic parameter of described tested speech, based on the probabilistic neural network integrated speech model of cognition of Segment Clustering, speech recognition is carried out to described tested speech, obtain voice identification result; And Similarity Measure is carried out to the MFCC characteristic parameter of described tested speech and the MFCC characteristic parameter of described received pronunciation, obtain MFCC related coefficient; According to institute's speech recognition result and described MFCC related coefficient, calculate the accuracy score of described tested speech;
According to the affective characteristics parameter of described tested speech, based on SVM emotion model, emotion recognition is carried out to described tested speech, obtain emotion recognition result; And Similarity Measure is carried out to the affective characteristics parameter of received pronunciation described in the affective characteristics parameter of described tested speech, obtain emotion related coefficient; According to described emotion recognition result and described emotion related coefficient, calculate the emotion score of described tested speech;
According to the pronunciation duration characteristics parameter of described received pronunciation and described tested speech, obtain the word speed ratio of described received pronunciation and described tested speech, and according to described word speed ratio, calculate the word speed score of described tested speech;
According to the stress position characteristic parameter of described tested speech and the stress position characteristic parameter of described received pronunciation, the stress position difference of tested speech described in comparison and described received pronunciation, and according to described stress position difference, calculate the stress score of described tested speech;
According to the voice unit duration characteristics parameter of described tested speech and the voice unit duration characteristics parameter of described received pronunciation, utilize dPVI algorithm, obtain the dPVI parameter of described tested speech, and according to described dPVI parameter, calculate the rhythm score of described tested speech;
According to the pitch parameters parameter of described tested speech and the pitch parameters parameter of described received pronunciation, utilize DTW algorithm, obtain the pitch differentiation of described received pronunciation and described tested speech, and according to described pitch differentiation, calculate the intonation score of described tested speech.
7. spoken language pronunciation quality evaluating method as claimed in claim 6 online, is characterized in that, describedly evaluates described tested speech according to the characteristic parameter of described tested speech and the characteristic parameter of received pronunciation, obtains evaluation result, also comprises:
Summation is weighted to described accuracy score, described emotion score, described word speed score, described stress score, described rhythm score and described intonation score, obtains integrate score; And according to described accuracy score, described emotion score, described word speed score, described stress score, described rhythm score, described intonation score and described integrate score, in conjunction with the mapping relations of each score and grade evaluation, obtain the class of accuracy evaluation of described tested speech, emotion grade evaluation, word speed grade evaluation, stress grade evaluation, rhythm grade evaluation, intonation grade evaluation and integrated level evaluation; And using the class of accuracy evaluation of described tested speech, emotion grade evaluation, word speed grade evaluation, stress grade evaluation, rhythm grade evaluation, intonation grade evaluation and the integrated level evaluation evaluation result as described tested speech.
8. online spoken language pronunciation quality evaluating method as claimed in claim 7, it is characterized in that, described method also comprises:
According to described evaluation result, the spoken language pronunciation of user is instructed, obtain pronunciation instruction;
Give described mobile client by described pronunciation instruction by network-feedback, and by described mobile client, described pronunciation instruction is shown.
9. an online spoken language pronunciation QA system, is characterized in that, comprises the mobile client and server end that are connected by network;
Described mobile client comprises:
Voice collecting unit, for collecting test voice, and sends to described server end by network by described tested speech;
Described server end comprises:
Pretreatment unit, for carrying out pre-service to the tested speech received;
Characteristic parameter extraction unit, for carrying out the extraction of speech characteristic parameter to pretreated tested speech, obtains the characteristic parameter of described tested speech;
Voice evaluation, for according to the characteristic parameter of described tested speech and the characteristic parameter of received pronunciation, evaluates described tested speech, obtains evaluation result; And give described mobile client by described evaluation result by network-feedback;
Described mobile client also comprises:
Data display unit, for showing described evaluation result.
10. online spoken language pronunciation QA system as claimed in claim 9, it is characterized in that, described system also comprises management of webpage end, and described management of webpage end is connected with described server end by network; Described server end also comprises database and statistical analysis unit;
Described database, for storing described evaluation result;
Described statistical analysis unit, for carrying out statistical study to evaluation result, obtains statistics; And described statistics is sent to described management of webpage end;
Described management of webpage end, for showing the statistics received.
CN201510102425.8A 2015-03-09 2015-03-09 A kind of online spoken language pronunciation quality evaluating method and system CN104732977B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510102425.8A CN104732977B (en) 2015-03-09 2015-03-09 A kind of online spoken language pronunciation quality evaluating method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510102425.8A CN104732977B (en) 2015-03-09 2015-03-09 A kind of online spoken language pronunciation quality evaluating method and system

Publications (2)

Publication Number Publication Date
CN104732977A true CN104732977A (en) 2015-06-24
CN104732977B CN104732977B (en) 2018-05-11

Family

ID=53456816

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510102425.8A CN104732977B (en) 2015-03-09 2015-03-09 A kind of online spoken language pronunciation quality evaluating method and system

Country Status (1)

Country Link
CN (1) CN104732977B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205763A (en) * 2015-11-06 2015-12-30 陈国庆 Teaching method and apparatus based on new media modes
CN105488142A (en) * 2015-11-24 2016-04-13 科大讯飞股份有限公司 Student score information input method and system
CN105513610A (en) * 2015-11-23 2016-04-20 南京工程学院 Voice analysis method and device
CN105608960A (en) * 2016-01-27 2016-05-25 广东外语外贸大学 Spoken language formative teaching method and system based on multi-parameter analysis
CN105741832A (en) * 2016-01-27 2016-07-06 广东外语外贸大学 Spoken language evaluation method based on deep learning and spoken language evaluation system
CN105825852A (en) * 2016-05-23 2016-08-03 渤海大学 Oral English reading test scoring method
CN106056989A (en) * 2016-06-23 2016-10-26 广东小天才科技有限公司 Language learning method and apparatus, and terminal device
CN106205634A (en) * 2016-07-14 2016-12-07 东北电力大学 A kind of spoken English in college level study and test system and method
CN106205635A (en) * 2016-07-13 2016-12-07 中南大学 Method of speech processing and system
CN106328168A (en) * 2016-08-30 2017-01-11 成都普创通信技术股份有限公司 Voice signal similarity detection method
CN106531182A (en) * 2016-12-16 2017-03-22 上海斐讯数据通信技术有限公司 Language learning system
CN106782609A (en) * 2016-12-20 2017-05-31 杨白宇 A kind of spoken comparison method
CN107067834A (en) * 2017-03-17 2017-08-18 麦片科技(深圳)有限公司 Point-of-reading system with oral evaluation function
CN107221318A (en) * 2017-05-12 2017-09-29 广东外语外贸大学 Oral English Practice pronunciation methods of marking and system
CN107342079A (en) * 2017-07-05 2017-11-10 谌勋 A kind of acquisition system of the true voice based on internet
CN108322791A (en) * 2018-02-09 2018-07-24 咪咕数字传媒有限公司 A kind of speech evaluating method and device
WO2018201688A1 (en) * 2017-05-05 2018-11-08 Boe Technology Group Co., Ltd. Microphone, vocal training apparatus comprising microphone and vocal analyzer, vocal training method, and non-transitory tangible computer-readable storage medium
WO2019075828A1 (en) * 2017-10-20 2019-04-25 深圳市鹰硕音频科技有限公司 Voice evaluation method and apparatus

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5867813A (en) * 1995-05-01 1999-02-02 Ascom Infrasys Ag. Method and apparatus for automatically and reproducibly rating the transmission quality of a speech transmission system
WO2005022786A1 (en) * 2003-09-03 2005-03-10 Huawei Technologies Co., Ltd. A method and apparatus for testing voice quality
CN101630448A (en) * 2008-07-15 2010-01-20 上海启态网络科技有限公司 Language learning client and system
CN102054375A (en) * 2009-11-09 2011-05-11 康俊义 Teaching main system for language ability
CN102800314A (en) * 2012-07-17 2012-11-28 广东外语外贸大学 English sentence recognizing and evaluating system with feedback guidance and method of system
CN103617799A (en) * 2013-11-28 2014-03-05 广东外语外贸大学 Method for detecting English statement pronunciation quality suitable for mobile device
CN103928023A (en) * 2014-04-29 2014-07-16 广东外语外贸大学 Voice scoring method and system
CN104050965A (en) * 2013-09-02 2014-09-17 广东外语外贸大学 English phonetic pronunciation quality evaluation system with emotion recognition function and method thereof
CN204117590U (en) * 2014-09-24 2015-01-21 广东外语外贸大学 Voice collecting denoising device and voice quality assessment system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5867813A (en) * 1995-05-01 1999-02-02 Ascom Infrasys Ag. Method and apparatus for automatically and reproducibly rating the transmission quality of a speech transmission system
WO2005022786A1 (en) * 2003-09-03 2005-03-10 Huawei Technologies Co., Ltd. A method and apparatus for testing voice quality
CN101630448A (en) * 2008-07-15 2010-01-20 上海启态网络科技有限公司 Language learning client and system
CN102054375A (en) * 2009-11-09 2011-05-11 康俊义 Teaching main system for language ability
CN102800314A (en) * 2012-07-17 2012-11-28 广东外语外贸大学 English sentence recognizing and evaluating system with feedback guidance and method of system
CN104050965A (en) * 2013-09-02 2014-09-17 广东外语外贸大学 English phonetic pronunciation quality evaluation system with emotion recognition function and method thereof
CN103617799A (en) * 2013-11-28 2014-03-05 广东外语外贸大学 Method for detecting English statement pronunciation quality suitable for mobile device
CN103928023A (en) * 2014-04-29 2014-07-16 广东外语外贸大学 Voice scoring method and system
CN204117590U (en) * 2014-09-24 2015-01-21 广东外语外贸大学 Voice collecting denoising device and voice quality assessment system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
曹文明 等: "《高维信息几何与语音分析》", 31 March 2011, 科学出版社 *
赵力: "《语音信号处理(第2版)》", 30 June 2009, 机械工业出版社 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205763A (en) * 2015-11-06 2015-12-30 陈国庆 Teaching method and apparatus based on new media modes
CN105513610A (en) * 2015-11-23 2016-04-20 南京工程学院 Voice analysis method and device
CN105488142A (en) * 2015-11-24 2016-04-13 科大讯飞股份有限公司 Student score information input method and system
CN105488142B (en) * 2015-11-24 2019-07-30 科大讯飞股份有限公司 Performance information input method and system
CN105608960A (en) * 2016-01-27 2016-05-25 广东外语外贸大学 Spoken language formative teaching method and system based on multi-parameter analysis
CN105741832A (en) * 2016-01-27 2016-07-06 广东外语外贸大学 Spoken language evaluation method based on deep learning and spoken language evaluation system
CN105741832B (en) * 2016-01-27 2020-01-07 广东外语外贸大学 Spoken language evaluation method and system based on deep learning
CN105825852A (en) * 2016-05-23 2016-08-03 渤海大学 Oral English reading test scoring method
CN106056989A (en) * 2016-06-23 2016-10-26 广东小天才科技有限公司 Language learning method and apparatus, and terminal device
CN106056989B (en) * 2016-06-23 2018-10-16 广东小天才科技有限公司 A kind of interactive learning methods and device, terminal device
CN106205635A (en) * 2016-07-13 2016-12-07 中南大学 Method of speech processing and system
CN106205634A (en) * 2016-07-14 2016-12-07 东北电力大学 A kind of spoken English in college level study and test system and method
CN106328168A (en) * 2016-08-30 2017-01-11 成都普创通信技术股份有限公司 Voice signal similarity detection method
CN106531182A (en) * 2016-12-16 2017-03-22 上海斐讯数据通信技术有限公司 Language learning system
CN106782609A (en) * 2016-12-20 2017-05-31 杨白宇 A kind of spoken comparison method
CN107067834A (en) * 2017-03-17 2017-08-18 麦片科技(深圳)有限公司 Point-of-reading system with oral evaluation function
US10499149B2 (en) 2017-05-05 2019-12-03 Boe Technology Group Co., Ltd. Microphone, vocal training apparatus comprising microphone and vocal analyzer, vocal training method, and non-transitory tangible computer-readable storage medium
WO2018201688A1 (en) * 2017-05-05 2018-11-08 Boe Technology Group Co., Ltd. Microphone, vocal training apparatus comprising microphone and vocal analyzer, vocal training method, and non-transitory tangible computer-readable storage medium
CN108806720A (en) * 2017-05-05 2018-11-13 京东方科技集团股份有限公司 Microphone, data processor, monitoring system and monitoring method
US20190124441A1 (en) * 2017-05-05 2019-04-25 Boe Technology Group Co., Ltd. Microphone, vocal training apparatus comprising microphone and vocal analyzer, vocal training method, and non-transitory tangible computer-readable storage medium
CN108806720B (en) * 2017-05-05 2019-12-06 京东方科技集团股份有限公司 Microphone, data processor, monitoring system and monitoring method
CN107221318A (en) * 2017-05-12 2017-09-29 广东外语外贸大学 Oral English Practice pronunciation methods of marking and system
CN107342079A (en) * 2017-07-05 2017-11-10 谌勋 A kind of acquisition system of the true voice based on internet
WO2019075828A1 (en) * 2017-10-20 2019-04-25 深圳市鹰硕音频科技有限公司 Voice evaluation method and apparatus
CN108322791A (en) * 2018-02-09 2018-07-24 咪咕数字传媒有限公司 A kind of speech evaluating method and device

Also Published As

Publication number Publication date
CN104732977B (en) 2018-05-11

Similar Documents

Publication Publication Date Title
Li et al. Automatic speaker age and gender recognition using acoustic and prosodic level information fusion
US7299188B2 (en) Method and apparatus for providing an interactive language tutor
Eyben et al. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing
CN102124515B (en) Speaker characterization through speech analysis
Strik et al. Comparing different approaches for automatic pronunciation error detection
Wu et al. Automatic speech emotion recognition using modulation spectral features
Kinnunen Spectral features for automatic text-independent speaker recognition
Sroka et al. Human and machine consonant recognition
Fernandez A computational model for the automatic recognition of affect in speech
Orozco-Arroyave et al. New Spanish speech corpus database for the analysis of people suffering from Parkinson's disease.
Ramamohan et al. Sinusoidal model-based analysis and classification of stressed speech
Schuller et al. Timing levels in segment-based speech emotion recognition
CN101201980B (en) Remote Chinese language teaching system based on voice affection identification
Orozco-Arroyave et al. Automatic detection of Parkinson's disease in running speech spoken in three different languages
CN102231278B (en) Method and system for realizing automatic addition of punctuation marks in speech recognition
CN103928023B (en) A kind of speech assessment method and system
Pao et al. Mandarin emotional speech recognition based on SVM and NN
Ververidis et al. Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections
Perrot et al. Voice disguise and automatic detection: review and perspectives
Kandali et al. Emotion recognition from Assamese speeches using MFCC features and GMM classifier
Alías et al. A review of physical and perceptual feature extraction techniques for speech, music and environmental sounds
Kane et al. Wavelet maxima dispersion for breathy to tense voice discrimination
Le et al. Investigation of spectral centroid features for cognitive load classification
CN104700843A (en) Method and device for identifying ages
Farrús et al. Using jitter and shimmer in speaker verification

Legal Events

Date Code Title Description
PB01 Publication
C06 Publication
SE01 Entry into force of request for substantive examination
C10 Entry into substantive examination
GR01 Patent grant
GR01 Patent grant