Summary of the invention
For this purpose, the technical problem to be solved by the present invention is to during spoken language exercise, the spoken language exercise, such as English
How language provides a user the Speech Assessment result of user's content of interest.
According to the first aspect of the invention, a kind of Speech Assessment Methods are provided, for the voice to user's content of interest
Carry out Speech Assessment, comprising the following steps:
Step S101, input voice obtains, for passing through electronics in the spoken language exercise link that user carries out language learning
The sound pick-up outfit of device obtains the voice input of user;
Step S102, voice unit divide, and for carrying out basic voice unit division to the voice recorded, form voice
Unit sequence;
Step S103, musical note feature obtain, and for analyzing the speech unit sequence, obtain institute's speech units
The musical note feature of sequence;
Step S104, content to be evaluated determine, for carrying out feature calculation to the musical note feature extracted, if calculating knot
Fruit meets predetermined condition, then using the voice unit to conform to a predetermined condition as content to be evaluated;
Step S105, Speech comparison analysis, for obtaining the musical note feature of content to be evaluated, and by the musical note feature with
The received pronunciation of voice prediction model prediction compares and analyzes;
Step S106, comparing result generate, Speech comparison result are labeled on user speech text, user is supplied to.
The basic voice unit can be syllable, phoneme etc., by the division to the voice, obtain institute's recorded speech
Basic voice unit, and form speech unit sequence.
The musical note feature of the speech unit sequence includes prosodic features and syllable characteristic,
The prosodic features includes the boundary characteristic of each basic voice unit, pronunciation duration, adjacent basic voice unit
Between dead time and the pronunciation duration of entire speech unit sequence etc.;
The syllable characteristic includes the pronunciation of each basic voice unit.
In the step S104, branch can be obtained using optimal to the calculating of the musical note feature of the speech unit sequence
The calculation method of diameter, concrete operations are as follows:
By the musical note feature of the speech unit sequence extracted, branch is obtained using the calculating of trained acoustic model is optimal
Diameter;
If optimal obtain includes the content to be evaluated to be detected in sub-path, it is determined that detected content to be evaluated.
The optimal calculation formula for obtaining sub-path is:
Wherein,
X represents the musical note feature vector of the speech unit sequence, and W represents the maximum optimal word sequence of score;
Conditional probability P (X | W) it is acoustic model scores, it is calculated by trained acoustic model;
Prior probability P (W) is language model scores, as to Penalty added by different acoustic models.
The musical note feature of the content to be evaluated can also include the musical note feature of context etc. of content to be evaluated.
The step S105 further comprises:
Basic voice unit division is carried out to the user speech recorded;
Corresponding musical note feature to be evaluated is extracted from speech unit sequence;
Corresponding prediction model is loaded for different musical note features, predicts corresponding standard pronunciation;
The musical note feature of the musical note feature of user speech and standard pronunciation is compared, corresponding evaluation result is obtained.
According to the second aspect of the invention, a kind of Speech Assessment device is provided, for the voice to user's content of interest
Speech Assessment is carried out, including input voice obtains module, information storage module, voice unit division module, musical note feature and obtains
Module, content determination module to be evaluated, Speech comparison analysis module, evaluation module and comparing result generation module, in which:
It inputs voice and obtains module, the voice for obtaining user inputs, and the user speech of recording is stored to information
In memory module;
Voice unit division module obtains the recording language for carrying out basic voice unit division to institute's recorded speech
The speech unit sequence of sound;
Musical note feature obtains module, for carrying out feature extraction to the speech unit sequence, obtains institute's speech units
The musical note feature of sequence;
Content determination module to be evaluated, for carrying out feature calculation to the musical note feature extracted, if calculated result is full
Sufficient predetermined condition, then using qualified voice unit as content to be evaluated;
Speech comparison analysis module, for obtaining the musical note feature of content to be evaluated, and by the musical note feature and voice
The received pronunciation of prediction model prediction compares and analyzes;
Comparing result generation module is supplied to user for Speech Assessment result to be labeled in user speech text.
The basic voice unit can be syllable, phoneme etc., by the division to the voice, obtain institute's recorded speech
Basic voice unit and speech unit sequence.
The musical note feature of the speech unit sequence includes prosodic features and syllable characteristic,
Prosodic features includes between the boundary characteristic of each basic voice unit, pronunciation duration, adjacent basic voice unit
The pronunciation duration of dead time and entire speech unit sequence;
The syllable characteristic includes the pronunciation of each basic voice unit.
For the content determination module to be evaluated, optimal obtain can be used to the calculating of the musical note feature of speech unit sequence
The calculation method of sub-path, comprising:
The musical note feature that obtained speech unit sequence will be extracted obtains branch using the calculating of trained acoustic model is optimal
Diameter;
If optimal obtain includes the content to be evaluated to be detected in sub-path, it is determined that detected content to be evaluated.
The optimal calculation formula for obtaining sub-path is:
Wherein,
X represents the musical note feature vector of the speech unit sequence, and W represents the maximum optimal word sequence of score;
Conditional probability P (X | W) it is acoustic model scores, it is calculated by trained acoustic model;
Prior probability P (W) is language model scores, as to Penalty added by different acoustic models.
The musical note feature of the content to be evaluated can also include the musical note feature of the context of content to be evaluated.
For the Speech comparison analysis module, include: using the operation that voice prediction model carries out Speech Assessment
Basic voice unit division is carried out to the user speech recorded;
Corresponding musical note feature to be evaluated is extracted from speech unit sequence;
Corresponding prediction model is loaded for different musical note features, predicts corresponding standard pronunciation;
The musical note feature of the musical note feature of user speech and standard pronunciation is compared, corresponding evaluation result is obtained.
The Speech Assessment device further includes display module, for that will have the user speech text of Speech Assessment result mark
Originally it is shown to user.
According to the third aspect of the invention we, a kind of computer readable storage medium is provided, computer program is stored thereon with,
The step in method as described above is realized when described program is executed by processor.
Speech Assessment Methods and device through the invention only analyze user when user reads aloud whole sentence or whole section of article
Partial content of interest provides corresponding evaluation result, to reduce the number of system on the basis of improving user's focus
According to amount of analysis, system resource is saved.
Specific embodiment
Before exemplary embodiment is discussed in greater detail, it should be noted that some exemplary embodiments are described as
The processing or method described as flow chart.Although operations are described as the processing of sequence by flow chart, therein to be permitted
Multioperation can be implemented concurrently, concomitantly or simultaneously.In addition, the sequence of operations can be rearranged.When it
The processing can be terminated when operation completion, it is also possible to have the additional step being not included in attached drawing.
Alleged " Speech Assessment device " is " computer equipment " within a context, and referring to can be by running preset program
Or instruct to execute the intelligent electronic device of the predetermined process processes such as numerical value calculating and/or logic calculation, it may include processing
Device and memory execute the survival prestored in memory instruction by processor to execute predetermined process process, or by ASIC,
The hardware such as FPGA, DSP execute predetermined process process, or are realized by said two devices combination.
The computer equipment includes user equipment and/or the network equipment.Wherein, the user equipment includes but is not limited to
Computer, smart phone, PDA etc.;The network equipment includes but is not limited to single network server, multiple network servers composition
Server group or be based on cloud computing (Cloud Computing) cloud consisting of a large number of computers or network servers,
In, cloud computing is one kind of distributed computing, a super virtual computer consisting of a loosely coupled set of computers.
Wherein, the computer equipment can isolated operation realize the present invention, also can access network and by with other meters in network
The interactive operation of machine equipment is calculated to realize the present invention.Wherein, network locating for the computer equipment includes but is not limited to interconnect
Net, wide area network, Metropolitan Area Network (MAN), local area network, VPN network etc..
Those skilled in the art will be understood that heretofore described " Speech Assessment device " can be only user equipment,
Corresponding operation is executed by user equipment;It is also possible to be integrated by user equipment and the network equipment or server come group
At being matched by user equipment with the network equipment to execute corresponding operation.
It should be noted that the user equipment, the network equipment and network etc. are only for example, other are existing or from now on may be used
The computer equipment or network that can occur such as are applicable to the present invention, should also be included within the scope of protection of the present invention, and to draw
It is incorporated herein with mode.
Here, those skilled in the art will be understood that present invention can apply to mobile terminals and non-moving end, for example, when using
When family uses mobile phone or PC, it can be provided and be presented using method or apparatus of the present invention.
Specific structure and function details disclosed herein are only representative, and are for describing the present invention show
The purpose of example property embodiment.But the present invention can be implemented by many alternative forms, and be not interpreted as
It is limited only by the embodiments set forth herein.
It should further be mentioned that the function action being previously mentioned can be attached according to being different from some replace implementations
The sequence indicated in figure occurs.For example, related function action is depended on, the two width figures shown in succession actually may be used
Substantially simultaneously to execute or can execute in a reverse order sometimes.
Present invention is further described in detail with reference to the accompanying drawing.
Fig. 1 shows the flow chart of Speech Assessment Methods of the invention.The method is used for user's content of interest
Voice carries out Speech Assessment.
Firstly, in step S101, input voice is obtained, in the spoken language exercise link that user carries out language learning,
The voice input of user is obtained by the sound pick-up outfit of electronic device.
Such as user, after having learnt the new content of courses, such as new word or expression can enter the pronunciation exercises stage.
In the pronunciation exercises stage, other than allowing user individually practice with reading to learned new content, general teaching process is also
User can be allowed to experience the utilization of content in authentic context, that is, provide one section of sentence or short for learning new content comprising the institute
Text is read aloud by user.For example, current the learned new content of user is word " platform ", user is learning the list
After the paraphrase and pronunciation of word, teaching software further provides example sentence " the The train now comprising the word
Standing at platform 1is for Leeds ", carries out reading aloud practice for user.At this point, user is more concern
In the example sentence, whether standard is smooth for pronunciation of the user to learned new word, to understand the grasp situation of its content, and
Whether the pronunciation for not too much paying close attention to other words in example sentence is accurate.Therefore, in order to provide user in the example sentence about
The pronunciation of " platform " is evaluated, and before user starts to read aloud the content, starts the recording device of teaching equipment, make its into
Enter recording state, and carries out the voice of Record and Save user to user speech when user reads aloud.
In step S102, voice unit is divided, and for carrying out basic voice unit division to the voice recorded, forms base
This basic speech unit sequence.
The basic voice unit can be syllable, phoneme etc., by the division to the voice, obtain institute's recorded speech
Basic voice unit and speech unit sequence.
Different speech recognition systems will such as be based on MFCC (Mel-Frequency based on different acoustic features
Cepstrum Coefficients, MFCC cepstrum) feature acoustic model, be based on PLP (Perceptual Linear
Predictive, perceive linear prediction) feature acoustic model etc., or use different acoustic models such as HMM-GMM (Hidden
Markov Model-Gaussian Mixture Model, hidden Markov model-gauss hybrid models), be based on DBN
The neural network acoustic model etc. of (Dynamic Beyesian Network, dynamic bayesian network), or use different solutions
Code mode such as Viterbi search, A* search etc. decode voice signal.
Step S103, musical note feature obtain, and for analyzing the speech unit sequence, obtain institute's speech units
The musical note feature of sequence.
The musical note feature includes prosodic features and syllable characteristic, and prosodic features includes the boundary of each basic voice unit
The pronunciation duration of dead time and entire speech unit sequence between feature, pronunciation duration, adjacent basic voice unit.It is described
Syllable characteristic includes the pronunciation of each basic voice unit.
Step S104, content to be evaluated determine, for carrying out feature calculation to the musical note feature extracted, if calculating knot
Fruit meets predetermined condition, then using qualified voice unit as content to be evaluated.
The optimal calculation method for obtaining sub-path, the musical note feature that extraction is obtained, benefit can be used to the calculating of musical note feature
Sub-path is obtained with the calculating of trained acoustic model is optimal, if optimal obtain includes the content to be evaluated to be detected in sub-path,
Then determination has detected content to be evaluated.The optimal calculation formula for obtaining sub-path is:
Wherein, X represents the musical note feature vector of the speech unit sequence, and W represents the maximum optimal word sequence of score;Item
Part probability P (X | W) it is acoustic model scores, it is calculated by trained acoustic model;Prior probability P (W) is language mould
Type score, as to Penalty added by different acoustic models.
For example, passing through example sentence " the The train now standing at platform 1is read aloud user
The optimal score path computing of for Leeds ", wherein the calculating score of " platform " is maximum, thus determine that being optimal word
Sequence, therefore " platform " is used as content to be evaluated.
Step S105, Speech comparison analysis, for obtaining the musical note feature of content to be evaluated, and by the musical note feature with
The received pronunciation of voice prediction model prediction compares and analyzes.
In the step, the musical note feature of content to be evaluated is obtained, such as obtain the musical note feature of " platform ".It will
The musical note feature and the received pronunciation of voice prediction model prediction compare and analyze, and provide user about described to be evaluated interior
The evaluation result of appearance.
In order to further appreciate that user reads aloud the fluency situation with evaluation content, the musical note feature can also be wrapped
Include the musical note feature of the context of content to be evaluated.For example, when carrying out pronunciation evaluation to " platform ",
The musical note feature of " platform " further includes its context in addition to the musical note feature including " platform " word itself
Musical note feature, i.e. " at ", the musical note feature of " 1 " provide pass by the comparative analysis to factors such as pronunciation duration, dead times
In the evaluation result for reading aloud fluency.
Existing Speech Evaluation Technique can be used using the method that voice prediction model carries out Speech Assessment, i.e., to being recorded
User speech carry out basic voice unit division, corresponding musical note feature to be evaluated is extracted from speech unit sequence, for not
With musical note feature load corresponding prediction model, predict corresponding standard pronunciation, then by the musical note feature of user speech with
The musical note feature of standard pronunciation compares, and obtains corresponding evaluation result.
Step S106, comparing result generate, and for Speech comparison result to be labeled in user speech text, are supplied to use
Family.
In the step, step S105 the is obtained evaluation knot with the comparison of voice prediction model prediction received pronunciation
Fruit is labeled on the speech text using visual mode, is shown to user.User by shown evaluation result,
It is whether accurate, whether smooth to understand pronunciation of the learned new content in entire paragraph.
Fig. 2 shows the schematic diagrames of Speech Assessment device according to an embodiment of the present invention.The Speech Assessment device is used for
Realize that Speech Assessment Methods of the invention, the Speech Assessment device include, input voice obtains module 1, information storage module
2, voice unit division module 3, musical note feature obtain module 4, content determination module to be evaluated 5, Speech comparison analysis module 6,
Comparing result generation module 7, display module 8 and voice prediction model 9.
User obtains module in the spoken language exercise link for carrying out language learning, through the input voice of Speech Assessment device
1 obtains the voice input of user, and institute's recorded speech is deposited into information storage module 2.
For example, user, after having learnt the new content of courses, such as new word or expression can enter pronunciation exercises rank
Section.In the pronunciation exercises stage, other than allowing user individually practice with reading to learned new content, general teaching process
Can also allow user to experience the utilization of content in authentic context, that is, provide one section comprising the learned new content sentence or
Short essay is read aloud by user.For example, current the learned new content of user is word " platform ", user learnt it is described
After the paraphrase and pronunciation of word, teaching software further provides example sentence " the The train now comprising the word
Standing at platform 1is for Leeds ", carries out reading aloud practice for user.At this point, user is more concern
In the example sentence, whether standard is smooth for pronunciation of the user to learned new word, to understand the grasp situation of its content, and
Whether the pronunciation for not too much paying close attention to other words in example sentence is accurate.Therefore, in order to provide user in the example sentence about
The pronunciation of " platform " is evaluated, and before user starts to read aloud the content, the input voice for starting teaching equipment obtains mould
Block 1 makes it into recording state, and carries out the voice of Record and Save user to user speech when user reads aloud.
Voice unit division module 3 carries out basic voice unit division to the voice recorded for user.
The basic voice unit can be syllable, phoneme etc., by the division to the voice, obtain institute's recorded speech
Basic voice unit and speech unit sequence.
Different speech recognition systems will such as be based on MFCC (Mel-Frequency based on different acoustic features
Cepstrum Coefficients, MFCC cepstrum) feature acoustic model, be based on PLP (Perceptual Linear
Predictive, perceive linear prediction) feature acoustic model etc., or use different acoustic models such as HMM-GMM (Hidden
Markov Model-Gaussian Mixture Model, hidden Markov model-gauss hybrid models), be based on DBN
The neural network acoustic model etc. of (Dynamic Beyesian Network, dynamic bayesian network), or use different solutions
Code mode such as Viterbi search, A* search etc. decode voice signal.
Musical note feature obtains module 4 and obtains the speech unit sequence for analyzing the speech unit sequence
Musical note feature.
The musical note feature includes prosodic features and syllable characteristic, and prosodic features includes the boundary of each basic voice unit
The pronunciation duration of dead time and entire speech unit sequence between feature, pronunciation duration, adjacent basic voice unit.It is described
Syllable characteristic includes the pronunciation of each basic voice unit.
Content determination module 5 to be evaluated, for carrying out feature calculation to the musical note feature extracted, if calculated result is full
Sufficient predetermined condition, then using qualified voice unit as content to be evaluated.
The optimal calculation method for obtaining sub-path, the musical note feature that extraction is obtained, benefit can be used to the calculating of musical note feature
Sub-path is obtained with the calculating of trained acoustic model is optimal, if optimal obtain includes the content to be evaluated to be detected in sub-path,
Then determination has detected content to be evaluated.The optimal calculation formula for obtaining sub-path is:
Wherein, X represents the musical note feature vector of the speech unit sequence, and W represents the maximum optimal word sequence of score;Item
Part probability P (X | W) it is acoustic model scores, it is calculated by trained acoustic model;Prior probability P (W) is language mould
Type score, as to Penalty added by different acoustic models.
For example, passing through example sentence " the The train now standing at platform 1is read aloud user
The optimal score path computing of for leeds. ", wherein the calculating score of " platform " is maximum, thus determine that being optimal word
Sequence, therefore " platform " is used as content to be evaluated.
Speech comparison analysis module 6, for obtaining the musical note feature of content to be evaluated, and by the musical note feature and voice
The received pronunciation that prediction model 9 is predicted compares and analyzes.
Speech comparison analysis module 6 obtains the musical note feature of content to be evaluated, such as obtains the musical note spy of " platform "
Sign.The received pronunciation that the musical note feature is predicted with voice prediction model 9 is compared and analyzed, provide user about it is described to
The evaluation result of evaluation content.
In order to further appreciate that user reads aloud the fluency situation with evaluation content, the musical note feature can also be wrapped
Include the musical note feature of the context of content to be evaluated.For example, when carrying out pronunciation evaluation to " platform ",
The musical note feature of " platform " further includes its context in addition to the musical note feature including " platform " word itself
Musical note feature, i.e. " at ", the musical note feature of " 1 " provide pass by the comparative analysis to factors such as pronunciation duration, dead times
In the evaluation result for reading aloud fluency.
Existing Speech Evaluation Technique can be used using the method that voice prediction model carries out Speech Assessment, i.e., to being recorded
User speech carry out basic voice unit division, corresponding musical note feature to be evaluated is extracted from speech unit sequence, for not
With musical note feature load corresponding prediction model, predict corresponding standard pronunciation, then by the musical note feature of user speech with
The musical note feature of standard pronunciation compares, and obtains corresponding evaluation result.
Speech comparison result is labeled on user speech text, is supplied to user by comparing result generation module 7.
In order to be labeled to the read text of user, comparing result generation module 7 obtains Speech comparison analysis module 6 and is given
Speech Assessment out is shown to use by display module 8 as a result, be labeled on the read text of user using visual mode
Family.User understands whether pronunciation of the learned new content in entire paragraph be accurate, whether flows by shown evaluation result
Freely.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can
It is completed with instructing relevant hardware by computer program, described program can store in a computer readable storage medium
In, and executed by processor.Computer readable storage medium may include: read-only memory (ROM, Read Only
Memory), random access memory (RAM, Random Access Memory), disk or CD etc..
Better embodiment of the invention is described above, it is intended to so that spirit of the invention is more clear and convenient for managing
Solution, is not meant to limit the present invention, all within the spirits and principles of the present invention, modification, replacement, the improvement made should all
Within the protection scope that appended claims of the invention is summarized.