CN109697975A

CN109697975A - A kind of Speech Assessment Methods and device

Info

Publication number: CN109697975A
Application number: CN201710981866.9A
Authority: CN
Inventors: 宾晓皎; 李明; 蔡泽鑫
Original assignee: Shenzhen Yingshuo Audio Technology Co Ltd
Current assignee: Shenzhen Yingshuo Intelligent Technology Co.,Ltd.
Priority date: 2017-10-20
Filing date: 2017-10-20
Publication date: 2019-04-30
Anticipated expiration: 2037-10-20
Also published as: CN109697975B; WO2019075827A1

Abstract

The present invention provides a kind of Speech Assessment Methods and device, comprising: step S101 obtains input voice, for carrying out the spoken language exercise link of language learning in user, passes through the voice input that sound pick-up outfit obtains user；Step S102 divides voice unit, for carrying out basic voice unit division to the voice recorded, forms speech unit sequence；Step S103 obtains musical note feature, for analyzing the speech unit sequence, obtains musical note feature；Step S104 determines content to be evaluated, for carrying out feature calculation to the musical note feature extracted, if calculated result meets predetermined condition, using qualified voice unit as content to be evaluated；Step S105, Speech comparison analysis, compares and analyzes for obtaining the musical note feature of content to be evaluated, and by the musical note feature and the received pronunciation of voice prediction model；Step S106 generates comparing result for Speech comparison result to be labeled in user speech text and is supplied to user.

Description

A kind of Speech Assessment Methods and device

Technical field

The present invention relates to Teaching Technology of Multimedia field more particularly to a kind of languages for verbal learning in multimedia teaching Sound evaluation method and device.

Background technique

Language plays a very important role in life and work as a kind of media of communication, and whether student is learning In the stage of work, verbal learning is all the learning Content that people pay much attention to by the stage of school study or people.And with net The continuous of network teaching is popularized, and the mode of online teaching by users because by the constraint of time and place of giving lessons, not liked. Therefore, many users are more willing to spend one's leisure at present, carry out language learning by network.And user is carrying out language learning When, when study to new word or expression carries out spoken connection, in addition to merely to word or expression progress pronunciation exercises it Outside, can also spoken language exercise be carried out to the sentence comprising the word or expression.

It is evaluated to solve the above problems, being currently suggested according to voice of the voice prediction model to student. CN101197084A discloses a kind of automatic spoken English evaluating and learning system, it is characterised in that the system includes detection mouth Language pronunciation part, the detection spoken language pronunciation part divide the foundation the following steps are included: (1) standard pronunciation people's corpus: 1) seeking Look for English Standard speaker；2) principle design the first recording text balanced according to Oral English Practice study requirement and phoneme；3) it marks Quasi- speaker control recording text is recorded；(2) collection of oral evaluation corpus: in simulation English study software application ring Under border, design the second recording text is required according to English study, while finding general speaker, and to the spoken language of general speaker Pronunciation is recorded；(3) mark of oral evaluation corpus: whether just expert marks the pronunciation of phoneme in each word in detail Really；(4) foundation of received pronunciation acoustic model: based in standard pronunciation people's corpus recording and its associated text, instruction Practice the acoustic model of received pronunciation；(5) it calculates the error detection parameter of voice: 1) extracting the MFCC cepstrum parameter of voice；2) base The recording of general speaker and its corresponding aligned phoneme sequence of text in standard acoustic model, and evaluation and test corpus, will be to one As the automatic segmentation of speaker speech data at each segment as unit of phoneme, while based on master pattern each sound is calculated First likelihood value of the Duan Zuowei phoneme；3) it is identified with each segment of the standard acoustic model to general speaker speech, Second likelihood value of the segment as recognition result phoneme is calculated based on standard acoustic model simultaneously；4) seemingly by segment first So value obtains the likelihood ratio of the segment, the error detection parameter as the sound bite divided by the second likelihood value；(6) error detection ginseng is established Number marks the error detection mapping model of pronunciation mistake to expert: on a batch evaluation and test voice, each segment being evaluated and tested parameter and sound The formant sequence of section and the detailed mark of expert are associated, and are obtained above-mentioned parameter with the method for statistics and are marked in detail with expert The corresponding relationship of note saves these relationships as the error detection mapping model between error label of pronouncing from error detection parameter to expert.

CN101650886A discloses a kind of method of automatic detection reading errors of language learners, which is characterized in that packet Containing following steps: 1) front-end processing: pre-processing input voice, carries out feature extraction, and extracted feature is MFCC feature Vector；2) building simplify search space: the content that user to be read aloud as Key for Reference, and according to Key for Reference, pronounce The search space that dictionary, more pronunciation models and acoustics model construction are simplified；3) language model is read aloud in building: according to Key for Reference structure That builds user reads aloud language model, which describes the context that user may read aloud when reading aloud the reference statement Content and its probabilistic information；4) it searches for: in search space, according to acoustic model, reading aloud language model and more pronunciation models are searched Rope obtains actually reading aloud resultant content with the most matched paths of characteristic vector stream of input as user, being made into identification As a result sequence；5) it is aligned: the Key for Reference is aligned with recognition result, obtain user's mostly reading, skip, detection mispronounced As a result.

In the prior art, when user carry out pronunciation exercises when, mostly use the mode of recording greatly, user with reading it is rear to Whether family playback, it is accurate to be pronounced by user's self-assessment；Or online teaching is carried out by teacher, the pronunciation for user is given It instructs and suggests out.This mode can only allow user in the pronunciation situation of subjective perception oneself, and it is effectively accurate to provide Evaluation result.In recent years, the Speech Assessment Methods of the Web-based instruction come by the way of carrying out Characteristic Contrast with received pronunciation Evaluate the pronunciation of user.For example, pre-processing to input voice, feature extraction is carried out；The content that user to be read aloud is made For Key for Reference, and the search space simplified according to Key for Reference, Pronounceable dictionary, more pronunciation models and acoustics model construction；? In search space, according to acoustic model, language model is read aloud and multiple sound pattern search is obtained with the characteristic vector stream of input most A matched paths actually read aloud resultant content as user, are made into recognition result sequence；By the Key for Reference and know Other result is aligned, and user mostly reading, skip, the testing result mispronounced are obtained.

Although above-mentioned evaluation method can provide the pronunciation evaluation result of user speech, the evaluation result provided is often The analysis for all voices that user is read aloud is as a result, and what user may be more concerned about sometimes is learned new word or expression entire Whether the pronunciation in sentence or paragraph is accurate smooth, and the pronunciation for other parts is not the emphasis of its concern.

Therefore, it is necessary to provide a kind of Speech Assessment Methods, when user reads aloud whole sentence or whole section of article, user is only analyzed Partial content of interest provides corresponding evaluation result, to reduce the number of system on the basis of improving user's focus According to amount of analysis, system resource is saved.

Summary of the invention

For this purpose, the technical problem to be solved by the present invention is to during spoken language exercise, the spoken language exercise, such as English How language provides a user the Speech Assessment result of user's content of interest.

According to the first aspect of the invention, a kind of Speech Assessment Methods are provided, for the voice to user's content of interest Carry out Speech Assessment, comprising the following steps:

Step S101, input voice obtains, for passing through electronics in the spoken language exercise link that user carries out language learning The sound pick-up outfit of device obtains the voice input of user；

Step S102, voice unit divide, and for carrying out basic voice unit division to the voice recorded, form voice Unit sequence；

Step S103, musical note feature obtain, and for analyzing the speech unit sequence, obtain institute's speech units The musical note feature of sequence；

Step S104, content to be evaluated determine, for carrying out feature calculation to the musical note feature extracted, if calculating knot Fruit meets predetermined condition, then using the voice unit to conform to a predetermined condition as content to be evaluated；

Step S105, Speech comparison analysis, for obtaining the musical note feature of content to be evaluated, and by the musical note feature with The received pronunciation of voice prediction model prediction compares and analyzes；

Step S106, comparing result generate, Speech comparison result are labeled on user speech text, user is supplied to.

The basic voice unit can be syllable, phoneme etc., by the division to the voice, obtain institute's recorded speech Basic voice unit, and form speech unit sequence.

The musical note feature of the speech unit sequence includes prosodic features and syllable characteristic,

The prosodic features includes the boundary characteristic of each basic voice unit, pronunciation duration, adjacent basic voice unit Between dead time and the pronunciation duration of entire speech unit sequence etc.；

The syllable characteristic includes the pronunciation of each basic voice unit.

In the step S104, branch can be obtained using optimal to the calculating of the musical note feature of the speech unit sequence The calculation method of diameter, concrete operations are as follows:

By the musical note feature of the speech unit sequence extracted, branch is obtained using the calculating of trained acoustic model is optimal Diameter；

If optimal obtain includes the content to be evaluated to be detected in sub-path, it is determined that detected content to be evaluated.

The optimal calculation formula for obtaining sub-path is:

Wherein,

X represents the musical note feature vector of the speech unit sequence, and W represents the maximum optimal word sequence of score；

Conditional probability P (X | W) it is acoustic model scores, it is calculated by trained acoustic model；

Prior probability P (W) is language model scores, as to Penalty added by different acoustic models.

The musical note feature of the content to be evaluated can also include the musical note feature of context etc. of content to be evaluated.

The step S105 further comprises:

Basic voice unit division is carried out to the user speech recorded；

Corresponding musical note feature to be evaluated is extracted from speech unit sequence；

Corresponding prediction model is loaded for different musical note features, predicts corresponding standard pronunciation；

The musical note feature of the musical note feature of user speech and standard pronunciation is compared, corresponding evaluation result is obtained.

According to the second aspect of the invention, a kind of Speech Assessment device is provided, for the voice to user's content of interest Speech Assessment is carried out, including input voice obtains module, information storage module, voice unit division module, musical note feature and obtains Module, content determination module to be evaluated, Speech comparison analysis module, evaluation module and comparing result generation module, in which:

It inputs voice and obtains module, the voice for obtaining user inputs, and the user speech of recording is stored to information In memory module；

Voice unit division module obtains the recording language for carrying out basic voice unit division to institute's recorded speech The speech unit sequence of sound；

Musical note feature obtains module, for carrying out feature extraction to the speech unit sequence, obtains institute's speech units The musical note feature of sequence；

Content determination module to be evaluated, for carrying out feature calculation to the musical note feature extracted, if calculated result is full Sufficient predetermined condition, then using qualified voice unit as content to be evaluated；

Speech comparison analysis module, for obtaining the musical note feature of content to be evaluated, and by the musical note feature and voice The received pronunciation of prediction model prediction compares and analyzes；

Comparing result generation module is supplied to user for Speech Assessment result to be labeled in user speech text.

The basic voice unit can be syllable, phoneme etc., by the division to the voice, obtain institute's recorded speech Basic voice unit and speech unit sequence.

Prosodic features includes between the boundary characteristic of each basic voice unit, pronunciation duration, adjacent basic voice unit The pronunciation duration of dead time and entire speech unit sequence；

For the content determination module to be evaluated, optimal obtain can be used to the calculating of the musical note feature of speech unit sequence The calculation method of sub-path, comprising:

The musical note feature that obtained speech unit sequence will be extracted obtains branch using the calculating of trained acoustic model is optimal Diameter；

The optimal calculation formula for obtaining sub-path is:

Wherein,

The musical note feature of the content to be evaluated can also include the musical note feature of the context of content to be evaluated.

For the Speech comparison analysis module, include: using the operation that voice prediction model carries out Speech Assessment

Basic voice unit division is carried out to the user speech recorded；

The Speech Assessment device further includes display module, for that will have the user speech text of Speech Assessment result mark Originally it is shown to user.

According to the third aspect of the invention we, a kind of computer readable storage medium is provided, computer program is stored thereon with, The step in method as described above is realized when described program is executed by processor.

Speech Assessment Methods and device through the invention only analyze user when user reads aloud whole sentence or whole section of article Partial content of interest provides corresponding evaluation result, to reduce the number of system on the basis of improving user's focus According to amount of analysis, system resource is saved.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, institute in being described below to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without creative efforts, can also implement according to the present invention The content of example and these attached drawings obtain other attached drawings.

Fig. 1 is the flow chart of Speech Assessment Methods according to the present invention；With

Fig. 2 is the schematic diagram of Speech Assessment device according to the present invention.

Specific embodiment

Before exemplary embodiment is discussed in greater detail, it should be noted that some exemplary embodiments are described as The processing or method described as flow chart.Although operations are described as the processing of sequence by flow chart, therein to be permitted Multioperation can be implemented concurrently, concomitantly or simultaneously.In addition, the sequence of operations can be rearranged.When it The processing can be terminated when operation completion, it is also possible to have the additional step being not included in attached drawing.

Alleged " Speech Assessment device " is " computer equipment " within a context, and referring to can be by running preset program Or instruct to execute the intelligent electronic device of the predetermined process processes such as numerical value calculating and/or logic calculation, it may include processing Device and memory execute the survival prestored in memory instruction by processor to execute predetermined process process, or by ASIC, The hardware such as FPGA, DSP execute predetermined process process, or are realized by said two devices combination.

The computer equipment includes user equipment and/or the network equipment.Wherein, the user equipment includes but is not limited to Computer, smart phone, PDA etc.；The network equipment includes but is not limited to single network server, multiple network servers composition Server group or be based on cloud computing (Cloud Computing) cloud consisting of a large number of computers or network servers, In, cloud computing is one kind of distributed computing, a super virtual computer consisting of a loosely coupled set of computers. Wherein, the computer equipment can isolated operation realize the present invention, also can access network and by with other meters in network The interactive operation of machine equipment is calculated to realize the present invention.Wherein, network locating for the computer equipment includes but is not limited to interconnect Net, wide area network, Metropolitan Area Network (MAN), local area network, VPN network etc..

Those skilled in the art will be understood that heretofore described " Speech Assessment device " can be only user equipment, Corresponding operation is executed by user equipment；It is also possible to be integrated by user equipment and the network equipment or server come group At being matched by user equipment with the network equipment to execute corresponding operation.

It should be noted that the user equipment, the network equipment and network etc. are only for example, other are existing or from now on may be used The computer equipment or network that can occur such as are applicable to the present invention, should also be included within the scope of protection of the present invention, and to draw It is incorporated herein with mode.

Here, those skilled in the art will be understood that present invention can apply to mobile terminals and non-moving end, for example, when using When family uses mobile phone or PC, it can be provided and be presented using method or apparatus of the present invention.

Specific structure and function details disclosed herein are only representative, and are for describing the present invention show The purpose of example property embodiment.But the present invention can be implemented by many alternative forms, and be not interpreted as It is limited only by the embodiments set forth herein.

It should further be mentioned that the function action being previously mentioned can be attached according to being different from some replace implementations The sequence indicated in figure occurs.For example, related function action is depended on, the two width figures shown in succession actually may be used Substantially simultaneously to execute or can execute in a reverse order sometimes.

Present invention is further described in detail with reference to the accompanying drawing.

Fig. 1 shows the flow chart of Speech Assessment Methods of the invention.The method is used for user's content of interest Voice carries out Speech Assessment.

Firstly, in step S101, input voice is obtained, in the spoken language exercise link that user carries out language learning, The voice input of user is obtained by the sound pick-up outfit of electronic device.

Such as user, after having learnt the new content of courses, such as new word or expression can enter the pronunciation exercises stage. In the pronunciation exercises stage, other than allowing user individually practice with reading to learned new content, general teaching process is also User can be allowed to experience the utilization of content in authentic context, that is, provide one section of sentence or short for learning new content comprising the institute Text is read aloud by user.For example, current the learned new content of user is word " platform ", user is learning the list After the paraphrase and pronunciation of word, teaching software further provides example sentence " the The train now comprising the word Standing at platform 1is for Leeds ", carries out reading aloud practice for user.At this point, user is more concern In the example sentence, whether standard is smooth for pronunciation of the user to learned new word, to understand the grasp situation of its content, and Whether the pronunciation for not too much paying close attention to other words in example sentence is accurate.Therefore, in order to provide user in the example sentence about The pronunciation of " platform " is evaluated, and before user starts to read aloud the content, starts the recording device of teaching equipment, make its into Enter recording state, and carries out the voice of Record and Save user to user speech when user reads aloud.

In step S102, voice unit is divided, and for carrying out basic voice unit division to the voice recorded, forms base This basic speech unit sequence.

Different speech recognition systems will such as be based on MFCC (Mel-Frequency based on different acoustic features Cepstrum Coefficients, MFCC cepstrum) feature acoustic model, be based on PLP (Perceptual Linear Predictive, perceive linear prediction) feature acoustic model etc., or use different acoustic models such as HMM-GMM (Hidden Markov Model-Gaussian Mixture Model, hidden Markov model-gauss hybrid models), be based on DBN The neural network acoustic model etc. of (Dynamic Beyesian Network, dynamic bayesian network), or use different solutions Code mode such as Viterbi search, A* search etc. decode voice signal.

Step S103, musical note feature obtain, and for analyzing the speech unit sequence, obtain institute's speech units The musical note feature of sequence.

The musical note feature includes prosodic features and syllable characteristic, and prosodic features includes the boundary of each basic voice unit The pronunciation duration of dead time and entire speech unit sequence between feature, pronunciation duration, adjacent basic voice unit.It is described Syllable characteristic includes the pronunciation of each basic voice unit.

Step S104, content to be evaluated determine, for carrying out feature calculation to the musical note feature extracted, if calculating knot Fruit meets predetermined condition, then using qualified voice unit as content to be evaluated.

The optimal calculation method for obtaining sub-path, the musical note feature that extraction is obtained, benefit can be used to the calculating of musical note feature Sub-path is obtained with the calculating of trained acoustic model is optimal, if optimal obtain includes the content to be evaluated to be detected in sub-path, Then determination has detected content to be evaluated.The optimal calculation formula for obtaining sub-path is:

Wherein, X represents the musical note feature vector of the speech unit sequence, and W represents the maximum optimal word sequence of score；Item Part probability P (X | W) it is acoustic model scores, it is calculated by trained acoustic model；Prior probability P (W) is language mould Type score, as to Penalty added by different acoustic models.

For example, passing through example sentence " the The train now standing at platform 1is read aloud user The optimal score path computing of for Leeds ", wherein the calculating score of " platform " is maximum, thus determine that being optimal word Sequence, therefore " platform " is used as content to be evaluated.

Step S105, Speech comparison analysis, for obtaining the musical note feature of content to be evaluated, and by the musical note feature with The received pronunciation of voice prediction model prediction compares and analyzes.

In the step, the musical note feature of content to be evaluated is obtained, such as obtain the musical note feature of " platform ".It will The musical note feature and the received pronunciation of voice prediction model prediction compare and analyze, and provide user about described to be evaluated interior The evaluation result of appearance.

In order to further appreciate that user reads aloud the fluency situation with evaluation content, the musical note feature can also be wrapped Include the musical note feature of the context of content to be evaluated.For example, when carrying out pronunciation evaluation to " platform ", The musical note feature of " platform " further includes its context in addition to the musical note feature including " platform " word itself Musical note feature, i.e. " at ", the musical note feature of " 1 " provide pass by the comparative analysis to factors such as pronunciation duration, dead times In the evaluation result for reading aloud fluency.

Existing Speech Evaluation Technique can be used using the method that voice prediction model carries out Speech Assessment, i.e., to being recorded User speech carry out basic voice unit division, corresponding musical note feature to be evaluated is extracted from speech unit sequence, for not With musical note feature load corresponding prediction model, predict corresponding standard pronunciation, then by the musical note feature of user speech with The musical note feature of standard pronunciation compares, and obtains corresponding evaluation result.

Step S106, comparing result generate, and for Speech comparison result to be labeled in user speech text, are supplied to use Family.

In the step, step S105 the is obtained evaluation knot with the comparison of voice prediction model prediction received pronunciation Fruit is labeled on the speech text using visual mode, is shown to user.User by shown evaluation result, It is whether accurate, whether smooth to understand pronunciation of the learned new content in entire paragraph.

Fig. 2 shows the schematic diagrames of Speech Assessment device according to an embodiment of the present invention.The Speech Assessment device is used for Realize that Speech Assessment Methods of the invention, the Speech Assessment device include, input voice obtains module 1, information storage module 2, voice unit division module 3, musical note feature obtain module 4, content determination module to be evaluated 5, Speech comparison analysis module 6, Comparing result generation module 7, display module 8 and voice prediction model 9.

User obtains module in the spoken language exercise link for carrying out language learning, through the input voice of Speech Assessment device 1 obtains the voice input of user, and institute's recorded speech is deposited into information storage module 2.

For example, user, after having learnt the new content of courses, such as new word or expression can enter pronunciation exercises rank Section.In the pronunciation exercises stage, other than allowing user individually practice with reading to learned new content, general teaching process Can also allow user to experience the utilization of content in authentic context, that is, provide one section comprising the learned new content sentence or Short essay is read aloud by user.For example, current the learned new content of user is word " platform ", user learnt it is described After the paraphrase and pronunciation of word, teaching software further provides example sentence " the The train now comprising the word Standing at platform 1is for Leeds ", carries out reading aloud practice for user.At this point, user is more concern In the example sentence, whether standard is smooth for pronunciation of the user to learned new word, to understand the grasp situation of its content, and Whether the pronunciation for not too much paying close attention to other words in example sentence is accurate.Therefore, in order to provide user in the example sentence about The pronunciation of " platform " is evaluated, and before user starts to read aloud the content, the input voice for starting teaching equipment obtains mould Block 1 makes it into recording state, and carries out the voice of Record and Save user to user speech when user reads aloud.

Voice unit division module 3 carries out basic voice unit division to the voice recorded for user.

Musical note feature obtains module 4 and obtains the speech unit sequence for analyzing the speech unit sequence Musical note feature.

Content determination module 5 to be evaluated, for carrying out feature calculation to the musical note feature extracted, if calculated result is full Sufficient predetermined condition, then using qualified voice unit as content to be evaluated.

For example, passing through example sentence " the The train now standing at platform 1is read aloud user The optimal score path computing of for leeds. ", wherein the calculating score of " platform " is maximum, thus determine that being optimal word Sequence, therefore " platform " is used as content to be evaluated.

Speech comparison analysis module 6, for obtaining the musical note feature of content to be evaluated, and by the musical note feature and voice The received pronunciation that prediction model 9 is predicted compares and analyzes.

Speech comparison analysis module 6 obtains the musical note feature of content to be evaluated, such as obtains the musical note spy of " platform " Sign.The received pronunciation that the musical note feature is predicted with voice prediction model 9 is compared and analyzed, provide user about it is described to The evaluation result of evaluation content.

Speech comparison result is labeled on user speech text, is supplied to user by comparing result generation module 7.

In order to be labeled to the read text of user, comparing result generation module 7 obtains Speech comparison analysis module 6 and is given Speech Assessment out is shown to use by display module 8 as a result, be labeled on the read text of user using visual mode Family.User understands whether pronunciation of the learned new content in entire paragraph be accurate, whether flows by shown evaluation result Freely.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by computer program, described program can store in a computer readable storage medium In, and executed by processor.Computer readable storage medium may include: read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc..

Better embodiment of the invention is described above, it is intended to so that spirit of the invention is more clear and convenient for managing Solution, is not meant to limit the present invention, all within the spirits and principles of the present invention, modification, replacement, the improvement made should all Within the protection scope that appended claims of the invention is summarized.

Claims

1. a kind of Speech Assessment Methods carry out Speech Assessment for the voice to user's content of interest, comprising the following steps:

Step S101, input voice obtains, for passing through electronic device in the spoken language exercise link that user carries out language learning Sound pick-up outfit obtain user voice input；

Step S103, musical note feature obtain, and for analyzing the speech unit sequence, obtain the speech unit sequence Musical note feature；

Step S104, content to be evaluated determine, for carrying out feature calculation to the musical note feature extracted, if calculated result is full Sufficient predetermined condition, then using the voice unit to conform to a predetermined condition as content to be evaluated；

Step S105, Speech comparison analysis, for obtaining the musical note feature of content to be evaluated, and by the musical note feature and voice The received pronunciation of prediction model prediction compares and analyzes；

2. Speech Assessment Methods according to claim 1, it is characterised in that:

The basic voice unit can be syllable, phoneme etc., by the division to the voice, obtain the base of institute's recorded speech This voice unit, and form speech unit sequence.

3. Speech Assessment Methods according to claim 1, it is characterised in that:

The prosodic features includes between the boundary characteristic of each basic voice unit, pronunciation duration, adjacent basic voice unit Dead time and the pronunciation duration of entire speech unit sequence etc.；

4. Speech Assessment Methods according to claim 1, it is characterised in that:

It, can be using optimal sub-path to calculating for the musical note feature of the speech unit sequence in the step S104 Calculation method, concrete operations are as follows:

By the musical note feature of the speech unit sequence extracted, sub-path is obtained using the calculating of trained acoustic model is optimal；

5. Speech Assessment Methods according to claim 4, it is characterised in that:

The optimal calculation formula for obtaining sub-path is:

Wherein,

6. Speech Assessment Methods according to claim 1, it is characterised in that:

7. Speech Assessment Methods according to claim 1, it is characterised in that:

The step S105 further comprises:

Basic voice unit division is carried out to the user speech recorded；

8. a kind of Speech Assessment device carries out Speech Assessment for the voice to user's content of interest, including input voice obtains Modulus block, information storage module, voice unit division module, musical note feature obtain module, content determination module to be evaluated, voice Comparative analysis module, evaluation module and comparing result generation module, it is characterised in that:

It inputs voice and obtains module, the voice for obtaining user inputs, and the storage of the user speech of recording is stored to information In module；

Voice unit division module obtains the recorded speech for carrying out basic voice unit division to institute's recorded speech Speech unit sequence；

Musical note feature obtains module, for carrying out feature extraction to the speech unit sequence, obtains the speech unit sequence Musical note feature；

Content determination module to be evaluated, for carrying out feature calculation to the musical note feature extracted, if calculated result meets in advance Fixed condition, then using qualified voice unit as content to be evaluated；

Speech comparison analysis module, for obtaining the musical note feature of content to be evaluated, and by the musical note feature and voice prediction The received pronunciation of model prediction compares and analyzes；

9. Speech Assessment device according to claim 8, it is characterised in that:

The basic voice unit can be syllable, phoneme etc., by the division to the voice, obtain the base of institute's recorded speech This voice unit and speech unit sequence.

10. Speech Assessment device according to claim 8, it is characterised in that:

Prosodic features includes the pause between the boundary characteristic of each basic voice unit, pronunciation duration, adjacent basic voice unit The pronunciation duration of time and entire speech unit sequence；

11. Speech Assessment device according to claim 8, it is characterised in that:

For the content determination module to be evaluated, optimal branch can be used to the calculating of the musical note feature of speech unit sequence The calculation method of diameter, comprising:

The musical note feature that obtained speech unit sequence will be extracted obtains sub-path using the calculating of trained acoustic model is optimal；

12. Speech Assessment device according to claim 11, it is characterised in that:

The optimal calculation formula for obtaining sub-path is:

Wherein,

13. Speech Assessment device according to claim 8, it is characterised in that:

14. Speech Assessment device according to claim 8, it is characterised in that:

Basic voice unit division is carried out to the user speech recorded；

15. Speech Assessment device according to claim 8, it is characterised in that: further include display module, for that will be commented with voice The user speech text of valence result mark is shown to user.

16. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the side such as any one of claim 1-7 may be implemented when executing described program in the processor Method step.

17. a kind of computer storage medium, which stores the programs that can be computer-executed, can be real when executing described program Now such as the method and step of any one of claim 1-7.