Summary of the invention
For this purpose, how to be provided a user simultaneously the technical problem to be solved by the present invention is to during language learning
The evaluation result compared with the evaluation result of teaching sample voice comparison and the received pronunciation of voice prediction model prediction, with side
User is helped to fully understand itself study situation.
For this purpose, the present invention provides a kind of Speech Assessment Methods, in Course of Language Learning to the language pronouncing of user
It is evaluated, it is characterised in that:
Step S101 is inputted by the voice that the sound pick-up outfit of Speech Assessment device obtains user;
Step S102 carries out basic voice unit division to institute's recorded speech, obtains the voice unit sequence of the recorded speech
Column;
Step S103 carries out feature extraction to the speech unit sequence, obtains the musical note feature of the speech unit sequence;
Step S104, by the musical note feature extracted respectively with teaching sample voice and voice prediction model prediction mark
Quasi- voice compares and analyzes;
Speech comparison result is labeled on user speech text by step S105.
The basic voice unit can be syllable, phoneme etc., by the division to the recorded speech, to be somebody's turn to do
The basic voice unit and speech unit sequence of recorded speech.
The musical note feature includes prosodic features and syllable characteristic, and prosodic features includes the boundary of each basic voice unit
The pronunciation duration of dead time and entire speech unit sequence between feature, pronunciation duration, adjacent basic voice unit, it is described
Syllable characteristic includes the pronunciation of each basic voice unit and the pronunciation of entire speech unit sequence.
The process that sample voice compares and analyzes includes: with imparting knowledge to students
The teaching sample voice saved in acquisition system;
Basic voice unit division is carried out to teaching sample voice, obtains the basic voice unit and language of teaching sample voice
Sound unit sequence;
Extract the musical note feature of teaching speech unit sequence, the musical note feature and user's language of the teaching speech unit sequence
The musical note feature of sound unit sequence is corresponding;
The musical note feature of user speech unit sequence and the musical note feature of teaching speech unit sequence are compared, provided
Corresponding evaluation result.
Include: using the process that voice prediction model carries out Speech Assessment
Basic voice unit division is carried out to the user speech recorded, is extracted from speech unit sequence corresponding wait test and assess
Musical note feature;
Corresponding prediction model is loaded for different musical note features, predicts corresponding standard pronunciation;
The musical note feature of the musical note feature of user speech and standard pronunciation is compared, corresponding evaluation result is obtained.
Speech comparison result annotation process specifically includes:
The user speech that will be recorded, is converted into speech text;
By the evaluation result of teaching sample voice comparison obtained and the received pronunciation pair of voice prediction model prediction
The evaluation result of ratio is labeled on the speech text respectively using visual mode, is shown to user.
The present invention also provides a kind of Speech Assessment device, the Speech Assessment device includes recording module, memory module, language
Sound processing module, characteristic extracting module, speech analysis module, evaluation module, labeling module and display module, feature exist
In:
Recording module, the voice for obtaining user input;
Speech processing module obtains the language of the recorded speech for carrying out basic voice unit division to institute's recorded speech
Sound unit sequence;
Characteristic extracting module carries out feature extraction to the speech unit sequence, obtains the musical note of the speech unit sequence
Feature;
Speech analysis module, by the musical note feature extracted respectively with teaching sample voice and voice prediction model prediction
Received pronunciation compare and analyze;
Speech Assessment result is labeled on user speech text by labeling module.
The Speech Assessment device further includes display module, for that will have the user speech text of Speech Assessment result mark
Originally it is shown to user.
Speech Assessment Methods and device of the invention, by providing a user user speech simultaneously and sample voice of imparting knowledge to students
Evaluation result and evaluation result with the received pronunciation of voice prediction model prediction, make user fully understand the pronunciation feelings of oneself
Condition improves the accuracy of pronunciation.
Specific embodiment
It should be mentioned that some exemplary embodiments are described as before exemplary embodiment is discussed in greater detail
The processing or method described as flow chart.Although operations are described as the processing of sequence by flow chart, therein to be permitted
Multioperation can be implemented concurrently, concomitantly or simultaneously.In addition, the sequence of operations can be rearranged.When it
The processing can be terminated when operation completion, it is also possible to have the additional step being not included in attached drawing.
Alleged " Speech Assessment device " is " computer equipment " within a context, and referring to can be by running preset program
Or instruct to execute the intelligent electronic device of the predetermined process processes such as numerical value calculating and/or logic calculation, it may include processing
Device and memory execute the survival prestored in memory instruction by processor to execute predetermined process process, or by ASIC,
The hardware such as FPGA, DSP execute predetermined process process, or are realized by said two devices combination.
The computer equipment includes user equipment and/or the network equipment.Wherein, the user equipment includes but is not limited to
Computer, smart phone, PDA etc.;The network equipment includes but is not limited to single network server, multiple network servers composition
Server group or be based on cloud computing (Cloud Computing) cloud consisting of a large number of computers or network servers,
In, cloud computing is one kind of distributed computing, a super virtual computer consisting of a loosely coupled set of computers.
Wherein, the computer equipment can isolated operation realize the present invention, also can access network and by with other meters in network
The interactive operation of machine equipment is calculated to realize the present invention.Wherein, network locating for the computer equipment includes but is not limited to interconnect
Net, wide area network, Metropolitan Area Network (MAN), local area network, VPN network etc..
Those skilled in the art will be understood that heretofore described " Speech Assessment device " can be only user equipment,
Corresponding operation is executed by user equipment;It is also possible to be integrated by user equipment and the network equipment or server come group
At being matched by user equipment with the network equipment to execute corresponding operation.
It should be noted that the user equipment, the network equipment and network etc. are only for example, other are existing or from now on may be used
The computer equipment or network that can occur such as are applicable to the present invention, should also be included within the scope of protection of the present invention, and to draw
It is incorporated herein with mode.
Here, those skilled in the art will be understood that present invention can apply to mobile terminals and non-moving end, for example, when using
When family uses mobile phone or PC, it can be provided and be presented using method or apparatus of the present invention.
Specific structure and function details disclosed herein are only representative, and are for describing the present invention show
The purpose of example property embodiment.But the present invention can be implemented by many alternative forms, and be not interpreted as
It is limited only by the embodiments set forth herein.
Term used herein above is not intended to limit exemplary embodiment just for the sake of description specific embodiment.Unless
Context clearly refers else, otherwise singular used herein above "one", " one " also attempt to include plural number.Also answer
When understanding, term " includes " and/or "comprising" used herein above provide stated feature, integer, step, operation,
The presence of unit and/or component, and do not preclude the presence or addition of other one or more features, integer, step, operation, unit,
Component and/or combination thereof.
It should further be mentioned that the function action being previously mentioned can be attached according to being different from some replace implementations
The sequence indicated in figure occurs.For example, related function action is depended on, the two width figures shown in succession actually may be used
Substantially simultaneously to execute or can execute in a reverse order sometimes.
Present invention is further described in detail with reference to the accompanying drawing.
Fig. 1 shows the flow chart of Speech Assessment Methods of the invention.
In step S101, user is carrying out the spoken with reading to pass through the recording of Speech Assessment device in link of language learning
Equipment records the voice input of user.
Specifically, user, into reading link, triggers voice after having learnt the voice sample in teaching courseware at this time
Sound pick-up outfit in evaluating apparatus, makes it into recording state.When user starts with reading voice sample, sound pick-up outfit starts to record
User speech processed, and being stored in the memory module of Speech Assessment device with reading voice by user, make for further analysis
With.
In step S102, the user recorded in memory module is obtained with reading voice, basic voice is carried out to institute's recorded speech
Dividing elements obtain speech unit sequence of the recorded user with reading voice.
The basic voice unit can be syllable, phoneme etc., by the division to the recorded speech, to be somebody's turn to do
The basic voice unit and speech unit sequence of recorded speech.
Different speech recognition systems will such as be based on MFCC (Mel-Frequency based on different acoustic features
Cepstrum Coefficients, MFCC cepstrum) feature acoustic model, be based on PLP (Perceptual Linear
Predictive, perceive linear prediction) feature acoustic model etc., or use different acoustic models such as HMM-GMM (Hidden
Markov Model-Gaussian Mixture Model, hidden Markov model-gauss hybrid models), be based on DBN
The neural network acoustic model etc. of (Dynamic BeyesianNetwork, dynamic bayesian network), or use different decodings
Mode such as Viterbi search, A* search etc., decodes voice signal.
Step S103 carries out feature extraction to the speech unit sequence, obtains the musical note feature of the speech unit sequence.
The musical note feature includes prosodic features and syllable characteristic, and prosodic features includes the boundary of each basic voice unit
The pronunciation duration of dead time and entire speech unit sequence between feature, pronunciation duration, adjacent basic voice unit, it is described
Syllable characteristic includes the pronunciation of each basic voice unit and the pronunciation of entire speech unit sequence.
Step S104, by the musical note feature extracted respectively with teaching sample voice and voice prediction model prediction mark
Quasi- voice compares and analyzes.
Wherein, the process compared and analyzed with teaching sample voice is as follows, the teaching example language saved in acquisition system
Sound carries out basic voice unit division to teaching sample voice, to obtain the basic voice unit and language of teaching sample voice
Sound unit sequence, the musical note feature of onestep extraction of going forward side by side teaching speech unit sequence, the musical note of the teaching speech unit sequence
Feature is corresponding with the musical note feature of user speech unit sequence.By the musical note feature of user speech unit sequence and teaching voice
The musical note feature of unit sequence compares, and provides corresponding evaluation result.
Existing Speech Evaluation Technique can be used using the method that voice prediction model carries out Speech Assessment, i.e., to being recorded
User speech carry out basic voice unit division, corresponding musical note feature to be tested and assessed is extracted from speech unit sequence, for not
With musical note feature load corresponding prediction model, predict corresponding standard pronunciation, then by the musical note feature of user speech with
The musical note feature of standard pronunciation compares, and obtains corresponding evaluation result.
Speech comparison result is labeled on user speech text, is supplied to user by step S105.
In this step, by speech processing module, the user speech that will further be recorded is converted into speech text.
The received pronunciation pair for the evaluation result and voice prediction model prediction compared with teaching sample voice that step S104 is obtained
The evaluation result of ratio is labeled on the speech text respectively using visual mode, is shown to user.User is by showing
The evaluation result shown, it can be realized that its difference pronounced with exemplary pronunciation of imparting knowledge to students, and it is pre- with voice prediction model
The difference of the pronunciation of the received pronunciation of survey, so that user fully understands what the pronunciation of its read text has, side
User is helped to further increase pronunciation standard type.The comparing result may include the pronunciation evaluation of basic voice unit, basic voice
The pronunciation duration of unit is evaluated, full text fluency is evaluated etc..
Fig. 2 shows Speech Assessment devices according to an embodiment of the present invention.The Speech Assessment device is for realizing this hair
Bright Speech Assessment Methods provide the evaluation result with sample voice of imparting knowledge to students to user after user carries out spoken language with reading simultaneously
And the evaluation result of the received pronunciation gone out with voice prediction model prediction.The Speech Assessment device includes recording module 1, deposits
Store up module 2, speech processing module 3, characteristic extracting module 4, speech analysis module 5, labeling module 6 and display module 7.
User is carrying out the spoken with reading in link of language learning, by the recording module 1 of Speech Assessment device to user
Voice input recorded.
Specifically, user has been after having learnt the voice sample in teaching courseware, into reading link, and triggers voice and comments
Recording module 1 in valence device, makes it into recording state.When user starts with reading voice sample, recording module 1 starts to record
User speech processed, and being stored in the memory module 2 of Speech Assessment device with reading voice by user, make for further analysis
With.
Speech processing module 3 obtains the user recorded in memory module 2 with reading voice, and carries out to institute's recorded speech basic
Voice unit divides.
The basic voice unit can be syllable, phoneme etc., by the division to the recorded speech, to be somebody's turn to do
The basic voice unit and speech unit sequence of recorded speech.
After speech processing module 3 has divided the basic voice unit of recorded speech, characteristic extracting module 4 is further right
Speech unit sequence generated carries out feature extraction, to obtain the musical note feature of the speech unit sequence.
The musical note feature includes prosodic features and syllable characteristic, and prosodic features includes the boundary of each basic voice unit
The pronunciation duration of dead time and entire speech unit sequence between feature, pronunciation duration, adjacent basic voice unit, it is described
Syllable characteristic includes the pronunciation of each basic voice unit and the pronunciation of entire speech unit sequence.
Speech analysis module 5 by the musical note feature extracted respectively with teaching sample voice and voice prediction model prediction
Received pronunciation compare and analyze.
Wherein, the process compared and analyzed with teaching sample voice is as follows, and speech analysis module 5 obtains memory module 2
The teaching sample voice of middle preservation carries out basic voice unit division to teaching sample voice, to obtain teaching sample voice
Basic voice unit and speech unit sequence, onestep extraction of going forward side by side impart knowledge to students speech unit sequence musical note feature, the teaching
The musical note feature of speech unit sequence and the musical note feature of user speech unit sequence are corresponding.By user speech unit sequence
Musical note feature and the musical note feature of teaching speech unit sequence compare, and provide corresponding evaluation result.
Existing Speech Evaluation Technique can be used using the method that voice prediction model carries out Speech Assessment, i.e., to being recorded
User speech carry out basic voice unit division, corresponding musical note feature to be tested and assessed is extracted from speech unit sequence, for not
With musical note feature load corresponding prediction model, predict corresponding standard pronunciation, then by the musical note feature of user speech with
The musical note feature of standard pronunciation compares, and obtains corresponding evaluation result.
Speech comparison result is labeled on user speech by labeling module 6, and is supplied to user by display module 7.
Specifically by speech processing module 3, the user speech that will further be recorded is converted into speech text.Using
Speech analysis module 5 is analyzed the obtained evaluation result and voice prediction with teaching sample voice comparison by visual mode
The evaluation result of the received pronunciation comparison of model prediction, is labeled on the speech text, and aobvious by display module fast 7 respectively
Show to user.User passes through shown evaluation result, it can be realized that its difference pronounced with exemplary pronunciation of imparting knowledge to students,
And the difference with the pronunciation of the received pronunciation of voice prediction model prediction, so that user fully understands its read text
There is what in pronunciation, user is helped to further increase pronunciation standard type.The comparing result may include basic voice unit
Pronunciation evaluation, basic voice unit pronunciation duration evaluation, full text fluency evaluation etc..
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can
It is completed with instructing relevant hardware by computer program, which can store in a computer readable storage medium
In, and executed by processor.Computer readable storage medium may include: read-only memory (ROM, Read Only
Memory), random access memory (RAM, Random Access Memory), disk or CD etc..
Better embodiment of the invention is described above, it is intended to so that spirit of the invention is more clear and convenient for managing
Solution, is not meant to limit the present invention, all within the spirits and principles of the present invention, modification, replacement, the improvement made should all
Within the protection scope that appended claims of the invention is summarized.