CN109697988A

CN109697988A - A kind of Speech Assessment Methods and device

Info

Publication number: CN109697988A
Application number: CN201710996819.1A
Authority: CN
Inventors: 卢炀; 宾晓皎; 李明; 蔡泽鑫
Original assignee: Shenzhen Yingshuo Audio Technology Co Ltd
Current assignee: Shenzhen Yingshuo Intelligent Technology Co.,Ltd.
Priority date: 2017-10-20
Filing date: 2017-10-20
Publication date: 2019-04-30
Anticipated expiration: 2037-10-20
Also published as: WO2019075828A1; CN109697988B

Abstract

The present invention provides a kind of Speech Assessment Methods, for evaluating in Course of Language Learning the language pronouncing of user, comprising the following steps: step S101 is inputted by the voice that the sound pick-up outfit of Speech Assessment device obtains user；Step S102 carries out basic voice unit division to institute's recorded speech, obtains the speech unit sequence of the recorded speech；Step S103 carries out feature extraction to the speech unit sequence, obtains the musical note feature of the speech unit sequence；Step S104, the received pronunciation by the musical note feature extracted respectively with teaching sample voice and voice prediction model prediction compare and analyze；Speech comparison result is labeled on user speech text by step S105.

Description

A kind of Speech Assessment Methods and device

Technical field

The present invention relates to Teaching Technology of Multimedia field more particularly to a kind of Speech Assessment Methods for multimedia teaching And device.

Background technique

Language plays a very important role in life and work as a kind of media of communication, and whether student is learning In the stage of work, verbal learning is all the learning Content that people pay much attention to by the stage of school study or people.And with net The continuous of network teaching is popularized, and the mode of online teaching by users because by the constraint of time and place of giving lessons, not liked. Therefore, many users are more willing to spend one's leisure at present, carry out language learning by network.

In current network teaching process, when carrying out pronunciation exercises, a kind of mode is played in video (or audio) After one section of voice, provide one section of free time voluntarily carried out by user with read practice；Either by the way of recording, learning For member whether with backward student's playback of reading, it is accurate to be pronounced by student's self-assessment；Or it can also be carried out by teacher online Teaching, the pronunciation for student provide guidance and suggestion.Above-mentioned existing teaching method or the pronunciation that can not be directed to student be given Targetedly instruction out, causes learning effect bad or needs teacher's online teaching, need a large amount of human and material resources And financial support.

It is evaluated to solve the above problems, being currently suggested according to voice of the voice prediction model to student. CN101197084A discloses a kind of automatic spoken English evaluating and learning system, it is characterised in that the system includes detection mouth Language pronunciation part, the detection spoken language pronunciation part divide the foundation the following steps are included: (1) standard pronunciation people's corpus: 1) seeking Look for English Standard speaker；2) principle design the first recording text balanced according to Oral English Practice study requirement and phoneme；3) it marks Quasi- speaker control recording text is recorded；(2) collection of oral evaluation corpus: in simulation English study software application ring Under border, design the second recording text is required according to English study, while finding general speaker, and to the spoken language of general speaker Pronunciation is recorded；(3) mark of oral evaluation corpus: whether just expert marks the pronunciation of phoneme in each word in detail Really；(4) foundation of received pronunciation acoustic model: based in standard pronunciation people's corpus recording and its associated text, instruction Practice the acoustic model of received pronunciation；(5) it calculates the error detection parameter of voice: 1) extracting the MFCC cepstrum parameter of voice；2) base The recording of general speaker and its corresponding aligned phoneme sequence of text in standard acoustic model, and evaluation and test corpus, will be to one As the automatic segmentation of speaker speech data at each segment as unit of phoneme, while based on master pattern each sound is calculated First likelihood value of the Duan Zuowei phoneme；3) it is identified with each segment of the standard acoustic model to general speaker speech, Second likelihood value of the segment as recognition result phoneme is calculated based on standard acoustic model simultaneously；4) seemingly by segment first So value obtains the likelihood ratio of the segment, the error detection parameter as the sound bite divided by the second likelihood value；(6) error detection ginseng is established Number marks the error detection mapping model of pronunciation mistake to expert: on a batch evaluation and test voice, each segment being evaluated and tested parameter and sound The formant sequence of section and the detailed mark of expert are associated, and are obtained above-mentioned parameter with the method for statistics and are marked in detail with expert The corresponding relationship of note saves these relationships as the error detection mapping model between error label of pronouncing from error detection parameter to expert.

CN101650886A discloses a kind of method of automatic detection reading errors of language learners, which is characterized in that packet Containing following steps: 1) front-end processing: pre-processing input voice, carries out feature extraction, and extracted feature is MFCC feature Vector；2) building simplify search space: the content that user to be read aloud as Key for Reference, and according to Key for Reference, pronounce The search space that dictionary, more pronunciation models and acoustics model construction are simplified；3) language model is read aloud in building: according to Key for Reference structure That builds user reads aloud language model, which describes the context that user may read aloud when reading aloud the reference statement Content and its probabilistic information；4) it searches for: in search space, according to acoustic model, reading aloud language model and more pronunciation models are searched Rope obtains actually reading aloud resultant content with the most matched paths of characteristic vector stream of input as user, being made into identification As a result sequence；5) it is aligned: the Key for Reference is aligned with recognition result, obtain user's mostly reading, skip, detection mispronounced As a result.

The corresponding voice snippet of each basic voice unit in voice signal is obtained using speech recognition system in the prior art, The voice snippet of acquisition is merged, the efficient voice fragment sequence for corresponding to the voice signal is obtained, from effective language Evaluation and test feature is extracted in sound fragment sequence, loads score in predicting model corresponding with the evaluation and test characteristic type of feature；Meter The similarity that the evaluation and test feature corresponds to the score in predicting model is calculated, and using the similarity as the voice signal Score.But user often learns according to the voice sample of teacher in instructional video (audio) when actually carrying out language learning Practise pronunciation, and teacher's voice sample is often because of personalized reason, the standard pronunciation that can not go out with voice prediction model prediction It is completely the same.Therefore, tested and assessed with voice prediction model to user pronunciation, the standard pronunciation predicted often with teaching language In some aspects upper not quite identical (such as tone, the rhythm), the evaluation result provided in this way is user speech and pre- to sound example The comparing result for surveying voice can not really reflect the comparing result of user speech and voice sample of imparting knowledge to students.

Therefore, it is necessary to provide a kind of Speech Assessment Methods, the evaluation result gone out by voice prediction model evaluation is being provided While, the evaluation result of the voice sample that can also provide and impart knowledge to students comparison, so that user be made to fully understand oneself study feelings Condition.

Summary of the invention

For this purpose, how to be provided a user simultaneously the technical problem to be solved by the present invention is to during language learning The evaluation result compared with the evaluation result of teaching sample voice comparison and the received pronunciation of voice prediction model prediction, with side User is helped to fully understand itself study situation.

For this purpose, the present invention provides a kind of Speech Assessment Methods, in Course of Language Learning to the language pronouncing of user It is evaluated, it is characterised in that:

Step S101 is inputted by the voice that the sound pick-up outfit of Speech Assessment device obtains user；

Step S102 carries out basic voice unit division to institute's recorded speech, obtains the voice unit sequence of the recorded speech Column；

Step S103 carries out feature extraction to the speech unit sequence, obtains the musical note feature of the speech unit sequence；

Step S104, by the musical note feature extracted respectively with teaching sample voice and voice prediction model prediction mark Quasi- voice compares and analyzes；

Speech comparison result is labeled on user speech text by step S105.

The basic voice unit can be syllable, phoneme etc., by the division to the recorded speech, to be somebody's turn to do The basic voice unit and speech unit sequence of recorded speech.

The musical note feature includes prosodic features and syllable characteristic, and prosodic features includes the boundary of each basic voice unit The pronunciation duration of dead time and entire speech unit sequence between feature, pronunciation duration, adjacent basic voice unit, it is described Syllable characteristic includes the pronunciation of each basic voice unit and the pronunciation of entire speech unit sequence.

The process that sample voice compares and analyzes includes: with imparting knowledge to students

The teaching sample voice saved in acquisition system；

Basic voice unit division is carried out to teaching sample voice, obtains the basic voice unit and language of teaching sample voice Sound unit sequence；

Extract the musical note feature of teaching speech unit sequence, the musical note feature and user's language of the teaching speech unit sequence The musical note feature of sound unit sequence is corresponding；

The musical note feature of user speech unit sequence and the musical note feature of teaching speech unit sequence are compared, provided Corresponding evaluation result.

Include: using the process that voice prediction model carries out Speech Assessment

Basic voice unit division is carried out to the user speech recorded, is extracted from speech unit sequence corresponding wait test and assess Musical note feature；

Corresponding prediction model is loaded for different musical note features, predicts corresponding standard pronunciation；

The musical note feature of the musical note feature of user speech and standard pronunciation is compared, corresponding evaluation result is obtained.

Speech comparison result annotation process specifically includes:

The user speech that will be recorded, is converted into speech text；

By the evaluation result of teaching sample voice comparison obtained and the received pronunciation pair of voice prediction model prediction The evaluation result of ratio is labeled on the speech text respectively using visual mode, is shown to user.

The present invention also provides a kind of Speech Assessment device, the Speech Assessment device includes recording module, memory module, language Sound processing module, characteristic extracting module, speech analysis module, evaluation module, labeling module and display module, feature exist In:

Recording module, the voice for obtaining user input；

Speech processing module obtains the language of the recorded speech for carrying out basic voice unit division to institute's recorded speech Sound unit sequence；

Characteristic extracting module carries out feature extraction to the speech unit sequence, obtains the musical note of the speech unit sequence Feature；

Speech analysis module, by the musical note feature extracted respectively with teaching sample voice and voice prediction model prediction Received pronunciation compare and analyze；

Speech Assessment result is labeled on user speech text by labeling module.

The Speech Assessment device further includes display module, for that will have the user speech text of Speech Assessment result mark Originally it is shown to user.

Speech Assessment Methods and device of the invention, by providing a user user speech simultaneously and sample voice of imparting knowledge to students Evaluation result and evaluation result with the received pronunciation of voice prediction model prediction, make user fully understand the pronunciation feelings of oneself Condition improves the accuracy of pronunciation.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, institute in being described below to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without creative efforts, can also implement according to the present invention The content of example and these attached drawings obtain other attached drawings.

Fig. 1 is the flow chart of Speech Assessment Methods according to an embodiment of the present invention；With

Fig. 2 is the structure chart of Speech Assessment device according to an embodiment of the present invention.

Specific embodiment

It should be mentioned that some exemplary embodiments are described as before exemplary embodiment is discussed in greater detail The processing or method described as flow chart.Although operations are described as the processing of sequence by flow chart, therein to be permitted Multioperation can be implemented concurrently, concomitantly or simultaneously.In addition, the sequence of operations can be rearranged.When it The processing can be terminated when operation completion, it is also possible to have the additional step being not included in attached drawing.

Alleged " Speech Assessment device " is " computer equipment " within a context, and referring to can be by running preset program Or instruct to execute the intelligent electronic device of the predetermined process processes such as numerical value calculating and/or logic calculation, it may include processing Device and memory execute the survival prestored in memory instruction by processor to execute predetermined process process, or by ASIC, The hardware such as FPGA, DSP execute predetermined process process, or are realized by said two devices combination.

The computer equipment includes user equipment and/or the network equipment.Wherein, the user equipment includes but is not limited to Computer, smart phone, PDA etc.；The network equipment includes but is not limited to single network server, multiple network servers composition Server group or be based on cloud computing (Cloud Computing) cloud consisting of a large number of computers or network servers, In, cloud computing is one kind of distributed computing, a super virtual computer consisting of a loosely coupled set of computers. Wherein, the computer equipment can isolated operation realize the present invention, also can access network and by with other meters in network The interactive operation of machine equipment is calculated to realize the present invention.Wherein, network locating for the computer equipment includes but is not limited to interconnect Net, wide area network, Metropolitan Area Network (MAN), local area network, VPN network etc..

Those skilled in the art will be understood that heretofore described " Speech Assessment device " can be only user equipment, Corresponding operation is executed by user equipment；It is also possible to be integrated by user equipment and the network equipment or server come group At being matched by user equipment with the network equipment to execute corresponding operation.

It should be noted that the user equipment, the network equipment and network etc. are only for example, other are existing or from now on may be used The computer equipment or network that can occur such as are applicable to the present invention, should also be included within the scope of protection of the present invention, and to draw It is incorporated herein with mode.

Here, those skilled in the art will be understood that present invention can apply to mobile terminals and non-moving end, for example, when using When family uses mobile phone or PC, it can be provided and be presented using method or apparatus of the present invention.

Specific structure and function details disclosed herein are only representative, and are for describing the present invention show The purpose of example property embodiment.But the present invention can be implemented by many alternative forms, and be not interpreted as It is limited only by the embodiments set forth herein.

Term used herein above is not intended to limit exemplary embodiment just for the sake of description specific embodiment.Unless Context clearly refers else, otherwise singular used herein above "one", " one " also attempt to include plural number.Also answer When understanding, term " includes " and/or "comprising" used herein above provide stated feature, integer, step, operation, The presence of unit and/or component, and do not preclude the presence or addition of other one or more features, integer, step, operation, unit, Component and/or combination thereof.

It should further be mentioned that the function action being previously mentioned can be attached according to being different from some replace implementations The sequence indicated in figure occurs.For example, related function action is depended on, the two width figures shown in succession actually may be used Substantially simultaneously to execute or can execute in a reverse order sometimes.

Present invention is further described in detail with reference to the accompanying drawing.

Fig. 1 shows the flow chart of Speech Assessment Methods of the invention.

In step S101, user is carrying out the spoken with reading to pass through the recording of Speech Assessment device in link of language learning Equipment records the voice input of user.

Specifically, user, into reading link, triggers voice after having learnt the voice sample in teaching courseware at this time Sound pick-up outfit in evaluating apparatus, makes it into recording state.When user starts with reading voice sample, sound pick-up outfit starts to record User speech processed, and being stored in the memory module of Speech Assessment device with reading voice by user, make for further analysis With.

In step S102, the user recorded in memory module is obtained with reading voice, basic voice is carried out to institute's recorded speech Dividing elements obtain speech unit sequence of the recorded user with reading voice.

Different speech recognition systems will such as be based on MFCC (Mel-Frequency based on different acoustic features Cepstrum Coefficients, MFCC cepstrum) feature acoustic model, be based on PLP (Perceptual Linear Predictive, perceive linear prediction) feature acoustic model etc., or use different acoustic models such as HMM-GMM (Hidden Markov Model-Gaussian Mixture Model, hidden Markov model-gauss hybrid models), be based on DBN The neural network acoustic model etc. of (Dynamic BeyesianNetwork, dynamic bayesian network), or use different decodings Mode such as Viterbi search, A* search etc., decodes voice signal.

Step S103 carries out feature extraction to the speech unit sequence, obtains the musical note feature of the speech unit sequence.

Step S104, by the musical note feature extracted respectively with teaching sample voice and voice prediction model prediction mark Quasi- voice compares and analyzes.

Wherein, the process compared and analyzed with teaching sample voice is as follows, the teaching example language saved in acquisition system Sound carries out basic voice unit division to teaching sample voice, to obtain the basic voice unit and language of teaching sample voice Sound unit sequence, the musical note feature of onestep extraction of going forward side by side teaching speech unit sequence, the musical note of the teaching speech unit sequence Feature is corresponding with the musical note feature of user speech unit sequence.By the musical note feature of user speech unit sequence and teaching voice The musical note feature of unit sequence compares, and provides corresponding evaluation result.

Existing Speech Evaluation Technique can be used using the method that voice prediction model carries out Speech Assessment, i.e., to being recorded User speech carry out basic voice unit division, corresponding musical note feature to be tested and assessed is extracted from speech unit sequence, for not With musical note feature load corresponding prediction model, predict corresponding standard pronunciation, then by the musical note feature of user speech with The musical note feature of standard pronunciation compares, and obtains corresponding evaluation result.

Speech comparison result is labeled on user speech text, is supplied to user by step S105.

In this step, by speech processing module, the user speech that will further be recorded is converted into speech text. The received pronunciation pair for the evaluation result and voice prediction model prediction compared with teaching sample voice that step S104 is obtained The evaluation result of ratio is labeled on the speech text respectively using visual mode, is shown to user.User is by showing The evaluation result shown, it can be realized that its difference pronounced with exemplary pronunciation of imparting knowledge to students, and it is pre- with voice prediction model The difference of the pronunciation of the received pronunciation of survey, so that user fully understands what the pronunciation of its read text has, side User is helped to further increase pronunciation standard type.The comparing result may include the pronunciation evaluation of basic voice unit, basic voice The pronunciation duration of unit is evaluated, full text fluency is evaluated etc..

Fig. 2 shows Speech Assessment devices according to an embodiment of the present invention.The Speech Assessment device is for realizing this hair Bright Speech Assessment Methods provide the evaluation result with sample voice of imparting knowledge to students to user after user carries out spoken language with reading simultaneously And the evaluation result of the received pronunciation gone out with voice prediction model prediction.The Speech Assessment device includes recording module 1, deposits Store up module 2, speech processing module 3, characteristic extracting module 4, speech analysis module 5, labeling module 6 and display module 7.

User is carrying out the spoken with reading in link of language learning, by the recording module 1 of Speech Assessment device to user Voice input recorded.

Specifically, user has been after having learnt the voice sample in teaching courseware, into reading link, and triggers voice and comments Recording module 1 in valence device, makes it into recording state.When user starts with reading voice sample, recording module 1 starts to record User speech processed, and being stored in the memory module 2 of Speech Assessment device with reading voice by user, make for further analysis With.

Speech processing module 3 obtains the user recorded in memory module 2 with reading voice, and carries out to institute's recorded speech basic Voice unit divides.

After speech processing module 3 has divided the basic voice unit of recorded speech, characteristic extracting module 4 is further right Speech unit sequence generated carries out feature extraction, to obtain the musical note feature of the speech unit sequence.

Speech analysis module 5 by the musical note feature extracted respectively with teaching sample voice and voice prediction model prediction Received pronunciation compare and analyze.

Wherein, the process compared and analyzed with teaching sample voice is as follows, and speech analysis module 5 obtains memory module 2 The teaching sample voice of middle preservation carries out basic voice unit division to teaching sample voice, to obtain teaching sample voice Basic voice unit and speech unit sequence, onestep extraction of going forward side by side impart knowledge to students speech unit sequence musical note feature, the teaching The musical note feature of speech unit sequence and the musical note feature of user speech unit sequence are corresponding.By user speech unit sequence Musical note feature and the musical note feature of teaching speech unit sequence compare, and provide corresponding evaluation result.

Speech comparison result is labeled on user speech by labeling module 6, and is supplied to user by display module 7.

Specifically by speech processing module 3, the user speech that will further be recorded is converted into speech text.Using Speech analysis module 5 is analyzed the obtained evaluation result and voice prediction with teaching sample voice comparison by visual mode The evaluation result of the received pronunciation comparison of model prediction, is labeled on the speech text, and aobvious by display module fast 7 respectively Show to user.User passes through shown evaluation result, it can be realized that its difference pronounced with exemplary pronunciation of imparting knowledge to students, And the difference with the pronunciation of the received pronunciation of voice prediction model prediction, so that user fully understands its read text There is what in pronunciation, user is helped to further increase pronunciation standard type.The comparing result may include basic voice unit Pronunciation evaluation, basic voice unit pronunciation duration evaluation, full text fluency evaluation etc..

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by computer program, which can store in a computer readable storage medium In, and executed by processor.Computer readable storage medium may include: read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc..

Better embodiment of the invention is described above, it is intended to so that spirit of the invention is more clear and convenient for managing Solution, is not meant to limit the present invention, all within the spirits and principles of the present invention, modification, replacement, the improvement made should all Within the protection scope that appended claims of the invention is summarized.

Claims

1. a kind of Speech Assessment Methods, for evaluating in Course of Language Learning the language pronouncing of user, feature exists In:

Step S102 carries out basic voice unit division to institute's recorded speech, obtains the speech unit sequence of the recorded speech；

Step S104, by the musical note feature extracted respectively with teaching sample voice and voice prediction model prediction standard speech Sound compares and analyzes；

Speech comparison result is labeled on user speech text by step S105.

2. Speech Assessment Methods according to claim 1, it is characterised in that:

The basic voice unit can be syllable, phoneme etc., by the division to the recorded speech, to obtain the recording The basic voice unit and speech unit sequence of voice.

3. Speech Assessment Methods according to claim 1, it is characterised in that:

The musical note feature includes prosodic features and syllable characteristic, and prosodic features includes that the boundary of each basic voice unit is special The pronunciation duration of dead time and entire speech unit sequence between sign, pronunciation duration, adjacent basic voice unit；

The syllable characteristic includes the pronunciation of each basic voice unit and the pronunciation of entire speech unit sequence.

4. Speech Assessment Methods according to claim 1, it is characterised in that:

The process that sample voice compares and analyzes includes with imparting knowledge to students,

The teaching sample voice saved in acquisition system；

Basic voice unit division is carried out to teaching sample voice, obtains the basic voice unit and voice list of teaching sample voice Metasequence；

Extract the musical note feature of teaching speech unit sequence, the musical note feature and user speech list of the teaching speech unit sequence The musical note feature of metasequence is corresponding；

5. Speech Assessment Methods according to claim 1, it is characterised in that:

Include using the process that voice prediction model carries out Speech Assessment,

Basic voice unit division is carried out to the user speech recorded, corresponding musical note to be tested and assessed is extracted from speech unit sequence Feature；

6. Speech Assessment Methods according to claim 1, it is characterised in that:

Speech comparison result annotation process specifically includes,

The user speech that will be recorded, is converted into speech text；

By the received pronunciation comparison of the evaluation result of teaching sample voice comparison obtained and voice prediction model prediction Evaluation result is labeled on the speech text respectively using visual mode, is shown to user.

7. a kind of Speech Assessment device, the Speech Assessment device includes recording module, memory module, speech processing module, spy Levy extraction module, speech analysis module and labeling module, it is characterised in that:

Recording module, the voice for obtaining user input；

Speech processing module obtains the voice list of the recorded speech for carrying out basic voice unit division to institute's recorded speech Metasequence；

Characteristic extracting module carries out feature extraction to the speech unit sequence, obtains the musical note feature of the speech unit sequence；

Speech analysis module, by the musical note feature extracted respectively with teaching sample voice and voice prediction model prediction mark Quasi- voice compares and analyzes；

Speech Assessment result is labeled on user speech text by labeling module.

8. Speech Assessment device according to claim 7, it is characterised in that:

9. Speech Assessment device according to claim 7, it is characterised in that:

The musical note feature includes prosodic features and syllable characteristic, and prosodic features includes that the boundary of each basic voice unit is special The pronunciation duration of dead time and entire speech unit sequence between sign, pronunciation duration, adjacent basic voice unit, the sound Section feature includes the pronunciation of each basic voice unit and the pronunciation of entire speech unit sequence.

10. Speech Assessment device according to claim 7, it is characterised in that:

The teaching sample voice saved in acquisition system；

11. Speech Assessment device according to claim 7, it is characterised in that:

The process of Speech Assessment is carried out using voice prediction model, including,

12. Speech Assessment device according to claim 7, it is characterised in that:

Speech comparison result annotation process specifically includes,

The user speech that will be recorded, is converted into speech text；

13. Speech Assessment device according to claim 7, it is characterised in that:

The Speech Assessment device further includes display module, for showing the user speech text for having Speech Assessment result mark Show to user.

14. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the side such as any one of claim 1-6 may be implemented when executing described program in the processor Method step.

15. a kind of computer storage medium, which stores the programs that can be computer-executed, can be real when executing described program Now such as the method and step of any one of claim 1-6.