CN109697975A - A kind of Speech Assessment Methods and device - Google Patents

A kind of Speech Assessment Methods and device Download PDF

Info

Publication number
CN109697975A
CN109697975A CN201710981866.9A CN201710981866A CN109697975A CN 109697975 A CN109697975 A CN 109697975A CN 201710981866 A CN201710981866 A CN 201710981866A CN 109697975 A CN109697975 A CN 109697975A
Authority
CN
China
Prior art keywords
speech
musical note
voice
user
note feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710981866.9A
Other languages
Chinese (zh)
Other versions
CN109697975B (en
Inventor
宾晓皎
李明
蔡泽鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yingshuo Intelligent Technology Co.,Ltd.
Original Assignee
Shenzhen Yingshuo Audio Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yingshuo Audio Technology Co Ltd filed Critical Shenzhen Yingshuo Audio Technology Co Ltd
Priority to CN201710981866.9A priority Critical patent/CN109697975B/en
Priority to PCT/CN2017/111818 priority patent/WO2019075827A1/en
Publication of CN109697975A publication Critical patent/CN109697975A/en
Application granted granted Critical
Publication of CN109697975B publication Critical patent/CN109697975B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The present invention provides a kind of Speech Assessment Methods and device, comprising: step S101 obtains input voice, for carrying out the spoken language exercise link of language learning in user, passes through the voice input that sound pick-up outfit obtains user;Step S102 divides voice unit, for carrying out basic voice unit division to the voice recorded, forms speech unit sequence;Step S103 obtains musical note feature, for analyzing the speech unit sequence, obtains musical note feature;Step S104 determines content to be evaluated, for carrying out feature calculation to the musical note feature extracted, if calculated result meets predetermined condition, using qualified voice unit as content to be evaluated;Step S105, Speech comparison analysis, compares and analyzes for obtaining the musical note feature of content to be evaluated, and by the musical note feature and the received pronunciation of voice prediction model;Step S106 generates comparing result for Speech comparison result to be labeled in user speech text and is supplied to user.

Description

A kind of Speech Assessment Methods and device
Technical field
The present invention relates to Teaching Technology of Multimedia field more particularly to a kind of languages for verbal learning in multimedia teaching Sound evaluation method and device.
Background technique
Language plays a very important role in life and work as a kind of media of communication, and whether student is learning In the stage of work, verbal learning is all the learning Content that people pay much attention to by the stage of school study or people.And with net The continuous of network teaching is popularized, and the mode of online teaching by users because by the constraint of time and place of giving lessons, not liked. Therefore, many users are more willing to spend one's leisure at present, carry out language learning by network.And user is carrying out language learning When, when study to new word or expression carries out spoken connection, in addition to merely to word or expression progress pronunciation exercises it Outside, can also spoken language exercise be carried out to the sentence comprising the word or expression.
It is evaluated to solve the above problems, being currently suggested according to voice of the voice prediction model to student. CN101197084A discloses a kind of automatic spoken English evaluating and learning system, it is characterised in that the system includes detection mouth Language pronunciation part, the detection spoken language pronunciation part divide the foundation the following steps are included: (1) standard pronunciation people's corpus: 1) seeking Look for English Standard speaker;2) principle design the first recording text balanced according to Oral English Practice study requirement and phoneme;3) it marks Quasi- speaker control recording text is recorded;(2) collection of oral evaluation corpus: in simulation English study software application ring Under border, design the second recording text is required according to English study, while finding general speaker, and to the spoken language of general speaker Pronunciation is recorded;(3) mark of oral evaluation corpus: whether just expert marks the pronunciation of phoneme in each word in detail Really;(4) foundation of received pronunciation acoustic model: based in standard pronunciation people's corpus recording and its associated text, instruction Practice the acoustic model of received pronunciation;(5) it calculates the error detection parameter of voice: 1) extracting the MFCC cepstrum parameter of voice;2) base The recording of general speaker and its corresponding aligned phoneme sequence of text in standard acoustic model, and evaluation and test corpus, will be to one As the automatic segmentation of speaker speech data at each segment as unit of phoneme, while based on master pattern each sound is calculated First likelihood value of the Duan Zuowei phoneme;3) it is identified with each segment of the standard acoustic model to general speaker speech, Second likelihood value of the segment as recognition result phoneme is calculated based on standard acoustic model simultaneously;4) seemingly by segment first So value obtains the likelihood ratio of the segment, the error detection parameter as the sound bite divided by the second likelihood value;(6) error detection ginseng is established Number marks the error detection mapping model of pronunciation mistake to expert: on a batch evaluation and test voice, each segment being evaluated and tested parameter and sound The formant sequence of section and the detailed mark of expert are associated, and are obtained above-mentioned parameter with the method for statistics and are marked in detail with expert The corresponding relationship of note saves these relationships as the error detection mapping model between error label of pronouncing from error detection parameter to expert.
CN101650886A discloses a kind of method of automatic detection reading errors of language learners, which is characterized in that packet Containing following steps: 1) front-end processing: pre-processing input voice, carries out feature extraction, and extracted feature is MFCC feature Vector;2) building simplify search space: the content that user to be read aloud as Key for Reference, and according to Key for Reference, pronounce The search space that dictionary, more pronunciation models and acoustics model construction are simplified;3) language model is read aloud in building: according to Key for Reference structure That builds user reads aloud language model, which describes the context that user may read aloud when reading aloud the reference statement Content and its probabilistic information;4) it searches for: in search space, according to acoustic model, reading aloud language model and more pronunciation models are searched Rope obtains actually reading aloud resultant content with the most matched paths of characteristic vector stream of input as user, being made into identification As a result sequence;5) it is aligned: the Key for Reference is aligned with recognition result, obtain user's mostly reading, skip, detection mispronounced As a result.
In the prior art, when user carry out pronunciation exercises when, mostly use the mode of recording greatly, user with reading it is rear to Whether family playback, it is accurate to be pronounced by user's self-assessment;Or online teaching is carried out by teacher, the pronunciation for user is given It instructs and suggests out.This mode can only allow user in the pronunciation situation of subjective perception oneself, and it is effectively accurate to provide Evaluation result.In recent years, the Speech Assessment Methods of the Web-based instruction come by the way of carrying out Characteristic Contrast with received pronunciation Evaluate the pronunciation of user.For example, pre-processing to input voice, feature extraction is carried out;The content that user to be read aloud is made For Key for Reference, and the search space simplified according to Key for Reference, Pronounceable dictionary, more pronunciation models and acoustics model construction;? In search space, according to acoustic model, language model is read aloud and multiple sound pattern search is obtained with the characteristic vector stream of input most A matched paths actually read aloud resultant content as user, are made into recognition result sequence;By the Key for Reference and know Other result is aligned, and user mostly reading, skip, the testing result mispronounced are obtained.
Although above-mentioned evaluation method can provide the pronunciation evaluation result of user speech, the evaluation result provided is often The analysis for all voices that user is read aloud is as a result, and what user may be more concerned about sometimes is learned new word or expression entire Whether the pronunciation in sentence or paragraph is accurate smooth, and the pronunciation for other parts is not the emphasis of its concern.
Therefore, it is necessary to provide a kind of Speech Assessment Methods, when user reads aloud whole sentence or whole section of article, user is only analyzed Partial content of interest provides corresponding evaluation result, to reduce the number of system on the basis of improving user's focus According to amount of analysis, system resource is saved.
Summary of the invention
For this purpose, the technical problem to be solved by the present invention is to during spoken language exercise, the spoken language exercise, such as English How language provides a user the Speech Assessment result of user's content of interest.
According to the first aspect of the invention, a kind of Speech Assessment Methods are provided, for the voice to user's content of interest Carry out Speech Assessment, comprising the following steps:
Step S101, input voice obtains, for passing through electronics in the spoken language exercise link that user carries out language learning The sound pick-up outfit of device obtains the voice input of user;
Step S102, voice unit divide, and for carrying out basic voice unit division to the voice recorded, form voice Unit sequence;
Step S103, musical note feature obtain, and for analyzing the speech unit sequence, obtain institute's speech units The musical note feature of sequence;
Step S104, content to be evaluated determine, for carrying out feature calculation to the musical note feature extracted, if calculating knot Fruit meets predetermined condition, then using the voice unit to conform to a predetermined condition as content to be evaluated;
Step S105, Speech comparison analysis, for obtaining the musical note feature of content to be evaluated, and by the musical note feature with The received pronunciation of voice prediction model prediction compares and analyzes;
Step S106, comparing result generate, Speech comparison result are labeled on user speech text, user is supplied to.
The basic voice unit can be syllable, phoneme etc., by the division to the voice, obtain institute's recorded speech Basic voice unit, and form speech unit sequence.
The musical note feature of the speech unit sequence includes prosodic features and syllable characteristic,
The prosodic features includes the boundary characteristic of each basic voice unit, pronunciation duration, adjacent basic voice unit Between dead time and the pronunciation duration of entire speech unit sequence etc.;
The syllable characteristic includes the pronunciation of each basic voice unit.
In the step S104, branch can be obtained using optimal to the calculating of the musical note feature of the speech unit sequence The calculation method of diameter, concrete operations are as follows:
By the musical note feature of the speech unit sequence extracted, branch is obtained using the calculating of trained acoustic model is optimal Diameter;
If optimal obtain includes the content to be evaluated to be detected in sub-path, it is determined that detected content to be evaluated.
The optimal calculation formula for obtaining sub-path is:
Wherein,
X represents the musical note feature vector of the speech unit sequence, and W represents the maximum optimal word sequence of score;
Conditional probability P (X | W) it is acoustic model scores, it is calculated by trained acoustic model;
Prior probability P (W) is language model scores, as to Penalty added by different acoustic models.
The musical note feature of the content to be evaluated can also include the musical note feature of context etc. of content to be evaluated.
The step S105 further comprises:
Basic voice unit division is carried out to the user speech recorded;
Corresponding musical note feature to be evaluated is extracted from speech unit sequence;
Corresponding prediction model is loaded for different musical note features, predicts corresponding standard pronunciation;
The musical note feature of the musical note feature of user speech and standard pronunciation is compared, corresponding evaluation result is obtained.
According to the second aspect of the invention, a kind of Speech Assessment device is provided, for the voice to user's content of interest Speech Assessment is carried out, including input voice obtains module, information storage module, voice unit division module, musical note feature and obtains Module, content determination module to be evaluated, Speech comparison analysis module, evaluation module and comparing result generation module, in which:
It inputs voice and obtains module, the voice for obtaining user inputs, and the user speech of recording is stored to information In memory module;
Voice unit division module obtains the recording language for carrying out basic voice unit division to institute's recorded speech The speech unit sequence of sound;
Musical note feature obtains module, for carrying out feature extraction to the speech unit sequence, obtains institute's speech units The musical note feature of sequence;
Content determination module to be evaluated, for carrying out feature calculation to the musical note feature extracted, if calculated result is full Sufficient predetermined condition, then using qualified voice unit as content to be evaluated;
Speech comparison analysis module, for obtaining the musical note feature of content to be evaluated, and by the musical note feature and voice The received pronunciation of prediction model prediction compares and analyzes;
Comparing result generation module is supplied to user for Speech Assessment result to be labeled in user speech text.
The basic voice unit can be syllable, phoneme etc., by the division to the voice, obtain institute's recorded speech Basic voice unit and speech unit sequence.
The musical note feature of the speech unit sequence includes prosodic features and syllable characteristic,
Prosodic features includes between the boundary characteristic of each basic voice unit, pronunciation duration, adjacent basic voice unit The pronunciation duration of dead time and entire speech unit sequence;
The syllable characteristic includes the pronunciation of each basic voice unit.
For the content determination module to be evaluated, optimal obtain can be used to the calculating of the musical note feature of speech unit sequence The calculation method of sub-path, comprising:
The musical note feature that obtained speech unit sequence will be extracted obtains branch using the calculating of trained acoustic model is optimal Diameter;
If optimal obtain includes the content to be evaluated to be detected in sub-path, it is determined that detected content to be evaluated.
The optimal calculation formula for obtaining sub-path is:
Wherein,
X represents the musical note feature vector of the speech unit sequence, and W represents the maximum optimal word sequence of score;
Conditional probability P (X | W) it is acoustic model scores, it is calculated by trained acoustic model;
Prior probability P (W) is language model scores, as to Penalty added by different acoustic models.
The musical note feature of the content to be evaluated can also include the musical note feature of the context of content to be evaluated.
For the Speech comparison analysis module, include: using the operation that voice prediction model carries out Speech Assessment
Basic voice unit division is carried out to the user speech recorded;
Corresponding musical note feature to be evaluated is extracted from speech unit sequence;
Corresponding prediction model is loaded for different musical note features, predicts corresponding standard pronunciation;
The musical note feature of the musical note feature of user speech and standard pronunciation is compared, corresponding evaluation result is obtained.
The Speech Assessment device further includes display module, for that will have the user speech text of Speech Assessment result mark Originally it is shown to user.
According to the third aspect of the invention we, a kind of computer readable storage medium is provided, computer program is stored thereon with, The step in method as described above is realized when described program is executed by processor.
Speech Assessment Methods and device through the invention only analyze user when user reads aloud whole sentence or whole section of article Partial content of interest provides corresponding evaluation result, to reduce the number of system on the basis of improving user's focus According to amount of analysis, system resource is saved.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, institute in being described below to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without creative efforts, can also implement according to the present invention The content of example and these attached drawings obtain other attached drawings.
Fig. 1 is the flow chart of Speech Assessment Methods according to the present invention;With
Fig. 2 is the schematic diagram of Speech Assessment device according to the present invention.
Specific embodiment
Before exemplary embodiment is discussed in greater detail, it should be noted that some exemplary embodiments are described as The processing or method described as flow chart.Although operations are described as the processing of sequence by flow chart, therein to be permitted Multioperation can be implemented concurrently, concomitantly or simultaneously.In addition, the sequence of operations can be rearranged.When it The processing can be terminated when operation completion, it is also possible to have the additional step being not included in attached drawing.
Alleged " Speech Assessment device " is " computer equipment " within a context, and referring to can be by running preset program Or instruct to execute the intelligent electronic device of the predetermined process processes such as numerical value calculating and/or logic calculation, it may include processing Device and memory execute the survival prestored in memory instruction by processor to execute predetermined process process, or by ASIC, The hardware such as FPGA, DSP execute predetermined process process, or are realized by said two devices combination.
The computer equipment includes user equipment and/or the network equipment.Wherein, the user equipment includes but is not limited to Computer, smart phone, PDA etc.;The network equipment includes but is not limited to single network server, multiple network servers composition Server group or be based on cloud computing (Cloud Computing) cloud consisting of a large number of computers or network servers, In, cloud computing is one kind of distributed computing, a super virtual computer consisting of a loosely coupled set of computers. Wherein, the computer equipment can isolated operation realize the present invention, also can access network and by with other meters in network The interactive operation of machine equipment is calculated to realize the present invention.Wherein, network locating for the computer equipment includes but is not limited to interconnect Net, wide area network, Metropolitan Area Network (MAN), local area network, VPN network etc..
Those skilled in the art will be understood that heretofore described " Speech Assessment device " can be only user equipment, Corresponding operation is executed by user equipment;It is also possible to be integrated by user equipment and the network equipment or server come group At being matched by user equipment with the network equipment to execute corresponding operation.
It should be noted that the user equipment, the network equipment and network etc. are only for example, other are existing or from now on may be used The computer equipment or network that can occur such as are applicable to the present invention, should also be included within the scope of protection of the present invention, and to draw It is incorporated herein with mode.
Here, those skilled in the art will be understood that present invention can apply to mobile terminals and non-moving end, for example, when using When family uses mobile phone or PC, it can be provided and be presented using method or apparatus of the present invention.
Specific structure and function details disclosed herein are only representative, and are for describing the present invention show The purpose of example property embodiment.But the present invention can be implemented by many alternative forms, and be not interpreted as It is limited only by the embodiments set forth herein.
It should further be mentioned that the function action being previously mentioned can be attached according to being different from some replace implementations The sequence indicated in figure occurs.For example, related function action is depended on, the two width figures shown in succession actually may be used Substantially simultaneously to execute or can execute in a reverse order sometimes.
Present invention is further described in detail with reference to the accompanying drawing.
Fig. 1 shows the flow chart of Speech Assessment Methods of the invention.The method is used for user's content of interest Voice carries out Speech Assessment.
Firstly, in step S101, input voice is obtained, in the spoken language exercise link that user carries out language learning, The voice input of user is obtained by the sound pick-up outfit of electronic device.
Such as user, after having learnt the new content of courses, such as new word or expression can enter the pronunciation exercises stage. In the pronunciation exercises stage, other than allowing user individually practice with reading to learned new content, general teaching process is also User can be allowed to experience the utilization of content in authentic context, that is, provide one section of sentence or short for learning new content comprising the institute Text is read aloud by user.For example, current the learned new content of user is word " platform ", user is learning the list After the paraphrase and pronunciation of word, teaching software further provides example sentence " the The train now comprising the word Standing at platform 1is for Leeds ", carries out reading aloud practice for user.At this point, user is more concern In the example sentence, whether standard is smooth for pronunciation of the user to learned new word, to understand the grasp situation of its content, and Whether the pronunciation for not too much paying close attention to other words in example sentence is accurate.Therefore, in order to provide user in the example sentence about The pronunciation of " platform " is evaluated, and before user starts to read aloud the content, starts the recording device of teaching equipment, make its into Enter recording state, and carries out the voice of Record and Save user to user speech when user reads aloud.
In step S102, voice unit is divided, and for carrying out basic voice unit division to the voice recorded, forms base This basic speech unit sequence.
The basic voice unit can be syllable, phoneme etc., by the division to the voice, obtain institute's recorded speech Basic voice unit and speech unit sequence.
Different speech recognition systems will such as be based on MFCC (Mel-Frequency based on different acoustic features Cepstrum Coefficients, MFCC cepstrum) feature acoustic model, be based on PLP (Perceptual Linear Predictive, perceive linear prediction) feature acoustic model etc., or use different acoustic models such as HMM-GMM (Hidden Markov Model-Gaussian Mixture Model, hidden Markov model-gauss hybrid models), be based on DBN The neural network acoustic model etc. of (Dynamic Beyesian Network, dynamic bayesian network), or use different solutions Code mode such as Viterbi search, A* search etc. decode voice signal.
Step S103, musical note feature obtain, and for analyzing the speech unit sequence, obtain institute's speech units The musical note feature of sequence.
The musical note feature includes prosodic features and syllable characteristic, and prosodic features includes the boundary of each basic voice unit The pronunciation duration of dead time and entire speech unit sequence between feature, pronunciation duration, adjacent basic voice unit.It is described Syllable characteristic includes the pronunciation of each basic voice unit.
Step S104, content to be evaluated determine, for carrying out feature calculation to the musical note feature extracted, if calculating knot Fruit meets predetermined condition, then using qualified voice unit as content to be evaluated.
The optimal calculation method for obtaining sub-path, the musical note feature that extraction is obtained, benefit can be used to the calculating of musical note feature Sub-path is obtained with the calculating of trained acoustic model is optimal, if optimal obtain includes the content to be evaluated to be detected in sub-path, Then determination has detected content to be evaluated.The optimal calculation formula for obtaining sub-path is:
Wherein, X represents the musical note feature vector of the speech unit sequence, and W represents the maximum optimal word sequence of score;Item Part probability P (X | W) it is acoustic model scores, it is calculated by trained acoustic model;Prior probability P (W) is language mould Type score, as to Penalty added by different acoustic models.
For example, passing through example sentence " the The train now standing at platform 1is read aloud user The optimal score path computing of for Leeds ", wherein the calculating score of " platform " is maximum, thus determine that being optimal word Sequence, therefore " platform " is used as content to be evaluated.
Step S105, Speech comparison analysis, for obtaining the musical note feature of content to be evaluated, and by the musical note feature with The received pronunciation of voice prediction model prediction compares and analyzes.
In the step, the musical note feature of content to be evaluated is obtained, such as obtain the musical note feature of " platform ".It will The musical note feature and the received pronunciation of voice prediction model prediction compare and analyze, and provide user about described to be evaluated interior The evaluation result of appearance.
In order to further appreciate that user reads aloud the fluency situation with evaluation content, the musical note feature can also be wrapped Include the musical note feature of the context of content to be evaluated.For example, when carrying out pronunciation evaluation to " platform ", The musical note feature of " platform " further includes its context in addition to the musical note feature including " platform " word itself Musical note feature, i.e. " at ", the musical note feature of " 1 " provide pass by the comparative analysis to factors such as pronunciation duration, dead times In the evaluation result for reading aloud fluency.
Existing Speech Evaluation Technique can be used using the method that voice prediction model carries out Speech Assessment, i.e., to being recorded User speech carry out basic voice unit division, corresponding musical note feature to be evaluated is extracted from speech unit sequence, for not With musical note feature load corresponding prediction model, predict corresponding standard pronunciation, then by the musical note feature of user speech with The musical note feature of standard pronunciation compares, and obtains corresponding evaluation result.
Step S106, comparing result generate, and for Speech comparison result to be labeled in user speech text, are supplied to use Family.
In the step, step S105 the is obtained evaluation knot with the comparison of voice prediction model prediction received pronunciation Fruit is labeled on the speech text using visual mode, is shown to user.User by shown evaluation result, It is whether accurate, whether smooth to understand pronunciation of the learned new content in entire paragraph.
Fig. 2 shows the schematic diagrames of Speech Assessment device according to an embodiment of the present invention.The Speech Assessment device is used for Realize that Speech Assessment Methods of the invention, the Speech Assessment device include, input voice obtains module 1, information storage module 2, voice unit division module 3, musical note feature obtain module 4, content determination module to be evaluated 5, Speech comparison analysis module 6, Comparing result generation module 7, display module 8 and voice prediction model 9.
User obtains module in the spoken language exercise link for carrying out language learning, through the input voice of Speech Assessment device 1 obtains the voice input of user, and institute's recorded speech is deposited into information storage module 2.
For example, user, after having learnt the new content of courses, such as new word or expression can enter pronunciation exercises rank Section.In the pronunciation exercises stage, other than allowing user individually practice with reading to learned new content, general teaching process Can also allow user to experience the utilization of content in authentic context, that is, provide one section comprising the learned new content sentence or Short essay is read aloud by user.For example, current the learned new content of user is word " platform ", user learnt it is described After the paraphrase and pronunciation of word, teaching software further provides example sentence " the The train now comprising the word Standing at platform 1is for Leeds ", carries out reading aloud practice for user.At this point, user is more concern In the example sentence, whether standard is smooth for pronunciation of the user to learned new word, to understand the grasp situation of its content, and Whether the pronunciation for not too much paying close attention to other words in example sentence is accurate.Therefore, in order to provide user in the example sentence about The pronunciation of " platform " is evaluated, and before user starts to read aloud the content, the input voice for starting teaching equipment obtains mould Block 1 makes it into recording state, and carries out the voice of Record and Save user to user speech when user reads aloud.
Voice unit division module 3 carries out basic voice unit division to the voice recorded for user.
The basic voice unit can be syllable, phoneme etc., by the division to the voice, obtain institute's recorded speech Basic voice unit and speech unit sequence.
Different speech recognition systems will such as be based on MFCC (Mel-Frequency based on different acoustic features Cepstrum Coefficients, MFCC cepstrum) feature acoustic model, be based on PLP (Perceptual Linear Predictive, perceive linear prediction) feature acoustic model etc., or use different acoustic models such as HMM-GMM (Hidden Markov Model-Gaussian Mixture Model, hidden Markov model-gauss hybrid models), be based on DBN The neural network acoustic model etc. of (Dynamic Beyesian Network, dynamic bayesian network), or use different solutions Code mode such as Viterbi search, A* search etc. decode voice signal.
Musical note feature obtains module 4 and obtains the speech unit sequence for analyzing the speech unit sequence Musical note feature.
The musical note feature includes prosodic features and syllable characteristic, and prosodic features includes the boundary of each basic voice unit The pronunciation duration of dead time and entire speech unit sequence between feature, pronunciation duration, adjacent basic voice unit.It is described Syllable characteristic includes the pronunciation of each basic voice unit.
Content determination module 5 to be evaluated, for carrying out feature calculation to the musical note feature extracted, if calculated result is full Sufficient predetermined condition, then using qualified voice unit as content to be evaluated.
The optimal calculation method for obtaining sub-path, the musical note feature that extraction is obtained, benefit can be used to the calculating of musical note feature Sub-path is obtained with the calculating of trained acoustic model is optimal, if optimal obtain includes the content to be evaluated to be detected in sub-path, Then determination has detected content to be evaluated.The optimal calculation formula for obtaining sub-path is:
Wherein, X represents the musical note feature vector of the speech unit sequence, and W represents the maximum optimal word sequence of score;Item Part probability P (X | W) it is acoustic model scores, it is calculated by trained acoustic model;Prior probability P (W) is language mould Type score, as to Penalty added by different acoustic models.
For example, passing through example sentence " the The train now standing at platform 1is read aloud user The optimal score path computing of for leeds. ", wherein the calculating score of " platform " is maximum, thus determine that being optimal word Sequence, therefore " platform " is used as content to be evaluated.
Speech comparison analysis module 6, for obtaining the musical note feature of content to be evaluated, and by the musical note feature and voice The received pronunciation that prediction model 9 is predicted compares and analyzes.
Speech comparison analysis module 6 obtains the musical note feature of content to be evaluated, such as obtains the musical note spy of " platform " Sign.The received pronunciation that the musical note feature is predicted with voice prediction model 9 is compared and analyzed, provide user about it is described to The evaluation result of evaluation content.
In order to further appreciate that user reads aloud the fluency situation with evaluation content, the musical note feature can also be wrapped Include the musical note feature of the context of content to be evaluated.For example, when carrying out pronunciation evaluation to " platform ", The musical note feature of " platform " further includes its context in addition to the musical note feature including " platform " word itself Musical note feature, i.e. " at ", the musical note feature of " 1 " provide pass by the comparative analysis to factors such as pronunciation duration, dead times In the evaluation result for reading aloud fluency.
Existing Speech Evaluation Technique can be used using the method that voice prediction model carries out Speech Assessment, i.e., to being recorded User speech carry out basic voice unit division, corresponding musical note feature to be evaluated is extracted from speech unit sequence, for not With musical note feature load corresponding prediction model, predict corresponding standard pronunciation, then by the musical note feature of user speech with The musical note feature of standard pronunciation compares, and obtains corresponding evaluation result.
Speech comparison result is labeled on user speech text, is supplied to user by comparing result generation module 7.
In order to be labeled to the read text of user, comparing result generation module 7 obtains Speech comparison analysis module 6 and is given Speech Assessment out is shown to use by display module 8 as a result, be labeled on the read text of user using visual mode Family.User understands whether pronunciation of the learned new content in entire paragraph be accurate, whether flows by shown evaluation result Freely.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by computer program, described program can store in a computer readable storage medium In, and executed by processor.Computer readable storage medium may include: read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc..
Better embodiment of the invention is described above, it is intended to so that spirit of the invention is more clear and convenient for managing Solution, is not meant to limit the present invention, all within the spirits and principles of the present invention, modification, replacement, the improvement made should all Within the protection scope that appended claims of the invention is summarized.

Claims (17)

1. a kind of Speech Assessment Methods carry out Speech Assessment for the voice to user's content of interest, comprising the following steps:
Step S101, input voice obtains, for passing through electronic device in the spoken language exercise link that user carries out language learning Sound pick-up outfit obtain user voice input;
Step S102, voice unit divide, and for carrying out basic voice unit division to the voice recorded, form voice unit Sequence;
Step S103, musical note feature obtain, and for analyzing the speech unit sequence, obtain the speech unit sequence Musical note feature;
Step S104, content to be evaluated determine, for carrying out feature calculation to the musical note feature extracted, if calculated result is full Sufficient predetermined condition, then using the voice unit to conform to a predetermined condition as content to be evaluated;
Step S105, Speech comparison analysis, for obtaining the musical note feature of content to be evaluated, and by the musical note feature and voice The received pronunciation of prediction model prediction compares and analyzes;
Step S106, comparing result generate, Speech comparison result are labeled on user speech text, user is supplied to.
2. Speech Assessment Methods according to claim 1, it is characterised in that:
The basic voice unit can be syllable, phoneme etc., by the division to the voice, obtain the base of institute's recorded speech This voice unit, and form speech unit sequence.
3. Speech Assessment Methods according to claim 1, it is characterised in that:
The musical note feature of the speech unit sequence includes prosodic features and syllable characteristic,
The prosodic features includes between the boundary characteristic of each basic voice unit, pronunciation duration, adjacent basic voice unit Dead time and the pronunciation duration of entire speech unit sequence etc.;
The syllable characteristic includes the pronunciation of each basic voice unit.
4. Speech Assessment Methods according to claim 1, it is characterised in that:
It, can be using optimal sub-path to calculating for the musical note feature of the speech unit sequence in the step S104 Calculation method, concrete operations are as follows:
By the musical note feature of the speech unit sequence extracted, sub-path is obtained using the calculating of trained acoustic model is optimal;
If optimal obtain includes the content to be evaluated to be detected in sub-path, it is determined that detected content to be evaluated.
5. Speech Assessment Methods according to claim 4, it is characterised in that:
The optimal calculation formula for obtaining sub-path is:
Wherein,
X represents the musical note feature vector of the speech unit sequence, and W represents the maximum optimal word sequence of score;
Conditional probability P (X | W) it is acoustic model scores, it is calculated by trained acoustic model;
Prior probability P (W) is language model scores, as to Penalty added by different acoustic models.
6. Speech Assessment Methods according to claim 1, it is characterised in that:
The musical note feature of the content to be evaluated can also include the musical note feature of context etc. of content to be evaluated.
7. Speech Assessment Methods according to claim 1, it is characterised in that:
The step S105 further comprises:
Basic voice unit division is carried out to the user speech recorded;
Corresponding musical note feature to be evaluated is extracted from speech unit sequence;
Corresponding prediction model is loaded for different musical note features, predicts corresponding standard pronunciation;
The musical note feature of the musical note feature of user speech and standard pronunciation is compared, corresponding evaluation result is obtained.
8. a kind of Speech Assessment device carries out Speech Assessment for the voice to user's content of interest, including input voice obtains Modulus block, information storage module, voice unit division module, musical note feature obtain module, content determination module to be evaluated, voice Comparative analysis module, evaluation module and comparing result generation module, it is characterised in that:
It inputs voice and obtains module, the voice for obtaining user inputs, and the storage of the user speech of recording is stored to information In module;
Voice unit division module obtains the recorded speech for carrying out basic voice unit division to institute's recorded speech Speech unit sequence;
Musical note feature obtains module, for carrying out feature extraction to the speech unit sequence, obtains the speech unit sequence Musical note feature;
Content determination module to be evaluated, for carrying out feature calculation to the musical note feature extracted, if calculated result meets in advance Fixed condition, then using qualified voice unit as content to be evaluated;
Speech comparison analysis module, for obtaining the musical note feature of content to be evaluated, and by the musical note feature and voice prediction The received pronunciation of model prediction compares and analyzes;
Comparing result generation module is supplied to user for Speech Assessment result to be labeled in user speech text.
9. Speech Assessment device according to claim 8, it is characterised in that:
The basic voice unit can be syllable, phoneme etc., by the division to the voice, obtain the base of institute's recorded speech This voice unit and speech unit sequence.
10. Speech Assessment device according to claim 8, it is characterised in that:
The musical note feature of the speech unit sequence includes prosodic features and syllable characteristic,
Prosodic features includes the pause between the boundary characteristic of each basic voice unit, pronunciation duration, adjacent basic voice unit The pronunciation duration of time and entire speech unit sequence;
The syllable characteristic includes the pronunciation of each basic voice unit.
11. Speech Assessment device according to claim 8, it is characterised in that:
For the content determination module to be evaluated, optimal branch can be used to the calculating of the musical note feature of speech unit sequence The calculation method of diameter, comprising:
The musical note feature that obtained speech unit sequence will be extracted obtains sub-path using the calculating of trained acoustic model is optimal;
If optimal obtain includes the content to be evaluated to be detected in sub-path, it is determined that detected content to be evaluated.
12. Speech Assessment device according to claim 11, it is characterised in that:
The optimal calculation formula for obtaining sub-path is:
Wherein,
X represents the musical note feature vector of the speech unit sequence, and W represents the maximum optimal word sequence of score;
Conditional probability P (X | W) it is acoustic model scores, it is calculated by trained acoustic model;
Prior probability P (W) is language model scores, as to Penalty added by different acoustic models.
13. Speech Assessment device according to claim 8, it is characterised in that:
The musical note feature of the content to be evaluated can also include the musical note feature of the context of content to be evaluated.
14. Speech Assessment device according to claim 8, it is characterised in that:
For the Speech comparison analysis module, include: using the operation that voice prediction model carries out Speech Assessment
Basic voice unit division is carried out to the user speech recorded;
Corresponding musical note feature to be evaluated is extracted from speech unit sequence;
Corresponding prediction model is loaded for different musical note features, predicts corresponding standard pronunciation;
The musical note feature of the musical note feature of user speech and standard pronunciation is compared, corresponding evaluation result is obtained.
15. Speech Assessment device according to claim 8, it is characterised in that: further include display module, for that will be commented with voice The user speech text of valence result mark is shown to user.
16. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the side such as any one of claim 1-7 may be implemented when executing described program in the processor Method step.
17. a kind of computer storage medium, which stores the programs that can be computer-executed, can be real when executing described program Now such as the method and step of any one of claim 1-7.
CN201710981866.9A 2017-10-20 2017-10-20 Voice evaluation method and device Active CN109697975B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710981866.9A CN109697975B (en) 2017-10-20 2017-10-20 Voice evaluation method and device
PCT/CN2017/111818 WO2019075827A1 (en) 2017-10-20 2017-11-20 Voice evaluation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710981866.9A CN109697975B (en) 2017-10-20 2017-10-20 Voice evaluation method and device

Publications (2)

Publication Number Publication Date
CN109697975A true CN109697975A (en) 2019-04-30
CN109697975B CN109697975B (en) 2021-05-14

Family

ID=66173188

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710981866.9A Active CN109697975B (en) 2017-10-20 2017-10-20 Voice evaluation method and device

Country Status (2)

Country Link
CN (1) CN109697975B (en)
WO (1) WO2019075827A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113192494A (en) * 2021-04-15 2021-07-30 辽宁石油化工大学 Intelligent English language identification and output system and method
CN115346421A (en) * 2021-05-12 2022-11-15 北京猿力未来科技有限公司 Spoken language fluency scoring method, computing device and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113539272A (en) * 2021-09-13 2021-10-22 腾讯科技(深圳)有限公司 Voice recognition method and device, storage medium and electronic equipment

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727764A (en) * 2008-10-21 2010-06-09 微星科技股份有限公司 Method and device for assisting in correcting pronunciation
CN101739870A (en) * 2009-12-03 2010-06-16 深圳先进技术研究院 Interactive language learning system and method
KR20110024624A (en) * 2009-09-02 2011-03-09 주식회사 케이티 System and method for evaluating foreign language pronunciation
CN102930866A (en) * 2012-11-05 2013-02-13 广州市神骥营销策划有限公司 Evaluation method for student reading assignment for oral practice
CN103617799A (en) * 2013-11-28 2014-03-05 广东外语外贸大学 Method for detecting English statement pronunciation quality suitable for mobile device
US8744856B1 (en) * 2011-02-22 2014-06-03 Carnegie Speech Company Computer implemented system and method and computer program product for evaluating pronunciation of phonemes in a language
CN104361896A (en) * 2014-12-04 2015-02-18 上海流利说信息技术有限公司 Voice quality evaluation equipment, method and system
CN104485115A (en) * 2014-12-04 2015-04-01 上海流利说信息技术有限公司 Pronunciation evaluation equipment, method and system
KR101562222B1 (en) * 2014-07-22 2015-10-23 조광호 Apparatus for evaluating accuracy of pronunciation and method thereof
CN105513612A (en) * 2015-12-02 2016-04-20 广东小天才科技有限公司 Language vocabulary audio processing method and device
CN105825852A (en) * 2016-05-23 2016-08-03 渤海大学 Oral English reading test scoring method
CN105845134A (en) * 2016-06-14 2016-08-10 科大讯飞股份有限公司 Spoken language evaluation method through freely read topics and spoken language evaluation system thereof

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1510590A (en) * 2002-12-24 2004-07-07 英业达股份有限公司 Language learning system and method with visual prompting to pronunciaton
CN100514446C (en) * 2004-09-16 2009-07-15 北京中科信利技术有限公司 Pronunciation evaluating method based on voice identification and voice analysis
CN103794210A (en) * 2012-10-29 2014-05-14 无敌科技(西安)有限公司 Mandarin voice evaluating system and mandarin voice evaluating method
CN106531185B (en) * 2016-11-01 2019-12-13 云知声(上海)智能科技有限公司 voice evaluation method and system based on voice similarity

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727764A (en) * 2008-10-21 2010-06-09 微星科技股份有限公司 Method and device for assisting in correcting pronunciation
KR20110024624A (en) * 2009-09-02 2011-03-09 주식회사 케이티 System and method for evaluating foreign language pronunciation
CN101739870A (en) * 2009-12-03 2010-06-16 深圳先进技术研究院 Interactive language learning system and method
US8744856B1 (en) * 2011-02-22 2014-06-03 Carnegie Speech Company Computer implemented system and method and computer program product for evaluating pronunciation of phonemes in a language
CN102930866A (en) * 2012-11-05 2013-02-13 广州市神骥营销策划有限公司 Evaluation method for student reading assignment for oral practice
CN103617799A (en) * 2013-11-28 2014-03-05 广东外语外贸大学 Method for detecting English statement pronunciation quality suitable for mobile device
KR101562222B1 (en) * 2014-07-22 2015-10-23 조광호 Apparatus for evaluating accuracy of pronunciation and method thereof
CN104361896A (en) * 2014-12-04 2015-02-18 上海流利说信息技术有限公司 Voice quality evaluation equipment, method and system
CN104485115A (en) * 2014-12-04 2015-04-01 上海流利说信息技术有限公司 Pronunciation evaluation equipment, method and system
CN105513612A (en) * 2015-12-02 2016-04-20 广东小天才科技有限公司 Language vocabulary audio processing method and device
CN105825852A (en) * 2016-05-23 2016-08-03 渤海大学 Oral English reading test scoring method
CN105845134A (en) * 2016-06-14 2016-08-10 科大讯飞股份有限公司 Spoken language evaluation method through freely read topics and spoken language evaluation system thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王冠雄: "《声学建模中若干问题的研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113192494A (en) * 2021-04-15 2021-07-30 辽宁石油化工大学 Intelligent English language identification and output system and method
CN115346421A (en) * 2021-05-12 2022-11-15 北京猿力未来科技有限公司 Spoken language fluency scoring method, computing device and storage medium

Also Published As

Publication number Publication date
CN109697975B (en) 2021-05-14
WO2019075827A1 (en) 2019-04-25

Similar Documents

Publication Publication Date Title
CN105185373B (en) The generation of prosody hierarchy forecast model and prosody hierarchy Forecasting Methodology and device
CN102360543B (en) HMM-based bilingual (mandarin-english) TTS techniques
CN109697988B (en) Voice evaluation method and device
CN109858038A (en) A kind of text punctuate determines method and device
CN101551947A (en) Computer system for assisting spoken language learning
US9489864B2 (en) Systems and methods for an automated pronunciation assessment system for similar vowel pairs
CN110246488A (en) Half optimizes the phonetics transfer method and device of CycleGAN model
CN112466279B (en) Automatic correction method and device for spoken English pronunciation
CN102184654B (en) Reading supervision method and device
Cucchiarini et al. Second language learners' spoken discourse: Practice and corrective feedback through automatic speech recognition
Kyriakopoulos et al. A deep learning approach to assessing non-native pronunciation of English using phone distances
CN110600002A (en) Voice synthesis method and device and electronic equipment
CN109697975A (en) A kind of Speech Assessment Methods and device
CN104700831B (en) The method and apparatus for analyzing the phonetic feature of audio file
Bang et al. An automatic feedback system for English speaking integrating pronunciation and prosody assessments
Dong et al. The application of big data to improve pronunciation and intonation evaluation in foreign language learning
Li et al. English sentence pronunciation evaluation using rhythm and intonation
Tao et al. A BLSTM Guided Unit Selection Synthesis System for Blizzard Challenge 2016
Ungureanu et al. pROnounce: Automatic Pronunciation Assessment for Romanian
CN112951276B (en) Method and device for comprehensively evaluating voice and electronic equipment
Ming et al. A Mandarin e‐learning system based on speech recognition and evaluation
Bao et al. An Auxiliary Teaching System for Spoken English Based on Speech Recognition Technology
Proença et al. Automatic annotation of disfluent speech in children’s reading tasks
Varatharaj Developing Automated Audio Assessment Tools for a Chinese Language Course
Varatharaj et al. Hao Fayin: developing automated audio assessment tools for a Chinese language course

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 518000 Jianda Industrial Park, Xin'an Street, Baoan District, Shenzhen City, Guangdong Province, 202B, 2nd floor, 1 building

Applicant after: Shenzhen Yingshuo Education Service Co.,Ltd.

Address before: 518100 Guangdong city of Shenzhen province Baoan District Xin'an three industrial zone 1 road Cantor Fitzgerald building two floor 202B

Applicant before: SHENZHEN YINGSHUO AUDIO TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
CB02 Change of applicant information

Address after: 518000 Jianda Industrial Park, Xin'an Street, Baoan District, Shenzhen City, Guangdong Province, 202B, 2nd floor, 1 building

Applicant after: Shenzhen YINGSHUO Education Service Co.,Ltd.

Address before: 518000 Jianda Industrial Park, Xin'an Street, Baoan District, Shenzhen City, Guangdong Province, 202B, 2nd floor, 1 building

Applicant before: Shenzhen Yingshuo Education Service Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Room 301, building D, Hongwei Industrial Zone, No.6 Liuxian 3rd road, Xingdong community, Xin'an street, Bao'an District, Shenzhen City, Guangdong Province

Patentee after: Shenzhen Yingshuo Intelligent Technology Co.,Ltd.

Address before: 518000 202b, 2nd floor, building 1, Jianda Industrial Park, Xin'an street, Bao'an District, Shenzhen City, Guangdong Province

Patentee before: Shenzhen YINGSHUO Education Service Co.,Ltd.

CP03 Change of name, title or address