CN109697988A - A kind of Speech Assessment Methods and device - Google Patents

A kind of Speech Assessment Methods and device Download PDF

Info

Publication number
CN109697988A
CN109697988A CN201710996819.1A CN201710996819A CN109697988A CN 109697988 A CN109697988 A CN 109697988A CN 201710996819 A CN201710996819 A CN 201710996819A CN 109697988 A CN109697988 A CN 109697988A
Authority
CN
China
Prior art keywords
speech
voice
user
musical note
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710996819.1A
Other languages
Chinese (zh)
Other versions
CN109697988B (en
Inventor
卢炀
宾晓皎
李明
蔡泽鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yingshuo Intelligent Technology Co.,Ltd.
Original Assignee
Shenzhen Yingshuo Audio Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yingshuo Audio Technology Co Ltd filed Critical Shenzhen Yingshuo Audio Technology Co Ltd
Priority to CN201710996819.1A priority Critical patent/CN109697988B/en
Priority to PCT/CN2017/111822 priority patent/WO2019075828A1/en
Publication of CN109697988A publication Critical patent/CN109697988A/en
Application granted granted Critical
Publication of CN109697988B publication Critical patent/CN109697988B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/04Electrically-operated educational appliances with audible presentation of the material to be studied
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The present invention provides a kind of Speech Assessment Methods, for evaluating in Course of Language Learning the language pronouncing of user, comprising the following steps: step S101 is inputted by the voice that the sound pick-up outfit of Speech Assessment device obtains user;Step S102 carries out basic voice unit division to institute's recorded speech, obtains the speech unit sequence of the recorded speech;Step S103 carries out feature extraction to the speech unit sequence, obtains the musical note feature of the speech unit sequence;Step S104, the received pronunciation by the musical note feature extracted respectively with teaching sample voice and voice prediction model prediction compare and analyze;Speech comparison result is labeled on user speech text by step S105.

Description

A kind of Speech Assessment Methods and device
Technical field
The present invention relates to Teaching Technology of Multimedia field more particularly to a kind of Speech Assessment Methods for multimedia teaching And device.
Background technique
Language plays a very important role in life and work as a kind of media of communication, and whether student is learning In the stage of work, verbal learning is all the learning Content that people pay much attention to by the stage of school study or people.And with net The continuous of network teaching is popularized, and the mode of online teaching by users because by the constraint of time and place of giving lessons, not liked. Therefore, many users are more willing to spend one's leisure at present, carry out language learning by network.
In current network teaching process, when carrying out pronunciation exercises, a kind of mode is played in video (or audio) After one section of voice, provide one section of free time voluntarily carried out by user with read practice;Either by the way of recording, learning For member whether with backward student's playback of reading, it is accurate to be pronounced by student's self-assessment;Or it can also be carried out by teacher online Teaching, the pronunciation for student provide guidance and suggestion.Above-mentioned existing teaching method or the pronunciation that can not be directed to student be given Targetedly instruction out, causes learning effect bad or needs teacher's online teaching, need a large amount of human and material resources And financial support.
It is evaluated to solve the above problems, being currently suggested according to voice of the voice prediction model to student. CN101197084A discloses a kind of automatic spoken English evaluating and learning system, it is characterised in that the system includes detection mouth Language pronunciation part, the detection spoken language pronunciation part divide the foundation the following steps are included: (1) standard pronunciation people's corpus: 1) seeking Look for English Standard speaker;2) principle design the first recording text balanced according to Oral English Practice study requirement and phoneme;3) it marks Quasi- speaker control recording text is recorded;(2) collection of oral evaluation corpus: in simulation English study software application ring Under border, design the second recording text is required according to English study, while finding general speaker, and to the spoken language of general speaker Pronunciation is recorded;(3) mark of oral evaluation corpus: whether just expert marks the pronunciation of phoneme in each word in detail Really;(4) foundation of received pronunciation acoustic model: based in standard pronunciation people's corpus recording and its associated text, instruction Practice the acoustic model of received pronunciation;(5) it calculates the error detection parameter of voice: 1) extracting the MFCC cepstrum parameter of voice;2) base The recording of general speaker and its corresponding aligned phoneme sequence of text in standard acoustic model, and evaluation and test corpus, will be to one As the automatic segmentation of speaker speech data at each segment as unit of phoneme, while based on master pattern each sound is calculated First likelihood value of the Duan Zuowei phoneme;3) it is identified with each segment of the standard acoustic model to general speaker speech, Second likelihood value of the segment as recognition result phoneme is calculated based on standard acoustic model simultaneously;4) seemingly by segment first So value obtains the likelihood ratio of the segment, the error detection parameter as the sound bite divided by the second likelihood value;(6) error detection ginseng is established Number marks the error detection mapping model of pronunciation mistake to expert: on a batch evaluation and test voice, each segment being evaluated and tested parameter and sound The formant sequence of section and the detailed mark of expert are associated, and are obtained above-mentioned parameter with the method for statistics and are marked in detail with expert The corresponding relationship of note saves these relationships as the error detection mapping model between error label of pronouncing from error detection parameter to expert.
CN101650886A discloses a kind of method of automatic detection reading errors of language learners, which is characterized in that packet Containing following steps: 1) front-end processing: pre-processing input voice, carries out feature extraction, and extracted feature is MFCC feature Vector;2) building simplify search space: the content that user to be read aloud as Key for Reference, and according to Key for Reference, pronounce The search space that dictionary, more pronunciation models and acoustics model construction are simplified;3) language model is read aloud in building: according to Key for Reference structure That builds user reads aloud language model, which describes the context that user may read aloud when reading aloud the reference statement Content and its probabilistic information;4) it searches for: in search space, according to acoustic model, reading aloud language model and more pronunciation models are searched Rope obtains actually reading aloud resultant content with the most matched paths of characteristic vector stream of input as user, being made into identification As a result sequence;5) it is aligned: the Key for Reference is aligned with recognition result, obtain user's mostly reading, skip, detection mispronounced As a result.
The corresponding voice snippet of each basic voice unit in voice signal is obtained using speech recognition system in the prior art, The voice snippet of acquisition is merged, the efficient voice fragment sequence for corresponding to the voice signal is obtained, from effective language Evaluation and test feature is extracted in sound fragment sequence, loads score in predicting model corresponding with the evaluation and test characteristic type of feature;Meter The similarity that the evaluation and test feature corresponds to the score in predicting model is calculated, and using the similarity as the voice signal Score.But user often learns according to the voice sample of teacher in instructional video (audio) when actually carrying out language learning Practise pronunciation, and teacher's voice sample is often because of personalized reason, the standard pronunciation that can not go out with voice prediction model prediction It is completely the same.Therefore, tested and assessed with voice prediction model to user pronunciation, the standard pronunciation predicted often with teaching language In some aspects upper not quite identical (such as tone, the rhythm), the evaluation result provided in this way is user speech and pre- to sound example The comparing result for surveying voice can not really reflect the comparing result of user speech and voice sample of imparting knowledge to students.
Therefore, it is necessary to provide a kind of Speech Assessment Methods, the evaluation result gone out by voice prediction model evaluation is being provided While, the evaluation result of the voice sample that can also provide and impart knowledge to students comparison, so that user be made to fully understand oneself study feelings Condition.
Summary of the invention
For this purpose, how to be provided a user simultaneously the technical problem to be solved by the present invention is to during language learning The evaluation result compared with the evaluation result of teaching sample voice comparison and the received pronunciation of voice prediction model prediction, with side User is helped to fully understand itself study situation.
For this purpose, the present invention provides a kind of Speech Assessment Methods, in Course of Language Learning to the language pronouncing of user It is evaluated, it is characterised in that:
Step S101 is inputted by the voice that the sound pick-up outfit of Speech Assessment device obtains user;
Step S102 carries out basic voice unit division to institute's recorded speech, obtains the voice unit sequence of the recorded speech Column;
Step S103 carries out feature extraction to the speech unit sequence, obtains the musical note feature of the speech unit sequence;
Step S104, by the musical note feature extracted respectively with teaching sample voice and voice prediction model prediction mark Quasi- voice compares and analyzes;
Speech comparison result is labeled on user speech text by step S105.
The basic voice unit can be syllable, phoneme etc., by the division to the recorded speech, to be somebody's turn to do The basic voice unit and speech unit sequence of recorded speech.
The musical note feature includes prosodic features and syllable characteristic, and prosodic features includes the boundary of each basic voice unit The pronunciation duration of dead time and entire speech unit sequence between feature, pronunciation duration, adjacent basic voice unit, it is described Syllable characteristic includes the pronunciation of each basic voice unit and the pronunciation of entire speech unit sequence.
The process that sample voice compares and analyzes includes: with imparting knowledge to students
The teaching sample voice saved in acquisition system;
Basic voice unit division is carried out to teaching sample voice, obtains the basic voice unit and language of teaching sample voice Sound unit sequence;
Extract the musical note feature of teaching speech unit sequence, the musical note feature and user's language of the teaching speech unit sequence The musical note feature of sound unit sequence is corresponding;
The musical note feature of user speech unit sequence and the musical note feature of teaching speech unit sequence are compared, provided Corresponding evaluation result.
Include: using the process that voice prediction model carries out Speech Assessment
Basic voice unit division is carried out to the user speech recorded, is extracted from speech unit sequence corresponding wait test and assess Musical note feature;
Corresponding prediction model is loaded for different musical note features, predicts corresponding standard pronunciation;
The musical note feature of the musical note feature of user speech and standard pronunciation is compared, corresponding evaluation result is obtained.
Speech comparison result annotation process specifically includes:
The user speech that will be recorded, is converted into speech text;
By the evaluation result of teaching sample voice comparison obtained and the received pronunciation pair of voice prediction model prediction The evaluation result of ratio is labeled on the speech text respectively using visual mode, is shown to user.
The present invention also provides a kind of Speech Assessment device, the Speech Assessment device includes recording module, memory module, language Sound processing module, characteristic extracting module, speech analysis module, evaluation module, labeling module and display module, feature exist In:
Recording module, the voice for obtaining user input;
Speech processing module obtains the language of the recorded speech for carrying out basic voice unit division to institute's recorded speech Sound unit sequence;
Characteristic extracting module carries out feature extraction to the speech unit sequence, obtains the musical note of the speech unit sequence Feature;
Speech analysis module, by the musical note feature extracted respectively with teaching sample voice and voice prediction model prediction Received pronunciation compare and analyze;
Speech Assessment result is labeled on user speech text by labeling module.
The Speech Assessment device further includes display module, for that will have the user speech text of Speech Assessment result mark Originally it is shown to user.
Speech Assessment Methods and device of the invention, by providing a user user speech simultaneously and sample voice of imparting knowledge to students Evaluation result and evaluation result with the received pronunciation of voice prediction model prediction, make user fully understand the pronunciation feelings of oneself Condition improves the accuracy of pronunciation.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, institute in being described below to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without creative efforts, can also implement according to the present invention The content of example and these attached drawings obtain other attached drawings.
Fig. 1 is the flow chart of Speech Assessment Methods according to an embodiment of the present invention;With
Fig. 2 is the structure chart of Speech Assessment device according to an embodiment of the present invention.
Specific embodiment
It should be mentioned that some exemplary embodiments are described as before exemplary embodiment is discussed in greater detail The processing or method described as flow chart.Although operations are described as the processing of sequence by flow chart, therein to be permitted Multioperation can be implemented concurrently, concomitantly or simultaneously.In addition, the sequence of operations can be rearranged.When it The processing can be terminated when operation completion, it is also possible to have the additional step being not included in attached drawing.
Alleged " Speech Assessment device " is " computer equipment " within a context, and referring to can be by running preset program Or instruct to execute the intelligent electronic device of the predetermined process processes such as numerical value calculating and/or logic calculation, it may include processing Device and memory execute the survival prestored in memory instruction by processor to execute predetermined process process, or by ASIC, The hardware such as FPGA, DSP execute predetermined process process, or are realized by said two devices combination.
The computer equipment includes user equipment and/or the network equipment.Wherein, the user equipment includes but is not limited to Computer, smart phone, PDA etc.;The network equipment includes but is not limited to single network server, multiple network servers composition Server group or be based on cloud computing (Cloud Computing) cloud consisting of a large number of computers or network servers, In, cloud computing is one kind of distributed computing, a super virtual computer consisting of a loosely coupled set of computers. Wherein, the computer equipment can isolated operation realize the present invention, also can access network and by with other meters in network The interactive operation of machine equipment is calculated to realize the present invention.Wherein, network locating for the computer equipment includes but is not limited to interconnect Net, wide area network, Metropolitan Area Network (MAN), local area network, VPN network etc..
Those skilled in the art will be understood that heretofore described " Speech Assessment device " can be only user equipment, Corresponding operation is executed by user equipment;It is also possible to be integrated by user equipment and the network equipment or server come group At being matched by user equipment with the network equipment to execute corresponding operation.
It should be noted that the user equipment, the network equipment and network etc. are only for example, other are existing or from now on may be used The computer equipment or network that can occur such as are applicable to the present invention, should also be included within the scope of protection of the present invention, and to draw It is incorporated herein with mode.
Here, those skilled in the art will be understood that present invention can apply to mobile terminals and non-moving end, for example, when using When family uses mobile phone or PC, it can be provided and be presented using method or apparatus of the present invention.
Specific structure and function details disclosed herein are only representative, and are for describing the present invention show The purpose of example property embodiment.But the present invention can be implemented by many alternative forms, and be not interpreted as It is limited only by the embodiments set forth herein.
Term used herein above is not intended to limit exemplary embodiment just for the sake of description specific embodiment.Unless Context clearly refers else, otherwise singular used herein above "one", " one " also attempt to include plural number.Also answer When understanding, term " includes " and/or "comprising" used herein above provide stated feature, integer, step, operation, The presence of unit and/or component, and do not preclude the presence or addition of other one or more features, integer, step, operation, unit, Component and/or combination thereof.
It should further be mentioned that the function action being previously mentioned can be attached according to being different from some replace implementations The sequence indicated in figure occurs.For example, related function action is depended on, the two width figures shown in succession actually may be used Substantially simultaneously to execute or can execute in a reverse order sometimes.
Present invention is further described in detail with reference to the accompanying drawing.
Fig. 1 shows the flow chart of Speech Assessment Methods of the invention.
In step S101, user is carrying out the spoken with reading to pass through the recording of Speech Assessment device in link of language learning Equipment records the voice input of user.
Specifically, user, into reading link, triggers voice after having learnt the voice sample in teaching courseware at this time Sound pick-up outfit in evaluating apparatus, makes it into recording state.When user starts with reading voice sample, sound pick-up outfit starts to record User speech processed, and being stored in the memory module of Speech Assessment device with reading voice by user, make for further analysis With.
In step S102, the user recorded in memory module is obtained with reading voice, basic voice is carried out to institute's recorded speech Dividing elements obtain speech unit sequence of the recorded user with reading voice.
The basic voice unit can be syllable, phoneme etc., by the division to the recorded speech, to be somebody's turn to do The basic voice unit and speech unit sequence of recorded speech.
Different speech recognition systems will such as be based on MFCC (Mel-Frequency based on different acoustic features Cepstrum Coefficients, MFCC cepstrum) feature acoustic model, be based on PLP (Perceptual Linear Predictive, perceive linear prediction) feature acoustic model etc., or use different acoustic models such as HMM-GMM (Hidden Markov Model-Gaussian Mixture Model, hidden Markov model-gauss hybrid models), be based on DBN The neural network acoustic model etc. of (Dynamic BeyesianNetwork, dynamic bayesian network), or use different decodings Mode such as Viterbi search, A* search etc., decodes voice signal.
Step S103 carries out feature extraction to the speech unit sequence, obtains the musical note feature of the speech unit sequence.
The musical note feature includes prosodic features and syllable characteristic, and prosodic features includes the boundary of each basic voice unit The pronunciation duration of dead time and entire speech unit sequence between feature, pronunciation duration, adjacent basic voice unit, it is described Syllable characteristic includes the pronunciation of each basic voice unit and the pronunciation of entire speech unit sequence.
Step S104, by the musical note feature extracted respectively with teaching sample voice and voice prediction model prediction mark Quasi- voice compares and analyzes.
Wherein, the process compared and analyzed with teaching sample voice is as follows, the teaching example language saved in acquisition system Sound carries out basic voice unit division to teaching sample voice, to obtain the basic voice unit and language of teaching sample voice Sound unit sequence, the musical note feature of onestep extraction of going forward side by side teaching speech unit sequence, the musical note of the teaching speech unit sequence Feature is corresponding with the musical note feature of user speech unit sequence.By the musical note feature of user speech unit sequence and teaching voice The musical note feature of unit sequence compares, and provides corresponding evaluation result.
Existing Speech Evaluation Technique can be used using the method that voice prediction model carries out Speech Assessment, i.e., to being recorded User speech carry out basic voice unit division, corresponding musical note feature to be tested and assessed is extracted from speech unit sequence, for not With musical note feature load corresponding prediction model, predict corresponding standard pronunciation, then by the musical note feature of user speech with The musical note feature of standard pronunciation compares, and obtains corresponding evaluation result.
Speech comparison result is labeled on user speech text, is supplied to user by step S105.
In this step, by speech processing module, the user speech that will further be recorded is converted into speech text. The received pronunciation pair for the evaluation result and voice prediction model prediction compared with teaching sample voice that step S104 is obtained The evaluation result of ratio is labeled on the speech text respectively using visual mode, is shown to user.User is by showing The evaluation result shown, it can be realized that its difference pronounced with exemplary pronunciation of imparting knowledge to students, and it is pre- with voice prediction model The difference of the pronunciation of the received pronunciation of survey, so that user fully understands what the pronunciation of its read text has, side User is helped to further increase pronunciation standard type.The comparing result may include the pronunciation evaluation of basic voice unit, basic voice The pronunciation duration of unit is evaluated, full text fluency is evaluated etc..
Fig. 2 shows Speech Assessment devices according to an embodiment of the present invention.The Speech Assessment device is for realizing this hair Bright Speech Assessment Methods provide the evaluation result with sample voice of imparting knowledge to students to user after user carries out spoken language with reading simultaneously And the evaluation result of the received pronunciation gone out with voice prediction model prediction.The Speech Assessment device includes recording module 1, deposits Store up module 2, speech processing module 3, characteristic extracting module 4, speech analysis module 5, labeling module 6 and display module 7.
User is carrying out the spoken with reading in link of language learning, by the recording module 1 of Speech Assessment device to user Voice input recorded.
Specifically, user has been after having learnt the voice sample in teaching courseware, into reading link, and triggers voice and comments Recording module 1 in valence device, makes it into recording state.When user starts with reading voice sample, recording module 1 starts to record User speech processed, and being stored in the memory module 2 of Speech Assessment device with reading voice by user, make for further analysis With.
Speech processing module 3 obtains the user recorded in memory module 2 with reading voice, and carries out to institute's recorded speech basic Voice unit divides.
The basic voice unit can be syllable, phoneme etc., by the division to the recorded speech, to be somebody's turn to do The basic voice unit and speech unit sequence of recorded speech.
After speech processing module 3 has divided the basic voice unit of recorded speech, characteristic extracting module 4 is further right Speech unit sequence generated carries out feature extraction, to obtain the musical note feature of the speech unit sequence.
The musical note feature includes prosodic features and syllable characteristic, and prosodic features includes the boundary of each basic voice unit The pronunciation duration of dead time and entire speech unit sequence between feature, pronunciation duration, adjacent basic voice unit, it is described Syllable characteristic includes the pronunciation of each basic voice unit and the pronunciation of entire speech unit sequence.
Speech analysis module 5 by the musical note feature extracted respectively with teaching sample voice and voice prediction model prediction Received pronunciation compare and analyze.
Wherein, the process compared and analyzed with teaching sample voice is as follows, and speech analysis module 5 obtains memory module 2 The teaching sample voice of middle preservation carries out basic voice unit division to teaching sample voice, to obtain teaching sample voice Basic voice unit and speech unit sequence, onestep extraction of going forward side by side impart knowledge to students speech unit sequence musical note feature, the teaching The musical note feature of speech unit sequence and the musical note feature of user speech unit sequence are corresponding.By user speech unit sequence Musical note feature and the musical note feature of teaching speech unit sequence compare, and provide corresponding evaluation result.
Existing Speech Evaluation Technique can be used using the method that voice prediction model carries out Speech Assessment, i.e., to being recorded User speech carry out basic voice unit division, corresponding musical note feature to be tested and assessed is extracted from speech unit sequence, for not With musical note feature load corresponding prediction model, predict corresponding standard pronunciation, then by the musical note feature of user speech with The musical note feature of standard pronunciation compares, and obtains corresponding evaluation result.
Speech comparison result is labeled on user speech by labeling module 6, and is supplied to user by display module 7.
Specifically by speech processing module 3, the user speech that will further be recorded is converted into speech text.Using Speech analysis module 5 is analyzed the obtained evaluation result and voice prediction with teaching sample voice comparison by visual mode The evaluation result of the received pronunciation comparison of model prediction, is labeled on the speech text, and aobvious by display module fast 7 respectively Show to user.User passes through shown evaluation result, it can be realized that its difference pronounced with exemplary pronunciation of imparting knowledge to students, And the difference with the pronunciation of the received pronunciation of voice prediction model prediction, so that user fully understands its read text There is what in pronunciation, user is helped to further increase pronunciation standard type.The comparing result may include basic voice unit Pronunciation evaluation, basic voice unit pronunciation duration evaluation, full text fluency evaluation etc..
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by computer program, which can store in a computer readable storage medium In, and executed by processor.Computer readable storage medium may include: read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc..
Better embodiment of the invention is described above, it is intended to so that spirit of the invention is more clear and convenient for managing Solution, is not meant to limit the present invention, all within the spirits and principles of the present invention, modification, replacement, the improvement made should all Within the protection scope that appended claims of the invention is summarized.

Claims (15)

1. a kind of Speech Assessment Methods, for evaluating in Course of Language Learning the language pronouncing of user, feature exists In:
Step S101 is inputted by the voice that the sound pick-up outfit of Speech Assessment device obtains user;
Step S102 carries out basic voice unit division to institute's recorded speech, obtains the speech unit sequence of the recorded speech;
Step S103 carries out feature extraction to the speech unit sequence, obtains the musical note feature of the speech unit sequence;
Step S104, by the musical note feature extracted respectively with teaching sample voice and voice prediction model prediction standard speech Sound compares and analyzes;
Speech comparison result is labeled on user speech text by step S105.
2. Speech Assessment Methods according to claim 1, it is characterised in that:
The basic voice unit can be syllable, phoneme etc., by the division to the recorded speech, to obtain the recording The basic voice unit and speech unit sequence of voice.
3. Speech Assessment Methods according to claim 1, it is characterised in that:
The musical note feature includes prosodic features and syllable characteristic, and prosodic features includes that the boundary of each basic voice unit is special The pronunciation duration of dead time and entire speech unit sequence between sign, pronunciation duration, adjacent basic voice unit;
The syllable characteristic includes the pronunciation of each basic voice unit and the pronunciation of entire speech unit sequence.
4. Speech Assessment Methods according to claim 1, it is characterised in that:
The process that sample voice compares and analyzes includes with imparting knowledge to students,
The teaching sample voice saved in acquisition system;
Basic voice unit division is carried out to teaching sample voice, obtains the basic voice unit and voice list of teaching sample voice Metasequence;
Extract the musical note feature of teaching speech unit sequence, the musical note feature and user speech list of the teaching speech unit sequence The musical note feature of metasequence is corresponding;
The musical note feature of user speech unit sequence and the musical note feature of teaching speech unit sequence are compared, provided corresponding Evaluation result.
5. Speech Assessment Methods according to claim 1, it is characterised in that:
Include using the process that voice prediction model carries out Speech Assessment,
Basic voice unit division is carried out to the user speech recorded, corresponding musical note to be tested and assessed is extracted from speech unit sequence Feature;
Corresponding prediction model is loaded for different musical note features, predicts corresponding standard pronunciation;
The musical note feature of the musical note feature of user speech and standard pronunciation is compared, corresponding evaluation result is obtained.
6. Speech Assessment Methods according to claim 1, it is characterised in that:
Speech comparison result annotation process specifically includes,
The user speech that will be recorded, is converted into speech text;
By the received pronunciation comparison of the evaluation result of teaching sample voice comparison obtained and voice prediction model prediction Evaluation result is labeled on the speech text respectively using visual mode, is shown to user.
7. a kind of Speech Assessment device, the Speech Assessment device includes recording module, memory module, speech processing module, spy Levy extraction module, speech analysis module and labeling module, it is characterised in that:
Recording module, the voice for obtaining user input;
Speech processing module obtains the voice list of the recorded speech for carrying out basic voice unit division to institute's recorded speech Metasequence;
Characteristic extracting module carries out feature extraction to the speech unit sequence, obtains the musical note feature of the speech unit sequence;
Speech analysis module, by the musical note feature extracted respectively with teaching sample voice and voice prediction model prediction mark Quasi- voice compares and analyzes;
Speech Assessment result is labeled on user speech text by labeling module.
8. Speech Assessment device according to claim 7, it is characterised in that:
The basic voice unit can be syllable, phoneme etc., by the division to the recorded speech, to obtain the recording The basic voice unit and speech unit sequence of voice.
9. Speech Assessment device according to claim 7, it is characterised in that:
The musical note feature includes prosodic features and syllable characteristic, and prosodic features includes that the boundary of each basic voice unit is special The pronunciation duration of dead time and entire speech unit sequence between sign, pronunciation duration, adjacent basic voice unit, the sound Section feature includes the pronunciation of each basic voice unit and the pronunciation of entire speech unit sequence.
10. Speech Assessment device according to claim 7, it is characterised in that:
The process that sample voice compares and analyzes includes with imparting knowledge to students,
The teaching sample voice saved in acquisition system;
Basic voice unit division is carried out to teaching sample voice, obtains the basic voice unit and voice list of teaching sample voice Metasequence;
Extract the musical note feature of teaching speech unit sequence, the musical note feature and user speech list of the teaching speech unit sequence The musical note feature of metasequence is corresponding;
The musical note feature of user speech unit sequence and the musical note feature of teaching speech unit sequence are compared, provided corresponding Evaluation result.
11. Speech Assessment device according to claim 7, it is characterised in that:
The process of Speech Assessment is carried out using voice prediction model, including,
Basic voice unit division is carried out to the user speech recorded, corresponding musical note to be tested and assessed is extracted from speech unit sequence Feature;
Corresponding prediction model is loaded for different musical note features, predicts corresponding standard pronunciation;
The musical note feature of the musical note feature of user speech and standard pronunciation is compared, corresponding evaluation result is obtained.
12. Speech Assessment device according to claim 7, it is characterised in that:
Speech comparison result annotation process specifically includes,
The user speech that will be recorded, is converted into speech text;
By the received pronunciation comparison of the evaluation result of teaching sample voice comparison obtained and voice prediction model prediction Evaluation result is labeled on the speech text respectively using visual mode, is shown to user.
13. Speech Assessment device according to claim 7, it is characterised in that:
The Speech Assessment device further includes display module, for showing the user speech text for having Speech Assessment result mark Show to user.
14. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the side such as any one of claim 1-6 may be implemented when executing described program in the processor Method step.
15. a kind of computer storage medium, which stores the programs that can be computer-executed, can be real when executing described program Now such as the method and step of any one of claim 1-6.
CN201710996819.1A 2017-10-20 2017-10-20 Voice evaluation method and device Active CN109697988B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710996819.1A CN109697988B (en) 2017-10-20 2017-10-20 Voice evaluation method and device
PCT/CN2017/111822 WO2019075828A1 (en) 2017-10-20 2017-11-20 Voice evaluation method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710996819.1A CN109697988B (en) 2017-10-20 2017-10-20 Voice evaluation method and device

Publications (2)

Publication Number Publication Date
CN109697988A true CN109697988A (en) 2019-04-30
CN109697988B CN109697988B (en) 2021-05-14

Family

ID=66172985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710996819.1A Active CN109697988B (en) 2017-10-20 2017-10-20 Voice evaluation method and device

Country Status (2)

Country Link
CN (1) CN109697988B (en)
WO (1) WO2019075828A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110534100A (en) * 2019-08-27 2019-12-03 北京海天瑞声科技股份有限公司 A kind of Chinese speech proofreading method and device based on speech recognition
CN110910687A (en) * 2019-12-04 2020-03-24 深圳追一科技有限公司 Teaching method and device based on voice information, electronic equipment and storage medium
CN111081080A (en) * 2019-05-29 2020-04-28 广东小天才科技有限公司 Voice detection method and learning device
CN112767932A (en) * 2020-12-11 2021-05-07 北京百家科技集团有限公司 Voice evaluation system, method, device, equipment and computer readable storage medium
CN113053409A (en) * 2021-03-12 2021-06-29 科大讯飞股份有限公司 Audio evaluation method and device
CN113192494A (en) * 2021-04-15 2021-07-30 辽宁石油化工大学 Intelligent English language identification and output system and method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060057545A1 (en) * 2004-09-14 2006-03-16 Sensory, Incorporated Pronunciation training method and apparatus
CN101246685A (en) * 2008-03-17 2008-08-20 清华大学 Pronunciation quality evaluation method of computer auxiliary language learning system
CN101739870A (en) * 2009-12-03 2010-06-16 深圳先进技术研究院 Interactive language learning system and method
CN103514765A (en) * 2013-10-28 2014-01-15 苏州市思玛特电力科技有限公司 Language teaching assessment method
CN103559894A (en) * 2013-11-08 2014-02-05 安徽科大讯飞信息科技股份有限公司 Method and system for evaluating spoken language
CN203773766U (en) * 2014-04-10 2014-08-13 滕坊坪 Language learning machine
CN105825852A (en) * 2016-05-23 2016-08-03 渤海大学 Oral English reading test scoring method
CN106971647A (en) * 2017-02-07 2017-07-21 广东小天才科技有限公司 Spoken language training method and system combining body language
CN107067834A (en) * 2017-03-17 2017-08-18 麦片科技(深圳)有限公司 Point-of-reading system with oral evaluation function

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7219059B2 (en) * 2002-07-03 2007-05-15 Lucent Technologies Inc. Automatic pronunciation scoring for language learning
CN100514446C (en) * 2004-09-16 2009-07-15 北京中科信利技术有限公司 Pronunciation evaluating method based on voice identification and voice analysis
US20150287339A1 (en) * 2014-04-04 2015-10-08 Xerox Corporation Methods and systems for imparting training
CN103928023B (en) * 2014-04-29 2017-04-05 广东外语外贸大学 A kind of speech assessment method and system
CN104732977B (en) * 2015-03-09 2018-05-11 广东外语外贸大学 A kind of online spoken language pronunciation quality evaluating method and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060057545A1 (en) * 2004-09-14 2006-03-16 Sensory, Incorporated Pronunciation training method and apparatus
CN101246685A (en) * 2008-03-17 2008-08-20 清华大学 Pronunciation quality evaluation method of computer auxiliary language learning system
CN101739870A (en) * 2009-12-03 2010-06-16 深圳先进技术研究院 Interactive language learning system and method
CN103514765A (en) * 2013-10-28 2014-01-15 苏州市思玛特电力科技有限公司 Language teaching assessment method
CN103559894A (en) * 2013-11-08 2014-02-05 安徽科大讯飞信息科技股份有限公司 Method and system for evaluating spoken language
CN203773766U (en) * 2014-04-10 2014-08-13 滕坊坪 Language learning machine
CN105825852A (en) * 2016-05-23 2016-08-03 渤海大学 Oral English reading test scoring method
CN106971647A (en) * 2017-02-07 2017-07-21 广东小天才科技有限公司 Spoken language training method and system combining body language
CN107067834A (en) * 2017-03-17 2017-08-18 麦片科技(深圳)有限公司 Point-of-reading system with oral evaluation function

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李超雷: "《博士学位论文》", 30 October 2013, 中国科学院研究生院 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111081080A (en) * 2019-05-29 2020-04-28 广东小天才科技有限公司 Voice detection method and learning device
CN110534100A (en) * 2019-08-27 2019-12-03 北京海天瑞声科技股份有限公司 A kind of Chinese speech proofreading method and device based on speech recognition
CN110910687A (en) * 2019-12-04 2020-03-24 深圳追一科技有限公司 Teaching method and device based on voice information, electronic equipment and storage medium
CN112767932A (en) * 2020-12-11 2021-05-07 北京百家科技集团有限公司 Voice evaluation system, method, device, equipment and computer readable storage medium
CN113053409A (en) * 2021-03-12 2021-06-29 科大讯飞股份有限公司 Audio evaluation method and device
CN113053409B (en) * 2021-03-12 2024-04-12 科大讯飞股份有限公司 Audio evaluation method and device
CN113192494A (en) * 2021-04-15 2021-07-30 辽宁石油化工大学 Intelligent English language identification and output system and method

Also Published As

Publication number Publication date
WO2019075828A1 (en) 2019-04-25
CN109697988B (en) 2021-05-14

Similar Documents

Publication Publication Date Title
O’Brien et al. Directions for the future of technology in pronunciation research and teaching
Agarwal et al. A review of tools and techniques for computer aided pronunciation training (CAPT) in English
CN109801193B (en) Follow-up teaching system with voice evaluation function
CN109697988A (en) A kind of Speech Assessment Methods and device
US6397185B1 (en) Language independent suprasegmental pronunciation tutoring system and methods
CN102360543B (en) HMM-based bilingual (mandarin-english) TTS techniques
Weinberger et al. The Speech Accent Archive: towards a typology of English accents
US9449522B2 (en) Systems and methods for evaluating difficulty of spoken text
JP4391109B2 (en) Automatic Pronunciation Symbol Labeling Method and Automatic Pronunciation Symbol Labeling System for Pronunciation Correction
CN109858038A (en) A kind of text punctuate determines method and device
Cucchiarini et al. Second language learners' spoken discourse: Practice and corrective feedback through automatic speech recognition
CN109697975B (en) Voice evaluation method and device
Matusevych et al. Evaluating computational models of infant phonetic learning across languages
CN104700831B (en) The method and apparatus for analyzing the phonetic feature of audio file
CN110647613A (en) Courseware construction method, courseware construction device, courseware construction server and storage medium
Ai Automatic pronunciation error detection and feedback generation for call applications
Larabi-Marie-Sainte et al. A new framework for Arabic recitation using speech recognition and the Jaro Winkler algorithm
Marujo et al. Porting REAP to European Portuguese.
Lounis et al. Mispronunciation detection and diagnosis using deep neural networks: a systematic review
Dielen Improving the Automatic Speech Recognition Model Whisper with Voice Activity Detection
Dong et al. The application of big data to improve pronunciation and intonation evaluation in foreign language learning
Lobanov et al. On a way to the computer aided speech intonation training
Yuwan et al. Automatic extraction phonetically rich and balanced verses for speaker-dependent quranic speech recognition system
Zhang et al. Cognitive state classification in a spoken tutorial dialogue system
Nakamura et al. Objective evaluation of English learners' timing control based on a measure reflecting perceptual characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 518000 Jianda Industrial Park, Xin'an Street, Baoan District, Shenzhen City, Guangdong Province, 202B, 2nd floor, 1 building

Applicant after: Shenzhen Yingshuo Education Service Co.,Ltd.

Address before: 518100 Guangdong city of Shenzhen province Baoan District Xin'an three industrial zone 1 road Cantor Fitzgerald building two floor 202B

Applicant before: SHENZHEN YINGSHUO AUDIO TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
CB02 Change of applicant information

Address after: 518000 Jianda Industrial Park, Xin'an Street, Baoan District, Shenzhen City, Guangdong Province, 202B, 2nd floor, 1 building

Applicant after: Shenzhen YINGSHUO Education Service Co.,Ltd.

Address before: 518000 Jianda Industrial Park, Xin'an Street, Baoan District, Shenzhen City, Guangdong Province, 202B, 2nd floor, 1 building

Applicant before: Shenzhen Yingshuo Education Service Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Room 301, building D, Hongwei Industrial Zone, No.6 Liuxian 3rd road, Xingdong community, Xin'an street, Bao'an District, Shenzhen City, Guangdong Province

Patentee after: Shenzhen Yingshuo Intelligent Technology Co.,Ltd.

Address before: 518000 202b, 2nd floor, building 1, Jianda Industrial Park, Xin'an street, Bao'an District, Shenzhen City, Guangdong Province

Patentee before: Shenzhen YINGSHUO Education Service Co.,Ltd.

CP03 Change of name, title or address