CN109697988B - Voice evaluation method and device - Google Patents

Voice evaluation method and device Download PDF

Info

Publication number
CN109697988B
CN109697988B CN201710996819.1A CN201710996819A CN109697988B CN 109697988 B CN109697988 B CN 109697988B CN 201710996819 A CN201710996819 A CN 201710996819A CN 109697988 B CN109697988 B CN 109697988B
Authority
CN
China
Prior art keywords
voice
user
speech
evaluation
unit sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710996819.1A
Other languages
Chinese (zh)
Other versions
CN109697988A (en
Inventor
卢炀
宾晓皎
李明
蔡泽鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yingshuo Intelligent Technology Co ltd
Original Assignee
Shenzhen Eaglesoul Education Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Eaglesoul Education Service Co Ltd filed Critical Shenzhen Eaglesoul Education Service Co Ltd
Priority to CN201710996819.1A priority Critical patent/CN109697988B/en
Priority to PCT/CN2017/111822 priority patent/WO2019075828A1/en
Publication of CN109697988A publication Critical patent/CN109697988A/en
Application granted granted Critical
Publication of CN109697988B publication Critical patent/CN109697988B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/04Electrically-operated educational appliances with audible presentation of the material to be studied
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention provides a voice evaluation method for evaluating the language pronunciation of a user in the language learning process, which comprises the following steps: step S101, acquiring voice input of a user through a recording device of a voice evaluation device; step S102, performing basic voice unit division on the recorded voice to obtain a voice unit sequence of the recorded voice; step S103, extracting the characteristics of the voice unit sequence to obtain the temperament characteristics of the voice unit sequence; step S104, comparing and analyzing the extracted temperament characteristics with teaching example voices and standard voices predicted by a voice prediction model respectively; and step S105, marking the voice comparison result on the voice text of the user.

Description

Voice evaluation method and device
Technical Field
The invention relates to the technical field of multimedia teaching, in particular to a voice evaluation method and device for multimedia teaching.
Background
Language is a very important place in life and work as a communication tool, and spoken language learning is a very important learning content of people no matter students are in the learning stage of schools or the working stage of people. With the continuous popularization of network teaching, the network teaching mode is not restricted by time and teaching places, so that the network teaching mode is popular with the majority of users. Therefore, many users prefer to use leisure time to learn languages through the network.
In the current network teaching process, when pronunciation practice is performed, one way is to give a free time after a piece of voice is played in video (or audio) and perform follow-up reading practice by the user; or a recording mode is adopted, the recording is played to the student after the student follows the reading, and the student self-evaluates whether the pronunciation is accurate or not; or the teacher can also carry out on-line teaching and give guidance and suggestions for the pronunciation of the student. The existing teaching mode can not give targeted guidance suggestions according to the pronunciation of the student, so that the learning effect is poor, or on-line teaching of a teacher is needed, and a large amount of manpower, material resources and financial resources are needed for supporting.
To solve the above problem, it is proposed to evaluate the voice of a learner based on a voice prediction model. CN101197084A discloses an automatic spoken English evaluation learning system, which is characterized in that the system comprises a spoken language pronunciation detection part, and the spoken language pronunciation detection part comprises the following steps: and (1) establishing a standard pronunciation human corpus: 1) searching English standard speakers; 2) designing a first recording text according to the oral English learning requirement and the phoneme balance principle; 3) recording the standard speaker by contrasting the recording text; and [ 2 ] collection of spoken language evaluation corpus: under the application environment of simulated English learning software, designing a second recording text according to the English learning requirement, searching for a common speaker, and recording the spoken pronunciation of the common speaker; and [ 3 ] labeling of spoken language evaluation corpus: the expert details whether the pronunciation of the phoneme in each word is correct; and (4) establishing a standard voice acoustic model: training an acoustic model of standard voice based on the recording in the standard pronunciation person corpus and the associated text thereof; calculating error detection parameters of the speech: 1) extracting a Mel cepstrum coefficient parameter of the voice; 2) based on a standard acoustic model and evaluating the general speaker recording in a corpus and a phoneme sequence corresponding to a text thereof, automatically cutting the general speaker voice data into segments taking phonemes as units, and meanwhile, calculating based on the standard model to obtain each segment as a first likelihood value of the phoneme; 3) recognizing each sound segment of the voice of the general speaker by using a standard acoustic model, and calculating a second likelihood value of the sound segment serving as a recognition result phoneme based on the standard acoustic model; 4) dividing the first likelihood value of the voice segment by the second likelihood value to obtain the likelihood ratio of the voice segment, and using the likelihood ratio as an error detection parameter of the voice segment; and (6) establishing an error detection mapping model for marking pronunciation errors to experts by error detection parameters: on a batch of evaluation voices, correlating each segment evaluation parameter and formant sequence of the segment with detailed labels of experts, obtaining corresponding relations between the parameters and the detailed labels of the experts by using a statistical method, and storing the relations as error detection mapping models from error detection parameters to pronunciation error labels of the experts.
CN101650886A discloses a method for automatically detecting reading errors of language learners, which is characterized by comprising the following steps: 1) front-end processing: preprocessing input voice, and extracting features, wherein the extracted features are MFCC feature vectors; 2) constructing a simplified search space: taking contents to be read by a user as a reference answer, and constructing a simplified search space according to the reference answer, a pronunciation dictionary, a multi-pronunciation model and an acoustic model; 3) constructing a reading language model: constructing a reading language model of the user according to the reference answer, wherein the language model describes context content and probability information which are possibly read when the user reads the reference sentence; 4) searching: in a search space, searching according to an acoustic model, a reading language model and a multi-pronunciation model to obtain a path which is most matched with the input feature vector stream, and taking the path as the actual reading result content of the user to make an identification result sequence; 5) alignment: and aligning the reference answer with the recognition result to obtain the detection results of multi-reading, missing reading and misreading of the user.
In the prior art, a voice recognition system is used for acquiring voice fragments corresponding to each basic voice unit in a voice signal, the acquired voice fragments are fused to obtain an effective voice fragment sequence corresponding to the voice signal, an evaluation feature is extracted from the effective voice fragment sequence, and a scoring prediction model corresponding to the feature type of the evaluation feature is loaded; and calculating the similarity of the evaluation features corresponding to the scoring prediction model, and taking the similarity as the score of the voice signal. However, when a user actually learns a language, the user often learns pronunciation according to a teacher's voice example in a teaching video (audio), and the teacher's voice example cannot be completely consistent with a standard pronunciation predicted by a voice prediction model due to individuation. Therefore, the pronunciation of the user is evaluated by the voice prediction model, and the predicted standard pronunciation is often not completely consistent with the teaching voice example in some aspects (such as tone and rhythm), so that the evaluation result is the comparison result of the user voice and the predicted voice and cannot truly reflect the comparison result of the user voice and the teaching voice example.
Therefore, it is necessary to provide a speech evaluation method that gives an evaluation result evaluated by a speech prediction model and also gives an evaluation result compared with a teaching speech example, so that a user can comprehensively understand his or her learning situation.
Disclosure of Invention
Therefore, the technical problem to be solved by the invention is how to simultaneously provide the evaluation result compared with the teaching example voice and the evaluation result compared with the standard voice predicted by the voice prediction model for the user in the language learning process so as to help the user to comprehensively know the self learning condition.
Therefore, the invention provides a voice evaluation method for evaluating the language pronunciation of a user in the language learning process, which is characterized in that:
step S101, acquiring voice input of a user through a recording device of a voice evaluation device;
step S102, performing basic voice unit division on the recorded voice to obtain a voice unit sequence of the recorded voice;
step S103, extracting the characteristics of the voice unit sequence to obtain the temperament characteristics of the voice unit sequence;
step S104, comparing and analyzing the extracted temperament characteristics with teaching example voices and standard voices predicted by a voice prediction model respectively;
and step S105, marking the voice comparison result on the voice text of the user.
The basic speech unit may be a syllable, a phoneme, etc., and the basic speech unit and the speech unit sequence of the recorded speech are obtained by dividing the recorded speech.
The rhythm characteristics comprise rhythm characteristics and syllable characteristics, wherein the rhythm characteristics comprise boundary characteristics, pronunciation duration, pause time between adjacent basic voice units and pronunciation duration of the whole voice unit sequence of each basic voice unit, and the syllable characteristics comprise pronunciation of each basic voice unit and pronunciation of the whole voice unit sequence.
The process of performing comparative analysis with the teaching example speech includes:
acquiring teaching example voice stored in a system;
dividing basic voice units of the teaching example voice to obtain basic voice units and voice unit sequences of the teaching example voice;
extracting the voice rhythm characteristics of the teaching voice unit sequence, wherein the voice rhythm characteristics of the teaching voice unit sequence correspond to the voice rhythm characteristics of the user voice unit sequence;
and comparing the voice rhythm characteristics of the user voice unit sequence with the voice rhythm characteristics of the teaching voice unit sequence, and giving a corresponding evaluation result.
The process of evaluating the voice by using the voice prediction model comprises the following steps:
dividing the recorded user voice into basic voice units, and extracting corresponding to-be-tested voice evaluation rule features from the voice unit sequence;
loading corresponding prediction models for different temperament characteristics, and predicting corresponding standard pronunciations;
and comparing the voice rhythm characteristics of the user voice with the voice rhythm characteristics of the standard pronunciation to obtain a corresponding evaluation result.
The process for labeling the voice comparison result specifically comprises the following steps:
converting the recorded user voice into a voice text;
and respectively marking the obtained evaluation result of the teaching example voice comparison and the evaluation result of the standard voice comparison predicted by the voice prediction model on the voice text in a visual mode, and displaying the evaluation results to a user.
The invention also provides a voice evaluation device, which comprises a recording module, a storage module, a voice processing module, a feature extraction module, a voice analysis module, an evaluation module, a labeling module and a display module, and is characterized in that:
the recording module is used for acquiring voice input of a user;
the voice processing module is used for dividing the recorded voice into basic voice units to obtain a voice unit sequence of the recorded voice;
the characteristic extraction module is used for extracting the characteristics of the voice unit sequence to obtain the temperament characteristics of the voice unit sequence;
the voice analysis module is used for respectively comparing and analyzing the extracted temperament characteristics with the teaching example voice and the standard voice predicted by the voice prediction model;
and the marking module is used for marking the voice evaluation result on the voice text of the user.
The voice evaluation device also comprises a display module which is used for displaying the user voice text with the voice evaluation result label to the user.
The voice evaluation method and the voice evaluation device provided by the invention have the advantages that the evaluation results of the user voice and the teaching example voice and the evaluation result of the standard voice predicted by the voice prediction model are provided for the user at the same time, so that the user can fully know the pronunciation condition of the user, and the pronunciation accuracy is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the contents of the embodiments of the present invention and the drawings without creative efforts.
FIG. 1 is a flow diagram of a speech evaluation method according to an embodiment of the invention; and
fig. 2 is a structural diagram of a voice evaluation apparatus according to an embodiment of the present invention.
Detailed Description
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel, concurrently, or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure.
The "voice evaluation device" is a "computer device" in this context, and refers to an intelligent electronic device that can execute a predetermined processing procedure such as numerical calculation and/or logic calculation by running a predetermined program or instruction, and may include a processor and a memory, where the processor executes a pre-stored instruction stored in the memory to execute the predetermined processing procedure, or the processor executes the predetermined processing procedure by hardware such as ASIC, FPGA, DSP, or a combination thereof.
The computer device comprises user equipment and/or network equipment. Wherein the user equipment includes but is not limited to computers, smart phones, PDAs, etc.; the network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of computers or network servers, wherein Cloud Computing is one of distributed Computing, a super virtual computer consisting of a collection of loosely coupled computers. Wherein the computer device can be operated alone to implement the invention, or can be accessed to a network and implement the invention through interoperation with other computer devices in the network. The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
Those skilled in the art should understand that the "voice evaluation device" described in the present invention may be only a user equipment, that is, the user equipment performs corresponding operations; or the user equipment and the network equipment or the server are integrated to form the system, namely the user equipment and the network equipment are matched to execute corresponding operations.
It should be noted that the user equipment, the network device, the network, etc. are only examples, and other existing or future computer devices or networks may also be included in the scope of the present invention, and are included by reference.
Here, it should be understood by those skilled in the art that the present invention can be applied to both mobile terminals and non-mobile terminals, for example, when a user uses a mobile phone or a PC, the method or the apparatus of the present invention can be used for providing and presenting.
Specific structural and functional details disclosed herein are merely representative and are provided for purposes of describing example embodiments of the present invention. The present invention may, however, be embodied in many alternate forms and should not be construed as limited to only the embodiments set forth herein.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be noted that, in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed substantially concurrently, or the figures may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
The present invention is described in further detail below with reference to the attached drawing figures.
FIG. 1 shows a flow chart of a speech assessment method of the present invention.
In step S101, in a spoken language follow-up reading process of language learning, a user records a voice input of the user through a recording device of a voice evaluation apparatus.
Specifically, after learning the voice examples in the teaching courseware, the user enters a follow-up reading link, and at the moment, the recording equipment in the voice evaluation device is triggered to enter a recording state. When the user starts to follow the voice example, the recording device starts to record the voice of the user, and the follow-up voice of the user is stored in a storage module of the voice evaluation device for further analysis and use.
In step S102, the user follow-up reading voice recorded in the storage module is acquired, and basic voice unit division is performed on the recorded voice to obtain a voice unit sequence of the recorded user follow-up reading voice.
The basic speech unit may be a syllable, a phoneme, etc., and the basic speech unit and the speech unit sequence of the recorded speech are obtained by dividing the recorded speech.
Different speech recognition systems will decode the speech signal based on different acoustic features, such as acoustic models based on MFCC (Mel-Frequency Cepstrum Coefficients, Mel-Frequency cepstral Coefficients) features, acoustic models based on PLP (Perceptual Linear prediction) features, etc., or using different acoustic models, such as HMM-gmkov Model-Gaussian Mixture Model, Hidden Markov Model-Gaussian Mixture Model, DBN (Dynamic beyesian network), etc., or using different decoding means, such as Viterbi search, a search, etc.
And step S103, extracting the characteristics of the voice unit sequence to obtain the temperament characteristics of the voice unit sequence.
The rhythm characteristics comprise rhythm characteristics and syllable characteristics, wherein the rhythm characteristics comprise boundary characteristics, pronunciation duration, pause time between adjacent basic voice units and pronunciation duration of the whole voice unit sequence of each basic voice unit, and the syllable characteristics comprise pronunciation of each basic voice unit and pronunciation of the whole voice unit sequence.
And step S104, comparing and analyzing the extracted temperament characteristics with the teaching example speech and the standard speech predicted by the speech prediction model respectively.
The process of comparing and analyzing the teaching example voice comprises the following steps of obtaining the teaching example voice stored in the system, dividing the teaching example voice into basic voice units to obtain the basic voice units and the voice unit sequences of the teaching example voice, and further extracting the rhythm characteristics of the teaching voice unit sequences, wherein the rhythm characteristics of the teaching voice unit sequences correspond to the rhythm characteristics of the user voice unit sequences. And comparing the voice rhythm characteristics of the user voice unit sequence with the voice rhythm characteristics of the teaching voice unit sequence, and giving a corresponding evaluation result.
The method for evaluating the voice by utilizing the voice prediction model can adopt the existing voice evaluation technology, namely, the basic voice unit division is carried out on the recorded user voice, corresponding to-be-evaluated voice rhythm characteristics are extracted from a voice unit sequence, the corresponding prediction models are loaded for different voice rhythm characteristics, corresponding standard pronunciations are predicted, and then the voice rhythm characteristics of the user voice are compared with the voice rhythm characteristics of the standard pronunciations to obtain corresponding evaluation results.
And step S105, marking the voice comparison result on the voice text of the user and providing the voice comparison result for the user.
In this step, the recorded user speech is further converted into a speech text by a speech processing module. And respectively marking the evaluation result obtained in the step S104 and compared with the teaching example voice and the evaluation result of the standard voice predicted by the voice prediction model on the voice text in a visual mode, and displaying the evaluation results to the user. The user can know the difference between the pronunciation of the user and the pronunciation of the teaching example and the difference between the pronunciation of the user and the pronunciation of the standard speech predicted by the speech prediction model through the displayed evaluation result, so that the user can comprehensively know the problems of the pronunciation of the read text and further improve the pronunciation standard type. The comparison result can comprise pronunciation evaluation of the basic voice unit, pronunciation duration evaluation of the basic voice unit, full text fluency evaluation and the like.
Fig. 2 shows a speech evaluation device according to an embodiment of the invention. The voice evaluation device is used for realizing the voice evaluation method, and simultaneously providing the evaluation result of the teaching example voice and the evaluation result of the standard voice predicted by the voice prediction model to the user after the user reads the following spoken language. The voice evaluation device comprises a recording module 1, a storage module 2, a voice processing module 3, a feature extraction module 4, a voice analysis module 5, a labeling module 6 and a display module 7.
In the following reading link of the spoken language for language learning, the user records the voice input of the user through the recording module 1 of the voice evaluation device.
Specifically, after learning the voice examples in the teaching courseware, the user enters a follow-up reading link and triggers the recording module 1 in the voice evaluation device to enter a recording state. When the user starts to follow the voice example, the recording module 1 starts to record the voice of the user, and stores the follow-up voice of the user in the storage module 2 of the voice evaluation device for further analysis.
The voice processing module 3 obtains the user follow-up reading voice recorded in the storage module 2, and performs basic voice unit division on the recorded voice.
The basic speech unit may be a syllable, a phoneme, etc., and the basic speech unit and the speech unit sequence of the recorded speech are obtained by dividing the recorded speech.
After the voice processing module 3 divides the basic voice unit of the recorded voice, the feature extraction module 4 further extracts features of the generated voice unit sequence to obtain the temperament features of the voice unit sequence.
The rhythm characteristics comprise rhythm characteristics and syllable characteristics, wherein the rhythm characteristics comprise boundary characteristics, pronunciation duration, pause time between adjacent basic voice units and pronunciation duration of the whole voice unit sequence of each basic voice unit, and the syllable characteristics comprise pronunciation of each basic voice unit and pronunciation of the whole voice unit sequence.
And the voice analysis module 5 compares and analyzes the extracted voice rhythm characteristics with the teaching example voice and the standard voice predicted by the voice prediction model respectively.
The process of comparing and analyzing the teaching example voice is as follows, the voice analysis module 5 obtains the teaching example voice stored in the storage module 2, and performs basic voice unit division on the teaching example voice, so as to obtain a basic voice unit and a voice unit sequence of the teaching example voice, and further extract the rhythm characteristics of the teaching voice unit sequence, wherein the rhythm characteristics of the teaching voice unit sequence correspond to those of the user voice unit sequence. And comparing the voice rhythm characteristics of the user voice unit sequence with the voice rhythm characteristics of the teaching voice unit sequence, and giving a corresponding evaluation result.
The method for evaluating the voice by utilizing the voice prediction model can adopt the existing voice evaluation technology, namely, the basic voice unit division is carried out on the recorded user voice, corresponding to-be-evaluated voice rhythm characteristics are extracted from a voice unit sequence, the corresponding prediction models are loaded for different voice rhythm characteristics, corresponding standard pronunciations are predicted, and then the voice rhythm characteristics of the user voice are compared with the voice rhythm characteristics of the standard pronunciations to obtain corresponding evaluation results.
The marking module 6 marks the voice comparison result on the voice of the user and provides the voice comparison result to the user through the display module 7.
Specifically, the recorded user voice is further converted into a voice text through the voice processing module 3. And respectively marking the evaluation result of the comparison with the teaching example voice, which is obtained by analyzing the voice analysis module 5, and the evaluation result of the standard voice comparison predicted by the voice prediction model on the voice text in a visual mode, and displaying the evaluation results to a user through a display module 7. The user can know the difference between the pronunciation of the user and the pronunciation of the teaching example and the difference between the pronunciation of the user and the pronunciation of the standard speech predicted by the speech prediction model through the displayed evaluation result, so that the user can comprehensively know the problems of the pronunciation of the read text and further improve the pronunciation standard type. The comparison result can comprise pronunciation evaluation of the basic voice unit, pronunciation duration evaluation of the basic voice unit, full text fluency evaluation and the like.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware as instructed by a computer program, which may be stored on a computer readable storage medium and executed by a processor. The computer-readable storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
The foregoing describes preferred embodiments of the present invention, and is intended to provide a clear and concise description of the spirit and scope of the invention, and not to limit the same, but to include all modifications, substitutions, and alterations falling within the spirit and scope of the invention as defined by the appended claims.

Claims (9)

1. A speech evaluation method for evaluating a user's speech pronunciation during language learning, comprising:
step S101, in a spoken language follow-up reading link of language learning, a user obtains voice input of the user through a recording device of a voice evaluation device; after learning the voice examples in the teaching courseware, the user enters a follow-up reading link, at the moment, a recording device in the voice evaluation device is triggered to enter a recording state, when the user starts to follow up reading the voice examples, the recording device starts to record the voice of the user, and the follow-up reading voice of the user is stored in a storage module of the voice evaluation device;
step S102, acquiring the user follow-up reading voice recorded in the storage module, and dividing the recorded voice into basic voice units to obtain a voice unit sequence of the recorded voice;
step S103, extracting the characteristics of the voice unit sequence to obtain the temperament characteristics of the voice unit sequence;
step S104, comparing and analyzing the extracted temperament characteristics with teaching example voices and standard voices predicted by a voice prediction model respectively;
step S105, marking a voice comparison result on a user voice text;
in step S104, the process of performing comparative analysis with the teaching example speech includes:
acquiring teaching example voice stored in a system;
dividing basic voice units of the teaching example voice to obtain basic voice units and voice unit sequences of the teaching example voice;
extracting the voice rhythm characteristics of the teaching voice unit sequence, wherein the voice rhythm characteristics of the teaching voice unit sequence correspond to the voice rhythm characteristics of the user voice unit sequence;
comparing the voice rhythm characteristics of the user voice unit sequence with the voice rhythm characteristics of the teaching voice unit sequence to give a corresponding evaluation result;
the process of comparing and analyzing the standard speech predicted by the speech prediction model comprises the following steps:
dividing the recorded user voice into basic voice units, and extracting corresponding to-be-tested voice evaluation rule features from the voice unit sequence;
loading corresponding prediction models for different temperament characteristics, and predicting corresponding standard voices;
comparing the voice rhythm characteristics of the user voice with the voice rhythm characteristics of the standard voice to obtain corresponding evaluation results;
the process for labeling the voice comparison result specifically comprises the following steps:
converting the recorded user voice into a voice text through a voice processing module;
respectively marking the obtained evaluation result of the teaching example voice comparison and the evaluation result of the standard voice comparison predicted by the voice prediction model on the voice text in a visual mode, and displaying the evaluation results to a user;
the user knows, from the displayed evaluation results, the difference of the pronunciation thereof from that of the tutorial sample speech and from that of the standard speech predicted by the speech prediction model.
2. The speech evaluation method according to claim 1, characterized in that:
the basic voice unit is syllable and phoneme, and the basic voice unit and the voice unit sequence of the recorded voice are obtained by dividing the recorded voice.
3. The speech evaluation method according to claim 1, characterized in that:
the prosody features comprise boundary features, pronunciation duration, pause time between adjacent basic voice units and pronunciation duration of the whole voice unit sequence;
the syllable characteristics include the pronunciation of each basic speech unit and the pronunciation of the entire sequence of speech units.
4. The utility model provides a voice evaluation device, voice evaluation device includes recording module, storage module, speech processing module, feature extraction module, speech analysis module and mark module, its characterized in that:
the recording module is used for acquiring the voice input of the user in the spoken language follow-up reading link of the language learning of the user; after learning the voice examples in the teaching courseware, the user enters a follow-up reading link, at the moment, a recording device in the voice evaluation device is triggered to enter a recording state, when the user starts to follow up reading the voice examples, the recording device starts to record the voice of the user, and the follow-up reading voice of the user is stored in a storage module of the voice evaluation device;
the voice processing module is used for acquiring the user follow-up reading voice recorded in the storage module, and performing basic voice unit division on the recorded voice to obtain a voice unit sequence of the recorded voice;
the characteristic extraction module is used for extracting the characteristics of the voice unit sequence to obtain the temperament characteristics of the voice unit sequence;
the voice analysis module is used for respectively comparing and analyzing the extracted temperament characteristics with the teaching example voice and the standard voice predicted by the voice prediction model;
the marking module is used for marking the voice evaluation result on the voice text of the user;
wherein, among the speech analysis module, the process of carrying out contrastive analysis with teaching example pronunciation includes:
acquiring teaching example voice stored in a system;
dividing basic voice units of the teaching example voice to obtain basic voice units and voice unit sequences of the teaching example voice;
extracting the voice rhythm characteristics of the teaching voice unit sequence, wherein the voice rhythm characteristics of the teaching voice unit sequence correspond to the voice rhythm characteristics of the user voice unit sequence;
comparing the voice rhythm characteristics of the user voice unit sequence with the voice rhythm characteristics of the teaching voice unit sequence to give a corresponding evaluation result;
the process of comparing and analyzing the standard speech predicted by the speech prediction model comprises the following steps:
dividing the recorded user voice into basic voice units, and extracting corresponding to-be-tested voice evaluation rule features from the voice unit sequence;
loading corresponding prediction models for different temperament characteristics, and predicting corresponding standard voices;
comparing the voice rhythm characteristics of the user voice with the voice rhythm characteristics of the standard voice to obtain corresponding evaluation results;
the process for labeling the voice comparison result specifically comprises the following steps:
converting the recorded user voice into a voice text through a voice processing module;
respectively marking the obtained evaluation result of the teaching example voice comparison and the evaluation result of the standard voice comparison predicted by the voice prediction model on the voice text in a visual mode, and displaying the evaluation results to a user;
the user knows, from the displayed evaluation results, the difference of the pronunciation thereof from that of the tutorial sample speech and from that of the standard speech predicted by the speech prediction model.
5. The speech evaluation device according to claim 4, characterized in that:
the basic voice unit is syllable and phoneme, and the basic voice unit and the voice unit sequence of the recorded voice are obtained by dividing the recorded voice.
6. The speech evaluation device according to claim 4, characterized in that:
the rhythm characteristics comprise rhythm characteristics and syllable characteristics, wherein the rhythm characteristics comprise boundary characteristics, pronunciation duration, pause time between adjacent basic voice units and pronunciation duration of the whole voice unit sequence of each basic voice unit, and the syllable characteristics comprise pronunciation of each basic voice unit and pronunciation of the whole voice unit sequence.
7. The speech evaluation device according to claim 4, characterized in that:
the voice evaluation device also comprises a display module which is used for displaying the user voice text with the voice evaluation result label to the user.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method steps of any of claims 1-3 when executing the program.
9. A computer storage medium storing a program executable by a computer, the program when executed implementing the method steps of any of claims 1-3.
CN201710996819.1A 2017-10-20 2017-10-20 Voice evaluation method and device Active CN109697988B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710996819.1A CN109697988B (en) 2017-10-20 2017-10-20 Voice evaluation method and device
PCT/CN2017/111822 WO2019075828A1 (en) 2017-10-20 2017-11-20 Voice evaluation method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710996819.1A CN109697988B (en) 2017-10-20 2017-10-20 Voice evaluation method and device

Publications (2)

Publication Number Publication Date
CN109697988A CN109697988A (en) 2019-04-30
CN109697988B true CN109697988B (en) 2021-05-14

Family

ID=66172985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710996819.1A Active CN109697988B (en) 2017-10-20 2017-10-20 Voice evaluation method and device

Country Status (2)

Country Link
CN (1) CN109697988B (en)
WO (1) WO2019075828A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111081080B (en) * 2019-05-29 2022-05-03 广东小天才科技有限公司 Voice detection method and learning device
CN110534100A (en) * 2019-08-27 2019-12-03 北京海天瑞声科技股份有限公司 A kind of Chinese speech proofreading method and device based on speech recognition
CN110910687A (en) * 2019-12-04 2020-03-24 深圳追一科技有限公司 Teaching method and device based on voice information, electronic equipment and storage medium
CN112767932A (en) * 2020-12-11 2021-05-07 北京百家科技集团有限公司 Voice evaluation system, method, device, equipment and computer readable storage medium
CN113053409B (en) * 2021-03-12 2024-04-12 科大讯飞股份有限公司 Audio evaluation method and device
CN113192494A (en) * 2021-04-15 2021-07-30 辽宁石油化工大学 Intelligent English language identification and output system and method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101246685A (en) * 2008-03-17 2008-08-20 清华大学 Pronunciation quality evaluation method of computer auxiliary language learning system
CN101739870A (en) * 2009-12-03 2010-06-16 深圳先进技术研究院 Interactive language learning system and method
CN103514765A (en) * 2013-10-28 2014-01-15 苏州市思玛特电力科技有限公司 Language teaching assessment method
CN103559894A (en) * 2013-11-08 2014-02-05 安徽科大讯飞信息科技股份有限公司 Method and system for evaluating spoken language
CN203773766U (en) * 2014-04-10 2014-08-13 滕坊坪 Language learning machine
CN105825852A (en) * 2016-05-23 2016-08-03 渤海大学 Oral English reading test scoring method
CN106971647A (en) * 2017-02-07 2017-07-21 广东小天才科技有限公司 A kind of Oral Training method and system of combination body language
CN107067834A (en) * 2017-03-17 2017-08-18 麦片科技(深圳)有限公司 Point-of-reading system with oral evaluation function

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7219059B2 (en) * 2002-07-03 2007-05-15 Lucent Technologies Inc. Automatic pronunciation scoring for language learning
US20060057545A1 (en) * 2004-09-14 2006-03-16 Sensory, Incorporated Pronunciation training method and apparatus
CN100514446C (en) * 2004-09-16 2009-07-15 北京中科信利技术有限公司 Pronunciation evaluating method based on voice identification and voice analysis
US20150287339A1 (en) * 2014-04-04 2015-10-08 Xerox Corporation Methods and systems for imparting training
CN103928023B (en) * 2014-04-29 2017-04-05 广东外语外贸大学 A kind of speech assessment method and system
CN104732977B (en) * 2015-03-09 2018-05-11 广东外语外贸大学 A kind of online spoken language pronunciation quality evaluating method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101246685A (en) * 2008-03-17 2008-08-20 清华大学 Pronunciation quality evaluation method of computer auxiliary language learning system
CN101739870A (en) * 2009-12-03 2010-06-16 深圳先进技术研究院 Interactive language learning system and method
CN103514765A (en) * 2013-10-28 2014-01-15 苏州市思玛特电力科技有限公司 Language teaching assessment method
CN103559894A (en) * 2013-11-08 2014-02-05 安徽科大讯飞信息科技股份有限公司 Method and system for evaluating spoken language
CN203773766U (en) * 2014-04-10 2014-08-13 滕坊坪 Language learning machine
CN105825852A (en) * 2016-05-23 2016-08-03 渤海大学 Oral English reading test scoring method
CN106971647A (en) * 2017-02-07 2017-07-21 广东小天才科技有限公司 A kind of Oral Training method and system of combination body language
CN107067834A (en) * 2017-03-17 2017-08-18 麦片科技(深圳)有限公司 Point-of-reading system with oral evaluation function

Also Published As

Publication number Publication date
WO2019075828A1 (en) 2019-04-25
CN109697988A (en) 2019-04-30

Similar Documents

Publication Publication Date Title
CN109697988B (en) Voice evaluation method and device
CN105845134B (en) Spoken language evaluation method and system for freely reading question types
CN109256152A (en) Speech assessment method and device, electronic equipment, storage medium
CN108389573B (en) Language identification method and device, training method and device, medium and terminal
US9449522B2 (en) Systems and methods for evaluating difficulty of spoken text
CN111402862B (en) Speech recognition method, device, storage medium and equipment
CN109741732A (en) Name entity recognition method, name entity recognition device, equipment and medium
US9489864B2 (en) Systems and methods for an automated pronunciation assessment system for similar vowel pairs
CN103559892A (en) Method and system for evaluating spoken language
Schillingmann et al. AlignTool: The automatic temporal alignment of spoken utterances in German, Dutch, and British English for psycholinguistic purposes
CN109697975B (en) Voice evaluation method and device
CN110647613A (en) Courseware construction method, courseware construction device, courseware construction server and storage medium
CN111798871B (en) Session link identification method, device and equipment and storage medium
Bang et al. An automatic feedback system for English speaking integrating pronunciation and prosody assessments
WO2012152290A1 (en) A mobile device for literacy teaching
Bai Pronunciation Tutor for Deaf Children based on ASR
CN114420159A (en) Audio evaluation method and device and non-transient storage medium
Dong et al. The application of big data to improve pronunciation and intonation evaluation in foreign language learning
Wu et al. Efficient personalized mispronunciation detection of Taiwanese-accented English speech based on unsupervised model adaptation and dynamic sentence selection
CN111341346A (en) Language expression capability evaluation method and system for fusion depth language generation model
Mbogho et al. The impact of accents on automatic recognition of South African English speech: a preliminary investigation
Qin et al. Automatic Detection of Word-Level Reading Errors in Non-native English Speech Based on ASR Output
CN112951276B (en) Method and device for comprehensively evaluating voice and electronic equipment
CN114783412B (en) Spanish spoken language pronunciation training correction method and system
Marie-Sainte et al. A new system for Arabic recitation using speech recognition and Jaro Winkler algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 518000 Jianda Industrial Park, Xin'an Street, Baoan District, Shenzhen City, Guangdong Province, 202B, 2nd floor, 1 building

Applicant after: Shenzhen Yingshuo Education Service Co.,Ltd.

Address before: 518100 Guangdong city of Shenzhen province Baoan District Xin'an three industrial zone 1 road Cantor Fitzgerald building two floor 202B

Applicant before: SHENZHEN YINGSHUO AUDIO TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
CB02 Change of applicant information

Address after: 518000 Jianda Industrial Park, Xin'an Street, Baoan District, Shenzhen City, Guangdong Province, 202B, 2nd floor, 1 building

Applicant after: Shenzhen YINGSHUO Education Service Co.,Ltd.

Address before: 518000 Jianda Industrial Park, Xin'an Street, Baoan District, Shenzhen City, Guangdong Province, 202B, 2nd floor, 1 building

Applicant before: Shenzhen Yingshuo Education Service Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Room 301, building D, Hongwei Industrial Zone, No.6 Liuxian 3rd road, Xingdong community, Xin'an street, Bao'an District, Shenzhen City, Guangdong Province

Patentee after: Shenzhen Yingshuo Intelligent Technology Co.,Ltd.

Address before: 518000 202b, 2nd floor, building 1, Jianda Industrial Park, Xin'an street, Bao'an District, Shenzhen City, Guangdong Province

Patentee before: Shenzhen YINGSHUO Education Service Co.,Ltd.

CP03 Change of name, title or address