WO2020027394A1 - Appareil et procédé pour évaluer la précision de prononciation d'une unité de phonème - Google Patents

Appareil et procédé pour évaluer la précision de prononciation d'une unité de phonème Download PDF

Info

Publication number
WO2020027394A1
WO2020027394A1 PCT/KR2019/000147 KR2019000147W WO2020027394A1 WO 2020027394 A1 WO2020027394 A1 WO 2020027394A1 KR 2019000147 W KR2019000147 W KR 2019000147W WO 2020027394 A1 WO2020027394 A1 WO 2020027394A1
Authority
WO
WIPO (PCT)
Prior art keywords
score
time interval
unit
information
phoneme
Prior art date
Application number
PCT/KR2019/000147
Other languages
English (en)
Korean (ko)
Inventor
윤종성
권용대
홍연정
김서현
조영선
양형원
Original Assignee
미디어젠 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 미디어젠 주식회사 filed Critical 미디어젠 주식회사
Publication of WO2020027394A1 publication Critical patent/WO2020027394A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]

Definitions

  • the present invention relates to a phoneme pronunciation accuracy evaluation device and an evaluation method, and more particularly to improve the problem of the conventional automatic pronunciation evaluation device that provides only the overall pronunciation evaluation score for the spoken speech signal corresponding to a given word or sentence, By providing a score for each phoneme (pronounced), which is a detailed unit of the voice signal, it is possible to feed back not only the overall pronunciation score but also the score for each phoneme (pronounced).
  • the present invention relates to a phoneme unit pronunciation accuracy evaluation device and an evaluation method.
  • Pronunciation is a voice of a language, and there are differences in the characteristics of the pronunciation according to the type and individual of the language.
  • pronunciation characteristics for the same language should be expressed to enable accurate communication with each other, even considering individual differences.
  • This pronunciation correction method has a problem that not only the listening ability of the individual must be preceded, but also difficult to apply to various pronunciations in common.
  • foreign language speaking and conversational learning methods are to go to a language school and learn directly from a foreign lecturer.
  • HMM hidden Markov model
  • the speech recognition system extracts a feature vector in frame units defined by the system for a speech signal that has undergone preprocessing such as frequency subtraction, sound source separation, noise filtering, and the like, and then processes the signal using the extracted feature vector.
  • preprocessing such as frequency subtraction, sound source separation, noise filtering, and the like
  • a first object of the present invention is to provide a conventional automatic pronunciation evaluation apparatus that provides only an overall pronunciation evaluation score for a spoken speech signal corresponding to a given word or sentence.
  • the pronunciation evaluation score and the overall pronunciation evaluation score for each phoneme are provided.
  • the present invention aims to provide a phoneme pronunciation accuracy evaluation device and an evaluation method capable of feeding back not only the overall pronunciation score but also the score of each phoneme (pronounced voice).
  • a second object of the present invention is to provide a score for each phoneme as a value between 0 and 100 points through a web page or a mobile app that is easily accessible to a user.
  • the phonetic unit pronunciation accuracy evaluation device In order to achieve the problem to be solved by the present invention, the phonetic unit pronunciation accuracy evaluation device,
  • the voice information extracting unit 100 obtains voice information pronounced by the learner about the spoken text information and the spoken text information from the learner, divides the obtained voice information into a set time interval unit, and extracts a speech feature vector for each time interval. ;
  • a forced sorting unit 400 forcibly sorting the spoken text information obtained by the voice information extracting unit for each time interval to generate forced sorting result information
  • An adjustment score providing unit 700 providing an adjustment score for each time interval to the score output unit according to whether the speech recognition result information of the speech recognition unit and the forced alignment result information of the forced alignment unit correspond to each time section;
  • a score output unit 800 that calculates an average score for each phoneme of the input voice information based on the adjustment score for each time interval provided from the adjustment score provider or calculates and outputs an overall average score for the input voice information; Characterized in that.
  • Log likelihood calculation step (S400) for calculating log likelihood for each time interval for the forced alignment result information using the voice feature vector for each time interval extracted through the voice information extraction step and the forced alignment result information generated through the forced alignment step (S400) ;
  • the problem of the conventional automatic pronunciation evaluation device that provides only the overall pronunciation evaluation score for the spoken speech signal corresponding to a given word or sentence
  • scores for each phoneme pronoun
  • FIG. 1 is an overall configuration diagram schematically showing an apparatus for evaluating phoneme pronunciation accuracy according to a first embodiment of the present invention.
  • FIG. 2 is an exemplary waveform graph of a speech signal obtained by the phoneme pronunciation pronunciation evaluation apparatus according to the first embodiment of the present invention.
  • FIG. 3 is an exemplary diagram of phoneme average scores calculated by a phoneme pronunciation accuracy evaluation apparatus according to a first exemplary embodiment of the present invention.
  • Figure 4 is an exemplary view showing the average score for each phoneme and the total average score calculated by the phoneme unit pronunciation accuracy evaluation apparatus according to the first embodiment of the present invention.
  • FIG. 5 is an overall flowchart of a phoneme unit pronunciation accuracy evaluation method according to a first embodiment of the present invention
  • first and second may be used to describe various components, but the components may not be limited by the terms.
  • the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.
  • a component When a component is referred to as being connected or connected to another component, it may be understood that it may be directly connected or connected to the other component, but there may be other components in between. .
  • a voice information extracting unit 100 for acquiring utterance text information of the utterance text information and utterance text information from the learner, dividing the acquired speech information by a predetermined time interval unit, and extracting a speech feature vector for each time interval;
  • a forced sorting unit 400 forcibly sorting the spoken text information obtained by the voice information extracting unit for each time interval to generate forced sorting result information
  • a log likelihood score conversion unit 600 for generating a log likelihood conversion score obtained by converting the log likelihood for each time interval of the calculated forced result information into a score between 0 and 100 points;
  • An adjustment score providing unit 700 providing an adjustment score for each time interval to the score output unit according to whether the speech recognition result information of the speech recognition unit and the forced alignment result information of the forced alignment unit correspond to each time section;
  • a score output unit 800 that calculates an average score for each phoneme based on the adjustment score for each time interval provided from the adjustment score provider or calculates and outputs an overall average score for the input voice information. do.
  • the native speaker acoustic model information stored in the native speaker acoustic model storage unit 200 includes information on the native speaker's pronunciation characteristics of each phoneme by analyzing the uttering speed of the native speaker, the length of the silent section between each pronunciation, and the like using a deep learning model. Characterized in that.
  • the score output unit 800 is characterized in that for treating the average score for each phoneme as a score value between 0 and 100 points.
  • the score output unit 800 is characterized in that for outputting at least one or more of the average score for each phoneme, the overall average score on the screen.
  • the interval unit is characterized in that the time interval in the range of 1msec ⁇ 20msec. Preferably it is characterized in that 10msec.
  • the log likelihood calculator 500 may extract the voice feature vector extracted by the voice information extractor 100 and the forced alignment result information generated by the forced sorter 400 using the following log likelihood equation. It is characterized by calculating the log likelihood for each time interval for the forced alignment result information.
  • log likelihood score conversion formula By using the log likelihood score conversion formula below it is characterized in that the log likelihood for each time interval for the forced alignment result information is converted into a score between 0 and 100 points.
  • the adjustment score of the corresponding section is set to 100 and provided to the score output unit, and the voice recognition result of the voice recognition unit
  • the time likelihood between the information and the forced sorting result information of the forced sorting unit may be provided to the score output unit using the log likelihood transformation score of the corresponding section converted by the log likelihood score converter as an adjustment score.
  • Log likelihood calculation step (S400) for calculating log likelihood for each time interval for the forced alignment result information using the voice feature vector for each time interval extracted through the voice information extraction step and the forced alignment result information generated through the forced alignment step (S400) ;
  • the force likelihood result by the log likelihood calculator 500 using the force feature result generated through the forced feature and the speech feature vector extracted for each time interval extracted through the voice information extraction step using the following log likelihood equation. Computing the log likelihood for each time interval for the information.
  • the log likelihood score conversion step (S500) is a log likelihood score conversion unit 600, the log likelihood score for each time interval for the forced sorting result information by using the following log likelihood score conversion equation between 0 to 100 points And converting the score.
  • the adjustment score providing step (S600) is performed by the adjustment score providing unit 700 in a time interval in which the voice recognition result information of the voice recognition unit and the forced alignment result information coincide with the log likelihood adjustment equation below.
  • the log likelihood score converter converts the corresponding section of the corresponding section. It is characterized in that the log likelihood conversion score is provided as an adjustment score to the score output unit.
  • FIG. 1 is an overall configuration diagram schematically showing an apparatus for evaluating phoneme pronunciation accuracy according to a first embodiment of the present invention.
  • the phoneme pronunciation pronunciation evaluation apparatus 1000 of the present invention improves the problem of the conventional automatic pronunciation evaluation apparatus that provides only an overall pronunciation evaluation score for a spoken speech signal corresponding to a given word or sentence.
  • the phoneme pronunciation pronunciation evaluation apparatus 1000 of the present invention improves the problem of the conventional automatic pronunciation evaluation apparatus that provides only an overall pronunciation evaluation score for a spoken speech signal corresponding to a given word or sentence.
  • a score for each phoneme (pronounced) which is a detailed unit of the voice signal, it is possible to feed back not only the overall pronunciation score but also the score for each phoneme (pronounced). It will enhance the effect.
  • the problem of the related art which provides only the overall pronunciation evaluation score for the input voice signal, is improved to provide a score for each phoneme which is a detailed unit of the voice signal.
  • the minimum sound unit that brings a difference in meaning is called a phoneme (pronounced), and when learning a foreign language, it is important to learn pronunciation of the phoneme unit of the corresponding language.
  • the common points of the existing pronunciation evaluation score calculation methods provide a score that evaluates the input foreign language voice signal as a whole, so that the user is provided with limited feedback because the score is not provided for each phoneme.
  • the learning effect through the pronunciation evaluation feedback is enhanced.
  • the conventional method for calculating a pronunciation score provides a total pronunciation score of 'the pronunciation score for cat is 80.
  • the pronunciation score for c is 80 for cat.
  • the pronunciation score for a is 90 points
  • the pronunciation score for t is 90 points
  • the overall pronunciation score is 86.6 points.
  • the phoneme unit pronunciation accuracy evaluation apparatus 1000 includes a voice information extracting unit 100, a native speaker model storage unit 200, a voice recognition unit 300, and a forced alignment unit 400. ), A log likelihood calculator 500, a log likelihood score converter 600, an adjustment score provider 700, and a score output unit 800.
  • the voice information extracting unit 100 obtains the spoken text information and the voice information pronounced by the learner for the spoken text information from the learner, divides the acquired voice information into a set time interval unit, and divides the speech feature vector for each time interval. Will be extracted.
  • voice text information corresponding to a text of 'cat' and voice information of a learner who pronounces 'cat' are obtained.
  • the present invention is provided with an input means for inputting text and an input means for inputting voice information.
  • the learner provides the voice information extracting unit 100 to the voice information extracting unit 100 through the input means for inputting text.
  • Voice information pronounced 'cat' which is spoken text, is input through an input means (eg, a microphone means) for inputting voice information.
  • the voice information extracting unit 100 receiving the voiced text information and the voice information acquires a voice signal of 'cat', and divides the acquired voice information into units of a set time period. A voice feature vector for each time interval is extracted as shown in FIG.
  • time intervals are divided by 10 ms units for the speech signal illustrated in FIG. 2, and a feature vector (MFCC) for the speech signal is extracted for each time interval.
  • MFCC feature vector
  • MFCC Mel Frequency Cepstrum Coefficient
  • the time interval unit for extracting the speech feature vector is characterized in that the time unit in the range of 1msec ⁇ 20msec.
  • the unit is preferably set in 10 msec units.
  • the native speaker model storage unit 200 stores native speaker model information.
  • the native speaker model information stored in the native speaker model storage unit 200 is characterized in that the native speaker pronunciation characteristic information for each phoneme using a deep learning model.
  • native speaker pronunciation characteristic information for each phoneme which is an analysis result of analyzing a native speaker's uttering speed and the length of a silent section between each pronunciation, is stored in the native speaker's acoustic model storage unit 200 and used therein.
  • the voice recognition unit 300 performs voice recognition on the voice pronounced by the learner.
  • the speech recognition unit 300 performs speech recognition on speech feature vectors for each time interval extracted by the speech information extracting unit 100 using native speaker sound model information stored in the native speaker sound model storage unit 200. To generate voice recognition result information.
  • the voice recognition result information is b phoneme pronunciation in 0-10ms (1 section), b phoneme pronunciation in 10-20ms (2 sections), b phoneme in 20-30ms (3 sections), as shown in FIG. Pronunciation, b phonetic pronunciation for 30-40ms (4 sections), ⁇ phonetic pronunciation for 40-50ms (5 sections), ⁇ phonetic pronunciation for 50-60ms (6 sections), t phonetic pronunciation for 60-70ms (7 sections), T phoneme pronunciation in 70 ⁇ 80ms (8 sections) and s phoneme pronunciation in 80 ⁇ 90ms (9 sections).
  • the forced sorting unit 400 generates the forced sorting result information by forcibly sorting the spoken text information obtained by the voice information extracting unit 100 for each time interval.
  • the forced alignment unit 400 adjusts the phoneme unit pronunciation corresponding to the text 'cat' to the voice strip in FIG. 3. Forced alignment is shown as shown.
  • pronunciation of phoneme units corresponding to the spoken text is forcedly sorted for each 10 ms time interval.
  • the forced sorting result information is k phoneme, 0 to 10 ms (one section), as shown in FIG. 3, and 10 to 20 ms.
  • K phoneme in 2 sections k phoneme in 20-30ms (3 sections), phoneme in 30-40ms (4 sections), phoneme in 40-50ms (5 sections), phoneme in 50 to 60ms (6 sections)
  • the log likelihood calculator 500 uses the voice feature vector extracted by the voice information extractor 100 for each time interval and the forced alignment result information generated by the forced sorter 400 to generate a time interval for the forced alignment result information. Calculate the star log likelihood.
  • the log likelihood calculator 500 calculates a log likelihood for each time interval for the forced sorting result information by using the following log likelihood formula.
  • oi denotes a voice feature vector of the i-th time interval
  • qi denotes a phoneme of the i-th time interval based on the forced alignment result information
  • p (oi ⁇ qi) denotes a probability value of oi coming out of qi in the i-th time interval.
  • the log likelihood for each time interval for the forced sorting result information may have a negative value. This is because p (oi ⁇ qi) is a value between 0 and 1 (the probability of oi coming out of qi in the i-th time interval), and is a logarithm of these values.
  • the log likelihood score converter 600 converts the log likelihood calculated for each time interval for the forced sorting result information into a score between 0 and 100 points.
  • the reason for converting the log likelihood calculated for each time interval into scores between 0 and 100 for the forced sorting result information is that the calculated log likelihood value for each time interval has a negative value, which is the value of the positive region.
  • log likelihood score conversion formula By using the log likelihood score conversion formula below it is characterized in that the log likelihood for each time interval for the forced alignment result information is converted into a score between 0 and 100 points.
  • oi denotes a voice feature vector of the i-th time interval
  • qi denotes a phoneme of the i-th time interval based on the forced alignment result information
  • p (oi ⁇ qi) denotes a probability value of oi coming out of qi in the i-th time interval.
  • the voice feature vector of one section (0 to 10 ms) shown in FIG. 3 is 'k', which is a phoneme of the first section based on the forced alignment result information.
  • 'k' which is a phoneme of the first section based on the forced alignment result information.
  • the ' ⁇ ' which is a phoneme of 5 sections based on the sorting result information, is a log value of the probability of coming out of the phoneme, plus 100
  • the converted log likelihood 90 of 7 sections (60 to 70 ms) shown in FIG. 'T' which is a phoneme of the 7-segment phoneme based on the coercion result information, is a value obtained by adding 100 to the log value of the probability of coming out of the phoneme.
  • p (o1 ⁇ q1) is the phoneme of 'k' whose first feature is the first feature of the first time interval. It is a probability to come out, and the log value is taken as the log likelihood value 90 after taking the logarithm to the probability value.
  • the speech feature vector is obtained, the transformed log likelihood value of 90 points of FIG. 3 is calculated due to the probability value of which the probability value is much smaller than one.
  • the log likelihood score conversion unit 600 converts the log likelihood of the 'k' phone, which is a phoneme (phoneme by the forced alignment result information) of one section (0 to 10 ms), as shown in FIG.
  • the converted log likelihood of the 'k' phone which is the phoneme of the 2 sections (10 ⁇ 20ms) (the phoneme by the forced alignment result information), is 80 points, and the phoneme of the 3 sections (20 ⁇ 30ms) (the phoneme by the forced alignment result information)
  • the converted log likelihood of the 'k' phoneme is 100 points, and the converted log likelihood of the ' ⁇ phoneme is 4 points (30 ⁇ 40ms), which is 40 points and 5 sections (40
  • the converted log likelihood of the ' ⁇ phoneme which is a phoneme of ⁇ 50ms) (the phoneme based on the forced alignment result information), is 80 points, and the converted log likelihood of the' ⁇ phoneme, which is 6 sections (50 ⁇ 60ms), is 80 points, 7 sections ( The converted log likelihood of the 't
  • the adjustment score providing unit 700 provides the adjustment score for each time interval to the score output unit according to whether the voice recognition result information of the voice recognition unit and the forced alignment result information of the forced alignment unit match each time period. Done.
  • the adjustment score of the corresponding section is provided as 100 and provided to the score output unit.
  • the log likelihood score conversion unit converts the log likelihood conversion score of the corresponding section converted by the log likelihood score conversion unit to the score output unit for a time interval in which the voice recognition result information of the speech recognition unit and the forced alignment result information of the forced alignment unit do not coincide. do.
  • oi denotes a voice feature vector of the i-th time interval
  • qi denotes a phoneme of the i-th time interval based on the forced alignment result information
  • p (oi ⁇ qi) denotes a probability value of oi coming out of qi in the i-th time interval.
  • the speech recognition result and the forced sorting result in the case of 't and t' in ' ⁇ and ⁇ 7' sections in five sections and 't and t' in eight sections, and 't and t' in eight sections.
  • the log likelihood conversion scores for the speech recognition results for each time interval are '80 points, 80 points, 90 points, 90 points', respectively, but the adjusted score is' 100 points', which is provided to the score output unit. will be.
  • the reason for setting the adjustment score as 100 is that although the speech recognition result information and the forced sorting result information match, if the adjustment is not made, the evaluation score is considerably lowered, so an adjustment score of 100 points is introduced to eliminate the score error. It will be done.
  • the log likelihood conversion score for the speech recognition result of the corresponding time interval is provided as the adjustment score, which is provided to the score output unit.
  • 80 points for 2 points, 100 points for 3 sections, 40 points for 4 sections, and 70 points for 9 sections are provided to the score output unit.
  • the score output unit 800 calculates an average score for each phoneme of the input voice information based on the adjustment score provided from the adjustment score provider, or calculates and outputs an overall average score for the input voice information.
  • 'cat' is composed of 'k' and ' ⁇ 't' phonemes, as shown in FIG.
  • the time interval adjustment score for phonemes is 90,80,100 points, respectively, followed by ((90 + 80 + 100) / 3), so the average score for 'k' phonemes is 90 points, 40,100,100 points followed by '(40 + 100 + 100) / 3', so the average score for the ' ⁇ phoneme is 80 points, and the time interval adjustment for the' t 'phoneme is 100,100,70 points followed by' (100 + 100 + 70 ' ) / 3 ', so the average score for the' t 'phoneme is calculated as 90 points and printed out.
  • At least one of the average score for each phoneme and the total average score of the input voice information characterized in that for outputting on the screen.
  • the phoneme average score and the overall average score may be simultaneously provided, only the phoneme average score may be provided, or only the overall average score may be provided.
  • FIG. 5 is a flowchart illustrating a method for evaluating phoneme unit pronunciation accuracy according to a first embodiment of the present invention.
  • the phoneme unit pronunciation accuracy evaluation method includes a voice information extraction step S100, a voice recognition step S200, a forced sorting step S300, a log likelihood calculation step S400, and a log likelihood score conversion.
  • a step S500, an adjustment score providing step S600, and a score output step S700 are included.
  • the speech information extracting unit 100 obtains the spoken text information and the spoken speech information of the learner from the learner from the learner, divides the obtained speech information into a set time interval unit, and extracts a speech feature vector for each time section. Extracting voice information (S100);
  • the speech recognition unit 300 performs speech recognition on speech feature vectors for each time interval extracted through the speech information extraction step S100 using the native speaker sound model information stored in the native speaker sound model storage unit 200.
  • Speech recognition step (S200) for generating a speech recognition result information by;
  • a forced sorting step (S300) forcing the sorting unit 400 to perform forced sorting of the spoken text information obtained through the voice information extracting step (S100) for each time interval to generate a forced sorting result information (S300);
  • the log likelihood measuring unit 500 extracts the voice feature vector for each time interval extracted through the voice information extraction step S100 and the forced sorting step S300.
  • the log likelihood calculation step (S400) is a forced feature result generated by the voice feature vector and the forced alignment step (S300) for each time interval extracted through the voice information extraction step (S100) using the following log likelihood equation; It is characterized in that to calculate the log likelihood for each time interval for the forced alignment result information.
  • oi denotes a voice feature vector of the i-th time interval
  • qi denotes a phoneme of the i-th time interval based on the forced alignment result information
  • p (oi ⁇ qi) denotes a probability value of oi coming out of qi in the i-th time interval.
  • the log likelihood score conversion step (S500) is characterized by converting the log likelihood for each time interval for the forced sorting result information into a score between 0 and 100 points using the following log likelihood score conversion equation.
  • oi denotes a voice feature vector of the i-th time interval
  • qi denotes a phoneme of the i-th time interval based on the forced alignment result information
  • p (oi ⁇ qi) denotes a probability value of oi coming out of qi in the i-th time interval.
  • the adjustment score of the corresponding section is set to 100 for a time section in which the voice recognition result information of the voice recognition unit and the forced alignment result information of the forced alignment unit correspond to each other according to the following log likelihood adjustment equation.
  • the score is output to the score output section, and the score is output by adjusting the log likelihood conversion score of the corresponding section, which is converted by the log likelihood score converter, for a time interval where the voice recognition result information of the voice recognition unit and the forced sorting result information of the forced alignment unit do not match. It is characterized by the provision of wealth.
  • oi denotes a voice feature vector of the i-th time interval
  • qi denotes a phoneme of the i-th time interval based on the forced alignment result information
  • p (oi ⁇ qi) denotes a probability value of oi coming out of qi in the i-th time interval.
  • the present invention by improving the problem of the conventional automatic pronunciation evaluation device that provides only the overall pronunciation evaluation score for the spoken speech signal corresponding to a given word or sentence, by providing a score for each phoneme (phoneme) which is a detailed unit of the speech signal.
  • the overall pronunciation score can be fed back to the score for each phoneme (pronounced) can concentrate on what the phoneme is insufficient to enhance the learning effect accordingly.
  • the problem of the conventional automatic pronunciation evaluation device that provides only the overall pronunciation evaluation score for the spoken speech signal corresponding to a given word or sentence
  • scores for each phoneme pronoun
  • the industrial applicability is also increased.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

La présente invention concerne un appareil et un procédé pour évaluer la précision de prononciation d'une unité de phonème et, plus spécifiquement, un appareil et un procédé pour évaluer la précision de prononciation d'une unité de phonème, l'appareil et le procédé modérant un problème d'un dispositif d'évaluation de prononciation automatique conventionnel qui fournit uniquement un score d'évaluation de prononciation total pour un signal vocal d'une voix, prononcé par un élève en correspondance avec un mot ou une phrase donné, de façon à fournir un score pour chaque phonème (prononciation), qui est une unité détaillée d'un signal vocal, permettant ainsi de fournir une rétroaction du score de prononciation total et du score pour chaque phonème (prononciation), de sorte qu'un phonème non satisfaisant peut être étudié plus intensément, et ainsi les effets d'apprentissage sont améliorés.
PCT/KR2019/000147 2018-08-02 2019-01-04 Appareil et procédé pour évaluer la précision de prononciation d'une unité de phonème WO2020027394A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR20180090123 2018-08-02
KR10-2018-0090123 2018-08-02

Publications (1)

Publication Number Publication Date
WO2020027394A1 true WO2020027394A1 (fr) 2020-02-06

Family

ID=69232268

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2019/000147 WO2020027394A1 (fr) 2018-08-02 2019-01-04 Appareil et procédé pour évaluer la précision de prononciation d'une unité de phonème

Country Status (1)

Country Link
WO (1) WO2020027394A1 (fr)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111986650A (zh) * 2020-08-07 2020-11-24 云知声智能科技股份有限公司 借助语种识别辅助语音评测的方法及系统
CN112331180A (zh) * 2020-11-03 2021-02-05 北京猿力未来科技有限公司 一种口语评测方法及装置
CN112466288A (zh) * 2020-12-18 2021-03-09 北京百度网讯科技有限公司 语音识别方法、装置、电子设备及存储介质
CN112767919A (zh) * 2021-01-22 2021-05-07 北京读我科技有限公司 一种语音测评方法及装置
CN112908360A (zh) * 2021-02-02 2021-06-04 早道(大连)教育科技有限公司 一种在线口语发音评价方法、装置及存储介质
CN113823329A (zh) * 2021-07-30 2021-12-21 腾讯科技(深圳)有限公司 数据处理方法以及计算机设备
WO2022048354A1 (fr) * 2020-09-07 2022-03-10 北京世纪好未来教育科技有限公司 Procédé et appareil d'évaluation de modèle d'alignement forcé de paroles, dispositif électronique et support de stockage
CN115376547A (zh) * 2022-08-12 2022-11-22 腾讯科技(深圳)有限公司 发音评测方法、装置、计算机设备和存储介质
CN115376547B (zh) * 2022-08-12 2024-06-04 腾讯科技(深圳)有限公司 发音评测方法、装置、计算机设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050074298A (ko) * 2004-01-08 2005-07-18 정보통신연구진흥원 외국어 발음 평가 시스템 및 외국어 발음 평가 방법
KR20100049201A (ko) * 2008-11-03 2010-05-12 윤병원 발음 학습기능을 갖는 전자사전 서비스 방법 및 그 전자사전 장치
KR20150001189A (ko) * 2013-06-26 2015-01-06 한국전자통신연구원 음성인식을 이용한 외국어 말하기 능력의 훈련 및 평가 방법과 그 장치
KR101609473B1 (ko) * 2014-10-14 2016-04-05 충북대학교 산학협력단 영어 말하기 시험의 유창성 평가 시스템 및 방법
KR20160122542A (ko) * 2015-04-14 2016-10-24 주식회사 셀바스에이아이 발음 유사도 측정 방법 및 장치

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050074298A (ko) * 2004-01-08 2005-07-18 정보통신연구진흥원 외국어 발음 평가 시스템 및 외국어 발음 평가 방법
KR20100049201A (ko) * 2008-11-03 2010-05-12 윤병원 발음 학습기능을 갖는 전자사전 서비스 방법 및 그 전자사전 장치
KR20150001189A (ko) * 2013-06-26 2015-01-06 한국전자통신연구원 음성인식을 이용한 외국어 말하기 능력의 훈련 및 평가 방법과 그 장치
KR101609473B1 (ko) * 2014-10-14 2016-04-05 충북대학교 산학협력단 영어 말하기 시험의 유창성 평가 시스템 및 방법
KR20160122542A (ko) * 2015-04-14 2016-10-24 주식회사 셀바스에이아이 발음 유사도 측정 방법 및 장치

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111986650A (zh) * 2020-08-07 2020-11-24 云知声智能科技股份有限公司 借助语种识别辅助语音评测的方法及系统
CN111986650B (zh) * 2020-08-07 2024-02-27 云知声智能科技股份有限公司 借助语种识别辅助语音评测的方法及系统
WO2022048354A1 (fr) * 2020-09-07 2022-03-10 北京世纪好未来教育科技有限公司 Procédé et appareil d'évaluation de modèle d'alignement forcé de paroles, dispositif électronique et support de stockage
US11749257B2 (en) 2020-09-07 2023-09-05 Beijing Century Tal Education Technology Co., Ltd. Method for evaluating a speech forced alignment model, electronic device, and storage medium
CN112331180A (zh) * 2020-11-03 2021-02-05 北京猿力未来科技有限公司 一种口语评测方法及装置
CN112466288A (zh) * 2020-12-18 2021-03-09 北京百度网讯科技有限公司 语音识别方法、装置、电子设备及存储介质
CN112767919A (zh) * 2021-01-22 2021-05-07 北京读我科技有限公司 一种语音测评方法及装置
CN112908360A (zh) * 2021-02-02 2021-06-04 早道(大连)教育科技有限公司 一种在线口语发音评价方法、装置及存储介质
CN112908360B (zh) * 2021-02-02 2024-06-07 早道(大连)教育科技有限公司 一种在线口语发音评价方法、装置及存储介质
CN113823329A (zh) * 2021-07-30 2021-12-21 腾讯科技(深圳)有限公司 数据处理方法以及计算机设备
CN115376547A (zh) * 2022-08-12 2022-11-22 腾讯科技(深圳)有限公司 发音评测方法、装置、计算机设备和存储介质
CN115376547B (zh) * 2022-08-12 2024-06-04 腾讯科技(深圳)有限公司 发音评测方法、装置、计算机设备和存储介质

Similar Documents

Publication Publication Date Title
WO2020027394A1 (fr) Appareil et procédé pour évaluer la précision de prononciation d'une unité de phonème
WO2020213996A1 (fr) Procédé et appareil de détection d'interruption
WO2020231181A1 (fr) Procédé et dispositif pour fournir un service de reconnaissance vocale
WO2020145439A1 (fr) Procédé et dispositif de synthèse vocale basée sur des informations d'émotion
WO2020190050A1 (fr) Appareil de synthèse vocale et procédé associé
WO2020189850A1 (fr) Dispositif électronique et procédé de commande de reconnaissance vocale par ledit dispositif électronique
WO2017217661A1 (fr) Appareil d'intégration de sens de mot et procédé utilisant un réseau sémantique lexical, et appareil et procédé de discrimination d'homographe utilisant un réseau sémantique lexical et une intégration de mot
WO2021112642A1 (fr) Interface utilisateur vocale
WO2019139431A1 (fr) Procédé et système de traduction de parole à l'aide d'un modèle de synthèse texte-parole multilingue
WO2017082447A1 (fr) Dispositif de lecture à voix haute et d'affichage en langue étrangère et procédé associé, dispositif d'apprentissage moteur et procédé d'apprentissage moteur basés sur un capteur de détection d'actions rythmiques de langue étrangère l'utilisant, et support électronique et ressources d'étude dans lesquels celui-ci est enregistré
WO2019078615A1 (fr) Procédé et dispositif électronique pour traduire un signal vocal
WO2020230926A1 (fr) Appareil de synthèse vocale pour évaluer la qualité d'une voix synthétisée en utilisant l'intelligence artificielle, et son procédé de fonctionnement
WO2020085794A1 (fr) Dispositif électronique et son procédé de commande
WO2020263034A1 (fr) Dispositif de reconnaissance d'entrée vocale d'un utilisateur et procédé de fonctionnement associé
WO2020050509A1 (fr) Dispositif de synthèse vocale
WO2020145472A1 (fr) Vocodeur neuronal pour mettre en œuvre un modèle adaptatif de locuteur et générer un signal vocal synthétisé, et procédé d'entraînement de vocodeur neuronal
WO2022080774A1 (fr) Dispositif, procédé et programme d'évaluation de trouble de la parole
WO2021040490A1 (fr) Procédé et appareil de synthèse de la parole
WO2014163231A1 (fr) Procede d'extraction de signal de parole et appareil d'extraction de signal de parole a utiliser pour une reconnaissance de parole dans un environnement dans lequel de multiples sources sonores sont delivrees
EP3841460A1 (fr) Dispositif électronique et son procédé de commande
WO2023085584A1 (fr) Dispositif et procédé de synthèse vocale
WO2022035183A1 (fr) Dispositif de reconnaissance d'entrée vocale d'utilisateur et son procédé d'utilisation
WO2021085661A1 (fr) Procédé et appareil de reconnaissance vocale intelligent
WO2022260432A1 (fr) Procédé et système pour générer une parole composite en utilisant une étiquette de style exprimée en langage naturel
WO2023177095A1 (fr) Apprentissage multi-condition corrigé pour une reconnaissance vocale robuste

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19843830

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06.07.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19843830

Country of ref document: EP

Kind code of ref document: A1