KR20210152687A - Method of evaluating english pronunciation using encoder-decoder attention difference of speech transformer - Google Patents

Method of evaluating english pronunciation using encoder-decoder attention difference of speech transformer Download PDF

Info

Publication number
KR20210152687A
KR20210152687A KR1020200069483A KR20200069483A KR20210152687A KR 20210152687 A KR20210152687 A KR 20210152687A KR 1020200069483 A KR1020200069483 A KR 1020200069483A KR 20200069483 A KR20200069483 A KR 20200069483A KR 20210152687 A KR20210152687 A KR 20210152687A
Authority
KR
South Korea
Prior art keywords
english pronunciation
encoder
pronunciation
native speaker
evaluating
Prior art date
Application number
KR1020200069483A
Other languages
Korean (ko)
Inventor
김광호
김덕규
Original Assignee
김광호
김덕규
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 김광호, 김덕규 filed Critical 김광호
Priority to KR1020200069483A priority Critical patent/KR20210152687A/en
Publication of KR20210152687A publication Critical patent/KR20210152687A/en

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/06Foreign languages
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

An English pronunciation evaluation method according to one embodiment of the present invention may comprise: a step of learning English pronunciation of a native speaker, and a step of evaluating English pronunciation of a non-native speaker based on the learned English pronunciation of a native speaker. According to the present invention, English pronunciation of a non-native speaker can be evaluated by using an artificial recognition deep-learning technology.

Description

스피치 트랜스포머의 인코더-디코더 어텐션 차이를 활용한 영어 발음 평가 방법{METHOD OF EVALUATING ENGLISH PRONUNCIATION USING ENCODER-DECODER ATTENTION DIFFERENCE OF SPEECH TRANSFORMER}A method for evaluating English pronunciation using the encoder-decoder attention difference of speech transformers {METHOD OF EVALUATING ENGLISH PRONUNCIATION USING ENCODER-DECODER ATTENTION DIFFERENCE OF SPEECH TRANSFORMER}

본 출원은, 스피치 트랜스포머의 인코더-디코더 어텐션 차이를 활용한 영어 발음 평가 방법에 관한 것이다.The present application relates to a method for evaluating English pronunciation using a difference in the encoder-decoder attention of a speech transformer.

산업의 국제화 추세에 따라 제2 외국어에 대한 관심이 많아지고 있으며, 이러한 추세에 대응하기 위해 어학용 프로그램으로 외국어 발음 평가 방법들의 연구 개발이 필요하다.According to the internationalization trend of the industry, interest in a second foreign language is increasing, and in order to respond to this trend, it is necessary to research and develop methods for evaluating foreign language pronunciation as a language study program.

본 발명의 일 실시 형태에 의하면, 인공 인식 딥러닝 기술을 활용하여 비원어민의 영어 발음을 평가할 수 있는 스피치 트랜스포머의 인코더-디코더 어텐션 차이를 활용한 영어 발음 평가 방법을 제공한다.According to an embodiment of the present invention, there is provided a method for evaluating English pronunciation using an encoder-decoder attention difference of a speech transformer capable of evaluating the English pronunciation of a non-native speaker using artificial recognition deep learning technology.

본 발명의 일 실시 형태에 의하면, 원어민의 영어 발음열 학습 단계와, 학습된 상기 원어민의 영어 발음열을 기초로, 비원어민의 열어 발음을 평가하는 단계를 포함하는, 영어 발음 평가 방법이 제공된다.According to an embodiment of the present invention, there is provided a method for evaluating English pronunciation, comprising the steps of: learning the English pronunciation sequence of a native speaker; and evaluating the open pronunciation of a non-native speaker based on the learned English pronunciation sequence of the native speaker. .

본 발명의 일 실시 형태에 의하면, 인공 인식 딥러닝 기술을 활용하여 비원어민의 영어 발음을 평가할 수 있다.According to one embodiment of the present invention, it is possible to evaluate the English pronunciation of non-native speakers by using artificial recognition deep learning technology.

도 1은 본 발명의 일 실시 형태에 따른 트랜스포머의 아키텍처를 도시한 도면이다.
도 2는 본 발명의 일 실시 형태에 따른 원어민의 어텐션 매트릭스와 비원어민의 어텐션 매트릭스를 이용한 영어 발음 평가 과정을 도시한 도면이다.
1 is a diagram illustrating an architecture of a transformer according to an embodiment of the present invention.
2 is a diagram illustrating an English pronunciation evaluation process using a native speaker's attention matrix and a non-native speaker's attention matrix according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 실시형태를 설명한다. 그러나 본 발명의 실시형태는 여러 가지의 다른 형태로 변형될 수 있으며, 본 발명의 범위가 이하 설명하는 실시형태로만 한정되는 것은 아니다. 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있으며, 도면상의 동일한 부호로 표시되는 요소는 동일한 요소이다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. However, the embodiment of the present invention may be modified in various other forms, and the scope of the present invention is not limited only to the embodiments described below. The shapes and sizes of elements in the drawings may be exaggerated for a clearer description, and elements indicated by the same reference numerals in the drawings are the same elements.

도 1은 본 발명의 일 실시 형태에 따른 트랜스포머의 아키텍처를 도시한 도면이다. 한편, 도 2는 본 발명의 일 실시 형태에 따른 원어민의 어텐션 매트릭스와 비원어민의 어텐션 매트릭스를 이용한 영어 발음 평가 과정을 도시한 도면이다.1 is a diagram illustrating an architecture of a transformer according to an embodiment of the present invention. Meanwhile, FIG. 2 is a diagram illustrating an English pronunciation evaluation process using a native speaker's attention matrix and a non-native speaker's attention matrix according to an embodiment of the present invention.

본 출원에서 제안하고자 하는, 트랜스포머(transformer)의 인코더-디코더 어텐션 차이(Encoder-Decoder Attention Difference) 측정을 위해서는, 다음의 두 단계로 구성된다. In order to measure an encoder-decoder attention difference of a transformer, which is proposed in the present application, the following two steps are performed.

(1) 원어민(Native Speaker)의 영어 발음열 학습 단계(1) Native speaker's English pronunciation training stage

(2) 비원어민(Second Language Learner)의 영어 발음 평가 단계(2) Second Language Learner's English Pronunciation Evaluation Stage

음성인식 딥러닝 분야에서의 영어 발음열 학습과 발음열 평가(예측) 단계로 구성된다. It consists of English pronunciation heat learning and pronunciation heat evaluation (prediction) steps in the field of speech recognition deep learning.

(1) 원어민(Native Speaker)의 영어 발음열 학습 단계 (1) Native speaker's English pronunciation training stage

- 모델링 기법: 스피치 트랜스포머(Speech Transformer)의 아키텍처를 활용 - Modeling technique: utilizing the architecture of Speech Transformer

- 입력: 원어민(Native Speaker)의 영어 발성 문장 - Input: Sentences spoken by a native speaker

- 출력: 영어의 발음열 시퀀스 (영어발음열 정의: 약 84개) - Output: English pronunciation sequence (Definition of English pronunciation sequence: about 84)

- 발음열 정의는 CMU Dictionary를 활용하였다(출처: http://www.speech.cs.cmu.edu/cgi-bin/cmudict).- The pronunciation column was defined using the CMU Dictionary (source: http://www.speech.cs.cmu.edu/cgi-bin/cmudict ).

구체적으로, 트랜스포머(Transformer)는 자연어 처리 분야 및 음성인식 분야에서 딥러닝 모델로써 좋은 성능을 제시하고 있다. Specifically, Transformer presents good performance as a deep learning model in the field of natural language processing and speech recognition.

본 출원에서는 음성인식 트랜스포머(Transformer)의 구조를 활용하면서, 음성인식의 단어 인식단위를 영어 발음열 인식으로 변환하여 발음을 모델링한다. In the present application, the pronunciation is modeled by converting the word recognition unit of speech recognition into English pronunciation string recognition while utilizing the structure of the speech recognition transformer.

이는 원어민(Native Speaker)의 영어 발음을 모델링하는 단계이다. This is the stage of modeling the English pronunciation of native speakers.

원어민(Native Speaker)과 비원어민(Second Language Learner)간의 차이를 인코더-디코더 어텐션(Encoder-Decoder Attention)의 차이를 계산함으로써 발음 평가 결과를 제시하고자 한다. We present the pronunciation evaluation result by calculating the difference between the encoder-decoder attention and the difference between native speakers and non-native speakers.

도 1에서 오른쪽 중간부분의 인코더-디코더 어텐션(Encoder-Decoder Attention)의 결과물은 매 시간별 벡터(Vector) 형태로써, 매트릭스(Matrix) 형태를 보여주는데 본 특허에서는 인코더-디코더 어텐션(Encoder-Decoder Attention)을 의미하는 매트릭스(Matrix)간의 차이(Difference)를 구함으로써 최종 발음 평가의 결과를 제시하고자 한다. In Fig. 1, the result of the encoder-decoder attention in the right middle part is in the form of a vector for each time, showing the form of a matrix. In this patent, the encoder-decoder attention is The result of the final pronunciation evaluation is presented by finding the difference between the meaning matrices.

(2) 비원어민(Second Language Learner)의 영어 발음 평가 단계(단 원어민과 비원어민이 동일한 문장을 발성한다고 가정함) (2) Second Language Learner's English pronunciation evaluation stage (provided that the native speaker and the non-native speaker utter the same sentence)

- 원어민(Native Speaker)의 발성에 대한 인코더-디코더 어텐션 매트릭스(Encoder-Decoder Attention Matrix) 생성 (NS-M) - Generation of Encoder-Decoder Attention Matrix for native speaker's vocalization (NS-M)

- 비원어민(Second Language Learner)의 발성이 원어민(Native Speaker)의 발성 문장과 동일하다고 가정하고, Teacher Forcing 기법을 적용한 인코더-디코더 어텐션 매트릭스(Encoder-Decoder Attention Matrix) 생성 (SL-M) - Assuming that the vocalization of a second language learner is the same as that of a native speaker, generation of an Encoder-Decoder Attention Matrix with Teacher Forcing applied (SL-M)

- NS-M과 SL-M 사이의 차이(Difference)를 구한 후, - After finding the difference between NS-M and SL-M,

- 시그모이드(Sigmoid) 함수를 적용하여 0~1사이의 값으로 정규화(Normalization)를 수행하고, - Normalization is performed to a value between 0 and 1 by applying the sigmoid function,

- 평균 값을 구하여 영어 발음 평가 결과로 제시하였다.- The average value was obtained and presented as the result of English pronunciation evaluation.

상술한 바와 같이, 본 발명의 일 실시 형태에 의하면, 인공 인식 딥러닝 기술을 활용하여 비원어민의 영어 발음을 평가할 수 있다.As described above, according to an embodiment of the present invention, it is possible to evaluate the English pronunciation of non-native speakers by using artificial recognition deep learning technology.

상술한 본 발명의 일 실시 형태에 따른 스피치 트랜스포머의 인코더-디코더 어텐션 차이를 활용한 영어 발음 평가 방법은 컴퓨터에서 실행되기 위한 프로그램으로 제작되어 컴퓨터가 읽을 수 있는 기록 매체에 저장될 수 있다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고 상기 방법을 구현하기 위한 기능적인(function) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다.The method for evaluating English pronunciation using the encoder-decoder attention difference of the speech transformer according to the embodiment of the present invention described above may be produced as a program to be executed on a computer and stored in a computer-readable recording medium. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. include In addition, the computer-readable recording medium is distributed in a computer system connected to a network, so that the computer-readable code can be stored and executed in a distributed manner. And a functional program, code, and code segments for implementing the method can be easily inferred by programmers in the art to which the present invention pertains.

본 발명은 상술한 실시형태 및 첨부된 도면에 의해 한정되지 아니한다. 첨부된 청구범위에 의해 권리범위를 한정하고자 하며, 청구범위에 기재된 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 다양한 형태의 치환, 변형 및 변경할 수 있다는 것은 당 기술분야의 통상의 지식을 가진 자에게 자명할 것이다.The present invention is not limited by the above-described embodiments and the accompanying drawings. It is intended to limit the scope of rights by the appended claims, and it is to those of ordinary skill in the art that various types of substitutions, modifications and changes can be made without departing from the technical spirit of the present invention described in the claims. it will be self-evident

100: 트랜스포머의 아키텍처100: Architecture of Transformers

Claims (1)

원어민의 영어 발음열 학습 단계와,
학습된 상기 원어민의 영어 발음열을 기초로, 비원어민의 열어 발음을 평가하는 단계를 포함하는, 영어 발음 평가 방법.
The steps of learning English pronunciation of native speakers,
and evaluating the open pronunciation of a non-native speaker based on the learned English pronunciation sequence of the native speaker.
KR1020200069483A 2020-06-09 2020-06-09 Method of evaluating english pronunciation using encoder-decoder attention difference of speech transformer KR20210152687A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020200069483A KR20210152687A (en) 2020-06-09 2020-06-09 Method of evaluating english pronunciation using encoder-decoder attention difference of speech transformer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020200069483A KR20210152687A (en) 2020-06-09 2020-06-09 Method of evaluating english pronunciation using encoder-decoder attention difference of speech transformer

Publications (1)

Publication Number Publication Date
KR20210152687A true KR20210152687A (en) 2021-12-16

Family

ID=79033178

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020200069483A KR20210152687A (en) 2020-06-09 2020-06-09 Method of evaluating english pronunciation using encoder-decoder attention difference of speech transformer

Country Status (1)

Country Link
KR (1) KR20210152687A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20230114893A (en) 2022-01-26 2023-08-02 서강대학교산학협력단 Self-supervised Swin transformer model structure and method of learning the self-supervised Swin transformer model
KR20230149554A (en) 2022-04-20 2023-10-27 중앙대학교 산학협력단 Apparatus and method for normalization of vision transformers

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20230114893A (en) 2022-01-26 2023-08-02 서강대학교산학협력단 Self-supervised Swin transformer model structure and method of learning the self-supervised Swin transformer model
KR20230149554A (en) 2022-04-20 2023-10-27 중앙대학교 산학협력단 Apparatus and method for normalization of vision transformers

Similar Documents

Publication Publication Date Title
Agarwal et al. A review of tools and techniques for computer aided pronunciation training (CAPT) in English
US10573296B1 (en) Reconciliation between simulator and speech recognition output using sequence-to-sequence mapping
CN107103900B (en) Cross-language emotion voice synthesis method and system
Feraru et al. Cross-language acoustic emotion recognition: An overview and some tendencies
Chao et al. 3m: An effective multi-view, multi-granularity, and multi-aspect modeling approach to english pronunciation assessment
KR20210152687A (en) Method of evaluating english pronunciation using encoder-decoder attention difference of speech transformer
US8157566B2 (en) Adjustable hierarchical scoring method and system
Khomitsevich et al. A bilingual Kazakh-Russian system for automatic speech recognition and synthesis
Nagano et al. Data augmentation based on vowel stretch for improving children's speech recognition
Maraoui et al. Arabic discourse analysis based on acoustic, prosodic and phonetic modeling: elocution evaluation, speech classification and pathological speech correction
KR20210059995A (en) Method for Evaluating Foreign Language Speaking Based on Deep Learning and System Therefor
Zhang et al. Learning Syllable-Level Discrete Prosodic Representation for Expressive Speech Generation.
Peabody et al. Towards automatic tone correction in non-native mandarin
Ekpenyong et al. Improved syllable-based text to speech synthesis for tone language systems
Hanzlíček et al. LSTM-based speech segmentation trained on different foreign languages
Shufang Design of an automatic english pronunciation error correction system based on radio magnetic pronunciation recording devices
Azim et al. Large vocabulary Arabic continuous speech recognition using tied states acoustic models
CN111508522A (en) Statement analysis processing method and system
Newman et al. Speaker independent visual-only language identification
Lee et al. Foreign language tutoring in oral conversations using spoken dialog systems
Baranwal et al. Improved Mispronunciation detection system using a hybrid CTC-ATT based approach for L2 English speakers
US20210304628A1 (en) Systems and Methods for Automatic Video to Curriculum Generation
KR102395702B1 (en) Method for providing english education service using step-by-step expanding sentence structure unit
US10783873B1 (en) Native language identification with time delay deep neural networks trained separately on native and non-native english corpora
Minematsu Pronunciation assessment based upon the compatibility between a learner's pronunciation structure and the target language's lexical structure.