WO2021137534A1

WO2021137534A1 - Method and system for learning korean pronunciation via voice analysis

Info

Publication number: WO2021137534A1
Application number: PCT/KR2020/019110
Authority: WO
Inventors: 송진주
Original assignee: (주)헤이스타즈
Priority date: 2019-12-31
Filing date: 2020-12-24
Publication date: 2021-07-08
Also published as: KR20220039679A; KR102396833B1; KR102542602B1; KR20210086182A

Abstract

The present invention provides a method for learning Korean pronunciation via voice analysis, the method comprising the steps of: uploading, by a user, personal information to a system; providing, by the system, Korean image content to the user, presenting pronunciation evaluation questions, and requesting feedback; collecting user voice data in response to the feedback and then evaluating the pronunciation of the user; storing, in a user terminal, the user voice data and scores of the evaluation; collecting, in a server, the stored scores of the evaluation during a certain period; and correcting, by the server, the pronunciation evaluation questions provided to the user, by using the collected information.

Description

Method and system for learning Korean pronunciation through voice analysis

The present invention relates to a method and system for learning Korean pronunciation, and more particularly, based on data such as the learner's voice, nationality, gender, age, etc., allowing foreigners to intensively learn words or sentences that are easy to make mistakes when learning Korean. It relates to a Korean pronunciation learning system through voice analysis that can maximize the learning effect.

There are various languages in the world, and each language has different characteristics such as stress, intonation, tone, etc., so there are words (sentences) that you can hear and speak well and words that are difficult to hear (sentences) depending on the environment you live in. do.

This is influenced not only by the anatomical structure of the ear according to race but also by acquired circumstances, and depending on the language environment, people may be able to hear or speak certain pronunciations better.

Acquired changes are caused by people's living environment. In other words, the human hearing is adjusted to better hear sounds that are often heard, and not to hear sounds that are not usually heard well. For example, the 'middle ear' of the human body structure blocks sounds that are not normally heard in order to protect the ears. It becomes blunt to sound in a frequency band that you don't normally hear.

Since the pronunciation or intonation is different for each language or for each person speaking, it is possible to improve the listening ability by using the above characteristics.

Republic of Korea Patent No. 10-0405061 'Language learning apparatus and its language analysis method' (hereinafter referred to as 'Patent Document 1') analyzes the pronunciation of the words or sentences learned by the learner, and envelops the stress and rhythm. By analyzing the voice waveform with the voice waveform and displaying the result, it is possible to objectively analyze pronunciation, stress, rhythm, and mouth shape.

However, the content disclosed in Patent Document 1 is merely to check the learning performance by analyzing the frequency characteristics of the words or sentences learned by the learner, and a method to improve the learning efficiency by improving the listening ability for a specific language before learning is not disclosed for

In addition, in Republic of Korea Patent Registration No. 10-1983772 'Listening Learning System' (hereinafter referred to as 'Patent Document 2'), the audio signal of the first frequency band is transitioned to the audio signal of the second frequency band, and the user is familiar with a specific frequency band. It allows people to better hear foreign languages that mainly use different frequency bands.

However, according to Patent Document 2, while the frequency band of the foreign language is shifted, the tone or tone of the sentence heard by the learner is different from the original sentence. In this case, there is a problem that the learner's listening ability may be lowered even though the foreign language sentence can be heard well at that moment.

The present invention is based on the learner's voice, nationality, gender, age, etc., so that foreigners can intensively learn words or sentences that are easy to make mistakes when learning Korean based on data such as voice analysis to maximize the learning effect. It's about learning systems.

In the present invention, the user uploads personal information to the system, the system provides Korean video content to the user, and the system presents the pronunciation evaluation problem to the user and then requests feedback, and user voice data is collected in response to the feedback. After the user's pronunciation is evaluated, the user's voice data and the evaluation score are stored in a user terminal, the stored evaluation score is collected in a server with a predetermined period, and the server is the collected information It provides a method for learning Korean pronunciation through voice analysis, including correcting a pronunciation evaluation problem provided to a user using

According to an embodiment of the present invention, the user's pronunciation evaluation is performed by STT (Speak to text) technology.

According to an embodiment of the present invention, the pronunciation evaluation is determined by whether the STT server can accurately recognize how many letters in Korean when a foreigner's pronunciation is input.

According to an embodiment of the present invention, the evaluation score is collected anonymously according to the user's gender, residence area, nationality, and age.

According to an embodiment of the present invention, the system classifies the user level after presenting the pronunciation evaluation problem for the level test to the user, the level corresponds to the same level to the divided user, and at least one of gender, residence area, nationality, and age Two or more items present a pronunciation evaluation problem with a low score for the same users.

According to the present invention, the pronunciation evaluation score data of the user is copied and aggregated in the main server and collected anonymously according to the user's gender, residence area, nationality, age, etc., and then frequently incorrect words, sentences, expressions according to country, age, gender, etc. and so on, and it enables big data analysis by quantifying how many users pronounce incorrectly among the total number of pronunciation attempts.

In addition, according to the present invention, if the user's nationality, gender, age, and the like are accurately input, it is possible to intensively train pronunciation, sentences and expressions that are difficult in a similar user environment.

1 is a conceptual diagram of a system for learning Korean pronunciation through voice analysis according to an embodiment of the present invention.

2 is a flowchart of a method for learning Korean pronunciation through voice analysis according to an embodiment of the present invention.

3 is a conceptual diagram illustrating an embodiment of content that is customized to a learner through big data analysis.

4 is a conceptual diagram illustrating an embodiment of a pronunciation pattern analysis for providing customized content.

Hereinafter, the present invention will be described in more detail with reference to the drawings. In the present specification, the same and similar reference numerals are assigned to the same and similar components in different embodiments, and the description is replaced with the first description. As used herein, the singular expression includes the plural expression unless the context clearly dictates otherwise.

In addition, the suffixes "module" and "part" for the components used in the following description are given or mixed in consideration of only the ease of writing the specification, and do not have a meaning or role distinct from each other by themselves.

According to the system of the present invention, customized pronunciation learning contents are provided to learners so that foreign learners can improve their listening/speaking skills in Korean.

Learners may be provided with language learning content through the user terminal. User terminals include mobile phones, smart phones, laptop computers, digital broadcasting terminals, personal digital assistants (PDA), portable multimedia players (PMPs), navigation systems, slate PCs, and tablet PCs. PC), ultrabooks, wearable devices, for example, watch-type terminals (smartwatch), glass-type terminals (smart glass), mobile terminals such as HMD (head mounted display), digital TV, desktop A fixed terminal such as a computer, digital signage, and the like may be included.

Referring to FIG. 1 , the system 100 may include a control module 110 , a data collection module 120 , a database 130 , a sentence generation module 140 , an analysis module 150 , and the like. .

Hereinafter, each module is separately described for convenience of description, but it is possible that the actual control module 110 is formed to include functions of other modules.

Although the system is illustrated as being separated from the server and the user terminal in FIG. 1 , the system 100 referred to in the present invention may be a specific server including the modules or a user terminal including the modules.

For example, after analysis is performed through a central server, content may be delivered to a user terminal, or analysis and content execution may be performed in a user terminal in which an application program (application) is installed.

In the present invention, the control module can improve the Korean listening/speaking ability by providing the read request sentence generated through the analysis module, the sentence generation module, etc. to the learner.

The data collection module 120 may collect data for sentences exposed online and supply it to the user terminal.

The database 130 stores various data for driving the system 100 . For example, a plurality of voice data corresponding to words may be stored in the database 130 . Also, the database 130 may store information on a frequency band used for each language, information on a frequency band used in a specific fingerprint, and the like.

In addition, the evaluation point data 131 evaluated by users according to the pronunciation feedback provided by the system and the voice data 131 stored in the user terminal by the users according to the feedback of the system may be stored in the database.

The sentence generation module 140 generates a read request sentence provided to the learner based on the user's personal information or the big data analysis result stored in the database.

Referring to FIG. 2 , the method for learning Korean pronunciation includes the steps of providing Korean video content and requesting pronunciation feedback from the user (S100), evaluating the user's pronunciation through the STT system (S200), and sending the user to the user terminal. Storing pronunciation and evaluation data of (S300), collecting data of users in the main server (S400), analyzing the collected big data, and then providing customized for each user (S500), etc. can do.

Learners can enter personal information such as their nationality, gender, and age before starting learning.

The system provides customized learning content to individual learners by analyzing the learner's personal information and learning-related big data stored in the server (or database).

In the Korean pronunciation learning method, in the step (S100) of providing Korean video content and requesting pronunciation feedback from the user, the Korean video content collected through the data collection module 120 or the Korean video content stored in the user terminal through the application is provided to the user. is provided

After the learner watches the video, the system requests the learner for pronunciation feedback on the sentence generated by the sentence generating module 140 .

In the step (S200) of evaluating the user's pronunciation through the STT system, foreign users who use the application using STT (Speak to Text) technology provided by global pronunciation evaluation systems (Google, MINDs Lab, etc.) Evaluate how accurately you pronounce

Pronunciation evaluation criteria is when the foreigner's pronunciation input through the microphone of the user's terminal is transmitted to the server through the STT API and then the foreigner's pronunciation is inputted through the STT server after the read request sentence presented by the application using the STT system. A method of numerically evaluating how many characters can be accurately recognized as Hangul is used.

In the step (S300) of storing the user's pronunciation and evaluation data in the user terminal, the application stores the voice data pronounced by the foreign user in the user's mobile device.

The system creates a voice data storage space that supports the user to listen to their pronunciation again and a pronunciation evaluation score data storage space that stores pronunciation evaluation scores, and stores the original sound data every time a foreign user repeatedly trains pronunciation. compacted in space.

The evaluation score is also stored in the history type at the same time in the score management space according to the pronunciation point and sentence. This data is used to listen again and correct pronunciation according to the history.

In the step of collecting data of users in the main server (S400), the evaluation score data stored in the user terminal is transmitted to the server.

Upon application installation, the user's consent for statistical use of evaluation score data (use of anonymous data to improve application functions and sentence recommendation service) is obtained in advance, and the user's Pronunciation score data is replicated and aggregated in the main server. This evaluation data may be collected anonymously according to the user's gender, residence area, nationality, age, and the like.

After analyzing the collected big data, customized for each user and provided (S500) provides optimized content according to the user's personal information.

For example, if the learner is a woman in her 20s of Vietnamese nationality, the system presents to the learner a sentence containing the pronunciation (ex. 'fate') that Vietnamese women in their 20s often make wrong through the data analysis collected in step S300. (See Figure 3).

In addition, according to an embodiment of the present invention, it is possible to sequentially present sentences in which frequently incorrect words are included in different forms to acquire a pronunciation change method. For example, the system can determine if the selected word is followed by another word, if the selected word is followed by a proposition, if the selected word comes at the end of a sentence, if the selected word is placed in an interrogative sentence, if the selected word is placed in an exclamation sentence, etc. Various types of sentences can be presented to learners.

In addition, according to another embodiment of the present invention, it is possible for the system to provide learning content by analyzing the pronunciation pattern of a language mainly used by the learner.

4 is a conceptual diagram illustrating an embodiment of pronunciation pattern analysis for providing customized content.

Referring to FIG. 4 , a pronunciation pattern familiar to a user who periodically uses a language (eg, English) with an accent may be identified. Here, Δt means the main period of the bullish pattern.

The system analyzes the main frequency band and stress pattern of the language used by the user as a native language, and it is possible to reinforce the learning of sentence components that are different from those of Korean.

The principle of improving Korean listening/speaking skills using frequency band sound is as follows.

Languages of each country use different frequencies for each language.

For example, in Korean, a sound in a range of 500 to 2200 Hz is used for communication, whereas in English (American style) a sound in a band of 800 to 3500 Hz is used for communication.

People in each country become accustomed to the frequency band of the language they speak, and it is difficult to recognize sounds outside this frequency band. Because of this, the same sound can be heard differently by different people. Since it is impossible to make a sound if you cannot hear it, the difference in frequency bands that you can hear well as above can be an obstacle to learning a language other than your native language.

However, if the frequency used by the language to be learned is different from the frequency of the main language, it is difficult to hear the frequency part that is out of the frequency of the mainly used language, and thus it is difficult to pronounce.

For example, since Koreans usually use sounds of 500 to 2200 Hz, sounds between 800 and 3500 Hz used in English (American) between 2200 and 3500 Hz that do not overlap are not accurately heard or difficult to distinguish.

Utilizing this principle, the system can enhance the learning of words or sentences outside the frequency band mainly used in the learner's native language.

In addition, the type of language is a syllable language, in which syllables are simply arranged without a rhyme repeating at regular intervals (there is no stress cycle in a sentence), and a rhyme with a certain pattern (the stress of a certain period is repeated in a sentence). appear) can be divided into existing stress language.

In a stress language, the length of a syllable is different depending on the situation, but the interval at which stress (stress) is given is constant. In other words, the time interval between syllables giving stress is similar, and the syllables in between are compressed and pronounced to match the time interval that used to be included.

The analysis module, including a language type analysis module and a stress pattern analysis module, analyzes which country the sentence corresponds to, whether the language corresponds to a stress language, and how the stress pattern is in the sentence.

Specifically, the analysis module determines what kind of language the sentence is made in based on data such as the type of language used by foreigners, words, and word order, and the language belongs to the sealable language based on the collected data about the language. You can determine whether or not you belong to a stress language.

The analysis module analyzes the voice output data of the sentence and analyzes the position where the speaker cuts and reads the sentence, the output dB value, tone or intonation, etc. to determine the stress pattern of the sentence the learner wants to listen to.

The period of the stress pattern may be slightly different for each user or sentence, but the system can identify the period that occurs with the average or the highest frequency of the periods of the collected foreign language sentences as the main period.

The system may arrange important words in positions corresponding to these main cycles and provide them to learners so that the learners can learn Korean sentences accustomed to them.

According to the embodiments of the present invention discussed above, the pronunciation evaluation score data of the user is copied and aggregated in the main server, collected anonymously according to the user's gender, residence area, nationality, age, etc., and then according to country, age, gender, etc. It can be divided into frequently incorrect words, sentences, and expressions, and it enables big data analysis by quantifying how many users pronounce incorrectly among the total number of pronunciation attempts by users. An effect different from the prior art can be expected, such as being able to intensively train pronunciation, sentences and expressions that are difficult in the user environment.

The present invention described above is not limited to the configuration and method of the above-described embodiments, but all or some of the embodiments may be selectively combined so that various modifications may be made.

Claims

uploading personal information to the system by the user;

After the system provides Korean video content to the user, presenting a pronunciation evaluation problem, and then requesting feedback;

evaluating the user's pronunciation after collecting user voice data in response to the feedback;

storing the user voice data and the evaluation score in a user terminal;

collecting the stored evaluation scores in a server with a predetermined period; and

and correcting a pronunciation evaluation problem provided to a user by a server using the collected information.
According to claim 1,

The method for learning Korean pronunciation through voice analysis, characterized in that the evaluation of the user's pronunciation is performed by STT (Speak to text) technology.
3. The method of claim 2,

The pronunciation evaluation is a Korean pronunciation learning method through voice analysis, characterized in that it is determined by whether the STT server can accurately recognize how many letters in Korean when a foreigner's pronunciation is input.
4. The method of claim 3,

The evaluation score is a method for learning Korean pronunciation through voice analysis, characterized in that it is collected anonymously according to the user's gender, residential area, nationality, and age.
5. The method of claim 4,

The system classifies the user level after presenting the pronunciation evaluation problem for the level test to the user, and the level corresponds to the same level to the divided user, and at least two items among gender, residence area, nationality, and age are lower for the same users. A method for learning Korean pronunciation through speech analysis, characterized in that it presents a pronunciation evaluation problem that has been scored.