KR102321141B1

KR102321141B1 - Apparatus and method for user interface for pronunciation assessment

Info

Publication number: KR102321141B1
Application number: KR1020200000805A
Authority: KR
Inventors: 조창수; 김상하; 문대영
Original assignee: 주식회사 셀바스에이아이
Priority date: 2020-01-03
Filing date: 2020-01-03
Publication date: 2021-11-03
Also published as: KR20210087727A

Abstract

본 발명의 실시예에 따른 발음 평가를 위한 사용자 인터페이스 제공 장치 및 방법이 제공된다. 본 발명의 실시예에 따른 발음 평가를 위한 사용자 인터페이스 제공 장치는, 데이터를 송수신하도록 구성된 통신부; 데이터를 표시하도록 구성된 표시부; 및 상기 통신부, 및 상기 표시부와 연결하도록 구성된 제어부를 포함하고, 상기 제어부는, 특정 음소 또는 특정 단어에 대한 사용자의 음성 데이터를 획득하고, 상기 획득된 음성 데이터를 상기 사용자의 발음을 평가하기 위한 서비스 제공 서버로 전달하고, 상기 서비스 제공 서버로부터 상기 사용자의 발음을 평가한 발음 평가 결과 데이터를 수신하고, 상기 수신된 발음 평가 결과 데이터를 나타내는 인터페이스 화면을 상기 표시부를 통해 표시하도록 구성된다.An apparatus and method for providing a user interface for pronunciation evaluation are provided according to an embodiment of the present invention. According to an embodiment of the present invention, there is provided an apparatus for providing a user interface for pronunciation evaluation, comprising: a communication unit configured to transmit and receive data; a display unit configured to display data; and a control unit configured to connect with the communication unit and the display unit, wherein the control unit obtains voice data of a user for a specific phoneme or a specific word, and uses the acquired voice data to evaluate the user's pronunciation and transmit to the providing server, receive pronunciation evaluation result data obtained by evaluating the user's pronunciation from the service providing server, and display an interface screen representing the received pronunciation evaluation result data through the display unit.

Description

Apparatus and method for providing a user interface for pronunciation evaluation

본 발명은 발음 평가를 위한 사용자 인터페이스 제공 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for providing a user interface for pronunciation evaluation.

산업의 전문화 및 국제화의 추세에 따라 외국어에 대한 중요성이 커지고 있으며, 이에 따라 외국어 학습을 위한 다양한 서비스가 제공되고 있다. According to the trend of industry specialization and internationalization, the importance of foreign languages is increasing, and accordingly, various services for foreign language learning are provided.

일반적으로 외국어 학습은 원어민 강사의 지도에 의해서 이루어지고 있으나, 이러한 학습은 비용이 많이 소비되고, 장소 및 시간 제약이 있다는 문제점이 있다. In general, foreign language learning is conducted under the guidance of a native speaker, but there are problems in that such learning consumes a lot of money and there are restrictions on place and time.

이에, 장소 및 시간 제약 없이 언제 어디서든 적은 비용으로 외국어 학습이 가능한 외국어 학습 방법이 필요로 하게 되었다. 이러한 요구에 부응하기 위해 다양한 어학용 프로그램이 개발되어 제공되고 있다. Accordingly, there is a need for a foreign language learning method capable of learning a foreign language at a low cost anytime, anywhere without restriction of place and time. To meet these needs, various language programs have been developed and provided.

그러나, 이러한 어학용 프로그램은 사용자의 외국어 발음 중 어떤 발음이 취약하고 훈련이 필요한지 분석하거나, 이에 대하여 피드백을 제공하는 부분이 취약하다는 문제점이 있다.However, such a language study program has a problem in that it is weak in analyzing which pronunciation among the user's foreign language pronunciations is weak and requires training or providing feedback on this.

따라서, 장소 및 시간 제약 없이 사용자의 외국어 발음을 분석하여 어떤 발음이 취약하고 훈련이 필요한지 피드백을 해주기 위한 발음 평가 방법이 요구된다.Therefore, there is a need for a pronunciation evaluation method for analyzing a user's foreign language pronunciation without restrictions on place and time to provide feedback on which pronunciation is weak and needs training.

본 발명이 해결하고자 하는 과제는 발음 평가를 위한 사용자 인터페이스 제공 장치 및 방법을 제공하는 것이다.SUMMARY OF THE INVENTION An object of the present invention is to provide an apparatus and method for providing a user interface for pronunciation evaluation.

구체적으로, 본 발명이 해결하고자 하는 과제는 장소 및 시간 제약 없이 사용자의 외국어 학습을 위해 사용자의 외국어 발음을 평가하기 위한 사용자 인터페이스 제공 장치 및 방법을 제공하는 것이다.Specifically, an object of the present invention is to provide an apparatus and method for providing a user interface for evaluating a user's foreign language pronunciation for the user's foreign language learning without restriction of place and time.

또한, 본 발명이 해결하고자 하는 또다른 과제는 사용자의 외국어 발음을 분석하여 취약 부분에 대한 피드백을 제공하기 위한 발음 평가 사용자 인터페이스 제공 장치 및 방법을 제공하는 것이다.Another object to be solved by the present invention is to provide an apparatus and method for providing a pronunciation evaluation user interface for analyzing a user's foreign language pronunciation and providing feedback on a weak part.

본 발명의 과제들은 이상에서 언급한 과제들로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problems of the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the following description.

전술한 바와 같은 과제를 해결하기 위하여 본 발명의 실시예에 따른 발음 평가를 위한 사용자 인터페이스 제공 장치 및 방법이 제공된다. 본 발명의 실시예에 따른 발음 평가를 위한 사용자 인터페이스 제공 장치는, 데이터를 송수신하도록 구성된 통신부; 데이터를 표시하도록 구성된 표시부; 및 상기 통신부, 및 상기 표시부와 연결하도록 구성된 제어부를 포함하고, 상기 제어부는, 특정 음소 또는 특정 단어에 대한 사용자의 음성 데이터를 획득하고, 상기 획득된 음성 데이터를 상기 사용자의 발음을 평가하기 위한 서비스 제공 서버로 전달하고, 상기 서비스 제공 서버로부터 상기 사용자의 발음을 발음 특성 각각에 대해서 평가한 발음 평가 결과 데이터를 수신하고, 상기 수신된 발음 평가 결과 데이터를 나타내는 인터페이스 화면을 상기 표시부를 통해 표시하도록 구성되고, 상기 발음 평가 결과 데이터는, 상기 사용자의 발음에 대한 평가 결과를 점수화한 평가 점수, 상기 사용자의 음성 데이터로부터 추출된 발음 특성 및 상기 사용자의 발음을 원어민의 발음으로 가이드하기 위한 피드백 데이터 중 적어도 하나를 포함한다.In order to solve the problems described above, an apparatus and method for providing a user interface for pronunciation evaluation according to an embodiment of the present invention are provided. According to an embodiment of the present invention, there is provided an apparatus for providing a user interface for pronunciation evaluation, comprising: a communication unit configured to transmit and receive data; a display unit configured to display data; and a control unit configured to connect with the communication unit and the display unit, wherein the control unit obtains voice data of a user for a specific phoneme or a specific word, and uses the acquired voice data to evaluate the user's pronunciation configured to transmit to a providing server, receive pronunciation evaluation result data in which the user's pronunciation is evaluated for each pronunciation characteristic from the service providing server, and display an interface screen indicating the received pronunciation evaluation result data through the display unit and the pronunciation evaluation result data includes at least an evaluation score obtained by scoring the evaluation result of the user's pronunciation, pronunciation characteristics extracted from the user's voice data, and feedback data for guiding the user's pronunciation to the pronunciation of a native speaker. includes one

본 발명의 실시예에 따른 발음 평가를 위한 사용자 인터페이스 제공 장치의 제어부에 의해서 수행되는 발음 평가를 위한 사용자 인터페이스 제공 방법은, 특정 음소 또는 특정 단어에 대한 사용자의 음성 데이터를 획득하는 단계; 상기 획득된 음성 데이터를 상기 사용자의 발음을 평가하기 위한 서비스 제공 서버로 전달하는 단계; 상기 서비스 제공 서버로부터 상기 사용자의 발음을 발음 특성 각각에 대해서 평가한 발음 평가 결과 데이터를 수신하는 단계; 및 상기 수신된 발음 평가 결과 데이터를 나타내는 인터페이스 화면을 표시하는 단계를 포함하고, 상기 발음 평가 결과 데이터는, 상기 사용자의 발음에 대한 평가 결과를 점수화한 평가 점수, 상기 사용자의 음성 데이터로부터 추출된 발음 특성 및 상기 사용자의 발음을 원어민의 발음으로 가이드하기 위한 피드백 데이터 중 적어도 하나를 포함한다.A method of providing a user interface for pronunciation evaluation performed by a control unit of an apparatus for providing a user interface for pronunciation evaluation according to an embodiment of the present invention includes: acquiring voice data of a user for a specific phoneme or a specific word; transmitting the acquired voice data to a service providing server for evaluating the user's pronunciation; receiving pronunciation evaluation result data obtained by evaluating the pronunciation of the user for each pronunciation characteristic from the service providing server; and displaying an interface screen indicating the received pronunciation evaluation result data, wherein the pronunciation evaluation result data includes an evaluation score obtained by scoring an evaluation result of the user's pronunciation, and a pronunciation extracted from the user's voice data. and at least one of a characteristic and feedback data for guiding the user's pronunciation to that of a native speaker.

기타 실시예의 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.The details of other embodiments are included in the detailed description and drawings.

본 발명은 장소 및 시간 제약 없이 사용자의 외국어 발음을 평가하기 위한 사용자 인터페이스 제공 장치 및 방법을 제공할 수 있다.The present invention may provide an apparatus and method for providing a user interface for evaluating a user's foreign language pronunciation without restrictions on place and time.

또한 본 발명은 사용자의 외국어 발음을 분석하여 발음 특성 각각의 취약 부분에 대한 피드백을 제공하여 사용자가 취약 발음에 대한 훈련을 수행하도록 할 수 있다.In addition, the present invention can analyze the user's foreign language pronunciation and provide feedback on the weak part of each pronunciation characteristic so that the user can perform training on the weak pronunciation.

또한 본 발명은 사용자의 외국어 발음을 원어민 발음으로 교정하도록 가이드함으로서, 사용자의 외국어 발음을 원어민 수준으로 향상시킬 수 있다.In addition, the present invention guides the user to correct the user's foreign language pronunciation to the native speaker's pronunciation, so that the user's foreign language pronunciation can be improved to the level of a native speaker.

또한 본 발명은 음소별로 발음 정확도 분석 결과를 제공함으로써, 사용자가 외국어 발음에 대하여 보다 강도 높은 학습이 가능하다.In addition, the present invention provides the pronunciation accuracy analysis result for each phoneme, so that the user can learn more strongly about the pronunciation of a foreign language.

본 발명에 따른 효과는 이상에서 예시된 내용에 의해 제한되지 않으며, 더욱 다양한 효과들이 본 명세서 내에 포함되어 있다.The effect according to the present invention is not limited by the contents exemplified above, and more various effects are included in the present specification.

도 1은 본 발명의 실시예에 따른 발음 평가를 위한 사용자 인터페이스 제공 시스템을 설명하기 위한 개략도이다.
도 2는 본 발명의 실시예에 따른 사용자 장치에 대한 개략도이다.
도 3는 본 발명의 실시예에 따른 서비스 제공 서버에 대한 개략도이다.
도 4는 본 발명의 실시예에 따른 사용자 장치에서 발음 평가를 위한 사용자 인터페이스 제공 방법을 설명하기 위한 개략적인 흐름도이다.
도 5는 본 발명의 실시예에 따른 사용자 장치 및 서비스 제공 서버 간의 발음 평가를 위한 방법을 설명하기 위한 개략적인 흐름도이다.
도 6a, 도 6b, 도 6c, 도 6d, 도 6e, 도 6f, 및 도 6g는 본 발명의 실시예에 따른 발음 평가에 관련된 다양한 인터페이스 화면에 대한 예시도들이다.
도 7은 본 발명의 실시예에 따른 사용자의 발음 평가를 위한 모바일 웹 화면에 대한 예시도이다.1 is a schematic diagram illustrating a system for providing a user interface for pronunciation evaluation according to an embodiment of the present invention.
2 is a schematic diagram of a user device according to an embodiment of the present invention;
3 is a schematic diagram of a service providing server according to an embodiment of the present invention.
4 is a schematic flowchart illustrating a method of providing a user interface for pronunciation evaluation in a user device according to an embodiment of the present invention.
5 is a schematic flowchart for explaining a method for pronunciation evaluation between a user device and a service providing server according to an embodiment of the present invention.
6A, 6B, 6C, 6D, 6E, 6F, and 6G are exemplary views of various interface screens related to pronunciation evaluation according to an embodiment of the present invention.
7 is an exemplary diagram of a mobile web screen for evaluating a user's pronunciation according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 도면의 설명과 관련하여, 유사한 구성요소에 대해서는 유사한 참조부호가 사용될 수 있다.Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in various different forms, and only these embodiments allow the disclosure of the present invention to be complete, and common knowledge in the art to which the present invention pertains It is provided to fully inform those who have the scope of the invention, and the present invention is only defined by the scope of the claims. In connection with the description of the drawings, like reference numerals may be used for like components.

본 문서에서, "가진다," "가질 수 있다," "포함한다," 또는 "포함할 수 있다" 등의 표현은 해당 특징(예: 수치, 기능, 동작, 또는 부품 등의 구성요소)의 존재를 가리키며, 추가적인 특징의 존재를 배제하지 않는다.In this document, expressions such as "has," "may have," "includes," or "may include" refer to the presence of a corresponding characteristic (eg, a numerical value, function, operation, or component such as a part). and does not exclude the presence of additional features.

본 문서에서, "A 또는 B," "A 또는/및 B 중 적어도 하나," 또는 "A 또는/및 B 중 하나 또는 그 이상" 등의 표현은 함께 나열된 항목들의 모든 가능한 조합을 포함할 수 있다. 예를 들면, "A 또는 B," "A 및 B 중 적어도 하나," 또는 "A 또는 B 중 적어도 하나"는, (1) 적어도 하나의 A를 포함, (2) 적어도 하나의 B를 포함, 또는(3) 적어도 하나의 A 및 적어도 하나의 B 모두를 포함하는 경우를 모두 지칭할 수 있다.In this document, expressions such as "A or B," "at least one of A and/and B," or "one or more of A or/and B" may include all possible combinations of the items listed together. . For example, "A or B," "at least one of A and B," or "at least one of A or B" means (1) includes at least one A, (2) includes at least one B; Or (3) it may refer to all cases including both at least one A and at least one B.

본 문서에서 사용된 "제1," "제2," "첫째," 또는 "둘째," 등의 표현들은 다양한 구성요소들을, 순서 및/또는 중요도에 상관없이 수식할 수 있고, 한 구성요소를 다른 구성요소와 구분하기 위해 사용될 뿐 해당 구성요소들을 한정하지 않는다. 예를 들면, 제1 사용자 기기와 제2 사용자 기기는, 순서 또는 중요도와 무관하게, 서로 다른 사용자 기기를 나타낼 수 있다. 예를 들면, 본 문서에 기재된 권리범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제 2 구성요소도 제1 구성요소로 바꾸어 명명될 수 있다.As used herein, expressions such as "first," "second," "first," or "second," may modify various elements, regardless of order and/or importance, and refer to one element. It is used only to distinguish it from other components, and does not limit the components. For example, the first user equipment and the second user equipment may represent different user equipment regardless of order or importance. For example, without departing from the scope of rights described in this document, the first component may be named as the second component, and similarly, the second component may also be renamed as the first component.

어떤 구성요소(예: 제1 구성요소)가 다른 구성요소(예: 제2 구성요소)에 "(기능적으로 또는 통신적으로) 연결되어((operatively or communicatively) coupled with/to)" 있다거나 "접속되어(connected to)" 있다고 언급된 때에는, 상기 어떤 구성요소가 상기 다른 구성요소에 직접적으로 연결되거나, 다른 구성요소(예: 제3 구성요소)를 통하여 연결될 수 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소(예: 제1 구성요소)가 다른 구성요소(예: 제2 구성요소)에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 상기 어떤 구성요소와 상기 다른 구성요소 사이에 다른 구성요소(예: 제3 구성요소)가 존재하지 않는 것으로 이해될 수 있다.A component (eg, a first component) is "coupled with/to (operatively or communicatively)" to another component (eg, a second component) When referring to "connected to", it will be understood that the certain element may be directly connected to the other element or may be connected through another element (eg, a third element). On the other hand, when it is said that a component (eg, a first component) is "directly connected" or "directly connected" to another component (eg, a second component), the component and the It may be understood that other components (eg, a third component) do not exist between other components.

본 문서에서 사용된 표현 "~하도록 구성된(또는 설정된)(configured to)"은 상황에 따라, 예를 들면, "~에 적합한(suitable for)," "~하는 능력을 가지는(having the capacity to)," "~하도록 설계된(designed to)," "~하도록 변경된(adapted to)," "~하도록 만들어진(made to)," 또는 "~ 를 할 수 있는(capable of)"과 바꾸어 사용될 수 있다. 용어 "~하도록 구성된(또는 설정된)"은 하드웨어적으로 "특별히 설계된(specifically designed to)" 것만을 반드시 의미하지 않을 수 있다. 대신, 어떤 상황에서는, "~하도록 구성된 장치"라는 표현은, 그 장치가 다른 장치 또는 부품들과 함께 "~할 수 있는" 것을 의미할 수 있다. 예를 들면, 문구 "A, B, 및 C를 수행하도록 구성된(또는 설정된)프로세서"는 해당 동작을 수행하기 위한 전용 프로세서(예: 임베디드 프로세서), 또는 메모리 장치에 저장된 하나 이상의 소프트웨어 프로그램들을 실행함으로써, 해당 동작들을 수행할 수 있는 범용 프로세서(generic-purpose processor)(예: CPU 또는 application processor)를 의미할 수 있다.As used herein, the expression "configured to (or configured to)" depends on the context, for example, "suitable for," "having the capacity to ," "designed to," "adapted to," "made to," or "capable of." The term “configured (or configured to)” may not necessarily mean only “specifically designed to” in hardware. Instead, in some circumstances, the expression “a device configured to” may mean that the device is “capable of” with other devices or parts. For example, the phrase “a processor configured (or configured to perform) A, B, and C” refers to a dedicated processor (eg, an embedded processor) for performing the operations, or by executing one or more software programs stored in a memory device. , may mean a generic-purpose processor (eg, a CPU or an application processor) capable of performing corresponding operations.

본 문서에서 사용된 용어들은 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 다른 실시예의 범위를 한정하려는 의도가 아닐 수 있다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 용어들은 본 문서에 기재된 기술분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가질 수 있다. 본 문서에 사용된 용어들 중 일반적인 사전에 정의된 용어들은, 관련 기술의 문맥상 가지는 의미와 동일 또는 유사한 의미로 해석될 수 있으며, 본 문서에서 명백하게 정의되지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다. 경우에 따라서, 본 문서에서 정의된 용어일지라도 본 문서의 실시 예들을 배제하도록 해석될 수 없다.Terms used in this document are only used to describe specific embodiments, and may not be intended to limit the scope of other embodiments. The singular expression may include the plural expression unless the context clearly dictates otherwise. Terms used herein, including technical or scientific terms, may have the same meanings as commonly understood by one of ordinary skill in the art described in this document. Among the terms used in this document, terms defined in a general dictionary may be interpreted with the same or similar meaning as the meaning in the context of the related art, and unless explicitly defined in this document, ideal or excessively formal meanings is not interpreted as In some cases, even terms defined in this document cannot be construed to exclude embodiments of this document.

본 발명의 여러 실시예들의 각각 특징들이 부분적으로 또는 전체적으로 서로 결합 또는 조합 가능하며, 당업자가 충분히 이해할 수 있듯이 기술적으로 다양한 연동 및 구동이 가능하며, 각 실시예들이 서로에 대하여 독립적으로 실시 가능할 수도 있고 연관 관계로 함께 실시 가능할 수도 있다.Each feature of the various embodiments of the present invention may be partially or wholly combined or combined with each other, and technically various interlocking and driving are possible, as will be fully understood by those skilled in the art, and each embodiment may be independently implemented with respect to each other, It may be possible to implement together in a related relationship.

이하, 첨부된 도면을 참조하여 본 발명의 다양한 실시예들을 상세히 설명한다.Hereinafter, various embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른 발음 평가를 위한 사용자 인터페이스 제공 시스템을 설명하기 위한 개략도이다.1 is a schematic diagram illustrating a system for providing a user interface for pronunciation evaluation according to an embodiment of the present invention.

도 1을 참조하면, 발음 평가를 위한 사용자 인터페이스 제공 시스템(100)은 사용자의 음성 데이터를 분석하여 사용자의 발음을 평가하고, 평가 결과에 기반하여 발음 평가를 위한 사용자 인터페이스를 제공하는 시스템으로서, 사용자의 발음 평가 및 교정을 요청하기 위해 사용자의 음성 데이터를 제공하는 사용자 장치(110), 및 발음 평가를 위한 서비스를 제공하는 서비스 제공 서버(120)를 포함할 수 있다.Referring to FIG. 1 , a system 100 for providing a user interface for pronunciation evaluation is a system that analyzes the user's voice data to evaluate the user's pronunciation, and provides a user interface for pronunciation evaluation based on the evaluation result. It may include a user device 110 that provides the user's voice data to request pronunciation evaluation and correction, and a service providing server 120 that provides a service for pronunciation evaluation.

먼저, 사용자 장치(110)는 사용자의 음성 데이터에 대한 발음 평가를 요청하고, 발음 평가에 대한 결과 데이터를 나타내기 위한 사용자 인터페이스를 제공하는 전자 장치로서, 스마트폰, 태블릿 PC(Personal Computer), 노트북 및/또는 PC 등 중 적어도 하나를 포함할 수 있다.First, the user device 110 is an electronic device that requests a pronunciation evaluation of the user's voice data and provides a user interface for displaying result data for the pronunciation evaluation, and includes a smartphone, a tablet PC (Personal Computer), and a notebook computer. and/or may include at least one of a PC and the like.

사용자 장치(110)는 사용자의 발음 평가 및 발음 교정을 위해 사용자의 음성 데이터를 획득하고, 획득된 음성 데이터를 서비스 제공 서버(120)로 전달할 수 있다. 예를 들어, 사용자 장치(110)는 사용자의 음성 데이터를 획득하기 위한 마이크와 같은 입력 장치를 구비하고, 마이크를 통해 음성 데이터를 획득하기 위한 사용자 인터페이스를 제공할 수 있다. 이러한 사용자 인터페이스는 특정 음소 또는 특정 발음에 대한 원어민 발음을 학습하기 위한 영상 데이터를 표시하는 영역을 포함할 수 있다.The user device 110 may acquire the user's voice data for pronunciation evaluation and pronunciation correction of the user, and transmit the acquired voice data to the service providing server 120 . For example, the user device 110 may include an input device such as a microphone for acquiring the user's voice data, and may provide a user interface for acquiring the voice data through the microphone. Such a user interface may include an area for displaying image data for learning a specific phoneme or a native speaker's pronunciation for a specific pronunciation.

이러한 사용자 인터페이스를 통해 음성 데이터가 획득되면 사용자 장치(110)는 획득된 음성 데이터를 서비스 제공 서버(120)로 전달할 수 있다.When voice data is acquired through the user interface, the user device 110 may transmit the acquired voice data to the service providing server 120 .

사용자 장치(110)는 서비스 제공 서버(120)로부터 사용자의 발음에 대한 평가 결과를 나타내는 발음 평가 결과 데이터를 수신하고, 수신된 발음 평가 결과 데이터를 사용자 장치(110)의 표시부를 통해 표시할 수 있다. 여기서, 발음 평가 결과 데이터는 사용자의 발음에 대한 평가 점수, 사용자의 음성 데이터로부터 추출된 음소별 발음 특성 및 피드백 데이터 중 적어도 하나를 포함할 수 있다. 예를 들어, 피드백 데이터는 사용자의 발음에서 취약 부분을 설명하기 위한 데이터일 수 있다.The user device 110 may receive pronunciation evaluation result data representing an evaluation result of the user's pronunciation from the service providing server 120 , and display the received pronunciation evaluation result data through the display unit of the user device 110 . . Here, the pronunciation evaluation result data may include at least one of an evaluation score for the user's pronunciation, pronunciation characteristics for each phoneme extracted from the user's voice data, and feedback data. For example, the feedback data may be data for explaining a weak part in the user's pronunciation.

다음으로, 서비스 제공 서버(120)는 사용자 장치(110)로부터 제공된 사용자의 음성 데이터를 분석하여 발음 평가를 위한 서비스를 제공하기 위해 다양한 연산을 수행하는 범용 컴퓨터, 랩탑, 및/또는 데이터 서버 등을 포함할 수 있다. 다양한 실시예에서 서비스 제공 서버(120)는 클라이언트의 요청에 따라 발음 평가를 위한 서비스에 관한 웹 페이지를 제공하는 웹 서버(web server) 또는 모바일 웹 사이트를 제공하는 모바일 웹 서버(mobile web server)일 수 있으나, 이에 한정되지 않는다.Next, the service providing server 120 analyzes the user's voice data provided from the user device 110 and provides a general-purpose computer, laptop, and/or data server that performs various calculations to provide a service for pronunciation evaluation. may include In various embodiments, the service providing server 120 may be a web server that provides a web page related to a service for pronunciation evaluation according to a request of a client or a mobile web server that provides a mobile web site. However, the present invention is not limited thereto.

구체적으로, 서비스 제공 서버(120)는 사용자 장치(110)로부터 음성 데이터를 수신하고, 수신된 음성 데이터를 분석하여 발음 특성 각각에 대해서 발음 평가를 수행할 수 있다. 예를 들어, 서비스 제공 서버(120)는 음성 데이터를 음소별로 정렬하고, 음소별로 적어도 하나의 발음 특성을 검출한 후 검출된 적어도 하나의 발음 특성을 점수화할 수 있다. 예를 들어, 음소별로 검출된 적어도 하나의 발음 특성은 변별적 자질(distinctive feature)일 수 있으나, 이에 한정되지 않는다. 이러한 경우 적어도 하나의 발음 특성은 높은 혀, 낮은 혀, 앞쪽 혀, 뒤쪽 혀, 둥근 입술, 유성음, 비음, 파열음, 마찰음, 순음(또는 순치음), 치간/치경음, 유음, 파찰음, R유음, Y반모음, W반모음, 닫힘이중, 열림이중, 중앙이중, 연구개음 등 중 적어도 하나를 포함할 수 있다. 이와 같이 적어도 하나의 발음 특성으로 변별적 자질이 이용됨으로써, 분절음의 음성적 특징을 명확히 표현할 수 있고, 음소간의 상호관계를 명확하게 구분할 수 있으며, 많은 분절음을 구분 가능하며, 음운 규칙을 명시적으로 기술할 수 있다. Specifically, the service providing server 120 may receive voice data from the user device 110 , analyze the received voice data, and perform pronunciation evaluation for each pronunciation characteristic. For example, the service providing server 120 may sort the voice data for each phoneme, detect at least one pronunciation characteristic for each phoneme, and score the detected at least one pronunciation characteristic. For example, the at least one pronunciation characteristic detected for each phoneme may be a distinctive feature, but is not limited thereto. In this case, at least one of the phonetic characteristics is high tongue, low tongue, anterior tongue, posterior tongue, round lips, voiced, nasal, plosive, fricative, labial (or labial), interdental/alveolar, fluid, friste, R-voiced, Y It may include at least one of a semi-vowel, a W semi-vowel, a closed double, an open double, a central double, and a soft palate. As such, by using the discriminative quality as at least one pronunciation characteristic, the phonetic characteristics of segmental sounds can be clearly expressed, the interrelationship between phonemes can be clearly distinguished, many segmental sounds can be distinguished, and the phonological rules can be explicitly described. can do.

서비스 제공 서버(120)는 사용자의 발음에 대한 평가 결과를 나타내는 발음 평가 결과 데이터를 사용자 장치(110)로 제공할 수 있다. The service providing server 120 may provide pronunciation evaluation result data indicating the evaluation result of the user's pronunciation to the user device 110 .

이와 같이 서비스 제공 서버(120)로부터 제공되는 데이터는 사용자 장치(120)에 설치된 웹 브라우저를 통해 웹 페이지로 제공되거나, 어플리케이션, 또는 프로그램 형태로 제공될 수 있다. 다양한 실시예에서 이러한 데이터는 클라이언트-서버 환경에서 플랫폼에 포함되는 형태로 제공될 수 있다.As such, the data provided from the service providing server 120 may be provided as a web page through a web browser installed in the user device 120 , or may be provided in the form of an application or program. In various embodiments, such data may be provided in a form included in the platform in a client-server environment.

이를 통해 본 발명은 사용자가 스스로 학습이 가능하고, 사용자의 학습 만족도를 높일 수 있으며, 효율성이 향상된 발음 평가 및 교정을 위한 학습 서비스를 제공할 수 있다.Through this, the present invention enables the user to learn by himself, increases the user's learning satisfaction, and provides a learning service for pronunciation evaluation and correction with improved efficiency.

하기에서는 도 2를 참조하여 사용자 장치(110)에 대해서 상세하게 설명하도록 한다.Hereinafter, the user device 110 will be described in detail with reference to FIG. 2 .

도 2는 본 발명의 실시예에 따른 사용자 장치에 대한 개략도이다. 2 is a schematic diagram of a user device according to an embodiment of the present invention;

도 2를 참조하면, 사용자 장치(200)는 통신부(210), 표시부(220), 저장부(230) 및 제어부(240)를 포함한다. 제시된 실시예에서 사용자 장치(200)는 도 1의 사용자 장치(110)를 의미할 수 있다.Referring to FIG. 2 , the user device 200 includes a communication unit 210 , a display unit 220 , a storage unit 230 , and a control unit 240 . In the presented embodiment, the user device 200 may refer to the user device 110 of FIG. 1 .

통신부(210)는 사용자 장치(200)가 외부 장치와 통신이 가능하도록 연결한다. 통신부(210)는 유/무선 통신을 이용하여 서비스 제공 서버(120)와 연결되어 다양한 데이터를 송수신할 수 있다. 구체적으로, 통신부(210)는 서비스 제공 서버(120)로 사용자의 음성 데이터를 전달하고, 서비스 제공 서버(120)로부터 발음 평가 결과 데이터를 수신할 수 있다. The communication unit 210 connects the user device 200 to enable communication with an external device. The communication unit 210 may be connected to the service providing server 120 using wired/wireless communication to transmit/receive various data. Specifically, the communication unit 210 may transmit the user's voice data to the service providing server 120 and receive pronunciation evaluation result data from the service providing server 120 .

표시부(220)는 사용자에게 각종 콘텐츠(예: 텍스트, 이미지, 비디오, 아이콘, 배너 또는 심볼 등)를 표시할 수 있다. 구체적으로, 표시부(220)는 사용자의 음성 데이터에 대한 발음 평가를 요청하고, 발음 평가에 대한 결과 데이터를 나타내기 위한 다양한 인터페이스 화면을 표시할 수 있다. The display unit 220 may display various contents (eg, text, image, video, icon, banner or symbol, etc.) to the user. Specifically, the display unit 220 may request pronunciation evaluation of the user's voice data and display various interface screens for displaying result data for the pronunciation evaluation.

다양한 실시예에서 표시부(220)는 터치스크린을 포함할 수 있으며, 예를 들면, 전자 펜 또는 사용자의 신체의 일부를 이용한 터치(touch), 제스처(gesture), 근접, 드래그(drag), 스와이프(swipe) 또는 호버링(hovering) 입력 등을 수신할 수 있다. In various embodiments, the display unit 220 may include a touch screen, for example, a touch, a gesture, a proximity, a drag, and a swipe using an electronic pen or a part of the user's body. A swipe or hovering input may be received.

저장부(230)는 사용자의 음성 데이터에 대한 발음 평가를 요청하고, 발음 평가에 대한 결과 데이터를 나타내기 위한 사용자 인터페이스를 제공하기 위해 사용되는 다양한 데이터를 저장할 수 있다. 다양한 실시예에서 저장부(230)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(Random Access Memory, RAM), SRAM(Static Random Access Memory), 롬(Read-Only Memory, ROM), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. 서비스 제공 서버(200)는 인터넷(internet)상에서 상기 저장부(230)의 저장 기능을 수행하는 웹 스토리지(web storage)와 관련되어 동작할 수도 있다.The storage 230 may store various data used to request a pronunciation evaluation of the user's voice data and provide a user interface for displaying result data for the pronunciation evaluation. In various embodiments, the storage unit 230 may include a flash memory type, a hard disk type, a multimedia card micro type, and a card type memory (eg, SD or XD). memory, etc.), Random Access Memory (RAM), Static Random Access Memory (SRAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Programmable Read-Only Memory (PROM) , a magnetic memory, a magnetic disk, and an optical disk may include at least one type of storage medium. The service providing server 200 may operate in relation to a web storage that performs a storage function of the storage unit 230 on the Internet.

제어부(240)는 통신부(210), 표시부(220) 및 저장부(230)와 동작 가능하게 연결되며, 사용자의 음성 데이터에 대한 발음 평가를 요청하고, 발음 특성 각각에 대한 발음 평가에 대한 결과 데이터를 나타내기 위한 사용자 인터페이스를 제공하기 위한 다양한 명령들을 수행할 수 있다. The control unit 240 is operatively connected to the communication unit 210 , the display unit 220 , and the storage unit 230 , requests pronunciation evaluation for the user's voice data, and results data for pronunciation evaluation for each pronunciation characteristic. Various commands may be performed to provide a user interface for displaying .

구체적으로, 제어부(240)는 특정 음소 또는 특정 단어에 대하여 사용자의 음성 데이터를 획득하고, 획득된 음성 데이터에 대한 발음 평가를 서비스 제공 서버(120)로 요청할 수 있다. 여기서, 특정 단어는 특정 음소의 발음이 포함된 단어일 수 있으나, 이에 한정되지 않는다. 예를 들어, 제어부(240)는 마이크와 같은 입력부를 더 구비하고, 입력부를 통해서 사용자의 음성 데이터를 획득할 수 있으나, 이에 한정되지 않는다. 다양한 실시예에서 사용자 장치(200)가 마이크를 구비하지 않은 경우 외부 마이크와 같은 외부 장치와 연결되고, 이를 통해 음성 데이터를 획득할 수도 있다. Specifically, the controller 240 may obtain the user's voice data for a specific phoneme or a specific word, and request the service providing server 120 to evaluate the pronunciation of the acquired voice data. Here, the specific word may be a word including the pronunciation of a specific phoneme, but is not limited thereto. For example, the control unit 240 may further include an input unit such as a microphone, and may acquire the user's voice data through the input unit, but is not limited thereto. In various embodiments, when the user device 200 does not include a microphone, it is connected to an external device such as an external microphone, and voice data may be acquired through this.

특정 음소 또는 특정 단어에 대하여 사용자의 음성 데이터를 획득하기 위해 제어부(240)는 특정 음소 또는 특정 단어에 대한 원어민 발음을 학습하기 위한 영상 데이터를 제공할 수 있다. 예를 들어, 영상 데이터는 특정 음소 또는 특정 단어에 대한 원어민의 발음 및 원어민의 입모양 등을 영상화한 데이터일 수 있다. 이를 통해 사용자는 특정 음소 또는 특정 단어에 대한 발음 학습을 수행할 수 있다. In order to obtain the user's voice data for a specific phoneme or specific word, the controller 240 may provide image data for learning the pronunciation of a native speaker for a specific phoneme or specific word. For example, the image data may be data obtained by imaging a native speaker's pronunciation of a specific phoneme or a specific word, and the shape of a native speaker's mouth. Through this, the user can learn pronunciation for a specific phoneme or a specific word.

이어서, 제어부(240)는 획득된 음성 데이터에 대한 발음 평가를 요청하기 위한 인터페이스 화면을 표시부(220)를 통해 표시할 수 있다.Subsequently, the controller 240 may display an interface screen for requesting pronunciation evaluation for the acquired voice data through the display unit 220 .

제어부(240)는 서비스 제공 서버(120)로부터 발음 평가 결과 데이터를 수신하고, 수신된 발음 평가 결과 데이터를 나타내는 인터페이스 화면을 표시할 수 있다. 이러한 인터페이스 화면은 결정된 평가 점수, 음성 데이터로부터 추출된 음소별 발음 특성 및 피드백 데이터 중 적어도 하나를 나타내기 위한 그래픽 객체 또는 표시 영역을 포함할 수 있다. 다양한 실시예에서 인터페이스 화면은 특정 음소 또는 특정 단어에 대한 원어민의 발음을 학습하기 위한 영상 데이터를 나타내는 표시 영역을 더 포함할 수 있다.The controller 240 may receive pronunciation evaluation result data from the service providing server 120 and display an interface screen indicating the received pronunciation evaluation result data. The interface screen may include a graphic object or a display area for displaying at least one of the determined evaluation score, pronunciation characteristics for each phoneme extracted from the voice data, and feedback data. In various embodiments, the interface screen may further include a display area indicating image data for learning pronunciation of a native speaker for a specific phoneme or specific word.

이를 통해 본 발명은 사용자의 발음을 원어민 발음으로 교정하기 위해 사용자 스스로 학습을 수행할 수 있다.Through this, according to the present invention, the user can perform self-learning in order to correct the user's pronunciation to the native speaker's pronunciation.

하기에서는 도 3를 참조하여 서비스 제공 서버(120)에 대해서 상세하게 설명하도록 한다.Hereinafter, the service providing server 120 will be described in detail with reference to FIG. 3 .

도 3는 본 발명의 실시예에 따른 서비스 제공 서버에 대한 개략도이다. 3 is a schematic diagram of a service providing server according to an embodiment of the present invention.

도 3를 참조하면, 서비스 제공 서버(300)는 통신부(310), 저장부(320) 및 제어부(330)를 포함한다. 제시된 실시예에서 서비스 제공 서버(300)는 도 1의 서비스 제공 서버(120)를 의미할 수 있다.Referring to FIG. 3 , the service providing server 300 includes a communication unit 310 , a storage unit 320 , and a control unit 330 . In the presented embodiment, the service providing server 300 may refer to the service providing server 120 of FIG. 1 .

통신부(310)는 서비스 제공 서버(300)가 외부 장치와 통신이 가능하도록 연결한다. 통신부(310)는 유/무선 통신을 이용하여 사용자 장치(110)와 연결되어 다양한 데이터를 송수신할 수 있다. The communication unit 310 connects the service providing server 300 to enable communication with an external device. The communication unit 310 may be connected to the user device 110 using wired/wireless communication to transmit/receive various data.

구체적으로, 통신부(310)는 사용자 장치(110)로부터 사용자의 음성 데이터를 수신하고, 사용자 장치(110)로 발음 평가 결과 데이터를 전달할 수 있다.Specifically, the communication unit 310 may receive the user's voice data from the user device 110 , and transmit the pronunciation evaluation result data to the user device 110 .

저장부(320)는 사용자의 음성 데이터를 분석하여 발음 평가를 위한 서비스를 제공하기 위한 다양한 데이터를 저장할 수 있다. 다양한 실시예에서 저장부(330)는 플래시 메모리 타입, 하드디스크 타입, 멀티미디어 카드 마이크로 타입, 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램, SRAM, 롬, EEPROM, PROM, 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. 사용자 장치(300)는 인터넷상에서 상기 저장부(330)의 저장 기능을 수행하는 웹 스토리지와 관련되어 동작할 수도 있다. The storage unit 320 may store various data for providing a service for pronunciation evaluation by analyzing the user's voice data. In various embodiments, the storage unit 330 is a flash memory type, hard disk type, multimedia card micro type, card type memory (eg, SD or XD memory, etc.), RAM, SRAM, ROM, EEPROM, PROM, magnetic memory. , a magnetic disk, and an optical disk may include at least one type of storage medium. The user device 300 may operate in relation to the web storage that performs the storage function of the storage unit 330 on the Internet.

제어부(330)는 통신부(310) 및 저장부(320)와 동작 가능하게 연결되며, 사용자의 음성 데이터를 분석하여 발음 평가를 위한 서비스를 제공하기 위한 다양한 명령들을 수행할 수 있다. The control unit 330 is operatively connected to the communication unit 310 and the storage unit 320 , and may perform various commands to analyze the user's voice data to provide a service for pronunciation evaluation.

구체적으로, 제어부(330)는 통신부(310)를 통해 사용자 장치(110)로부터 사용자의 음성 데이터를 수신하고, 수신된 음성 데이터에 기반하여 사용자의 발음을 평가할 수 있다. Specifically, the controller 330 may receive the user's voice data from the user device 110 through the communication unit 310 and evaluate the user's pronunciation based on the received voice data.

이를 위해 제어부(330)는 수신된 사용자의 음성 데이터를 분석하여 적어도 하나의 발음 특성을 추출하고, 추출된 적어도 하나의 발음 특성을 기초로 사용자의 발음을 평가하도록 사전 학습된 발음 평가 모델을 이용할 수 있다. 예를 들어, 발음 평가 모델은 원어민 발음을 정답으로 사전 학습된 모델로서, 원어민 발음과의 유사도를 수치화한 평가 점수를 생성하거나, 원어민 발음과의 유사도가 높거나, 낮은지를 결정하기 위해 이용될 수 있다.To this end, the controller 330 may extract at least one pronunciation characteristic by analyzing the received user's voice data, and use a pre-learned pronunciation evaluation model to evaluate the user's pronunciation based on the extracted at least one pronunciation characteristic. have. For example, the pronunciation evaluation model is a model pre-trained based on native speaker pronunciation as the correct answer. have.

제어부(330)는 이러한 발음 평가 모델을 이용하여 음성 데이터로부터 적어도 하나의 발음 특성을 추출하고, 추출된 적어도 하나의 발음 특성을 기초로 사용자의 발음에 대한 평가 점수를 결정할 수 있다. 이때, 평가 점수는 사용자의 발음 특성과 원어민 발음 특성 사이의 유사도를 수치화한 데이터일 수 있다.The controller 330 may extract at least one pronunciation characteristic from the voice data using the pronunciation evaluation model, and determine an evaluation score for the user's pronunciation based on the extracted at least one pronunciation characteristic. In this case, the evaluation score may be data obtained by quantifying the degree of similarity between the pronunciation characteristic of the user and the pronunciation characteristic of a native speaker.

예를 들어, “Boy”라는 단어에 대하여 원어민의 음소별 변별적 자질이 “B: 마찰음”, “o: 둥근입술”, 및 “y: Y반모음”이라고 가정한다. 제어부(330)는 사용자의 음성 데이터를 음소별로 정렬하고, 음소별로 정렬된 음성 데이터로부터 적어도 하나의 변별적 자질을 추출할 수 있다. 다시 말해서, 제어부(330)는 “B”, “o”, “y” 각각으로부터 적어도 하나의 변별적 자질을 추출할 수 있다. For example, it is assumed that for the word “Boy”, the distinctive qualities of native speakers by phoneme are “B: fricative”, “o: round lips”, and “y: Y half vowel”. The controller 330 may sort the user's voice data by phoneme, and extract at least one distinctive feature from the voice data sorted by phoneme. In other words, the controller 330 may extract at least one distinctive feature from each of “B”, “o”, and “y”.

제어부(330)는 사용자의 음성 데이터로부터 추출된 변별적 자질이 “낮은혀”, “마찰음”, 및 “Y반모음”과 일치하면 사용자의 발음 특성과 원어민 발음 특성 사이의 유사도가 높다고 결정할 수 있다. 다양한 실시예에서 제어부(330)는 사용자의 음성 데이터로부터 추출된 변별적 자질이 “낮은혀”, “마찰음”, 및 “Y반모음” 이외에 다른 변별적 자질을 포함하거나, “마찰음”, “둥근입술” 및 “Y반모음” 중 어느 하나만 추출되면 사용자의 발음 특성과 원어민 발음 특성 사이의 유사도가 높지 않거나, 낮다고 결정할 수 있다. The controller 330 may determine that the similarity between the user's pronunciation characteristics and the native speaker's pronunciation characteristics is high when the distinctive qualities extracted from the user's voice data match "low tongue", "friction sound", and "half vowel Y". In various embodiments, the control unit 330 may include other distinctive features in addition to the "low tongue", "friction sound", and "Y half vowel", or the distinctive feature extracted from the user's voice data, or "friction sound", "round lips" If only one of ” and “Y half vowel” is extracted, it may be determined that the similarity between the user's pronunciation characteristics and the native speaker pronunciation characteristics is not high or low.

이와 같이 유사도가 결정되면 제어부(330)는 결정된 유사도에 기반하여 사용자의 발음에 대한 평가 점수를 결정할 수 있다. 이와 같이 결정된 평가 점수는 사용자의 음소별 발음 정확도에 대한 분석 결과로서 제공될 수 있다.When the degree of similarity is determined in this way, the controller 330 may determine an evaluation score for the user's pronunciation based on the determined degree of similarity. The evaluation score determined in this way may be provided as a result of analyzing the pronunciation accuracy of each phoneme of the user.

다양한 실시예에서 제어부(330)는 사용자의 음성 데이터로부터 음소별로 추출된 음성 특징과 원어민에 대한 음소별 발음 특성을 비교한 비교 데이터를 제공할 수 있다. 예를 들어, “B”, “o”, “y” 각각으로부터 추출되는 원어민 음성의 음소별 변별적 자질이 “낮은혀”, “마찰음”, 및 “Y반모음”인 경우 제어부(330)는 사용자의 음성 데이터로부터 추출된 음소별 변별적 자질이 “낮은혀”, “마찰음”, 및 “Y반모음”에 해당하는지를 나타내는 데이터를 제공하거나, “낮은혀”, “마찰음”, 및 “Y반모음” 이외에 추출된 변별적 자질이 있다면 해당 변별적 자질을 나타내는 데이터를 제공할 수 있으나, 이에 한정되지 않는다. In various embodiments, the control unit 330 may provide comparison data in which the voice features extracted for each phoneme from the user's voice data are compared with the pronunciation characteristics for each phoneme for a native speaker. For example, when the phoneme-specific distinctive qualities of a native speaker's voice extracted from each of “B”, “o”, and “y” are “low tongue,” “friction,” and “half vowel Y”, the controller 330 controls the user Provides data indicating whether the phoneme-specific distinctive qualities extracted from the voice data of “low tongue”, “frictional consonant”, and “Y half-vowel” correspond to If there is an extracted distinctive feature, data indicating the corresponding distinctive feature may be provided, but the present invention is not limited thereto.

다양한 실시예에서 제어부(330)는 결정된 평가 점수 및 음소별 발음 특성에 따라 미리 결정된 피드백 데이터를 제공할 수 있다. 여기서, 피드백 데이터는 사용자의 발음에 대해서 원어민의 발음으로 가이드하기 위한 데이터일 수 있다. 구체적으로, 피드백 데이터는 특정 음소 또는 특정 단어에 대해서 사용자의 음성 데이터로부터, 원어민 음성으로부터 추출된 적어도 하나의 발음 특성과 일치되는 발음 특성이 추출되도록 가이드하기 위한 데이터일 수 있다. 예를 들어, 피드백 데이터는 “소리를 길게 내지 않아야 합니다. 입을 크게 벌려 소리를 냅니다. 혀 끝을 입청장에 대지 않도록 합니다”와 같은 텍스트일 수 있으나, 이에 한정되지 않는다. In various embodiments, the controller 330 may provide predetermined feedback data according to the determined evaluation score and pronunciation characteristics for each phoneme. Here, the feedback data may be data for guiding the pronunciation of the user to the pronunciation of a native speaker. Specifically, the feedback data may be data for guiding that a pronunciation characteristic matching at least one pronunciation characteristic extracted from a native speaker's voice is extracted from the user's voice data for a specific phoneme or a specific word. For example, feedback data might say, “You shouldn't be making long sounds. Open your mouth wide and make a sound. It may be a text such as, but not limited to, “Do not touch the tip of your tongue to the letter of admission.”

제어부(330)는 결정된 평가 점수, 음성 데이터로부터 추출된 음소별 발음 특성 및 피드백 데이터 중 적어도 하나를 포함하는 발음 평가 결과 데이터를 사용자 장치(110)로 제공할 수 있다.The controller 330 may provide the pronunciation evaluation result data including at least one of the determined evaluation score, the pronunciation characteristics for each phoneme extracted from the voice data, and the feedback data to the user device 110 .

이와 같이 본 발명은 사용자의 외국어 발음에 대하여 음소별로 발음 정확도를 분석한 결과를 제공함으로써, 사용자가 외국어 발음에 대하여 보다 강도 높은 발음 훈련을 수행할 수 있다.As described above, the present invention provides the result of analyzing pronunciation accuracy for each phoneme with respect to the user's foreign language pronunciation, so that the user can perform more intense pronunciation training for the foreign language pronunciation.

하기에서는 사용자 장치(110)에서 발음 평가를 위한 사용자 인터페이스 제공 방법에 대해서 도 4를 참조하여 설명하도록 한다.Hereinafter, a method of providing a user interface for pronunciation evaluation in the user device 110 will be described with reference to FIG. 4 .

도 4는 본 발명의 실시예에 따른 사용자 장치에서 발음 평가를 위한 사용자 인터페이스 제공 방법을 설명하기 위한 개략적인 흐름도이다. 하기에서 서술하는 동작들은 사용자 장치(200)의 제어부(240)에 의해서 수행될 수 있다.4 is a schematic flowchart illustrating a method of providing a user interface for pronunciation evaluation in a user device according to an embodiment of the present invention. The operations described below may be performed by the controller 240 of the user device 200 .

도 4를 참조하면, 사용자 장치(200)는 특정 음소 또는 특정 단어에 대한 사용자의 음성 데이터를 획득하고(S400), 획득된 음성 데이터를 서비스 제공 서버(120)로 전달한다(S410). 예를 들어, 사용자 장치(200)는 마이크를 이용하여 특정 음소 또는 특정 단어를 발음한 사용자의 음성 데이터를 입력받을 수 있다. 사용자 장치(200)는 마이크를 통해 입력된 음성 데이터를 서비스 제공 서버(120)로 전달하여 발음 평가를 요청할 수 있다.Referring to FIG. 4 , the user device 200 acquires the user's voice data for a specific phoneme or a specific word (S400), and transmits the acquired voice data to the service providing server 120 (S410). For example, the user device 200 may receive voice data of a user who pronounces a specific phoneme or a specific word using a microphone. The user device 200 may transmit the voice data input through the microphone to the service providing server 120 to request pronunciation evaluation.

사용자 장치(200)는 서비스 제공 서버(120)로부터 사용자의 음성 데이터를 기초로 특정 음소 또는 특정 단어에 대한 사용자의 발음을 평가한 발음 평가 결과 데이터를 수신하고(S420), 수신된 발음 평가 결과 데이터를 나타내는 인터페이스 화면을 표시한다(S430). 예를 들어, 발음 평가 결과 데이터는 서비스 제공 서버(120)에서 발음 평가 모델을 이용하여 특정 음소 또는 특정 단어에 대하여 사용자의 발음과 원어민의 발음 사이의 유사도에 따라 결정된 평가 점수, 사용자의 음성 데이터로부터 추출된 음소별 발음 특성 및 사용자의 발음을 원어민 발음으로 교정하도록 가이드하기 위한 피드백 데이터 중 적어도 하나를 포함할 수 있다.The user device 200 receives pronunciation evaluation result data obtained by evaluating the user's pronunciation of a specific phoneme or a specific word based on the user's voice data from the service providing server 120 (S420), and the received pronunciation evaluation result data and displays an interface screen indicating (S430). For example, the pronunciation evaluation result data is obtained from the evaluation score determined according to the similarity between the user's pronunciation and the native speaker's pronunciation with respect to a specific phoneme or a specific word using the pronunciation evaluation model in the service providing server 120, and the user's voice data. It may include at least one of the extracted pronunciation characteristics for each phoneme and feedback data for guiding the user's pronunciation to be corrected to the native speaker's pronunciation.

이에 따라, 발음 평가 결과 데이터를 나타내는 인터페이스 화면은 평가 점수, 음소별 발음 특성 및 피드백 데이터를 나타내는 그래픽 객체, 또는 표시 영역 등을 포함할 수 있다.Accordingly, the interface screen displaying the pronunciation evaluation result data may include an evaluation score, a graphic object indicating pronunciation characteristics and feedback data for each phoneme, or a display area.

하기에서는 사용자 장치(110) 및 서비스 제공 서버(120) 간의 발음 평가를 위한 방법에 대해서 도 5를 참조하여 설명하도록 한다.Hereinafter, a method for evaluating pronunciation between the user device 110 and the service providing server 120 will be described with reference to FIG. 5 .

도 5는 본 발명의 실시예에 따른 사용자 장치 및 서비스 제공 서버 간의 발음 평가를 위한 방법을 설명하기 위한 개략적인 흐름도이다. 5 is a schematic flowchart for explaining a method for pronunciation evaluation between a user device and a service providing server according to an embodiment of the present invention.

도 5를 참조하면, 사용자 장치(110)는 특정 음소 또는 특정 단어에 대한 사용자의 음성 데이터를 획득하고(S500), 획득된 음성 데이터를 서비스 제공 서버로 전달한다(S510). Referring to FIG. 5 , the user device 110 obtains the user's voice data for a specific phoneme or a specific word ( S500 ), and transmits the acquired voice data to the service providing server ( S510 ).

서비스 제공 서버(120)는 음성 데이터를 기초로 사용자의 발음을 평가하도록 학습된 발음 평가 모델을 이용하여 사용자의 발음을 평가하고(S520), 발음 평가 결과 데이터를 사용자 장치(110)로 전달한다(S530). The service providing server 120 evaluates the user's pronunciation by using the pronunciation evaluation model learned to evaluate the user's pronunciation based on the voice data (S520), and transmits the pronunciation evaluation result data to the user device 110 ( S530).

구체적으로, 서비스 제공 서버(120)는 발음 평가 모델을 이용하여 사용자의 음성 데이터로부터 음소별 발음 특성을 추출하고, 추출된 음소별 발음 특성과 원어민의 음소별 발음 특성 간의 유사도를 결정할 수 있다. 서비스 제공 서버(120)는 결정된 유사도에 대응하여 평가 점수를 결정하고, 결정된 평가 점수, 추출된 음소별 발음 특성 및 피드백 데이터 중 적어도 하나를 포함하는 발음 평가 결과 데이터를 사용자 장치(110)로 전달할 수 있다.Specifically, the service providing server 120 may extract pronunciation characteristics for each phoneme from the user's voice data using the pronunciation evaluation model, and determine the similarity between the extracted pronunciation characteristics for each phoneme and the pronunciation characteristics for each phoneme of a native speaker. The service providing server 120 may determine an evaluation score in response to the determined similarity, and transmit the pronunciation evaluation result data including at least one of the determined evaluation score, the extracted phoneme-specific pronunciation characteristics, and the feedback data to the user device 110 . have.

사용자 장치(110)는 발음 평가 결과 데이터를 나타내는 인터페이스 화면을 표시한다(S540).The user device 110 displays an interface screen indicating pronunciation evaluation result data (S540).

이를 통해 본 발명은 사용자가 장소 및 시간 제약 없이 외국어 발음 훈련을 할 수 있고, 외국어 강사 등과 같은 교육자의 발음 교정 학습에 대한 부담을 감소시킬 수 있다.Through this, according to the present invention, a user can train foreign language pronunciation without restrictions on place and time, and it is possible to reduce the burden of an educator such as a foreign language instructor for pronunciation correction learning.

하기에서는 발음 평가에 관련된 다양한 인터페이스 화면에 대해서 도 6a, 도 6b, 도 6c, 도 6d, 도 6e, 도 6f, 및 도 6g를 참조하여 설명하도록 한다.Hereinafter, various interface screens related to pronunciation evaluation will be described with reference to FIGS. 6A, 6B, 6C, 6D, 6E, 6F, and 6G.

도 6a, 도 6b, 도 6c, 도 6d, 도 6e, 도 6f, 및 도 6g는 본 발명의 실시예에 따른 발음 평가에 관련된 다양한 인터페이스 화면에 대한 예시도들이다. 이러한 인터페이스 화면들은 사용자 장치(200)의 표시부(220)를 통해서 표시될 수 있다. 제시된 실시예에서는 사용자 장치(200)가 PC이고, 표시부(220)가 모니터인 경우를 설명하도록 한다.6A, 6B, 6C, 6D, 6E, 6F, and 6G are exemplary views of various interface screens related to pronunciation evaluation according to an embodiment of the present invention. These interface screens may be displayed through the display unit 220 of the user device 200 . In the presented embodiment, a case in which the user device 200 is a PC and the display unit 220 is a monitor will be described.

도 6a를 참조하면, 사용자 장치(200)는 특정 음소 또는 특정 단어에 대한 사용자의 발음을 평가하기 위한 인터페이스 화면(600)을 표시할 수 있다. 이러한 인터페이스 화면(600)은 발음 평가를 위해 사용자로부터 획득하고자 하는 음소 또는 단어 중 어느 하나를 선택하기 위한 제1 영역(602), 선택된 음소 또는 단어의 종류를 나타내는 제2 영역(604) 및 선택된 음소 또는 단어에 관련된 다양한 데이터를 나타내는 제3 영역(606)을 포함할 수 있다.Referring to FIG. 6A , the user device 200 may display an interface screen 600 for evaluating the user's pronunciation of a specific phoneme or a specific word. The interface screen 600 includes a first area 602 for selecting any one of phonemes or words to be obtained from the user for pronunciation evaluation, a second area 604 indicating the type of the selected phoneme or word, and the selected phoneme. Alternatively, it may include a third area 606 representing various data related to the word.

제1 영역(602)을 통해 특정 음소에 대한 발음 평가를 위한 “음소” 아이콘이 선택되면 사용자 장치(200)는 평가 가능한 음소의 종류를 나타내는 적어도 하나의 그래픽 객체를 제2 영역(604)에 표시할 수 있다. When a “phoneme” icon for pronunciation evaluation of a specific phoneme is selected through the first area 602 , the user device 200 displays at least one graphic object indicating the types of evaluable phonemes in the second area 604 . can do.

제2 영역(604)에 표시된 그래픽 객체들 중 “/a/” 발음에 해당하는 그래픽 객체(608)가 선택되면 사용자 장치(200)는 “/a/” 발음에 대한 원어민 발음을 학습하기 위한 영상을 표시하는 제4 영역(610) 및 “/a/” 발음에 관련된 사용자의 음성 데이터를 획득하기 위한 제5 영역(612)을 제3 영역(606)에 표시할 수 있다. 여기서, “/a/” 발음에 대한 원어민 발음을 학습하기 위한 영상은 “/a/” 발음에 대하여 원어민의 발음 및 입모양을 영상화한 데이터일 수 있다. 또한 제5 영역(612)은 사용자의 음성 데이터를 획득(또는 녹음)하기 위한 녹음 아이콘(614)을 포함할 수 있다.When the graphic object 608 corresponding to the pronunciation of “/a/” is selected from among the graphic objects displayed in the second area 604 , the user device 200 displays an image for learning the pronunciation of a native speaker for the pronunciation of “/a/” A fourth area 610 displaying "/a/" and a fifth area 612 for acquiring user's voice data related to pronunciation of "/a/" may be displayed in the third area 606 . Here, the image for learning the pronunciation of a native speaker for the pronunciation of “/a/” may be data obtained by imaging the pronunciation and mouth shape of a native speaker with respect to the pronunciation of “/a/”. Also, the fifth area 612 may include a recording icon 614 for acquiring (or recording) the user's voice data.

녹음 아이콘(614)이 선택되고, 사용자 장치(200)에 내장되거나, 또는 외부 장치로서 연결된 마이크를 통해 사용자의 음성 데이터가 입력되면 사용자 장치(200)는 도 6b와 같이 마이크를 통해 입력되고 있는 음성 데이터를 나타내는 그래픽 객체(616)를 제5 영역(612)에 표시할 수 있다. When the recording icon 614 is selected and the user's voice data is input through a microphone built into the user device 200 or connected as an external device, the user device 200 displays the voice being input through the microphone as shown in FIG. 6B . A graphic object 616 representing data may be displayed in the fifth area 612 .

음성 입력이 완료되면 사용자 장치(200)는 음성 데이터를 서비스 제공 서버(120)로 전달하여 “/a/” 발음에 대한 평가를 요청할 수 있다.When the voice input is completed, the user device 200 may transmit the voice data to the service providing server 120 to request evaluation of the pronunciation of “/a/”.

서비스 제공 서버(120)로부터 발음 평가 결과 데이터가 수신되면 사용자 장치(200)는 도 6c와 같이 발음 평가 결과 데이터를 나타내는 그래픽 객체들(618, 620, 622, 624, 626)을 제3 영역(606)에 표시할 수 있다.When the pronunciation evaluation result data is received from the service providing server 120 , the user device 200 displays graphic objects 618 , 620 , 622 , 624 , 626 representing the pronunciation evaluation result data in the third area 606 as shown in FIG. 6C . ) can be displayed.

도 6c를 참조하면, 발음 평가 결과 데이터를 나타내는 그래픽 객체들(618, 620, 622, 624, 626)은 “/a/” 발음에 대한 사용자의 전반적인 발음 평가 결과를 나타내는 제1 그래픽 객체(618), “/a/” 발음에 대하여 원어민 음성으로부터 추출되는 적어도 하나의 발음 특성을 나타내는 제2 그래픽 객체(620), “/a/” 발음에 대하여 사용자 음성으로부터 추출된 적어도 하나의 발음 특성을 나타내는 제3 그래픽 객체(622), 원어민 음성에서 추출된 발음 특성과 사용자 음성에서 추출된 발음 특성 간의 유사도를 나타내는 제4 그래픽 객체(624), 및 사용자의 발음을 원어민 발음으로 교정하도록 가이드하기 위한 제5 그래픽 객체(626)를 포함할 수 있다.Referring to FIG. 6C , the graphic objects 618 , 620 , 622 , 624 , and 626 representing the pronunciation evaluation result data are a first graphic object 618 indicating the overall pronunciation evaluation result of the user for the pronunciation of “/a/”. , a second graphic object 620 representing at least one pronunciation characteristic extracted from a native speaker's voice with respect to pronunciation of "/a/", a second graphic object 620 indicating at least one pronunciation characteristic extracted from a user's voice with respect to pronunciation of "/a/" 3 graphic object 622, a fourth graphic object 624 indicating a similarity between the pronunciation characteristic extracted from the native speaker's voice and the pronunciation characteristic extracted from the user's voice, and a fifth graphic object for guiding the user's pronunciation to be corrected to the native speaker's pronunciation object 626 .

여기서, 제1 그래픽 객체(618)는 서비스 제공 서버(120)에서 발음 평가 모델을 이용하여 결정된 사용자의 발음 평가 점수를 5개의 별 중 적어도 일부로서 표현한 이미지, 및 발음 평가 점수를 “bad, good, excellent” 등의 단어로 표현한 텍스트를 포함할 수 있다.Here, the first graphic object 618 is an image expressing the pronunciation evaluation score of the user determined by using the pronunciation evaluation model in the service providing server 120 as at least a part of five stars, and the pronunciation evaluation score as “bad, good, It can include text expressed with words such as “excellent”.

제2 그래픽 객체(620)는 “/a/” 발음에 대하여 원어민 음성 데이터로부터 추출된 적어도 하나의 발음 특성을 나타내는 아이콘 또는 이미지일 수 있다.The second graphic object 620 may be an icon or an image representing at least one pronunciation characteristic extracted from voice data of a native speaker with respect to pronunciation of “/a/”.

제3 그래픽 객체(622)는 “/a/” 발음에 대하여 사용자 음성 데이터로부터 추출된 적어도 하나의 발음 특성을 나타내는 아이콘 또는 이미지일 수 있다.The third graphic object 622 may be an icon or an image representing at least one pronunciation characteristic extracted from user voice data with respect to pronunciation of “/a/”.

제4 그래픽 객체(624)는 원어민 음성 데이터로부터 추출된 적어도 하나의 발음 특성과 사용자 음성 데이터로부터 추출된 적어도 하나의 발음 특성 간의 일치 여부를 O, X 로서 나타낸 아이콘 또는 이미지일 수 있다.The fourth graphic object 624 may be an icon or an image indicating whether at least one pronunciation characteristic extracted from the native speaker's voice data and at least one pronunciation characteristic extracted from the user's voice data match, as O and X.

제5 그래픽 객체(626)는 원어민 음성 데이터로부터 추출된 적어도 하나의 발음 특성과 사용자 음성 데이터로부터 추출된 적어도 하나의 발음 특성이 일치하지 않을 경우 사용자의 발음을 원어민 발음으로 교정하도록 가이드하기 위한 피드백 데이터를 나타내는 텍스트일 수 있다. 예를 들어, “혀 위치를 낮추어 소리 내주세요”와 같은 텍스트일 수 있다. The fifth graphic object 626 is feedback data for guiding to correct the user's pronunciation to the native speaker's pronunciation when at least one pronunciation characteristic extracted from the native speaker's voice data and at least one pronunciation characteristic extracted from the user's voice data do not match may be text representing For example, it may be text such as “Lower your tongue to make a sound”.

도 6d를 참조하면, 제1 영역(602)을 통해 특정 단어에 대한 발음 평가를 위한 “단어” 아이콘(628)이 선택되면 사용자 장치(200)는 “/a/” 발음에 관련하여 평가 가능한 단어의 종류를 나타내는 적어도 하나의 그래픽 객체(630)를 제3 영역(606)에 표시할 수 있다. Referring to FIG. 6D , when a “word” icon 628 for pronunciation evaluation of a specific word is selected through the first area 602 , the user device 200 displays a word that can be evaluated in relation to the pronunciation of “/a/” At least one graphic object 630 indicating the type of may be displayed on the third area 606 .

예를 들어, 적어도 하나의 그래픽 객체(630) 중 “fox” 단어에 대한 발음 평가를 위한 그래픽 객체(632)가 선택되면 사용자 장치(200)는 “fox” 단어에 대한 원어민 발음을 학습하기 위한 영상을 표시하는 제6 영역(634) 및 “fox” 단어에 관련된 사용자의 음성 데이터를 획득하기 위한 제7 영역(636)을 제3 영역(606)에 표시할 수 있다. 여기서, 제7 영역(636)은 사용자의 음성 데이터를 녹음하기 위한 녹음 아이콘(638)을 포함할 수 있다.For example, when the graphic object 632 for pronunciation evaluation of the word “fox” is selected among the at least one graphic object 630 , the user device 200 is an image for learning the pronunciation of the word “fox” by a native speaker A sixth area 634 displaying , and a seventh area 636 for acquiring user's voice data related to the word “fox” may be displayed in the third area 606 . Here, the seventh area 636 may include a recording icon 638 for recording the user's voice data.

녹음 아이콘(638)이 선택되고, 마이크를 통해 사용자의 음성 데이터가 입력되면 사용자 장치(200)는 도 6e와 같이 마이크를 통해 입력되고 있는 음성 데이터를 나타내는 그래픽 객체(640)를 제7 영역(636)에 표시할 수 있다.When the recording icon 638 is selected and the user's voice data is input through the microphone, the user device 200 displays the graphic object 640 representing the voice data being input through the microphone as shown in FIG. 6E in the seventh area 636 ) can be displayed.

음성 입력이 완료되면 사용자 장치(200)는 음성 데이터를 서비스 제공 서버(120)로 전달하여 “fox” 단어에 대한 사용자의 발음 평가를 요청할 수 있다.When the voice input is completed, the user device 200 may transmit the voice data to the service providing server 120 to request an evaluation of the user's pronunciation of the word “fox”.

서비스 제공 서버(120)로부터 발음 평가 결과 데이터가 수신되면 사용자 장치(200)는 도 6f와 같이 발음 평가 결과 데이터를 나타내는 그래픽 객체들(642, 644, 646, 648, 650)을 제3 영역(606)에 표시할 수 있다. When the pronunciation evaluation result data is received from the service providing server 120 , the user device 200 displays graphic objects 642 , 644 , 646 , 648 , 650 representing the pronunciation evaluation result data in the third area 606 as shown in FIG. 6F . ) can be displayed.

도 6f를 참조하면, 발음 평가 결과 데이터를 나타내는 그래픽 객체들(642, 644, 646, 648, 650)은 “fox” 단어의 음소별 발음에 해당하는 “f, a, k, s” 각각에 대한 평가 점수를 나타내는 제1 그래픽 객체(642), “fox” 단어에 대한 사용자의 전반적인 발음 평가 결과를 나타내는 제2 그래픽 객체(644), “fox” 단어에 대하여 원어민 음성으로부터 추출되는 음소별 발음 특성을 나타내는 제3 그래픽 객체(646), “fox” 단어에 대하여 사용자 음성으로부터 추출된 음소별 발음 특성을 나타내는 제4 그래픽 객체(648), 및 원어민 음성에서 추출된 음소별 발음 특성과 사용자 음성에서 추출된 음소별 발음 특성 간의 유사도를 나타내는 제5 그래픽 객체(650)를 포함할 수 있다. Referring to FIG. 6F , the graphic objects 642 , 644 , 646 , 648 , and 650 representing the pronunciation evaluation result data are for each of “f, a, k, s” corresponding to the phoneme-specific pronunciation of the word “fox”. The first graphic object 642 indicating the evaluation score, the second graphic object 644 indicating the overall pronunciation evaluation result of the user for the word “fox”, and the phoneme-specific pronunciation characteristics extracted from the native speaker’s voice for the word “fox” The third graphic object 646 representing the word “fox”, the fourth graphic object 648 representing the phoneme-specific pronunciation characteristics extracted from the user's voice, and the phoneme-specific pronunciation characteristics extracted from the native speaker's voice and the user's voice A fifth graphic object 650 indicating a similarity between pronunciation characteristics for each phoneme may be included.

다양한 실시예에서 이러한 발음 평가 결과 데이터를 나타내는 그래픽 객체들은 사용자의 발음을 원어민 발음으로 교정하도록 가이드하기 위한 그래픽 객체를 더 포함할 수 있다.According to various embodiments, the graphic objects representing the pronunciation evaluation result data may further include a graphic object for guiding the user's pronunciation to be corrected to the native speaker's pronunciation.

여기서, 제1 그래픽 객체(642)는 “fox” 단어의 발음에 해당하는 “/faks/”의 음소별 발음인 “f”, “a”, “k”, “s” 각각에 대하여 서비스 제공 서버(120)에서 발음 평가 모델을 이용하여 결정된 발음 평가 점수를 나타내는 텍스트일 수 있다. 예를 들어, “f”에 대하여 80점, “a”에 대하여 77점, “k”에 대하여 44점, “s”에 대하여 96점으로 결정된 음소별 평가 점수를 나타내는 텍스트일 수 있다.Here, the first graphic object 642 is a service providing server for each phoneme-specific pronunciation of “/faks/” corresponding to the pronunciation of the word “fox”, “f”, “a”, “k”, and “s” It may be a text indicating a pronunciation evaluation score determined by using the pronunciation evaluation model in step 120 . For example, it may be a text indicating an evaluation score for each phoneme determined as 80 points for “f”, 77 points for “a”, 44 points for “k”, and 96 points for “s”.

제2 그래픽 객체(644)는 서비스 제공 서버(120)에서 발음 평가 모델을 이용하여 결정된 사용자의 발음 평가 점수를 5개의 별 중 적어도 일부로서 표현한 이미지, 및 발음 평가 점수를 “bad, good, excellent” 등의 단어로 표현한 텍스트를 포함할 수 있다.The second graphic object 644 is an image expressing the pronunciation evaluation score of the user determined by using the pronunciation evaluation model in the service providing server 120 as at least a part of five stars, and the pronunciation evaluation score of “bad, good, excellent” It may include text expressed in words such as

제3 그래픽 객체(646)는 “fox” 단어에 대해서 원어민 음성 데이터로부터 추출된 음소별 발음 특성을 나타내는 아이콘 또는 이미지일 수 있다. The third graphic object 646 may be an icon or an image representing pronunciation characteristics for each phoneme extracted from the voice data of a native speaker for the word “fox”.

제4 그래픽 객체(648)는 “fox” 단어에 대해서 사용자 음성 데이터로부터 추출된 음소별 발음 특성을 나타내는 아이콘 또는 이미지일 수 있다. The fourth graphic object 648 may be an icon or an image representing pronunciation characteristics for each phoneme extracted from user voice data for the word “fox”.

제5 그래픽 객체(650)는 원어민 음성 데이터로부터 추출된 적어도 하나의 발음 특성과 사용자 음성 데이터로부터 추출된 적어도 하나의 발음 특성 간의 일치 여부를 O, X 로서 나타낸 아이콘 또는 이미지일 수 있다.The fifth graphic object 650 may be an icon or an image indicating whether at least one pronunciation characteristic extracted from the native speaker's voice data and at least one pronunciation characteristic extracted from the user's voice data match, as O and X.

도 6g를 참조하면, 사용자 장치(200)는 사용자의 음소별 발음에 대한 전체적인 평가 결과를 나타내는 인터페이스 화면(652)을 표시할 수 있다. Referring to FIG. 6G , the user device 200 may display an interface screen 652 indicating the overall evaluation result of the user's pronunciation for each phoneme.

이러한 인터페이스 화면(652)은 사용자가 발음 평가를 요청한 음소들 중 원어민 발음과 유사도가 높은 음소의 개수를 나타내는 제1 그래픽 객체(654), 발음 연습이 필요한 발음을 나타내는 제2 그래픽 객체(656) 및 각 발음에 대한 평가 점수를 그래프로 나타낸 제3 그래픽 객체(658)를 포함할 수 있다. 이를 통해 사용자는 복수의 음소들에 대하여 자신에게 부족한 발음을 확인할 수 있게 된다.The interface screen 652 includes a first graphic object 654 indicating the number of phonemes with high similarity to a native speaker's pronunciation among phonemes requested by the user for pronunciation evaluation, a second graphic object 656 indicating a pronunciation requiring pronunciation practice, and It may include a third graphic object 658 representing the evaluation score for each pronunciation as a graph. Through this, the user can check the pronunciation that is insufficient for the user with respect to the plurality of phonemes.

하기에서는 발음 평가에 관련된 인터페이스 화면이 모바일 웹 화면으로 구현된 실시 예에 대해서 도 7을 참조하여 설명하도록 한다.Hereinafter, an embodiment in which an interface screen related to pronunciation evaluation is implemented as a mobile web screen will be described with reference to FIG. 7 .

도 7은 본 발명의 실시예에 따른 사용자의 발음 평가를 위한 모바일 웹 화면에 대한 예시도이다. 제시된 실시예에서는 사용자 장치(200)가 모바일 장치인 경우를 설명하도록 한다.7 is an exemplary diagram of a mobile web screen for evaluating a user's pronunciation according to an embodiment of the present invention. In the presented embodiment, a case in which the user device 200 is a mobile device will be described.

도 7을 참조하면, 사용자 장치(200)는 평가받고자 하는 특정 음소 또는 특정 단어에 관한 음성 데이터를 획득하고, 획득된 음성 데이터를 기초로 특정 음소 또는 특정 단어에 대한 사용자의 발음을 평가하기 위한 인터페이스 화면(700)을 표시할 수 있다.Referring to FIG. 7 , the user device 200 obtains voice data related to a specific phoneme or specific word to be evaluated, and an interface for evaluating the user's pronunciation of a specific phoneme or specific word based on the acquired voice data The screen 700 may be displayed.

인터페이스 화면(700)은 평가받고자 하는 특정 음소 또는 특정 단어를 선택하기 위한 제1 영역(710), 특정 음소 또는 특정 단어에 대한 원어민 발음을 학습하기 위한 영상 데이터를 표시하는 제2 영역(720) 및 사용자의 음성 데이터를 획득하기 위한 제3 영역(730)을 포함할 수 있다. The interface screen 700 includes a first area 710 for selecting a specific phoneme or a specific word to be evaluated, a second area 720 for displaying image data for learning a native speaker's pronunciation for a specific phoneme or specific word, and A third area 730 for acquiring the user's voice data may be included.

사용자의 음성 녹음을 위한 녹음 아이콘(732)이 선택되면 사용자 장치(220)는 내장된 마이크를 통해 사용자의 음성 데이터를 획득할 수 있다. 이와 같이 획득된 음성 데이터는 제3 영역(730)에 표시될 수 있다.When the recording icon 732 for recording the user's voice is selected, the user device 220 may acquire the user's voice data through a built-in microphone. The voice data obtained in this way may be displayed in the third area 730 .

사용자 장치(200)는 획득된 음성 데이터를 서비스 제공 서버(120)로 전달하고, 서비스 제공 서버(120)로부터 발음 평가 결과 데이터를 수신할 수 있다. 이와 같이 수신된 발음 평가 결과는 특정 음소 또는 특정 단어에 대하여 사용자의 발음 평가를 수치화한 평가 점수를 나타내는 제1 그래픽 객체(740), 특정 음소 또는 특정 단어에 대하여 원어민 음성으로부터 추출되는 적어도 하나의 발음 특성을 나타내는 제2 그래픽 객체(750) 및 사용자 음성으로부터 추출된 적어도 하나의 발음 특성과 원어민 음성으로부터 추출되는 적어도 하나의 발음 특성 간의 일치 여부를 나타내는 제3 그래픽 객체(760)로서 인터페이스 화면(700)에 표시될 수 있다.The user device 200 may transmit the acquired voice data to the service providing server 120 and receive pronunciation evaluation result data from the service providing server 120 . The received pronunciation evaluation result is a first graphic object 740 indicating an evaluation score obtained by quantifying the user's pronunciation evaluation with respect to a specific phoneme or specific word, and at least one pronunciation extracted from a native speaker's voice with respect to a specific phoneme or specific word. Interface screen 700 as a second graphic object 750 indicating characteristics and a third graphic object 760 indicating whether at least one pronunciation characteristic extracted from a user's voice and at least one pronunciation characteristic extracted from a native speaker's voice match can be displayed in

제시된 실시예에서 설명한 인터페이스 화면들의 구성은 상술한 내용으로 한정되지 않으며, 각 인터페이스 화면을 구성하는 객체들은 다양하게 구성될 수 있다.The configuration of the interface screens described in the presented embodiment is not limited to the above, and objects constituting each interface screen may be configured in various ways.

이와 같이 본 발명은 사용자의 취약 발음에 대한 강도 높은 훈련이 가능하여 사용자의 외국어 학습 능력을 향상시킬 수 있다.As described above, the present invention enables intensive training for the user's weak pronunciation, thereby improving the user's foreign language learning ability.

또한, 본 발명은 음소별 학습이 가능하므로, 발음 교정이 필요한 음소에 대한 상세 또는 집중 훈련이 가능하다.In addition, since the present invention enables learning for each phoneme, detailed or intensive training for phonemes requiring pronunciation correction is possible.

본 발명의 실시예에 따른 장치 및 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.The apparatus and method according to an embodiment of the present invention may be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination.

컴퓨터 판독 가능 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The program instructions recorded on the computer readable medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the computer software field. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - Includes magneto-optical media and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상 첨부된 도면을 참조하여 본 발명의 실시예들을 더욱 상세하게 설명하였으나, 본 발명은 반드시 이러한 실시예로 국한되는 것은 아니고, 본 발명의 기술사상을 벗어나지 않는 범위 내에서 다양하게 변형 실시될 수 있다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Although the embodiments of the present invention have been described in more detail with reference to the accompanying drawings, the present invention is not necessarily limited to these embodiments, and various modifications may be made within the scope without departing from the technical spirit of the present invention. . Therefore, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. The protection scope of the present invention should be construed by the following claims, and all technical ideas within the equivalent range should be construed as being included in the scope of the present invention.

100: 발음 평가를 위한 사용자 인터페이스 제공 시스템
110, 200: 사용자 장치
120, 300: 서비스 제공 서버100: User interface providing system for pronunciation evaluation
110, 200: user device
120, 300: service providing server

Claims

a communication unit configured to transmit and receive data;
a display unit configured to display data; and
a control unit configured to be connected to the communication unit and the display unit;
The control unit is
Acquire the user's voice data for a specific phoneme or specific word,
Transmitting the acquired voice data to a service providing server for evaluating the user's pronunciation,
receiving pronunciation evaluation result data for evaluating the pronunciation of the user from the service providing server for each pronunciation characteristic, wherein the pronunciation characteristic corresponds to at least one distinctive feature detected for each phoneme;
configured to display an interface screen indicating the received pronunciation evaluation result data through the display unit,
The distinctive feature is
High tongue, low tongue, anterior tongue, posterior tongue, round lips, voiced, nasal, plosive, fricative, labial, interdental, alveolar, fluid, cleavage, R-vowel, Y-half vowel, W-half vowel, closed double, open Containing at least one of middle, central double, and velar consonants,
The pronunciation evaluation result data is
It is determined by the service providing server using a pre-learned pronunciation evaluation model to evaluate the pronunciation of the user based on the user's voice data for each of the pronunciation characteristics corresponding to the distinctive qualities,
At least one of an evaluation score for scoring the evaluation result of the user's pronunciation, pronunciation characteristics extracted from the user's voice data, and feedback data for guiding the user's pronunciation to the native speaker's pronunciation, and the native speaker's pronunciation and mouth shape Including image data imaged,
The video data is
and a graphic object representing a combination of at least two or more distinguishing qualities of any one phoneme in the native speaker's pronunciation of the specific phoneme or the specific word.

delete

According to claim 1, wherein the pronunciation evaluation model,
A device for providing a user interface for pronunciation evaluation, which is a model pre-learned with a correct answer for the pronunciation of a native speaker for the specific phoneme or the specific word.

According to claim 1, wherein the evaluation score,
Pronunciation evaluation, which is data obtained by quantifying the degree of similarity between the pronunciation characteristic of the user and the pronunciation characteristic of the native speaker based on at least one pronunciation characteristic extracted from the voice data using the pronunciation evaluation model by the service providing server A device that provides a user interface for

The method of claim 1, wherein the degree of similarity between the pronunciation characteristics of the user and the pronunciation characteristics of the native speaker is:
The apparatus for providing a user interface for pronunciation evaluation, which is determined according to whether the pronunciation characteristic of the user matches the pronunciation characteristic of the native speaker.

According to claim 1, wherein the feedback data,
When the pronunciation characteristics of the user and the pronunciation characteristics of the native speaker do not match each other, the apparatus for providing a user interface for pronunciation evaluation is provided to guide the extraction of pronunciation characteristics consistent with the pronunciation characteristics of the native speaker from the user's voice data .

delete

A method for providing a user interface for pronunciation evaluation performed by a control unit of a user interface providing device for pronunciation evaluation, the method comprising:
obtaining voice data of a user for a specific phoneme or a specific word;
transmitting the acquired voice data to a service providing server for evaluating the user's pronunciation;
receiving, from the service providing server, pronunciation evaluation result data for evaluating the pronunciation of the user's pronunciation for each pronunciation characteristic, wherein the pronunciation characteristic corresponds to at least one distinctive feature detected for each phoneme; and
Displaying an interface screen indicating the received pronunciation evaluation result data,
The distinctive feature is
High tongue, low tongue, anterior tongue, posterior tongue, round lips, voiced, nasal, plosive, fricative, labial, interdental, alveolar, fluid, cleavage, R-vowel, Y-half vowel, W-half vowel, closed double, open Containing at least one of middle, central double, and velar consonants,
The pronunciation evaluation result data is
It is determined using a pre-learned pronunciation evaluation model to evaluate the pronunciation of the user based on the user's voice data by the service providing server for each of the pronunciation characteristics corresponding to the distinctive qualities,
At least one of an evaluation score obtained by scoring the evaluation result of the user's pronunciation, a pronunciation characteristic extracted from the user's voice data, and feedback data for guiding the user's pronunciation to the native speaker's pronunciation, and the native speaker's pronunciation and mouth shape Including image data imaged,
The video data is
A method of providing a user interface for pronunciation evaluation, comprising a graphic object representing a combination of at least two or more distinguishing qualities of any one phoneme in the native speaker's pronunciation of the specific phoneme or the specific word.

delete

The method of claim 8, wherein the pronunciation evaluation model comprises:
A method for providing a user interface for pronunciation evaluation, which is a model pre-trained with a correct answer for the pronunciation of a native speaker for the specific phoneme or the specific word.

According to claim 8, wherein the evaluation score,
Pronunciation evaluation, which is data obtained by quantifying the degree of similarity between the pronunciation characteristic of the user and the pronunciation characteristic of the native speaker based on at least one pronunciation characteristic extracted from the voice data using the pronunciation evaluation model by the service providing server How to provide a user interface for

The method of claim 8, wherein the degree of similarity between the pronunciation characteristic of the user and the pronunciation characteristic of the native speaker is:
The method of providing a user interface for pronunciation evaluation, which is determined according to whether the pronunciation characteristics of the user and the pronunciation characteristics of the native speaker match.

The method of claim 8, wherein the feedback data,
When the pronunciation characteristics of the user and the pronunciation characteristics of the native speaker do not match with each other, the method for providing a user interface for pronunciation evaluation is provided to guide the extraction of pronunciation characteristics matching the pronunciation characteristics of the native speaker from the user's voice data .