KR20200042137A

KR20200042137A - Electronic device providing variation utterance text and operating method thereof

Info

Publication number: KR20200042137A
Application number: KR1020180122313A
Authority: KR
Inventors: 박상민; 송가진
Original assignee: 삼성전자주식회사
Priority date: 2018-10-15
Filing date: 2018-10-15
Publication date: 2020-04-23
Also published as: US20220051661A1; WO2020080771A1

Abstract

An operation method of an electronic device communicating with a server to generate a transformed utterance text set comprises the operations of: receiving a domain and a category; transmitting the domain and the category to the server; receiving a transformed utterance text corresponding to the domain and the category from the server; and displaying the transformed utterance text. The transformed utterance text is generated through a generative model or a transition learning model based on user utterance data previously stored in the server. According to the user utterance data, an external electronic device receiving user utterance converts voice data transmitted to the server into a text and stores the text. In addition, various embodiments identified through the specification are possible.

Description

ELECTRONIC DEVICE PROVIDING VARIATION UTTERANCE TEXT AND OPERATING METHOD THEREOF

본 문서에서 개시되는 실시 예들은, 트레이닝 발화 텍스트에 대응하는 변형 발화 텍스트를 제공하는 기술과 관련된다.Embodiments disclosed in this document relate to a technique for providing a modified speech text corresponding to a training speech text.

키보드나 마우스를 이용한 전통적인 입력 방식에 부가하여, 최근의 전자 장치들은 음성 입력과 같은 다양한 입력 방식을 지원할 수 있다. 예를 들어, 스마트폰이나 태블릿과 같은 전자 장치들은 음성 인식 서비스가 실행된 상태에서 입력되는 사용자의 음성을 인식하고, 음성 입력에 대응되는 동작을 실행하거나 검색 결과를 제공할 수 있다.In addition to the traditional input method using a keyboard or mouse, recent electronic devices can support various input methods such as voice input. For example, electronic devices such as smartphones and tablets may recognize a user's voice input while the voice recognition service is executed, perform an operation corresponding to the voice input, or provide a search result.

근래 음성 인식 서비스는 자연어를 처리하는 기술을 기반으로 발전하고 있다. 자연어를 처리하는 기술은 사용자 발화의 의도를 파악하고, 의도에 맞는 결과를 사용자에게 제공하는 기술이다.Recently, speech recognition services have been developed based on technologies for processing natural languages. The technology for processing natural language is a technology that grasps the intention of a user's speech and provides the user with a result in accordance with the intention.

음성 인식 서비스를 제공하는 서버는 개발자에 의해 수동으로 작성된 트레이닝 발화 텍스트 세트에 기초하여 트레이닝된다. 개발자는 대표 발화를 생성하고, 대표 발화에 대한 응용 발화 생성하여 트레이닝 발화 텍스트 세트를 작성한다. 따라서, 트레이닝 발화 텍스트 세트에 의한 트레이닝 효과는 개발자의 능력에 따라 달라지게 된다.The server providing the speech recognition service is trained based on a set of training speech texts written manually by the developer. The developer creates a representative utterance and creates a training utterance text set by generating an application utterance for the representative utterance. Therefore, the training effect by the training speech text set depends on the developer's ability.

본 발명의 다양한 실시 예들은 트레이닝 발화 텍스트 세트 또는 실제 사용자 발화에 기초하여 음성 인식 서비스의 트레이닝을 위해 서버 내에서 추가적인 변형 발화 텍스트 세트를 생성하는 방법을 제안하고자 한다.Various embodiments of the present invention are to propose a method for generating an additional modified speech text set in a server for training a speech recognition service based on a training speech text set or an actual user speech.

또한, 본 발명의 다양한 실시 예들은 생성된 변형 발화 텍스트 세트를 개발자 또는 사용자에게 제공하는 방법을 제안하고자 한다.In addition, various embodiments of the present invention are intended to propose a method for providing a generated set of modified speech text to a developer or user.

본 문서에 개시되는 일 실시 예에 따른 서버와 통신하는 전자 장치의 동작 방법은, 도메인 및 카테고리를 입력 받는 동작, 상기 도메인 및 상기 카테고리를 상기 서버로 전송하는 동작, 상기 서버로부터 상기 도메인 및 상기 카테고리에 대응하는 변형 발화 텍스트를 수신하는 동작, 그리고 상기 변형 발화 텍스트를 표시하는 동작을 포함하되, 상기 변형 발화 텍스트는 상기 서버에 기 저장된 사용자 발화 데이터에 기초하여 생성 모델 또는 전이 학습 모델을 통해 생성되고, 상기 사용자 발화 데이터는 사용자 발화를 입력 받는 외부 전자 장치가 상기 서버로 전달한 음성 데이터를 상기 서버가 텍스트로 변환하여 저장한다.An operation method of an electronic device communicating with a server according to an embodiment disclosed in the present document includes an operation of receiving a domain and a category, transmitting the domain and the category to the server, and the domain and the category from the server Receiving a modified speech text corresponding to, and the operation of displaying the modified speech text, wherein the modified speech text is generated through a generation model or a transfer learning model based on user speech data previously stored in the server. In addition, the user's speech data is stored by converting the voice data transmitted to the server by an external electronic device that receives the user's speech into text.

또한, 본 문서에 개시되는 일 실시 예에 따른 서버와 통신하는 전자 장치의 동작 방법은, 도메인 및 카테고리를 입력 받는 동작, 상기 도메인 및 상기 카테고리에 대응하는 트레이닝 발화 텍스트 세트를 입력 받는 동작, 상기 도메인, 상기 카테고리 및 상기 트레이닝 발화 텍스트 세트를 상기 서버로 전송하는 동작, 상기 서버로부터 상기 트레이닝 발화 텍스트 세트에 대응하는 변형 발화 텍스트 세트를 수신하는 동작, 그리고 상기 변형 발화 텍스트 세트를 표시하는 동작을 포함하되, 상기 변형 발화 텍스트 세트는 상기 서버에 기 저장된 사용자 발화 데이터에 기초하여 생성 모델 또는 전이 학습 모델을 통해 생성되고, 상기 사용자 발화 데이터는 사용자 발화를 입력 받는 외부 전자 장치가 상기 서버로 전달한 음성 데이터를 상기 서버가 텍스트로 변환하여 저장한다.In addition, an operation method of an electronic device communicating with a server according to an embodiment disclosed in the present document includes an operation of receiving a domain and a category, an operation of receiving a domain and a set of training speech text corresponding to the category, and the domain And transmitting the category and the training speech text set to the server, receiving a variation speech text set corresponding to the training speech text set from the server, and displaying the variation speech text set. , The modified speech text set is generated through a generation model or a transfer learning model based on user speech data pre-stored in the server, and the user speech data is voice data transmitted to the server by an external electronic device that receives a user speech. The server turns into text And stores.

또한, 본 문서에 개시되는 일 실시 예에 따른 서버와 통신하는 전자 장치의 동작 방법은, 도메인 및 카테고리를 입력 받는 동작, 상기 도메인 및 상기 카테고리에 대응하는 트레이닝 발화 텍스트 세트를 입력 받는 동작, 상기 도메인, 상기 카테고리 및 상기 트레이닝 발화 텍스트 세트를 상기 서버로 전송하는 동작, 상기 서버로부터 상기 트레이닝 발화 텍스트 세트에 대응하는 변형 발화 텍스트 세트를 수신하는 동작, 그리고 상기 변형 발화 텍스트 세트에 기초하여 상기 트레이닝 발화 텍스트 세트에 포함된 제1 파라미터에 대응하는 복수의 제2 파라미터를 표시하는 동작을 포함한다.In addition, an operation method of an electronic device communicating with a server according to an embodiment disclosed in the present document includes an operation of receiving a domain and a category, an operation of receiving a domain and a set of training speech text corresponding to the category, and the domain , Transmitting the category and the training speech text set to the server, receiving a modified speech text set corresponding to the training speech text set from the server, and the training speech text based on the modified speech text set And displaying a plurality of second parameters corresponding to the first parameter included in the set.

본 문서에 개시되는 실시 예들에 따르면, 과거에 축적된 사용자 발화 데이터를 기반으로 변형 발화 텍스트 세트를 생성할 수 있다.According to the embodiments disclosed in the present document, a modified speech text set may be generated based on user speech data accumulated in the past.

본 문서에 개시되는 실시 예들에 따르면, 생성 모델 또는 전이 학습 모델을 기반으로 변형 발화 텍스트 세트를 생성할 수 있다.According to the embodiments disclosed in the present document, a modified speech text set may be generated based on a generation model or a transfer learning model.

본 문서에 개시되는 실시 예들에 따르면, 사용자 특성을 기반으로 변형 발화 텍스트 세트를 생성할 수 있다.According to the embodiments disclosed in the present document, a modified speech text set may be generated based on user characteristics.

본 문서에 개시되는 실시 예들에 따르면, 생성된 변형 발화 텍스트 세트에 기초하여 서버의 자연어 이해 모듈을 트레이닝 할 수 있다.According to the embodiments disclosed in the present document, a natural language understanding module of the server may be trained based on the generated set of modified speech texts.

본 문서에 개시되는 실시 예들에 따르면, 생성된 변형 발화 텍스트 세트를 개발자 또는 사용자에게 추천하여 음성 인식 서비스의 성능을 향상시킬 수 있다.According to the embodiments disclosed in the present document, it is possible to improve the performance of the speech recognition service by recommending the generated set of spoken texts to the developer or user.

이 외에, 본 문서를 통해 직접적 또는 간접적으로 파악되는 다양한 효과들이 제공될 수 있다.In addition, various effects that can be directly or indirectly identified through this document may be provided.

도 1은 일 실시 예에 따른 통합 지능 (integrated intelligence) 시스템을 나타낸 블록도이다.
도 2는 일 실시 예에 따른, 컨셉과 액션의 관계 정보가 데이터베이스에 저장된 형태를 나타낸 도면이다.
도 3은 일 실시 예에 따라, 지능형 앱을 통해 수신된 음성 입력을 처리하는 화면을 표시하는 사용자 단말을 도시하는 도면이다.
도 4는 본 발명의 일 실시 예에 따른 변형 발화 텍스트 세트를 생성하는 지능형 서버를 보여주는 블록도이다.
도 5은 도 4의 파라미터 수집 모듈의 일 실시 예를 보여주는 블록도이다.
도 6은 본 발명의 일 실시 예에 따른 자연어 이해 트레이닝 모드 시 지능형 서버의 동작 방법을 보여주는 순서도이다.
도 7는 도 6의 동작 650에서 변형 발화 텍스트 세트를 생성하는 방법의 일 실시 예를 보여주는 순서도이다.
도 8은 도 6의 동작 650에서 변형 발화 텍스트 세트를 생성하는 방법의 다른 실시 예를 보여주는 순서도이다.
도 9는 본 발명의 일 실시 예에 따른 발화 추천 모드 시 지능형 서버의 동작 방법을 보여주는 순서도이다.
도 10a 내지 도 10c는 본 발명의 일 실시 예에 따른 트레이닝 발화 텍스트가 발화 입력기를 통해 입력될 때 변형 발화 텍스트가 추천되는 방법을 나타내는 도면이다.
도 11은 본 발명의 일 실시 예에 따른 사용자 발화 시 사용자에게 변형 발화 텍스트를 추천하는 방법을 나타내는 도면이다.
도 12는 다양한 실시 예에 따른 네트워크 환경 내의 전자 장치의 블록도이다.
도면의 설명과 관련하여, 동일 또는 유사한 구성요소에 대해서는 동일 또는 유사한 참조 부호가 사용될 수 있다.1 is a block diagram showing an integrated intelligence system according to an embodiment.
2 is a diagram illustrating a form in which relationship information between a concept and an action is stored in a database according to an embodiment.
3 is a diagram illustrating a user terminal displaying a screen for processing a voice input received through an intelligent app, according to an embodiment.
4 is a block diagram showing an intelligent server generating a modified speech text set according to an embodiment of the present invention.
5 is a block diagram illustrating an embodiment of the parameter collection module of FIG. 4.
6 is a flowchart illustrating an operation method of an intelligent server in a natural language understanding training mode according to an embodiment of the present invention.
7 is a flowchart illustrating an embodiment of a method of generating a modified speech text set in operation 650 of FIG. 6.
8 is a flowchart illustrating another embodiment of a method of generating a modified speech text set in operation 650 of FIG. 6.
9 is a flowchart illustrating an operation method of an intelligent server in a speech recommendation mode according to an embodiment of the present invention.
10A to 10C are views illustrating a method in which a modified speech text is recommended when the training speech text is input through the speech input device according to an embodiment of the present invention.
11 is a diagram illustrating a method of recommending a modified utterance text to a user when a user speaks according to an embodiment of the present invention.
12 is a block diagram of an electronic device in a network environment according to various embodiments of the present disclosure.
In connection with the description of the drawings, the same or similar reference numerals may be used for the same or similar components.

이하, 본 발명의 다양한 실시 예가 첨부된 도면을 참조하여 기재된다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 실시 예의 다양한 변경(modification), 균등물(equivalent), 및/또는 대체물(alternative)을 포함하는 것으로 이해되어야 한다. Hereinafter, various embodiments of the present invention will be described with reference to the accompanying drawings. However, this is not intended to limit the present invention to specific embodiments, and it should be understood that the present invention includes various modifications, equivalents, and / or alternatives.

본 발명의 일 실시 예를 서술하기에 앞서, 본 발명의 일 실시 예가 적용될 수 있는 통합 지능화 시스템에 대해 설명한다.Before describing an embodiment of the present invention, an integrated intelligent system to which an embodiment of the present invention can be applied will be described.

도 1은 일 실시 예에 따른 통합 지능(integrated intelligence) 시스템을 나타낸 블록도이다.1 is a block diagram illustrating an integrated intelligence system according to an embodiment.

도 1을 참조하면, 일 실시 예의 통합 지능화 시스템은 사용자 단말(100), 지능형 서버(200), 및 서비스 서버(300)를 포함할 수 있다. Referring to FIG. 1, an integrated intelligent system according to an embodiment may include a user terminal 100, an intelligent server 200, and a service server 300.

일 실시 예의 사용자 단말(100)은, 인터넷에 연결 가능한 단말 장치(또는, 전자 장치)일 수 있으며, 예를 들어, 휴대폰, 스마트폰, PDA(personal digital assistant), 노트북 컴퓨터, TV, 백색 가전, 웨어러블 장치, HMD, 또는 스마트 스피커일 수 있다.The user terminal 100 according to an embodiment may be a terminal device (or electronic device) that can be connected to the Internet, for example, a mobile phone, a smartphone, a personal digital assistant (PDA), a laptop computer, a TV, or a white appliance. It may be a wearable device, an HMD, or a smart speaker.

도시된 실시 예에 따르면, 사용자 단말(100)은 통신 인터페이스(110), 마이크(120), 스피커(130), 디스플레이(140), 메모리(150), 또는 프로세서(160)를 포함할 수 있다. 상기 열거된 구성요소들은 서로 작동적으로 또는 전기적으로 연결될 수 있다.According to the illustrated embodiment, the user terminal 100 may include a communication interface 110, a microphone 120, a speaker 130, a display 140, a memory 150, or a processor 160. The components listed above can be operatively or electrically connected to each other.

일 실시 예의 통신 인터페이스(110)는 외부 장치와 연결되어 데이터를 송수신하도록 구성될 수 있다. 일 실시 예의 마이크(120)는 소리(예: 사용자 발화)를 수신하여, 전기적 신호로 변환할 수 있다. 일 실시 예의 스피커(130)는 전기적 신호를 소리(예: 음성)로 출력할 수 있다. 일 실시 예의 디스플레이(140)는 이미지 또는 비디오를 표시하도록 구성될 수 있다. 일 실시 예의 디스플레이(140)는 또한 실행되는 앱(app)(또는, 어플리케이션 프로그램(application program))의 그래픽 사용자 인터페이스(graphic user interface)(GUI)를 표시할 수 있다.The communication interface 110 of one embodiment may be configured to be connected to an external device to transmit and receive data. The microphone 120 of one embodiment may receive a sound (eg, user speech) and convert it into an electrical signal. The speaker 130 of one embodiment may output an electrical signal as sound (eg, voice). The display 140 of one embodiment may be configured to display an image or video. The display 140 of one embodiment may also display a graphical user interface (GUI) of an app (or application program) to be executed.

일 실시 예의 메모리(150)는 클라이언트 모듈(151), SDK(software development kit)(153), 및 복수의 앱들(155)을 저장할 수 있다. 상기 클라이언트 모듈(151), 및 SDK(153)는 범용적인 기능을 수행하기 위한 프레임워크(framework)(또는, 솔루션 프로그램)를 구성할 수 있다. 또한, 클라이언트 모듈(151) 또는 SDK(153)는 음성 입력을 처리하기 위한 프레임워크를 구성할 수 있다.The memory 150 of one embodiment may store a client module 151, a software development kit (SDK) 153, and a plurality of apps 155. The client module 151 and the SDK 153 may constitute a framework (or solution program) for performing general-purpose functions. Also, the client module 151 or the SDK 153 may constitute a framework for processing voice input.

일 실시 예의 상기 복수의 앱들(155)은 지정된 기능을 수행하기 위한 프로그램일 수 있다. 일 실시 예에 따르면, 복수의 앱들(155)은 제1 앱(155_1), 제2 앱(155_3)을 포함할 수 있다. 일 실시 예에 따르면, 복수의 앱들(155) 각각은 지정된 기능을 수행하기 위한 복수의 동작들을 포함할 수 있다. 예를 들어, 상기 복수의 앱들(155)은 알람 앱, 메시지 앱, 및/또는 스케줄 앱을 포함할 수 있다. 일 실시 예에 따르면, 복수의 앱들(155)은 프로세서(160)에 의해 실행되어 상기 복수의 동작들 중 적어도 일부를 순차적으로 실행할 수 있다. The plurality of apps 155 of an embodiment may be a program for performing a designated function. According to an embodiment, the plurality of apps 155 may include a first app 155_1 and a second app 155_3. According to an embodiment, each of the plurality of apps 155 may include a plurality of operations for performing a designated function. For example, the plurality of apps 155 may include an alarm app, a message app, and / or a schedule app. According to an embodiment, the plurality of apps 155 may be executed by the processor 160 to sequentially execute at least some of the plurality of operations.

일 실시 예의 프로세서(160)는 사용자 단말(100)의 전반적인 동작을 제어할 수 있다. 예를 들어, 프로세서(160)는 통신 인터페이스(110), 마이크(120), 스피커(130), 및 디스플레이(140)와 전기적으로 연결되어 지정된 동작을 수행할 수 있다.The processor 160 of one embodiment may control the overall operation of the user terminal 100. For example, the processor 160 may be electrically connected to the communication interface 110, the microphone 120, the speaker 130, and the display 140 to perform a designated operation.

일 실시 예의 프로세서(160)는 또한 상기 메모리(150)에 저장된 프로그램을 실행시켜 지정된 기능을 수행할 수 있다. 예를 들어, 프로세서(160)는 클라이언트 모듈(151) 또는 SDK(153) 중 적어도 하나를 실행하여, 음성 입력을 처리하기 위한 이하의 동작을 수행할 수 있다. 프로세서(160)는, 예를 들어, SDK(153)를 통해 복수의 앱들(155)의 동작을 제어할 수 있다. 클라이언트 모듈(151) 또는 SDK(153)의 동작으로 설명된 이하의 동작은 프로세서(160)의 실행에 의한 동작일 수 있다.The processor 160 of one embodiment may also execute a program stored in the memory 150 to perform a designated function. For example, the processor 160 may execute at least one of the client module 151 or the SDK 153 to perform the following operations for processing voice input. The processor 160 may control the operations of the plurality of apps 155 through, for example, the SDK 153. The following operations described as operations of the client module 151 or the SDK 153 may be operations performed by the processor 160.

일 실시 예의 클라이언트 모듈(151)은 음성 입력을 수신할 수 있다. 예를 들어, 클라이언트 모듈(151)은 마이크(120)를 통해 감지된 사용자 발화에 대응되는 음성 신호를 수신할 수 있다. 상기 클라이언트 모듈(151)은 수신된 음성 입력을 지능형 서버(200)로 송신할 수 있다. 클라이언트 모듈(151)은 수신된 음성 입력과 함께, 사용자 단말(100)의 상태 정보를 지능형 서버(200)로 송신할 수 있다. 상기 상태 정보는, 예를 들어, 앱의 실행 상태 정보일 수 있다.The client module 151 of one embodiment may receive a voice input. For example, the client module 151 may receive a voice signal corresponding to a user's speech detected through the microphone 120. The client module 151 may transmit the received voice input to the intelligent server 200. The client module 151 may transmit status information of the user terminal 100 to the intelligent server 200 together with the received voice input. The status information may be, for example, execution status information of the app.

일 실시 예의 클라이언트 모듈(151)은 수신된 음성 입력에 대응되는 결과를 수신할 수 있다. 예를 들어, 클라이언트 모듈(151)은 지능형 서버(200)에서 상기 수신된 음성 입력에 대응되는 결과를 산출할 수 있는 경우, 수신된 음성 입력에 대응되는 결과를 수신할 수 있다. 클라이언트 모듈(151)은 상기 수신된 결과를 디스플레이(140)에 표시할 수 있다.The client module 151 of one embodiment may receive a result corresponding to the received voice input. For example, when the intelligent server 200 can calculate a result corresponding to the received voice input, the client module 151 may receive a result corresponding to the received voice input. The client module 151 may display the received result on the display 140.

일 실시 예의 클라이언트 모듈(151)은 수신된 음성 입력에 대응되는 플랜을 수신할 수 있다. 클라이언트 모듈(151)은 플랜에 따라 앱의 복수의 동작을 실행한 결과를 디스플레이(140)에 표시할 수 있다. 클라이언트 모듈(151)은, 예를 들어, 복수의 동작들의 실행 결과를 순차적으로 디스플레이에 표시할 수 있다. 사용자 단말(100)은, 다른 예를 들어, 복수의 동작들을 실행한 일부 결과(예: 마지막 동작의 결과)만을 디스플레이에 표시할 수 있다.The client module 151 of one embodiment may receive a plan corresponding to the received voice input. The client module 151 may display a result of executing a plurality of operations of the app according to the plan on the display 140. The client module 151 may sequentially display, for example, execution results of a plurality of operations on a display. For example, the user terminal 100 may display only some results (for example, results of the last operation) performed by a plurality of operations on the display.

일 실시 예에 따르면, 클라이언트 모듈(151)은 지능형 서버(200)로부터 음성 입력에 대응되는 결과를 산출하기 위해 필요한 정보를 획득하기 위한 요청을 수신할 수 있다. 일 실시 예에 따르면, 클라이언트 모듈(151)은 상기 요청에 대응하여 상기 필요한 정보를 지능형 서버(200)로 송신할 수 있다.According to an embodiment, the client module 151 may receive a request for obtaining information necessary for calculating a result corresponding to a voice input from the intelligent server 200. According to an embodiment, the client module 151 may transmit the required information to the intelligent server 200 in response to the request.

일 실시 예의 클라이언트 모듈(151)은 플랜에 따라 복수의 동작들을 실행한 결과 정보를 지능형 서버(200)로 송신할 수 있다. 지능형 서버(200)는 상기 결과 정보를 이용하여 수신된 음성 입력이 올바르게 처리된 것을 확인할 수 있다.The client module 151 of one embodiment may transmit information to the intelligent server 200 as a result of executing a plurality of operations according to the plan. The intelligent server 200 may confirm that the received voice input is correctly processed using the result information.

일 실시 예의 클라이언트 모듈(151)은 음성 인식 모듈을 포함할 수 있다. 일 실시 예에 따르면, 클라이언트 모듈(151)은 상기 음성 인식 모듈을 통해 제한된 기능을 수행하는 음성 입력을 인식할 수 있다. 예를 들어, 클라이언트 모듈(151)은 지정된 입력(예: 웨이크 업!)을 통해 유기적인 동작을 수행하기 위한 음성 입력을 처리하기 위한 지능형 앱을 수행할 수 있다.The client module 151 of one embodiment may include a speech recognition module. According to one embodiment, the client module 151 may recognize a voice input performing a limited function through the voice recognition module. For example, the client module 151 may perform an intelligent app for processing a voice input for performing an organic operation through a designated input (for example, wake up!).

일 실시 예의 지능형 서버(200)는 통신 망을 통해 사용자 단말(100)로부터 사용자 음성 입력과 관련된 정보를 수신할 수 있다. 일 실시 예에 따르면, 지능형 서버(200)는 수신된 음성 입력과 관련된 데이터를 텍스트 데이터(text data)로 변경할 수 있다. 일 실시 예에 따르면, 지능형 서버(200)는 상기 텍스트 데이터에 기초하여 사용자 음성 입력과 대응되는 태스크(task)를 수행하기 위한 플랜(plan)을 생성할 수 있다The intelligent server 200 according to an embodiment may receive information related to user voice input from the user terminal 100 through a communication network. According to an embodiment, the intelligent server 200 may change data related to the received voice input into text data. According to one embodiment, the intelligent server 200 may generate a plan for performing a task corresponding to a user voice input based on the text data.

일 실시 예에 따르면, 플랜은 인공 지능(artificial intelligent)(AI) 시스템에 의해 생성될 수 있다. 인공지능 시스템은 룰 베이스 시스템(rule-based system)일 수도 있고, 신경망 베이스 시스템(neural network-based system)(예: 피드포워드 신경망(feedforward neural network(FNN)), 순환 신경망(recurrent neural network(RNN)))일 수도 있다. 또는, 전술한 것의 조합 또는 이와 다른 인공지능 시스템일 수도 있다. 일 실시 예에 따르면, 플랜은 미리 정의된 플랜의 집합에서 선택될 수 있거나, 사용자 요청에 응답하여 실시간으로 생성될 수 있다. 예를 들어, 인공지능 시스템은 미리 정의된 복수의 플랜들 중 적어도 하나의 플랜을 선택할 수 있다.According to one embodiment, the plan may be generated by an artificial intelligent (AI) system. The artificial intelligence system may be a rule-based system, a neural network-based system (eg, a feedforward neural network (FNN)), a recurrent neural network (RNN) ))). Or, it may be a combination of the above or another artificial intelligence system. According to one embodiment, the plan may be selected from a predefined set of plans, or may be generated in real time in response to a user request. For example, the artificial intelligence system may select at least one plan among a plurality of predefined plans.

일 실시 예의 지능형 서버(200)는 생성된 플랜에 따른 결과를 사용자 단말(100)로 송신하거나, 생성된 플랜을 사용자 단말(100)로 송신할 수 있다. 일 실시 예에 따르면, 사용자 단말(100)은 플랜에 따른 결과를 디스플레이에 표시할 수 있다. 일 실시 예에 따르면, 사용자 단말(100)은 플랜에 따른 동작을 실행한 결과를 디스플레이에 표시할 수 있다.The intelligent server 200 according to an embodiment may transmit the result according to the generated plan to the user terminal 100 or the generated plan to the user terminal 100. According to one embodiment, the user terminal 100 may display the result according to the plan on the display. According to an embodiment, the user terminal 100 may display the result of executing the operation according to the plan on the display.

일 실시 예의 지능형 서버(200)는 프론트 엔드(front end)(210), 자연어 플랫폼(natural language platform)(220), 캡슐 데이터베이스(capsule DB)(230), 실행 엔진(execution engine)(240), 엔드 유저 인터페이스(end user interface)(250), 매니지먼트 플랫폼(management platform)(260), 빅 데이터 플랫폼(big data platform)(270), 또는 분석 플랫폼(analytic platform)(280)을 포함할 수 있다.The intelligent server 200 according to an embodiment includes a front end 210, a natural language platform 220, a capsule DB 230, an execution engine 240, It may include an end user interface (end user interface) 250, a management platform (management platform) 260, a big data platform (big data platform) 270, or an analysis platform (analytic platform) 280.

일 실시 예의 프론트 엔드(210)는 사용자 단말(100)로부터 수신된 음성 입력을 수신할 수 있다. 프론트 엔드(210)는 상기 음성 입력에 대응되는 응답을 송신할 수 있다.The front end 210 of one embodiment may receive the voice input received from the user terminal 100. The front end 210 may transmit a response corresponding to the voice input.

일 실시 예에 따르면, 자연어 플랫폼(220)은 자동 음성 인식 모듈(automatic speech recognition module)(ASR module)(221), 자연어 이해 모듈(natural language understanding module)(NLU module)(223), 플래너 모듈(planner module)(225), 자연어 생성 모듈(natural language generator module)(NLG module)(227) 또는 텍스트 음성 변환 모듈(text to speech module)(TTS module)(229)를 포함할 수 있다.According to an embodiment, the natural language platform 220 includes an automatic speech recognition module (ASR module) 221, a natural language understanding module (NLU module) 223, and a planner module ( It may include a planner module (225), a natural language generator module (NLG module) 227 or a text to speech module (TTS module) 229.

일 실시 예의 자동 음성 인식 모듈(221)은 사용자 단말(100)로부터 수신된 음성 입력을 텍스트 데이터로 변환할 수 있다. 일 실시 예의 자연어 이해 모듈(223)은 음성 입력의 텍스트 데이터를 이용하여 사용자의 의도를 파악할 수 있다. 예를 들어, 자연어 이해 모듈(223)은 문법적 분석(syntactic analyze) 또는 의미적 분석(semantic analyze)을 수행하여 사용자의 의도를 파악할 수 있다. 일 실시 예의 자연어 이해 모듈(223)은 형태소 또는 구의 언어적 특징(예: 문법적 요소)을 이용하여 음성 입력으로부터 추출된 단어의 의미를 파악하고, 상기 파악된 단어의 의미를 의도에 매칭시켜 사용자의 의도를 결정할 수 있다.The automatic speech recognition module 221 of one embodiment may convert speech input received from the user terminal 100 into text data. The natural language understanding module 223 of one embodiment may grasp the user's intention using text data of voice input. For example, the natural language understanding module 223 may grasp a user's intention by performing a syntactic analysis or semantic analysis. The natural language understanding module 223 of one embodiment uses the morpheme or the linguistic features of the phrase (eg, grammatical elements) to understand the meaning of the word extracted from the speech input, and matches the meaning of the identified word to the intention of the user Intent can be determined.

일 실시 예의 플래너 모듈(225)은 자연어 이해 모듈(223)에서 결정된 의도 및 파라미터를 이용하여 플랜을 생성할 수 있다. 일 실시 예에 따르면, 플래너 모듈(225)은 상기 결정된 의도에 기초하여 태스크를 수행하기 위해 필요한 복수의 도메인을 결정할 수 있다. 플래너 모듈(225)은 상기 의도에 기초하여 결정된 복수의 도메인 각각에 포함된 복수의 동작을 결정할 수 있다. 일 실시 예에 따르면, 플래너 모듈(225)은 상기 결정된 복수의 동작을 실행하는데 필요한 파라미터나, 상기 복수의 동작의 실행에 의해 출력되는 결과 값을 결정할 수 있다. 상기 파라미터, 및 상기 결과 값은 지정된 형식(또는, 클래스)의 컨셉으로 정의될 수 있다. 이에 따라, 플랜은 사용자의 의도에 의해 결정된 복수의 동작, 및 복수의 컨셉을 포함할 수 있다. 상기 플래너 모듈(225)은 상기 복수의 동작, 및 상기 복수의 컨셉 사이의 관계를 단계적(또는, 계층적)으로 결정할 수 있다. 예를 들어, 플래너 모듈(225)은 복수의 컨셉에 기초하여 사용자의 의도에 기초하여 결정된 복수의 동작의 실행 순서를 결정할 수 있다. 다시 말해, 플래너 모듈(225)은 복수의 동작의 실행에 필요한 파라미터, 및 복수의 동작의 실행에 의해 출력되는 결과에 기초하여, 복수의 동작의 실행 순서를 결정할 수 있다. 이에 따라, 플래너 모듈(225)는 복수의 동작, 및 복수의 컨셉 사이의 연관 정보(예: 온톨로지(ontology))가 포함된 플랜을 생성할 수 있다. 상기 플래너 모듈(225)은 컨셉과 동작의 관계들의 집합이 저장된 캡슐 데이터베이스(230)에 저장된 정보를 이용하여 플랜을 생성할 수 있다. The planner module 225 of one embodiment may generate a plan using intentions and parameters determined by the natural language understanding module 223. According to an embodiment, the planner module 225 may determine a plurality of domains required to perform a task based on the determined intention. The planner module 225 may determine a plurality of operations included in each of the plurality of domains determined based on the intention. According to an embodiment, the planner module 225 may determine a parameter required to execute the determined plurality of operations or a result value output by executing the plurality of operations. The parameter and the result value may be defined as a concept of a designated format (or class). Accordingly, the plan may include a plurality of operations determined by the user's intention, and a plurality of concepts. The planner module 225 may determine the relationship between the plurality of operations and the plurality of concepts in a stepwise (or hierarchical) manner. For example, the planner module 225 may determine an execution order of a plurality of operations determined based on a user's intention based on a plurality of concepts. In other words, the planner module 225 may determine an execution order of a plurality of operations based on parameters required for execution of the plurality of operations and a result output by the execution of the plurality of operations. Accordingly, the planner module 225 may generate a plan including a plurality of operations and association information (eg, ontology) between the plurality of concepts. The planner module 225 may generate a plan using information stored in the capsule database 230 in which a set of relations between concepts and actions is stored.

일 실시 예의 자연어 생성 모듈(227)은 지정된 정보를 텍스트 형태로 변경할 수 있다. 상기 텍스트 형태로 변경된 정보는 자연어 발화의 형태일 수 있다. 일 실시 예의 텍스트 음성 변환 모듈(229)은 텍스트 형태의 정보를 음성 형태의 정보로 변경할 수 있다.The natural language generation module 227 of one embodiment may change the designated information in the form of text. The information changed to the text form may be in the form of natural language speech. The text-to-speech module 229 according to an embodiment may change text-type information into voice-type information.

상기 캡슐 데이터베이스(230)는 복수의 도메인에 대응되는 복수의 컨셉과 동작의 관계에 대한 정보를 저장할 수 있다. 일 실시 예에 따른 캡슐은 플랜에 포함된 복수의 동작 오브젝트(action object 또는, 동작 정보) 및 컨셉 오브젝트(concept object 또는 컨셉 정보)를 포함할 수 있다. 일 실시 예에 따르면, 캡슐 데이터베이스(230)는 CAN(concept action network)의 형태로 복수의 캡슐을 저장할 수 있다. 일 실시 예에 따르면, 복수의 캡슐은 캡슐 데이터베이스(230)에 포함된 기능 레지스트리(function registry)에 저장될 수 있다.The capsule database 230 may store information on a relationship between a plurality of concepts and operations corresponding to a plurality of domains. The capsule according to an embodiment may include a plurality of action objects (action objects or action information) and concept objects (concept objects or concept information) included in the plan. According to one embodiment, the capsule database 230 may store a plurality of capsules in the form of a concept action network (CAN). According to an embodiment, a plurality of capsules may be stored in a function registry included in the capsule database 230.

상기 캡슐 데이터베이스(230)는 음성 입력에 대응되는 플랜을 결정할 때 필요한 전략 정보가 저장된 전략 레지스트리(strategy registry)를 포함할 수 있다. 상기 전략 정보는 음성 입력에 대응되는 복수의 플랜이 있는 경우, 하나의 플랜을 결정하기 위한 기준 정보를 포함할 수 있다. 일 실시 예에 따르면, 캡슐 데이터베이스(230)는 지정된 상황에서 사용자에게 후속 동작을 제안하기 위한 후속 동작의 정보가 저장된 후속 동작 레지스트리(follow up registry)를 포함할 수 있다. 상기 후속 동작은, 예를 들어, 후속 발화를 포함할 수 있다. 일 실시 예에 따르면, 캡슐 데이터베이스(230)는 사용자 단말(100)을 통해 출력되는 정보의 레이아웃(layout) 정보를 저장하는 레이아웃 레지스트리(layout registry)를 포함할 수 있다. 일 실시 예에 따르면, 캡슐 데이터베이스(230)는 캡슐 정보에 포함된 어휘(vocabulary) 정보가 저장된 어휘 레지스트리(vocabulary registry)를 포함할 수 있다. 일 실시 예에 따르면, 캡슐 데이터베이스(230)는 사용자와의 대화(dialog)(또는, 인터렉션(interaction)) 정보가 저장된 대화 레지스트리(dialog registry)를 포함할 수 있다. 상기 캡슐 데이터베이스(230)는 개발자 툴(developer tool)을 통해 저장된 오브젝트를 업데이트(update)할 수 있다. 상기 개발자 툴은, 예를 들어, 동작 오브젝트 또는 컨셉 오브젝트를 업데이트하기 위한 기능 에디터(function editor)를 포함할 수 있다. 상기 개발자 툴은 어휘를 업데이트하기 위한 어휘 에디터(vocabulary editor)를 포함할 수 있다. 상기 개발자 툴은 플랜을 결정하는 전략을 생성 및 등록 하는 전략 에디터(strategy editor)를 포함할 수 있다. 상기 개발자 툴은 사용자와의 대화를 생성하는 대화 에디터(dialog editor)를 포함할 수 있다. 상기 개발자 툴은 후속 목표를 활성화하고, 힌트를 제공하는 후속 발화를 편집할 수 있는 후속 동작 에디터(follow up editor)를 포함할 수 있다. 상기 후속 목표는 현재 설정된 목표, 사용자의 선호도 또는 환경 조건에 기초하여 결정될 수 있다. 일 실시 예에서는 캡슐 데이터베이스(230)는 사용자 단말(100) 내에도 구현이 가능할 수 있다. The capsule database 230 may include a strategy registry in which strategy information necessary for determining a plan corresponding to voice input is stored. The strategy information may include reference information for determining one plan when there are multiple plans corresponding to voice input. According to an embodiment, the capsule database 230 may include a follow up registry in which information of a subsequent operation for suggesting a subsequent operation to a user in a specified situation is stored. The subsequent operation may include, for example, a subsequent utterance. According to an embodiment, the capsule database 230 may include a layout registry that stores layout information of information output through the user terminal 100. According to an embodiment, the capsule database 230 may include a vocabulary registry in which vocabulary information included in capsule information is stored. According to an embodiment, the capsule database 230 may include a dialogue registry in which dialogue (or interaction) information with a user is stored. The capsule database 230 may update an object stored through a developer tool. The developer tool may include, for example, a function editor for updating a motion object or a concept object. The developer tool may include a vocabulary editor for updating the vocabulary. The developer tool may include a strategy editor for generating and registering a strategy for determining a plan. The developer tool may include a dialog editor that creates a conversation with the user. The developer tool may include a follow up editor capable of activating a follow-on goal and editing a follow-up utterance that provides hints. The following targets may be determined based on currently set targets, user preferences, or environmental conditions. In one embodiment, the capsule database 230 may be implemented in the user terminal 100.

일 실시 예의 실행 엔진(240)은 상기 생성된 플랜을 이용하여 결과를 산출할 수 있다. 엔드 유저 인터페이스(250)는 산출된 결과를 사용자 단말(100)로 송신할 수 있다. 이에 따라, 사용자 단말(100)은 상기 결과를 수신하고, 상기 수신된 결과를 사용자에게 제공할 수 있다. 일 실시 예의 매니지먼트 플랫폼(260)은 지능형 서버(200)에서 이용되는 정보를 관리할 수 있다. 일 실시 예의 빅 데이터 플랫폼(270)은 사용자의 데이터를 수집할 수 있다. 일 실시 예의 분석 플랫폼(280)은 지능형 서버(200)의 QoS(quality of service)를 관리할 수 있다. 예를 들어, 분석 플랫폼(280)은 지능형 서버(200)의 구성 요소 및 처리 속도(또는, 효율성)를 관리할 수 있다.The execution engine 240 of one embodiment may calculate a result using the generated plan. The end user interface 250 may transmit the calculated result to the user terminal 100. Accordingly, the user terminal 100 may receive the result and provide the received result to the user. The management platform 260 of one embodiment may manage information used in the intelligent server 200. The big data platform 270 of one embodiment may collect user data. The analysis platform 280 of one embodiment may manage quality of service (QoS) of the intelligent server 200. For example, the analysis platform 280 may manage the components and processing speed (or efficiency) of the intelligent server 200.

일 실시 예의 서비스 서버(300)는 사용자 단말(100)에 지정된 서비스(예: 음식 주문 또는 호텔 예약)를 제공할 수 있다. 일 실시 예에 따르면, 서비스 서버(300)는 제3 자에 의해 운영되는 서버일 수 있다. 일 실시 예의 서비스 서버(300)는 수신된 음성 입력에 대응되는 플랜을 생성하기 위한 정보를 지능형 서버(200)에 제공할 수 있다. 상기 제공된 정보는 캡슐 데이터베이스(230)에 저장될 수 있다. 또한, 서비스 서버(300)는 플랜에 따른 결과 정보를 지능형 서버(200)에 제공할 수 있다. The service server 300 according to an embodiment may provide a service (eg, food order or hotel reservation) designated to the user terminal 100. According to one embodiment, the service server 300 may be a server operated by a third party. The service server 300 according to an embodiment may provide information for generating a plan corresponding to the received voice input to the intelligent server 200. The provided information may be stored in the capsule database 230. In addition, the service server 300 may provide result information according to the plan to the intelligent server 200.

위에 기술된 통합 지능화 시스템에서, 상기 사용자 단말(100)은, 사용자 입력에 응답하여 사용자에게 다양한 인텔리전트 서비스를 제공할 수 있다. 상기 사용자 입력은, 예를 들어, 물리적 버튼을 통한 입력, 터치 입력 또는 음성 입력을 포함할 수 있다.In the integrated intelligent system described above, the user terminal 100 may provide various intelligent services to the user in response to user input. The user input may include, for example, input through a physical button, touch input, or voice input.

일 실시 예에서, 상기 사용자 단말(100)은 내부에 저장된 지능형 앱(또는, 음성 인식 앱)을 통해 음성 인식 서비스를 제공할 수 있다. 이 경우, 예를 들어, 사용자 단말(100)은 상기 마이크를 통해 수신된 사용자 발화(utterance) 또는 음성 입력(voice input)를 인식하고, 인식된 음성 입력에 대응되는 서비스를 사용자에게 제공할 수 있다.In one embodiment, the user terminal 100 may provide a voice recognition service through an intelligent app (or voice recognition app) stored therein. In this case, for example, the user terminal 100 may recognize a user's utterance or voice input received through the microphone, and provide a service corresponding to the recognized voice input to the user. .

일 실시 예에서, 사용자 단말(100)은 수신된 음성 입력에 기초하여, 단독으로 또는 상기 지능형 서버 및/또는 서비스 서버와 함께 지정된 동작을 수행할 수 있다. 예를 들어, 사용자 단말(100)은 수신된 음성 입력에 대응되는 앱을 실행시키고, 실행된 앱을 통해 지정된 동작을 수행할 수 있다.In one embodiment, the user terminal 100 may perform a designated operation alone or together with the intelligent server and / or service server based on the received voice input. For example, the user terminal 100 may execute an app corresponding to the received voice input, and perform a designated operation through the executed app.

일 실시 예에서, 사용자 단말(100)이 지능형 서버(200) 및/또는 서비스 서버와 함께 서비스를 제공하는 경우에는, 상기 사용자 단말은, 상기 마이크(120)를 이용하여 사용자 발화를 감지하고, 상기 감지된 사용자 발화에 대응되는 신호(또는, 음성 데이터)를 생성할 수 있다. 상기 사용자 단말은, 상기 음성 데이터를 통신 인터페이스(110)를 이용하여 지능형 서버(200)로 송신할 수 있다.In one embodiment, when the user terminal 100 provides a service with the intelligent server 200 and / or a service server, the user terminal detects a user's utterance using the microphone 120 and the A signal (or voice data) corresponding to the sensed user speech may be generated. The user terminal may transmit the voice data to the intelligent server 200 using the communication interface 110.

일 실시 예에 따른 지능형 서버(200)는 사용자 단말(100)로부터 수신된 음성 입력에 대한 응답으로써, 음성 입력에 대응되는 태스크(task)를 수행하기 위한 플랜, 또는 상기 플랜에 따라 동작을 수행한 결과를 생성할 수 있다. 상기 플랜은, 예를 들어, 사용자의 음성 입력에 대응되는 태스크(task)를 수행하기 위한 복수의 동작, 및 상기 복수의 동작과 관련된 복수의 컨셉을 포함할 수 있다. 상기 컨셉은 상기 복수의 동작의 실행에 입력되는 파라미터나, 복수의 동작의 실행에 의해 출력되는 결과 값을 정의한 것일 수 있다. 상기 플랜은 복수의 동작, 및 복수의 컨셉 사이의 연관 정보를 포함할 수 있다.The intelligent server 200 according to an embodiment is a response to the voice input received from the user terminal 100, a plan for performing a task corresponding to the voice input, or performing an operation according to the plan Can produce results. The plan may include, for example, a plurality of operations for performing a task corresponding to a user's voice input, and a plurality of concepts related to the plurality of operations. The concept may be defined as a parameter input to execution of the plurality of operations or a result value output by execution of the plurality of operations. The plan may include information related to a plurality of operations and a plurality of concepts.

일 실시 예의 사용자 단말(100)은, 통신 인터페이스(110)를 이용하여 상기 응답을 수신할 수 있다. 사용자 단말(100)은 상기 스피커(130)를 이용하여 사용자 단말(100) 내부에서 생성된 음성 신호를 외부로 출력하거나, 디스플레이(140)를 이용하여 사용자 단말(100) 내부에서 생성된 이미지를 외부로 출력할 수 있다.The user terminal 100 according to an embodiment may receive the response using the communication interface 110. The user terminal 100 outputs the voice signal generated inside the user terminal 100 using the speaker 130 to the outside, or uses the display 140 to externally generate the image generated inside the user terminal 100. Can be output as

도 2는 다양한 실시 예에 따른, 컨셉과 동작의 관계 정보가 데이터베이스에 저장된 형태를 나타낸 도면이다.2 is a diagram illustrating a form in which relationship information between a concept and an operation is stored in a database according to various embodiments of the present disclosure.

도 1 및 도 2를 참조하면, 상기 지능형 서버(200)의 캡슐 데이터베이스(예: 캡슐 데이터베이스(230))는 CAN(concept action network) 형태로 캡슐을 저장할 수 있다. 상기 캡슐 데이터베이스는 사용자의 음성 입력에 대응되는 태스크를 처리하기 위한 동작, 및 상기 동작을 위해 필요한 파라미터를 CAN(concept action network) 형태로 저장될 수 있다.1 and 2, the capsule database (eg, capsule database 230) of the intelligent server 200 may store capsules in the form of a concept action network (CAN). The capsule database may store an operation for processing a task corresponding to a user's voice input, and parameters required for the operation in a concept action network (CAN) form.

상기 캡슐 데이터베이스는 복수의 도메인(예: 어플리케이션) 각각에 대응되는 복수의 캡슐(capsule(A)(401), capsule(B)(404))을 저장할 수 있다. 일 실시 예에 따르면, 하나의 캡슐(예: capsule(A)(401))은 하나의 도메인(예: 위치(geo), 어플리케이션)에 대응될 수 있다. 또한, 하나의 캡슐에는 캡슐과 관련된 도메인에 대한 기능을 수행하기 위한 적어도 하나의 서비스 제공자(예: CP 1(402) 또는 CP 2(403))가 대응될 수 있다. 일 실시 예에 따르면, 하나의 캡슐은 지정된 기능을 수행하기 위한 적어도 하나 이상의 동작(410) 및 적어도 하나 이상의 컨셉(420)을 포함할 수 있다. The capsule database may store a plurality of capsules (capsule (A) 401, capsule (B) 404) corresponding to each of a plurality of domains (eg, applications). According to an embodiment, one capsule (eg, capsule (A) 401) may correspond to one domain (eg, location, application). In addition, at least one service provider (eg, CP 1 402 or CP 2 403) for performing a function for a domain related to the capsule may correspond to one capsule. According to an embodiment, one capsule may include at least one operation 410 and at least one concept 420 for performing a designated function.

상기 자연어 플랫폼(220)은 캡슐 데이터베이스에 저장된 캡슐을 이용하여 수신된 음성 입력에 대응하는 태스크를 수행하기 위한 플랜을 생성할 수 있다. 예를 들어, 자연어 플랫폼의 플래너 모듈(225)은 캡슐 데이터베이스에 저장된 캡슐을 이용하여 플랜을 생성할 수 있다. 예를 들어, 캡슐 A(410)의 동작들(4011,4013)과 컨셉들(4012,4014) 및 캡슐 B(404)의 동작(4041)과 컨셉(4042)을 이용하여 플랜(407)을 생성할 수 있다. The natural language platform 220 may generate a plan for performing a task corresponding to the received voice input using the capsule stored in the capsule database. For example, the planner module 225 of the natural language platform may generate a plan using capsules stored in the capsule database. For example, the plan 407 is generated using the operations 4011 and 4013 and concepts 4012 and 4014 of capsule A 410 and the operation 4041 and concept 4042 of capsule B 404. can do.

도 3은 다양한 실시 예에 따른 사용자 단말이 지능형 앱을 통해 수신된 음성 입력을 처리하는 화면을 나타낸 도면이다.3 is a diagram illustrating a screen in which a user terminal processes voice input received through an intelligent app according to various embodiments of the present disclosure.

사용자 단말(100)은 지능형 서버(200)를 통해 사용자 입력을 처리하기 위해 지능형 앱을 실행할 수 있다.The user terminal 100 may run an intelligent app to process user input through the intelligent server 200.

일 실시 예에 따르면, 310 화면에서, 사용자 단말(100)은 지정된 음성 입력(예: 웨이크 업!)를 인식하거나 하드웨어 키(예: 전용 하드웨어 키)를 통한 입력을 수신하면, 음성 입력을 처리하기 위한 지능형 앱을 실행할 수 있다. 사용자 단말(100)은, 예를 들어, 스케줄 앱을 실행한 상태에서 지능형 앱을 실행할 수 있다. 일 실시 예에 따르면, 사용자 단말(100)은 지능형 앱에 대응되는 오브젝트(예: 아이콘)(311)를 디스플레이(140)에 표시할 수 있다. 일 실시 예에 따르면, 사용자 단말(100)은 사용자 발화에 의한 음성 입력을 수신할 수 있다. 예를 들어, 사용자 단말(100)은 "이번주 일정 알려줘!"라는 음성 입력을 수신할 수 있다. 일 실시 예에 따르면, 사용자 단말(100)은 수신된 음성 입력의 텍스트 데이터가 표시된 지능형 앱의 UI(user interface)(313)(예: 입력창)를 디스플레이에 표시할 수 있다.According to an embodiment, on the screen 310, the user terminal 100 recognizes a designated voice input (eg, wake up!) Or receives an input through a hardware key (eg, a dedicated hardware key) to process the voice input. For running intelligent apps. The user terminal 100 may, for example, execute an intelligent app while the schedule app is running. According to an embodiment, the user terminal 100 may display an object (eg, icon) 311 corresponding to the intelligent app on the display 140. According to an embodiment, the user terminal 100 may receive a voice input by user speech. For example, the user terminal 100 may receive a voice input "Please tell me about this week's schedule!" According to an embodiment, the user terminal 100 may display a UI (user interface) 313 (eg, an input window) of an intelligent app in which text data of the received voice input is displayed on the display.

일 실시 예에 따르면, 320 화면에서, 사용자 단말(100)은 수신된 음성 입력에 대응되는 결과를 디스플레이에 표시할 수 있다. 예를 들어, 사용자 단말(100)은 수신된 사용자 입력에 대응되는 플랜을 수신하고, 플랜에 따라 '이번주 일정'을 디스플레이에 표시할 수 있다.According to an embodiment, on the 320 screen, the user terminal 100 may display a result corresponding to the received voice input on the display. For example, the user terminal 100 may receive a plan corresponding to the received user input, and display a 'this week schedule' on the display according to the plan.

도 4는 본 발명의 일 실시 예에 따른 변형 발화 텍스트 세트를 생성하는 지능형 서버를 보여주는 블록도이다. 도 4에서는 상술한 도면들에서 설명된 구성 요소와 중복되는 구성 요소에 대해서는 설명을 생략할 수 있다.4 is a block diagram showing an intelligent server generating a modified speech text set according to an embodiment of the present invention. In FIG. 4, description of components that overlap with those described in the above-described drawings may be omitted.

도 4를 참조하면, 지능형 서버(200)는 도 1에서 설명된 구성의 적어도 일부(예: 자동 음성 인식 모듈(221) 및 자연어 이해 모듈(223)) 및 파라미터 수집 모듈(291), 변형 발화 생성 모듈(292), 제1 및 제2 변형 발화 추천 모듈(293, 294), 자연어 이해(natural language understanding)(NLU) 트레이닝 모듈(295)을 포함할 수 있다. Referring to FIG. 4, the intelligent server 200 generates at least a part of the configuration described in FIG. 1 (eg, an automatic speech recognition module 221 and a natural language understanding module 223), a parameter collection module 291, and variant speech generation It may include a module 292, first and second modified speech recommendation modules 293 and 294, and a natural language understanding (NLU) training module 295.

일 실시 예에 따르면, 지능형 서버(200)는 적어도 하나의 통신 회로, 메모리 및 프로세서를 포함할 수 있다. 상기 통신 회로는 적어도 하나의 외부 전자 장치(예: 개발자 단말(500) 또는 사용자 단말(100)) 중 적어도 하나와 통신 채널을 형성하고, 상기 통신 채널을 통하여 상기 외부 전자 장치와 데이터를 송수신할 수 있다. 상기 메모리는 상기 지능형 서버(200) 구동과 관련한 다양한 데이터, 명령어, 알고리즘, 엔진 등을 저장할 수 있다. 상기 프로세서는 상기 메모리에 저장된 명령어를 실행하여 파라미터 수집 모듈(291), 변형 발화 생성 모듈(292), 제1 및 제2 변형 발화 추천 모듈(293, 294), NLU(자연어 이해) 트레이닝 모듈(295)을 구동시킬 수 있다. 지능형 서버(200)는 상기 통신 회로를 통해 외부 전자 장치(예: 사용자 단말(100), 개발자 단말(500))와 데이터(또는, 정보)를 송수신할 수 있다.According to an embodiment, the intelligent server 200 may include at least one communication circuit, memory, and processor. The communication circuit forms a communication channel with at least one of at least one external electronic device (eg, a developer terminal 500 or a user terminal 100), and transmits and receives data to and from the external electronic device through the communication channel. have. The memory may store various data, commands, algorithms, engines, etc. related to driving the intelligent server 200. The processor executes the instructions stored in the memory to collect a parameter collection module 291, a modified speech generation module 292, first and second modified speech recommendation modules 293, 294, and a natural language understanding (NLU) training module 295. ) Can be driven. The intelligent server 200 may transmit and receive data (or information) with an external electronic device (eg, the user terminal 100 and the developer terminal 500) through the communication circuit.

사용자 단말(100)은 사용자의 발화를 사용자 입력으로 수신하고, 자동 음성 인식 모듈(221)로 사용자 입력(예: 음성 데이터)을 전송할 수 있다. 자동 음성 인식 모듈(221)은 사용자 단말(100)로부터 수신된 사용자 입력을 사용자 발화 텍스트로 변환할 수 있다. 사용자 발화 텍스트는 자연어 이해 모듈(223) 및 파라미터 수집 모듈(291)을 통해 변형 발화 생성 모듈(292)로 전달될 수 있다. 변형 발화 생성 모듈(292)은 사용자 발화 텍스트에 대응하는 변형 발화 텍스트 세트를 생성할 수 있다. 변형 발화 텍스트 세트는 복수의 변형 발화 텍스트를 포함할 수 있다. 사용자 단말(100)은 도 1의 사용자 단말(100)과 동일 또는 유사하게 구성될 수 있다.The user terminal 100 may receive a user's speech as a user input and transmit a user input (eg, voice data) to the automatic speech recognition module 221. The automatic speech recognition module 221 may convert user input received from the user terminal 100 into user spoken text. The user utterance text may be transmitted to the modified utterance generation module 292 through the natural language understanding module 223 and the parameter collection module 291. The modified speech generation module 292 may generate a modified speech text set corresponding to the user speech text. The variant speech text set may include a plurality of variant speech texts. The user terminal 100 may be configured the same or similar to the user terminal 100 of FIG. 1.

개발자 단말(500)은 자연어 이해 모듈(223)을 트레이닝하기 위한 트레이닝 발화 텍스트 세트를 변형 발화 생성 모듈(292) 및 NLU(자연어 이해) 트레이닝 모듈(295)로 전송할 수 있다. 예를 들어, 트레이닝 발화 텍스트 세트는 개발자에 의해 작성될 수 있다. 개발자 단말(500)은 발화 입력기를 포함할 수 있다. 개발자는 발화 입력기를 이용하여 대표 발화 텍스트(예: 각 서비스에서 사용자들이 자주 사용할 것으로 예측되는 발화)를 입력하고, 도메인, 의도 및 파라미터에 따라 대표 발화 텍스트에 대응하는 응용 발화 텍스트를 개발자 단말(500)에 입력할 수 있다. 개발자 단말(500)은 대표 발화 텍스트 및 응용 발화 텍스트를 포함하는 트레이닝 발화 텍스트 세트를 저장할 수 있다. 예를 들어, 트레이닝 발화 텍스트 세트는 개발자에 의해 수동으로 입력될 수 있다. 트레이닝 발화 텍스트 세트는 개발자에 의해 작성된 복수의 트레이닝 발화 텍스트를 포함할 수 있다. 변형 발화 생성 모듈(292)은 개발자 단말(500)로부터 수신된 트레이닝 발화 텍스트 세트에 대응하는 변형 발화 텍스트 세트를 생성할 수 있다. 개발자 단말(500)은 도 1의 사용자 단말(100)과 동일 또는 유사하게 구성될 수 있다.The developer terminal 500 may transmit a set of training speech texts for training the natural language understanding module 223 to the modified speech generation module 292 and the NLU (natural language understanding) training module 295. For example, a set of training speech texts can be written by the developer. The developer terminal 500 may include an utterance input device. The developer inputs a representative utterance text (for example, utterances predicted to be frequently used by users in each service) using a utterance input device, and inputs the application utterance text corresponding to the representative utterance text according to the domain, intention, and parameters to the developer terminal 500 ). The developer terminal 500 may store a set of training speech texts including representative speech texts and application speech texts. For example, a set of training speech texts can be manually entered by the developer. The training speech text set may include a plurality of training speech texts written by a developer. The variant speech generation module 292 may generate a variant speech text set corresponding to the training speech text set received from the developer terminal 500. The developer terminal 500 may be configured the same or similar to the user terminal 100 of FIG. 1.

일 실시 예에 따르면, 개발자는 발화 입력기를 통해 트레이닝 발화 텍스트 세트를 생성하기 위한 트레이닝 발화 정보(예: 도메인 정보, 카테고리 정보, 사용자 발화 예시 정보 및 의도 정보)를 입력할 수 있다. 개발자 단말(500)은 트레이닝 발화 정보를 변형 발화 생성 모듈(292)로 전송할 수 있다. 변형 발화 생성 모듈(292)은 개발자 단말(500)로부터 수신된 트레이닝 발화 정보에 기초하여 변형 발화 텍스트 세트를 생성할 수 있다.According to one embodiment, the developer may input training speech information (eg, domain information, category information, user speech example information, and intention information) for generating a training speech text set through the speech input device. The developer terminal 500 may transmit training speech information to the modified speech generation module 292. The modified speech generation module 292 may generate a modified speech text set based on training speech information received from the developer terminal 500.

지능형 서버(200)는 트레이닝 발화 텍스트 세트를 수신하여 자연어 이해 모듈(223)을 트레이닝하기 위한 자연어 이해 트레이닝 모드(또는 기능)로 동작할 수 있다. 예를 들면, 자연어 이해 트레이닝 모드 시, NLU(자연어 이해) 트레이닝 모듈(295)은 트레이닝 발화 텍스트 세트에 기초하여 자연어 이해 모듈(223)을 트레이닝 할 수 있다. 다만, 트레이닝 발화 텍스트 세트는 개발자에 의해 수동적으로 생성되기 때문에, 트레이닝 발화 텍스트 세트에 기초한 트레이닝 방법의 성능은 개발자의 능력에 좌우될 수 있다. 본 발명의 실시 예에 따른 지능형 서버(200)는 트레이닝 성능을 향상시키기 위해 추가적인 발화 텍스트를 생성하여 자연어 이해 모듈(223)을 트레이닝 할 수 있다.The intelligent server 200 may operate in a natural language understanding training mode (or function) for training the natural language understanding module 223 by receiving a training speech text set. For example, in the natural language understanding training mode, the NLU (natural language understanding) training module 295 may train the natural language understanding module 223 based on the training speech text set. However, since the training speech text set is manually generated by the developer, the performance of the training method based on the training speech text set may depend on the ability of the developer. The intelligent server 200 according to an embodiment of the present invention may train the natural language understanding module 223 by generating additional speech text to improve training performance.

일 실시 예에 따르면, 변형 발화 생성 모듈(292)은 트레이닝 발화 텍스트 세트(또는 트레이닝 발화 정보)를 수신하여 추가적인 변형 발화 텍스트 세트를 생성할 수 있다. NLU(자연어 이해) 트레이닝 모듈(295)은 변형 발화 텍스트 세트에 기초하여 추가적으로 자연어 이해 모듈(223)을 트레이닝 할 수 있다. 자연어 이해 모듈(223)은 트레이닝 발화 텍스트 세트 및 변형 발화 텍스트 세트를 이용하여 트레이닝되고, 트레이닝 발화 텍스트 세트 만으로 트레이닝이 수행되는 상황보다 자연어 이해 모듈(223)의 트레이닝 효과는 향상될 수 있다.According to an embodiment, the modified speech generation module 292 may receive a training speech text set (or training speech information) to generate an additional modified speech text set. The NLU (natural language understanding) training module 295 may additionally train the natural language understanding module 223 based on the modified speech text set. The natural language understanding module 223 may be trained using the training speech text set and the modified speech text set, and the training effect of the natural language understanding module 223 may be improved compared to a situation in which training is performed only with the training speech text set.

지능형 서버(200)는 트레이닝 발화 텍스트 세트 또는 사용자 발화 텍스트에 기초하여 개발자 또는 발명자에게 변형 발화 텍스트 세트를 제공하는 발화 추천 모드(또는 기능)로 동작할 수 있다.The intelligent server 200 may operate in a speech recommendation mode (or function) that provides a modified speech text set to a developer or inventor based on a training speech text set or a user speech text.

일 실시 예에 따르면, 변형 발화 생성 모듈(292)은 트레이닝 발화 텍스트 세트(또는 트레이닝 발화 정보)를 수신하는 경우 트레이닝 발화 텍스트 세트(또는 트레이닝 발화 정보)에 대응하는 변형 발화 텍스트 세트를 생성할 수 있다. 생성된 변형 발화 텍스트 세트는 제1 변형 발화 추천 모듈(293)로 전송될 수 있다. 제1 변형 발화 추천 모듈(293)은 개발자 단말(500)로 생성된 변형 발화 텍스트 세트를 전송할 수 있다. 개발자는 변형 발화 텍스트 세트를 활용하여 새로운 트레이닝 발화 텍스트 세트를 생성할 수 있다. 예를 들어, 개발자는 개발자 단말(500)에서 실행되는 발화 입력기를 통해 트레이닝 발화 정보(예: 도메인 정보, 카테고리 정보, 사용자 발화 예시 정보 및 의도 정보)를 입력할 수 있고, 발화 입력기는 입력된 트레이닝 발화 정보에 기초하여 트레이닝 발화 텍스트 세트를 생성할 수 있다. 발화 입력기는 트레이닝 발화 정보를 입력 받는 과정에서 개발자에게 변형 발화 텍스트 세트를 제공할 수 있다. 개발자는 제공된 변형 발화 텍스트 세트를 참고하여 더욱 다양한 사용자 발화 예시들을 입력할 수 있고, 발화 입력기는 기 저장된 트레이닝 발화 텍스트에 새로 입력된 사용자 발화 예시들을 추가하여 새로운 트레이닝 발화 텍스트 세트를 생성할 수 있다. 개발자 단말(500)은 새로운 트레이닝 발화 텍스트 세트를 지능형 서버(200)에 전송하고, NLU(자연어 이해) 트레이닝 모듈(295)은 새로운 트레이닝 발화 텍스트 세트에 활용하여 자연어 이해 모듈(223)의 트레이닝 성능은 향상될 수 있다.According to one embodiment, the variant speech generation module 292 may generate a variant speech text set corresponding to the training speech text set (or training speech information) when receiving the training speech text set (or training speech information). . The generated modified speech text set may be transmitted to the first modified speech recommendation module 293. The first modified speech recommendation module 293 may transmit the modified speech text set generated by the developer terminal 500. Developers can create a new set of training utterance text by utilizing the variant utterance text set. For example, the developer may input training utterance information (eg, domain information, category information, user utterance example information, and intention information) through the utterance input device executed in the developer terminal 500, and the utterance input device may input training A training speech text set may be generated based on speech information. The speech input device may provide a modified speech text set to a developer in the process of receiving training speech information. The developer may input more various user utterance examples by referring to the provided set of utterance utterance texts, and the utterance input unit may generate a new set of training utterance texts by adding newly entered user utterance examples to the previously stored training utterance examples. The developer terminal 500 transmits a new set of training speech texts to the intelligent server 200, and the NLU (natural language understanding) training module 295 utilizes the new training speech text set for training training of the natural language understanding module 223. Can be improved.

일 실시 예에 따르면, 사용자 단말(100)에 사용자 발화가 입력되는 경우 사용자 발화는 자동 음성 인식 모듈(221) 및 자연어 이해 모듈(223)을 통해 사용자 발화 텍스트로 변환되고, 변형 발화 생성 모듈(292)은 사용자 발화 텍스트를 수신하는 경우 사용자 발화 텍스트에 대응하는 변형 발화 텍스트 세트를 생성할 수 있다. 생성된 변형 발화 텍스트 세트는 제2 변형 발화 추천 모듈(294)로 전송될 수 있다. 제2 변형 발화 추천 모듈(294)은 사용자 단말(100)로 생성된 변형 발화 텍스트 세트를 전송할 수 있다. 사용자 단말(100)은 사용자 발화가 입력될 때 변형 발화 텍스트 세트를 제공할 수 있다. 예를 들어, 사용자 단말(100)이 처음에 인식한 사용자 발화 텍스트가 사용자의 의도에 일치하지 않는 경우, 사용자는 사용자 발화 패턴과 유사한(또는 사용자에게 익숙한) 발화 텍스트(예: 변형 발화 텍스트 세트)를 추천 받을 수 있다. 사용자 단말(100)은 사용자 발화(예: “전화 닫아주삼”)에 대하여 대표 발화(예: “전화 종료”)가 아닌 사용자 발화 패턴과 유사한(사용자에게 익숙한) 발화 텍스트(예: “전화 꺼주삼”)를 추천할 수 있다. 사용자의 발화 패턴은 다양하고, 자연어 이해 모듈(223)에서 사용되는 발화 모델도 다양하기 때문에, 사용자가 자주 사용하는 발화 패턴과 자연어 이해 모듈(223)에서 잘 처리되는 발화 패턴은 서로 다를 수 있다. 따라서, 자연어 이해 모듈(223)이 처리하지 못하는 사용자 발화가 발생할 수 있다. 변형 발화 생성 모듈(292)에 의해 생성된 변형 발화 텍스트 세트는 자연어 이해 모듈(223)이 처리하지 못하는 부분을 보충할 수 있다.According to an embodiment, when a user's speech is input to the user terminal 100, the user's speech is converted into user speech text through the automatic speech recognition module 221 and the natural language understanding module 223, and the modified speech generation module 292 ) May generate a set of modified utterance text corresponding to the user utterance text when receiving the user utterance text. The generated variant speech text set may be transmitted to the second variant speech recommendation module 294. The second modified speech recommendation module 294 may transmit the modified speech text set generated by the user terminal 100. The user terminal 100 may provide a modified speech text set when a user speech is input. For example, if the user's utterance text initially recognized by the user terminal 100 does not match the user's intention, the user has utterance text similar to (or familiar to the user's) utterance pattern (for example, a set of variant utterance text) Can be recommended. The user terminal 100 is a user's utterance (eg, “close the phone”), but is not a representative utterance (eg, “end call”), a utterance text similar to the user utterance pattern (user-friendly) (eg, “Turn off the phone”) ”) Can be recommended. Since the user's utterance patterns are various, and the utterance models used in the natural language understanding module 223 are various, the utterance patterns frequently used by the user and the utterance patterns well processed in the natural language understanding module 223 may be different. Therefore, user speech that the natural language understanding module 223 cannot process may occur. The set of deformed utterance texts generated by the deformed utterance generation module 292 may supplement a portion that the natural language understanding module 223 cannot process.

변형 발화 생성 모듈(292)은 자연어 이해 트레이닝 모드 또는 발화 추천 모드 시 다양한 기준에 기초하여 변형 발화 텍스트 세트를 생성할 수 있다. 변형 발화 생성 모듈(292)은 사용자 발화에 기초하여 변형 발화 텍스트 세트를 생성할 수 있다.The variant speech generation module 292 may generate a variant speech text set based on various criteria in a natural language understanding training mode or a speech recommendation mode. The variant speech generation module 292 may generate a variant speech text set based on a user's speech.

일 실시 예에 따르면, 과거에 입력된 사용자 입력이 텍스트로 변환된 사용자 발화 데이터는 자연어 이해 모듈(223)을 통해 자연어 인식 데이터베이스에 저장될 수 있다. 파라미터 수집 모듈(291)은 자연어 인식 데이터베이스로부터 사용자 발화 데이터를 제공받아 사용자 발화 분류 정보를 생성할 수 있다. 사용자 발화 분류 정보는 사용자 발화 데이터에 대한 도메인 정보, 의도 정보 및 파라미터 정보 등을 포함할 수 있다. 변형 발화 생성 모듈(292)은 파라미터 수집 모듈(291)로부터 사용자 발화 분류 정보를 수신하고, 사용자 발화 분류 정보에 기초하여 도메인 별로 또는 의도 별로 변형 발화 텍스트 세트를 생성할 수 있다.According to an embodiment of the present disclosure, user speech data in which user input that has been input in the past is converted into text may be stored in the natural language recognition database through the natural language understanding module 223. The parameter collection module 291 may generate user speech classification information by receiving user speech data from a natural language recognition database. User utterance classification information may include domain information, intent information, and parameter information for user utterance data. The variant utterance generation module 292 may receive user utterance classification information from the parameter collection module 291 and generate a variant utterance text set for each domain or for each intention based on the user utterance classification information.

일 실시 예에 따르면, 변형 발화 생성 모듈(292)은 자연어 이해 트레이닝 모드 시 수신된 트레이닝 발화 텍스트 세트에 포함된 트레이닝 발화 텍스트의 수가 기준 발화 횟수보다 작은 경우 변형 발화 텍스트 세트를 생성할 수 있다. 트레이닝 발화 텍스트 세트에 포함된 트레이닝 발화 텍스트의 수가 기준 발화 횟수보다 작으면, 자연어 이해 모듈(223)의 트레이닝 효과는 감소할 수 있고, 따라서 추가적인 변형 발화 텍스트 세트가 필요할 수 있다.According to an embodiment, the modified speech generation module 292 may generate a modified speech text set when the number of training speech texts included in the training speech text set received in the natural language understanding training mode is smaller than the reference speech number. If the number of training speech texts included in the training speech text set is less than the reference speech count, the training effect of the natural language understanding module 223 may be reduced, and thus an additional modified speech text set may be required.

변형 발화 생성 모듈(292)은 생성 모델 또는 전이 학습 모델에 기초하여 변형 발화 텍스트 세트를 생성할 수 있다. 예를 들면, 생성 모델은 GAN(Generative Adversarial Networks), VAE(Variational Autoencoder), DNN(Deep Neural Network) 등을 포함하고, 전이 학습 모델은 Style-transfer 등을 포함할 수 있다.The variant speech generation module 292 may generate a variant speech text set based on a generation model or a transfer learning model. For example, the generation model may include GAN (Generative Adversarial Networks), VAE (Variational Autoencoder), DNN (Deep Neural Network), and the like, and the transfer learning model may include Style-transfer.

일 실시 예에 따르면, 변형 발화 생성 모듈(292)은 생성 모듈 및 검사 모듈을 포함할 수 있다. 생성 모듈 및 검사 모듈은 생성 모델을 구현할 수 있다. 생성 모듈은 사용자 발화 데이터를 이용하여 후보 발화 텍스트를 생성할 수 있다. 검사 모듈은 후보 발화 텍스트와 기준 발화 텍스트(예: 트레이닝 발화 텍스트 세트 또는 사용자 발화 텍스트)의 유사 여부를 비교할 수 있다. 후보 발화 텍스트와 기준 발화 텍스트가 유사한 경우(예: 유사도가 지정된 비율 이상인 경우), 검사 모듈은 기준 발화 텍스트와 유사한 후보 발화 텍스트를 변형 발화 텍스트 세트로 선택할 수 있다. 생성 모듈 및 검사 모듈은 도메인, 의도 및 파라미터 중 적어도 하나를 다르게 설정하면서 생성 및 검사를 반복하여 기준 발화 텍스트와 유사한 다양한 변형 발화 텍스트 세트를 생성할 수 있다.According to an embodiment, the modified utterance generation module 292 may include a generation module and an inspection module. The generation module and inspection module can implement a generation model. The generation module may generate candidate speech text using user speech data. The inspection module may compare whether the candidate speech text is similar to the reference speech text (eg, training speech text set or user speech text). When the candidate spoken text and the reference spoken text are similar (for example, when the similarity is greater than a specified ratio), the inspection module may select candidate spoken text similar to the reference spoken text as a set of modified spoken text. The generation module and the inspection module may generate and set various modified utterance texts similar to the reference utterance text by repeating the generation and inspection while setting at least one of the domain, intent and parameters differently.

일 실시 예에 따르면, 변형 발화 생성 모듈(292)은 기준 발화 텍스트의 도메인(제1 도메인)을 판별할 수 있다. 변형 발화 생성 모듈(292)은 제1 도메인과 유사한 제2 도메인을 결정할 수 있다. 변형 발화 생성 모듈(292)은 제2 도메인에서 제1 도메인에 대한 자연어 이해 모듈(223)의 트레이닝을 위한 변형 발화 텍스트 세트를 생성할 수 있다.According to an embodiment, the modified speech generation module 292 may determine a domain (first domain) of the reference speech text. The modified utterance generation module 292 may determine a second domain similar to the first domain. The modified speech generation module 292 may generate a modified speech text set for training the natural language understanding module 223 for the first domain in the second domain.

일 실시 예에 따르면, 제1 도메인에 유사한 제2 도메인은 카테고리(category)에 기초하여 결정될 수 있다. 예를 들어, 제1 도메인(예: 피자헛 앱)의 카테고리가 “피자 배달”인 경우, 제2 도메인(예: 도미노피자 앱)은 “피자 배달”이라는 카테고리 내의 도메인들(예: 도미노피자 앱, 미스터피자 앱) 중에서 선택될 수 있다.According to an embodiment, a second domain similar to the first domain may be determined based on a category. For example, if the category of the first domain (eg Pizza Hut app) is “pizza delivery”, the second domain (eg Domino Pizza app) is a domain within the category “Pizza Delivery” (eg Domino Pizza app). , Mr Pizza app).

일 실시 예에 따르면, 제1 도메인에 유사한 제2 도메인은 의도(intent)에 기초하여 결정될 수 있다. 예를 들어, 제1 도메인(예: 메시지 앱)의 의도가 “문자 전송”인 경우, 제2 도메인(예: 카카오톡 앱)은 “문자 전송”이라는 의도를 가지는 도메인들(예: 카카오톡 앱, 라인 앱) 중에서 선택될 수 있다.According to an embodiment, a second domain similar to the first domain may be determined based on an intent. For example, when the intention of the first domain (eg, the message app) is “send text”, the second domain (eg, the KakaoTalk app) has domains intended to “send the text” (eg, the KakaoTalk app). , Line app).

일 실시 예에 따르면, 변형 발화 생성 모듈(292)은 전이 학습을 통해 변형 발화 텍스트 세트를 생성할 수 있다. 예를 들어, 변형 발화 생성 모듈(292)은 제1 도메인에서는 사용되지 않고 제2 도메인에서 사용된 발화 패턴을 이용하여 제1 도메인에 대한 변형 발화 텍스트 세트를 생성할 수 있다. 변형 발화 생성 모듈(292)은 제2 도메인에서 사용되는 의도를 제1 도메인으로 전이하여 제1 도메인에 대한 변형 발화 텍스트 세트를 생성할 수 있다.According to an embodiment, the modified speech generation module 292 may generate a modified speech text set through transfer learning. For example, the modified speech generation module 292 may generate a modified speech text set for the first domain using a speech pattern used in the second domain, not used in the first domain. The modified speech generation module 292 may generate a modified speech text set for the first domain by transferring the intent used in the second domain to the first domain.

변형 발화 생성 모듈(292)은 사용자 특성에 기초하여 변형 발화 텍스트 세트를 생성할 수 있다.The variant speech generation module 292 may generate a variant speech text set based on user characteristics.

일 실시 예에 따르면, 파라미터 수집 모듈(291)은 자연어 이해 모듈(223)로부터 사용자 발화 데이터를 수신할 수 있다. 파라미터 수집 모듈(291)은 사용자 발화 데이터를 전처리(노이즈 제거, 샘플 발화 추출, 연관 발화 선택 중 적어도 하나의 처리)를 수행하여 변형 발화 생성 모듈(292)에서 사용되는 형태로 변경할 수 있다. 파라미터 수집 모듈(291)은 전처리된 사용자 발화 데이터를 분석하여 사용자 특성(예: 연령, 지역, 성별)에 대한 정보(이하 사용자 특성 정보)를 생성할 수 있다. 사용자 특성 정보는 연령, 지역 또는 성별에 따라 자주 사용하는 용어들에 대한 정보를 포함할 수 있다. 사용자는 사용자 특성에 따라 동일한 의미에 대하여 형태가 다른 용어들(예: “해주세요”, “해주삼”, 해주세욤”)을 사용할 수 있다.According to an embodiment, the parameter collection module 291 may receive user speech data from the natural language understanding module 223. The parameter collection module 291 may change user utterance data to a form used in the modified utterance generation module 292 by performing pre-processing (noise removal, sample utterance extraction, and processing of at least one of associated utterance selection). The parameter collection module 291 may generate pre-processed user utterance data to generate information about user characteristics (eg, age, region, gender) (hereinafter, user characteristic information). The user characteristic information may include information on terms frequently used according to age, region, or gender. The user can use different terms (eg, “please”, “haejusam”, haejuseom ”) for the same meaning according to user characteristics.

일 실시 예에 따르면, 파라미터 수집 모듈(291)은 사용자 특성 정보에 기초하여 연령, 지역 및 성별에 따라 자주 사용하는 발화 패턴을 추출할 수 있다. 예를 들어, 사용자 특성에 기초한 사용자 발화 패턴은 20대가 자주 사용하는 발화 패턴, 40대가 자주 사용하는 발화 패턴, 부산에서 자주 사용하는 발화 패턴, 제주도에서 자주 사용하는 발화 패턴, 남성이 자주 사용하는 발화 패턴 및 여성이 자주 사용하는 발화 패턴 등을 포함할 수 있다.According to an embodiment, the parameter collection module 291 may extract a frequently used utterance pattern according to age, region, and gender based on user characteristic information. For example, user utterance patterns based on user characteristics include utterance patterns frequently used by 20s, utterance patterns frequently used by 40s, utterance patterns frequently used in Busan, utterance patterns frequently used in Jeju Island, and utterances frequently used by men Patterns and utterance patterns frequently used by women.

일 실시 예에 따르면, 추출된 사용자 발화 패턴의 횟수가 기준 패턴 횟수보다 큰 경우, 변형 발화 생성 모듈(291)은 사용자 발화 패턴에 기초하여 변형 발화 텍스트 세트를 생성할 수 있다. 변형 발화 생성 모듈(291)은 사용자 발화 패턴의 수와 기준 패턴 횟수를 비교할 수 있다. 특정 사용자 발화 패턴의 수가 기준 패턴 횟수보다 큰 경우, 특정 사용자 발화 패턴은 사용자들에 의해 자주 사용된다는 것을 의미한다. 따라서, 변형 발화 생성 모듈(291)은 추가적인 변형 발화 텍스트 세트를 생성하기 위해 특정 사용자 발화 패턴을 사용할 수 있다. 기준 패턴 횟수는 발화량에 기초하여 결정될 수 있다. 기준 패턴 횟수는 발화 복잡도에 따라 결정될 수 있다. 예를 들어, 발화 복잡도는 사용자 발화에 포함된 파라미터(또는 슬롯)의 수에 비례할 수 있다. 복잡한 사용자 발화(예: 사용자 발화에 포함된 파라미터(또는 슬롯)가 많은 사용자 발화)의 경우, 기준 패턴 횟수는 낮게 설정될 수 있다.According to an embodiment, when the number of extracted user speech patterns is greater than the reference pattern number, the modified speech generation module 291 may generate a modified speech text set based on the user speech pattern. The modified utterance generation module 291 may compare the number of user utterance patterns and the number of reference patterns. When the number of specific user speech patterns is greater than the number of reference patterns, it means that the specific user speech patterns are frequently used by users. Accordingly, the modified speech generation module 291 may use a specific user speech pattern to generate additional sets of modified speech text. The number of reference patterns may be determined based on the amount of ignition. The number of reference patterns may be determined according to the utterance complexity. For example, speech complexity may be proportional to the number of parameters (or slots) included in the user speech. In the case of a complex user utterance (eg, a user utterance having a large number of parameters (or slots) included in the user utterance), the reference pattern count may be set low.

일 실시 예에 따르면, 변형 발화 생성 모듈(292)은 사용자 특성 정보에 기초하여 전이 학습을 통해 변형 발화 텍스트 세트를 생성할 수 있다. 예를 들어, 변형 발화 생성 모듈(292)은 10대에 의해 많이 사용되는 제2 도메인에서 사용된 발화 패턴을 이용하여 30대에 의해 많이 사용되는 제1 도메인에 대한 변형 발화 텍스트 세트를 생성할 수 있다.According to an embodiment, the modified speech generation module 292 may generate a modified speech text set through transfer learning based on user characteristic information. For example, the modified utterance generation module 292 may generate a modified utterance text set for the first domain frequently used by the thirties by using the utterance pattern used in the second domain that is frequently used by teenagers. have.

상술한 바와 같이, 다양한 실시 예에 따르면, 지능형 서버(200)는 개발자 단말(500)로부터 수신된 트레이닝 발화 텍스트 세트 또는 사용자 단말(100)로부터 수신된 사용자 입력에 대응하여 다양한 변형 발화 텍스트 세트를 생성할 수 있다. 지능형 서버(200)는 생성된 변형 발화 텍스트 세트를 이용하여 자연어 이해 모듈(223)을 트레이닝 할 수 있다. 지능형 서버(200)는 개발자가 트레이닝 발화 텍스트 세트를 작성하는데 활용하도록 생성된 변형 발화 텍스트 세트를 개발자 단말로 전송할 수 있다. 지능형 서버(200)는 사용자가 사용자 발화에 대응하는 동작을 쉽게 선택할 수 있도록 생성된 변형 발화 텍스트 세트를 사용자 단말로 전송할 수 있다.As described above, according to various embodiments, the intelligent server 200 generates a variety of modified speech text sets in response to the training speech text set received from the developer terminal 500 or the user input received from the user terminal 100. can do. The intelligent server 200 may train the natural language understanding module 223 using the generated set of modified speech texts. The intelligent server 200 may transmit the generated modified speech text set to the developer terminal for use by the developer to create the training speech text set. The intelligent server 200 may transmit the generated modified speech text set to the user terminal so that the user can easily select an operation corresponding to the user speech.

도 5은 도 4의 파라미터 수집 모듈의 일 실시 예를 보여주는 블록도이다.5 is a block diagram illustrating an embodiment of the parameter collection module of FIG. 4.

도 5을 참조하면, 파라미터 수집 모듈(291)은 전처리 모듈(2911) 및 사용자 발화 분류 모듈(2912)을 포함할 수 있다. 전처리 모듈(2911)은 노이즈 제거 모듈(2911a), 샘플링 모듈(2911b) 및 연관 발화 선택 모듈(2911c)을 포함할 수 있다.Referring to FIG. 5, the parameter collection module 291 may include a pre-processing module 2911 and a user utterance classification module 2912. The pre-processing module 2911 may include a noise removal module 2911a, a sampling module 2911b, and an associated speech selection module 2911c.

일 실시 예에 따르면, 자연어 이해 모듈(223)로부터 수신되는 사용자 발화 데이터는 노이즈(예: 사용자 발화의 시작과 종료 사이에 포함된 주변 소음)가 많고, 양(예: 수집되어 축적된 또는 자연어 이해 모듈(223)에 저장된 사용자 발화의 개수)이 많으며, 밸런싱(예: 카테고리 또는 도메인 별로 구분)되어 있지 않고, 불확실성(예: 자연어 이해 모듈(223)에 의해 결과가 모호한 발화, 또는 도메인을 알 수 없거나 자연어 이해 모듈(223)이 이해할 수 없는 발화(예: 어제 라이트검정등이 있어))을 포함하는 특징을 가질 수 있다. 전처리 모듈(2911)은 상기 특징을 가지는 사용자 발화 데이터를 전처리하여 변형 발화 생성 모듈(292)에서 사용되는 형태로 변경할 수 있다. 노이즈 제거 모듈(2911a)은 필터링 기법 또는 앙상블 기법을 사용하여 노이즈를 제거할 수 있다. 샘플링 모듈(2911b)은 사용자 발화 데이터에서 패턴화된 샘플 발화를 추출할 수 있다. 샘플링 모듈(2911b)은 반복되는 샘플 발화를 추출하여 사용자 발화 데이터의 양을 감소시킬 수 있다. 연관 발화 선택 모듈(2911c)은 기준 발화 텍스트(예: 트레이닝 발화 텍스트 세트 또는 사용자 발화 텍스트)와 의미적으로 관련이 적은 사용자 발화를 사용자 발화 데이터에서 제거할 수 있다. 즉, 연관 발화 선택 모듈(2911c)은 기준 발화 텍스트와 관련성이 큰 사용자 발화를 선택할 수 있다.According to one embodiment, the user speech data received from the natural language understanding module 223 has a large amount of noise (eg, ambient noise included between the start and end of the user speech), and a large amount (eg, collected and accumulated or understands natural language) The number of user utterances stored in the module 223 is large, it is not balanced (for example, classified by category or domain), and uncertainty (for example, the utterance or domain whose result is ambiguous by the natural language understanding module 223) can be known. It may have a feature that includes speech that is absent or cannot be understood by the natural language understanding module 223 (for example, there is a light black yesterday). The pre-processing module 2911 may pre-process the user utterance data having the above characteristics and change it to a form used in the modified utterance generation module 292. The noise removal module 2911a may remove noise using a filtering technique or an ensemble technique. The sampling module 2911b may extract patterned sample speech from user speech data. The sampling module 2911b extracts repeated sample utterances to reduce the amount of user utterance data. The associated utterance selection module 2911c may remove user utterances that are not significantly related to the reference utterance text (eg, training utterance text set or user utterance text) from the user utterance data. That is, the associated utterance selection module 2911c may select a user utterance having a high relevance to the reference utterance text.

일 실시 예에 따르면, 사용자 발화 분류 모듈(2912)은 전처리 모듈(2911)로부터 전처리된 사용자 발화 데이터를 수신할 수 있다. 사용자 발화 분류 모듈(2912)은 전처리된 사용자 발화 데이터에 기초하여 사용자 발화 분류 정보를 생성하고, 전처리된 사용자 발화 데이터 및 사용자 발화 분류 정보를 변형 발화 생성 모듈(292)로 전송할 수 있다. 한편, 사용자 발화 분류 모듈(2912)은 자연어 이해 모듈(223)로부터 현재의 사용자 발화 텍스트를 수신할 수 있다. 사용자 발화 분류 모듈(2912)은 현재의 사용자 발화 텍스트를 변형 발화 생성 모듈(292)로 전송할 수 있다.According to one embodiment, the user speech classification module 2912 may receive preprocessed user speech data from the preprocessing module 2911. The user utterance classification module 2912 may generate user utterance classification information based on the preprocessed user utterance data, and transmit the preprocessed user utterance data and user utterance classification information to the modified utterance generation module 292. Meanwhile, the user speech classification module 2912 may receive the current user speech text from the natural language understanding module 223. The user utterance classification module 2912 may transmit the current user utterance text to the variant utterance generation module 292.

도 6은 본 발명의 일 실시 예에 따른 자연어 이해 트레이닝 모드 시 지능형 서버의 동작 방법(600)을 보여주는 순서도이다. 지능형 서버의 동작 방법(600)은 자연어 이해 트레이닝 모드 시 트레이닝 발화 텍스트 세트에 포함된 트레이닝 발화 텍스트 세트의 수에 따라 다르게 수행될 수 있다.6 is a flowchart illustrating an operation method 600 of an intelligent server in a natural language understanding training mode according to an embodiment of the present invention. The operation method 600 of the intelligent server may be performed differently according to the number of training speech text sets included in the training speech text set in the natural language understanding training mode.

도 6을 참조하면, 동작 610에서, 지능형 서버(200)는 트레이닝 발화 텍스트 세트를 수신할 수 있다. 일 실시 예에 따르면, 변형 발화 생성 모듈(292)은 개발자 단말(500)로부터 트레이닝 발화 텍스트 세트를 수신할 수 있다. 트레이닝 발화 텍스트 세트는 개발자에 의해 작성된 복수의 트레이닝 발화 텍스트 세트를 포함할 수 있다.Referring to FIG. 6, in operation 610, the intelligent server 200 may receive a training speech text set. According to an embodiment, the modified speech generation module 292 may receive a training speech text set from the developer terminal 500. The training speech text set may include a plurality of training speech text sets written by the developer.

동작 620에서, 지능형 서버(200)는 트레이닝 발화 텍스트 세트에 포함된 트레이닝 발화 텍스트 세트의 수를 기준 발화 횟수와 비교할 수 있다. 일 실시 예에 따르면, 트레이닝 발화 텍스트 세트에 포함된 트레이닝 발화 텍스트 세트의 수가 기준 발화 횟수보다 작은 경우, 변형 발화 생성 모듈(292)은 변형 발화 텍스트 세트를 생성하는 동작들(동작 630 내지 동작 1150)을 수행할 수 있다. 트레이닝 발화 텍스트 세트에 포함된 트레이닝 발화 텍스트 세트의 수가 기준 발화 횟수보다 크거나 같은 경우, 동작 1170이 수행될 수 있다.In operation 620, the intelligent server 200 may compare the number of training speech text sets included in the training speech text set with the reference speech number. According to an embodiment, when the number of training speech text sets included in the training speech text set is less than the reference speech number, the variant speech generation module 292 generates actions (variations 630 to 1150) of the variant speech text set You can do When the number of training speech text sets included in the training speech text set is greater than or equal to the reference number of speeches, operation 1170 may be performed.

동작 630에서, 트레이닝 발화 텍스트 세트에 포함된 트레이닝 발화 텍스트 세트의 수가 기준 발화 횟수보다 작은 경우, 지능형 서버(200)는 트레이닝 발화 텍스트 세트의 도메인(제1 도메인)을 결정할 수 있다. 일 실시 예에 따르면, 변형 발화 생성 모듈(292)은 자연어 이해 모듈(223)을 이용하여 트레이닝 발화 텍스트 세트의 도메인을 결정할 수 있다.In operation 630, when the number of training speech text sets included in the training speech text set is less than the reference speech number, the intelligent server 200 may determine a domain (first domain) of the training speech text set. According to an embodiment, the modified speech generation module 292 may determine the domain of the training speech text set using the natural language understanding module 223.

동작 640에서, 지능형 서버(200)는 제1 도메인과 유사한 발화 패턴을 가지는 제2 도메인을 결정할 수 있다. 일 실시 예에 따르면, 변형 발화 생성 모듈(292)은 카테고리(category)에 기초하여 제1 도메인에 유사한 제2 도메인을 결정할 수 있다. 예를 들어, 제1 도메인(예: 피자헛 앱)의 카테고리가 “피자 배달업”인 경우, 제2 도메인(예: 도미노피자 앱)은 “피자 배달”이라는 카테고리 내의 도메인들(예: 도미노피자 앱, 미스터피자 앱) 중에서 선택될 수 있다. 일 실시 예에 따르면, 변형 발화 생성 모듈(292)은 의도(intent)에 기초하여 제1 도메인에 유사한 제2 도메인을 결정할 수 있다. 예를 들어, 제1 도메인(예: 메시지 앱)의 의도가 “문자 전송”인 경우, 제2 도메인(예: 카카오톡 앱)은 “문자 전송”이라는 의도를 가지는 도메인들(예: 카카오톡 앱, 라인 앱) 중에서 선택될 수 있다.In operation 640, the intelligent server 200 may determine a second domain having an utterance pattern similar to the first domain. According to an embodiment, the modified utterance generation module 292 may determine a second domain similar to the first domain based on a category. For example, if the category of the first domain (eg Pizza Hut app) is “pizza delivery business”, the second domain (eg Domino Pizza app) is a domain within the category “Pizza Delivery” (eg Domino Pizza app). , Mr Pizza app). According to an embodiment, the modified speech generation module 292 may determine a second domain similar to the first domain based on the intent. For example, when the intention of the first domain (eg, the message app) is “send text”, the second domain (eg, the KakaoTalk app) has domains intended to “send the text” (eg, the KakaoTalk app). , Line app).

동작 650에서, 지능형 서버(200)는 제2 도메인에서 사용되는 사용자 발화 패턴에 기초하여 제1 도메인에 적용될 변형 발화 텍스트 세트를 생성할 수 있다. 일 실시 예에 따르면, 파라미터 수집 모듈(291)은 자연어 이해 모듈(223)로부터 사용자 발화 데이터를 수신할 수 있다. 파라미터 수집 모듈(291)은 사용자 발화 데이터를 전처리(노이즈 제거, 샘플 발화 추출, 연관 발화 선택)하여 변형 발화 생성 모듈(292)에서 사용되는 형태로 변경할 수 있다. 파라미터 수집 모듈(291)은 전처리된 사용자 발화 데이터에 기초하여 사용자 발화 분류 정보를 생성하고, 전처리된 사용자 발화 데이터 및 사용자 발화 분류 정보를 변형 발화 생성 모듈(292)로 전송할 수 있다. 변형 발화 생성 모듈(292)은 사용자 발화 분류 정보에 기초하여 제2 도메인에서 사용되는 사용자 발화 패턴을 추출할 수 있다. 변형 발화 생성 모듈(292)는 추출된 사용자 발화 패턴을 이용하여 제1 도메인에 적용될 변형 발화 텍스트 세트를 생성할 수 있다. 변형 발화 텍스트 세트는 복수의 변형 발화 텍스트 세트를 포함할 수 있다.In operation 650, the intelligent server 200 may generate a modified speech text set to be applied to the first domain based on a user speech pattern used in the second domain. According to an embodiment, the parameter collection module 291 may receive user speech data from the natural language understanding module 223. The parameter collection module 291 may change the user speech data to a form used in the modified speech generation module 292 by pre-processing (noise removal, sample speech extraction, and associated speech selection). The parameter collection module 291 may generate user utterance classification information based on the preprocessed user utterance data, and transmit the preprocessed user utterance data and user utterance classification information to the modified utterance generation module 292. The modified speech generation module 292 may extract a user speech pattern used in the second domain based on the user speech classification information. The modified speech generation module 292 may generate a modified speech text set to be applied to the first domain using the extracted user speech pattern. The variant speech text set may include a plurality of variant speech text sets.

동작 660에서, 지능형 서버(200)는 수신된 트레이닝 발화 텍스트 세트 및 생성된 변형 발화 텍스트 세트에 기초하여 제1 도메인에 대하여 자연어 이해 모듈(223)을 트레이닝 할 수 있다. 일 실시 예에 따르면, NLU(자연어 이해) 트레이닝 모듈(295)은 개발자 단말(500)로부터 트레이닝 발화 텍스트 세트를 수신할 수 있다. NLU(자연어 이해) 트레이닝 모듈(295)은 트레이닝 발화 텍스트 세트에 기초하여 자연어 이해 모듈(223)을 트레이닝 할 수 있다. 또한, NLU(자연어 이해) 트레이닝 모듈(295)은 변형 발화 생성 모듈(292)로부터 변형 발화 텍스트 세트를 수신할 수 있다. NLU(자연어 이해) 트레이닝 모듈(295)은 변형 발화 텍스트 세트에 기초하여 자연어 이해 모듈(223)을 추가적으로 트레이닝 할 수 있다. 따라서, 자연어 이해 모듈(223)의 성능은 트레이닝 발화 텍스트 세트만을 사용하여 트레이닝 된 경우보다 더욱 향상될 수 있다.In operation 660, the intelligent server 200 may train the natural language understanding module 223 for the first domain based on the received training speech text set and the generated modified speech text set. According to an embodiment, the NLU (natural language understanding) training module 295 may receive a training speech text set from the developer terminal 500. The NLU (natural language understanding) training module 295 may train the natural language understanding module 223 based on the training speech text set. In addition, the NLU (Natural Language Understanding) training module 295 may receive a variant speech text set from the variant speech generation module 292. The NLU (natural language understanding) training module 295 may additionally train the natural language understanding module 223 based on the modified speech text set. Therefore, the performance of the natural language understanding module 223 may be improved more than the case of training using only the training speech text set.

동작 670에서, 트레이닝 발화 텍스트 세트에 포함된 트레이닝 발화 텍스트 세트의 수가 기준 발화 횟수보다 크거나 같은 경우, 지능형 서버(200)는 트레이닝 발화 텍스트 세트에 기초하여 제1 도메인에 대하여 자연어 이해 모듈(223)을 트레이닝 할 수 있다. 일 실시 예에 따르면, 트레이닝 발화 텍스트 세트에 포함된 트레이닝 발화 텍스트 세트의 수가 기준 발화 횟수보다 크거나 같은 경우, 제1 도메인에 대하여 이미 충분한 트레이닝 발화 텍스트 세트가 존재할 수 있다. 이 경우, 변형 발화 생성 모듈(292)은 동작하지 않을 수 있다. 따라서, NLU(자연어 이해) 트레이닝 모듈(295)은 개발자 단말(500)로부터 트레이닝 발화 텍스트 세트를 수신하여 자연어 이해 모듈(223)을 트레이닝 할 수 있다.In operation 670, if the number of training speech text sets included in the training speech text set is greater than or equal to the reference speech number, the intelligent server 200 based on the training speech text set, understands the natural language for the first domain module 223 You can train. According to an embodiment, when the number of training speech text sets included in the training speech text set is greater than or equal to the reference speech number, a sufficient training speech text set may already exist for the first domain. In this case, the modified speech generation module 292 may not operate. Accordingly, the NLU (natural language understanding) training module 295 may receive the training speech text set from the developer terminal 500 to train the natural language understanding module 223.

도 7은 도 6의 동작 650에서 변형 발화 텍스트 세트를 생성하는 방법의 일 실시 예(700)를 보여주는 순서도이다. 도 7의 변형 발화 텍스트 세트를 생성하는 방법(700)은 사용자 발화 데이터에 기초하여 생성된 사용자 발화 분류 정보에 따라 생성 모델 또는 전이 학습 모델에 의해 수행될 수 있다.FIG. 7 is a flow chart illustrating an embodiment 700 of a method for generating a modified speech text set in operation 650 of FIG. 6. The method 700 of generating the modified speech text set of FIG. 7 may be performed by a generation model or a transfer learning model according to user speech classification information generated based on user speech data.

도 7을 참조하면, 동작 710에서, 파라미터 수집 모듈(291)은 사용자 발화 데이터를 수신할 수 있다. 일 실시 예에 따르면, 파라미터 수집 모듈(291)은 자연어 이해 모듈(223)로부터 사용자 발화 데이터를 수신할 수 있다. 파라미터 수집 모듈(291)은 사용자 발화 데이터를 전처리(노이즈 제거, 샘플 발화 추출, 연관 발화 선택)하여 변형 발화 생성 모듈(292)에서 사용되는 형태로 변경할 수 있다.Referring to FIG. 7, in operation 710, the parameter collection module 291 may receive user speech data. According to an embodiment, the parameter collection module 291 may receive user speech data from the natural language understanding module 223. The parameter collection module 291 may change user utterance data to a form used in the modified utterance generation module 292 by preprocessing (noise removal, sample utterance extraction, and associated utterance selection).

동작 720에서, 파라미터 수집 모듈(291)은 사용자 발화 데이터에 기초하여 사용자 발화 분류 정보를 생성할 수 있다. 일 실시 예에 따르면, 파라미터 수집 모듈(291)은 전처리된 사용자 발화 데이터에 기초하여 사용자 발화 분류 정보를 생성하고, 전처리된 사용자 발화 데이터 및 사용자 발화 분류 정보를 변형 발화 생성 모듈(292)로 전송할 수 있다.In operation 720, the parameter collection module 291 may generate user speech classification information based on the user speech data. According to an embodiment, the parameter collection module 291 may generate user speech classification information based on the preprocessed user speech data, and transmit the preprocessed user speech data and the user speech classification information to the modified speech generation module 292. have.

동작 730에서, 변형 발화 생성 모듈(292)은 사용자 발화 분류 정보에 기초하여 생성 모델 또는 전이 학습 모델에 의해 변형 발화 텍스트 세트를 생성할 수 있다. 일 실시 예에 따르면, 변형 발화 생성 모듈(292)은 사용자 발화 분류 정보에 기초하여 제2 도메인에서 사용되는 사용자 발화 패턴을 추출할 수 있다. 변형 발화 생성 모듈(292)는 추출된 사용자 발화 패턴을 이용하여 제1 도메인에 적용될 변형 발화 텍스트 세트를 생성할 수 있다. 변형 발화 텍스트 세트는 복수의 변형 발화 텍스트 세트를 포함할 수 있다. 복수의 변형 발화 텍스트 세트는 제2 도메인에서 사용된 의도 및 파라미터에 기초하여 생성 모델 또는 전이 학습 모델에 의해 생성될 수 있다.In operation 730, the modified speech generation module 292 may generate a modified speech text set by a generation model or a transfer learning model based on the user speech classification information. According to an embodiment, the modified speech generation module 292 may extract a user speech pattern used in the second domain based on the user speech classification information. The modified speech generation module 292 may generate a modified speech text set to be applied to the first domain using the extracted user speech pattern. The variant speech text set may include a plurality of variant speech text sets. The plurality of modified speech text sets may be generated by a generation model or a transfer learning model based on intents and parameters used in the second domain.

도 8은 도 6의 동작 650에서 변형 발화 텍스트 세트를 생성하는 방법의 다른 실시 예(800)를 보여주는 순서도이다. 도 8의 변형 발화 텍스트 세트를 생성하는 방법(800)은 사용자 발화 데이터에 기초하여 확인된 사용자 특성에 따라 수행될 수 있다.FIG. 8 is a flow chart showing another embodiment 800 of a method of generating a modified speech text set in operation 650 of FIG. 6. The method 800 of generating the modified speech text set of FIG. 8 may be performed according to the identified user characteristics based on the user speech data.

도 8을 참조하면, 동작 810에서, 파라미터 수집 모듈(291)은 사용자 발화 데이터를 수신할 수 있다. 일 실시 예에 따르면, 파라미터 수집 모듈(291)은 자연어 이해 모듈(223)로부터 사용자 발화 데이터를 수신할 수 있다. 파라미터 수집 모듈(291)은 사용자 발화 데이터를 전처리(노이즈 제거, 샘플 발화 추출, 연관 발화 선택)하여 변형 발화 생성 모듈(292)에서 사용되는 형태로 변경할 수 있다.Referring to FIG. 8, in operation 810, the parameter collection module 291 may receive user speech data. According to an embodiment, the parameter collection module 291 may receive user speech data from the natural language understanding module 223. The parameter collection module 291 may change the user speech data to a form used in the modified speech generation module 292 by pre-processing (noise removal, sample speech extraction, and associated speech selection).

동작 820에서, 파라미터 수집 모듈(291)은 사용자 발화 데이터에 기초하여 사용자 특성을 확인할 수 있다. 일 실시 예에 따르면, 파라미터 수집 모듈(291)은 전처리된 사용자 발화 데이터를 분석하여 사용자 특성(예: 연령, 지역, 성별)에 대한 정보(이하 사용자 특성 정보)를 생성할 수 있다. 사용자 특성 정보는 연령, 지역 또는 성별에 따라 자주 사용하는 용어들에 대한 정보를 포함할 수 있다. 사용자는 사용자 특성에 따라 동일한 의미에 대하여 형태가 다른 용어들(예: “해주세요”, “해주삼”, 해주세욤”)을 사용할 수 있다.In operation 820, the parameter collection module 291 may check user characteristics based on user speech data. According to an embodiment, the parameter collection module 291 may generate pre-processed user utterance data to generate information about user characteristics (eg, age, region, gender) (hereinafter, user characteristic information). The user characteristic information may include information on terms frequently used according to age, region, or gender. The user can use different terms (eg, “please”, “haejusam”, haejuseom ”) for the same meaning according to user characteristics.

동작 830에서, 파라미터 수집 모듈(291)은 사용자 특성에 기초하여 사용자 발화 패턴을 추출할 수 있다. 일 실시 예에 따르면, 파라미터 수집 모듈(291)은 사용자 특성 정보에 기초하여 연령, 지역 및 성별에 따라 자주 사용하는 발화 패턴을 추출할 수 있다. 예를 들어, 사용자 특성에 기초한 사용자 발화 패턴은 20대가 자주 사용하는 발화 패턴, 40대가 자주 사용하는 발화 패턴, 부산에서 자주 사용하는 발화 패턴, 제주도에서 자주 사용하는 발화 패턴, 남성이 자주 사용하는 발화 패턴 및 여성이 자주 사용하는 발화 패턴 등을 포함할 수 있다.In operation 830, the parameter collection module 291 may extract a user speech pattern based on user characteristics. According to an embodiment, the parameter collection module 291 may extract a frequently used utterance pattern according to age, region, and gender based on user characteristic information. For example, user utterance patterns based on user characteristics include utterance patterns frequently used by 20s, utterance patterns frequently used by 40s, utterance patterns frequently used in Busan, utterance patterns frequently used in Jeju Island, and utterances frequently used by men Patterns and utterance patterns frequently used by women.

동작 840에서, 추출된 사용자 발화 패턴의 수가 기준 패턴 횟수보다 큰 경우, 변형 발화 생성 모듈(291)은 사용자 발화 패턴에 기초하여 변형 발화 텍스트 세트를 생성할 수 있다. 일 실시 예에 따르면, 변형 발화 생성 모듈(291)은 사용자 발화 패턴의 수가 기준 패턴 횟수를 비교할 수 있다. 특정 사용자 발화 패턴의 수가 기준 패턴 횟수보다 큰 경우, 특정 사용자 발화 패턴은 사용자들에 의해 자주 사용된다는 것을 의미한다. 따라서, 변형 발화 생성 모듈(291)은 추가적인 변형 발화 텍스트 세트를 생성하기 위해 특정 사용자 발화 패턴을 사용할 수 있다. 기준 패턴 횟수는 발화량에 기초하여 결정될 수 있다. 기준 패턴 횟수는 발화 복잡도에 따라 결정될 수 있다. 예를 들어, 복잡한 사용자 발화의 경우, 기준 패턴 횟수는 낮게 설정될 수 있다.In operation 840, when the number of extracted user speech patterns is greater than the reference pattern number, the modified speech generation module 291 may generate a modified speech text set based on the user speech pattern. According to an embodiment, the modified speech generation module 291 may compare the number of reference patterns with the number of user speech patterns. When the number of specific user speech patterns is greater than the number of reference patterns, it means that the specific user speech patterns are frequently used by users. Accordingly, the modified speech generation module 291 may use a specific user speech pattern to generate additional sets of modified speech text. The number of reference patterns may be determined based on the amount of ignition. The number of reference patterns may be determined according to the utterance complexity. For example, in the case of a complex user speech, the reference pattern number of times may be set low.

도 9는 본 발명의 일 실시 예에 따른 발화 추천 모드 시 지능형 서버의 동작 방법(900)을 보여주는 순서도이다. 지능형 서버의 동작 방법(900)은 발화 추천 모드 시 수신된 트레이닝 발화 텍스트 세트 또는 사용자 발화 텍스트에 대응하여 수행될 수 있다.9 is a flowchart illustrating an operation method 900 of an intelligent server in a speech recommendation mode according to an embodiment of the present invention. The operation method 900 of the intelligent server may be performed in response to the training speech text set or the user speech text received in the speech recommendation mode.

도 9를 참조하면, 동작 910에서, 변형 발화 생성 모듈(292)은 트레이닝 발화 텍스트 세트 또는 사용자 발화 텍스트를 수신할 수 있다. 일 실시 예에 따르면, 변형 발화 생성 모듈(292)은 개발자 단말(500)로부터 트레이닝 발화 텍스트 세트를 수신할 수 있다. 트레이닝 발화 텍스트 세트는 개발자에 의해 작성된 복수의 트레이닝 발화 텍스트 세트를 포함할 수 있다. 또한, 변형 발화 생성 모듈(292)은 파라미터 수집 모듈(291)을 통해 자연어 이해 모듈(223)로부터 사용자 발화 텍스트를 수신할 수 있다. 자동 음성 인식 모듈(221)은 사용자 단말(100)로부터 수신된 사용자 입력(예: 사용자 발화)를 사용자 발화 텍스트로 변환할 수 있다.Referring to FIG. 9, in operation 910, the modified speech generation module 292 may receive a training speech text set or a user speech text. According to an embodiment, the modified speech generation module 292 may receive a training speech text set from the developer terminal 500. The training speech text set may include a plurality of training speech text sets written by the developer. In addition, the modified speech generation module 292 may receive the user speech text from the natural language understanding module 223 through the parameter collection module 291. The automatic speech recognition module 221 may convert user input (eg, user speech) received from the user terminal 100 into user speech text.

동작 920에서, 변형 발화 생성 모듈(292)은 트레이닝 발화 텍스트 세트 또는 사용자 발화 텍스트의 도메인(제1 도메인)을 결정할 수 있다. 일 실시 예에 따르면, 변형 발화 생성 모듈(292)은 자연어 이해 모듈(223)을 이용하여 트레이닝 발화 텍스트 세트 또는 사용자 발화 텍스트의 도메인을 결정할 수 있다.In operation 920, the modified speech generation module 292 may determine a training speech text set or a domain (first domain) of the user speech text. According to an embodiment, the modified speech generation module 292 may determine a training speech text set or a domain of user speech text using the natural language understanding module 223.

동작 930에서, 변형 발화 생성 모듈(292)은 제1 도메인과 유사한 발화 패턴을 가지는 제2 도메인을 결정할 수 있다. 일 실시 예에 따르면, 변형 발화 생성 모듈(292)은 카테고리(category)에 기초하여 제1 도메인에 유사한 제2 도메인을 결정할 수 있다. 예를 들어, 제1 도메인(예: 피자헛 앱)의 카테고리가 “피자 배달업”인 경우, 제2 도메인(예: 도미노피자 앱)은 “피자 배달”이라는 카테고리 내의 도메인들(예: 도미노피자 앱, 미스터피자 앱) 중에서 선택될 수 있다. 일 실시 예에 따르면, 변형 발화 생성 모듈(292)은 의도(intent)에 기초하여 제1 도메인에 유사한 제2 도메인을 결정할 수 있다. 예를 들어, 제1 도메인(예: 메시지 앱)의 의도가 “문자 전송”인 경우, 제2 도메인(예: 카카오톡 앱)은 “문자 전송”이라는 의도를 가지는 도메인들(예: 카카오톡 앱, 라인 앱) 중에서 선택될 수 있다.In operation 930, the modified utterance generation module 292 may determine a second domain having an utterance pattern similar to the first domain. According to an embodiment, the modified utterance generation module 292 may determine a second domain similar to the first domain based on a category. For example, if the category of the first domain (eg Pizza Hut app) is “pizza delivery business”, the second domain (eg Domino Pizza app) is a domain within the category “Pizza Delivery” (eg Domino Pizza app). , Mr Pizza app). According to an embodiment, the modified speech generation module 292 may determine a second domain similar to the first domain based on the intent. For example, when the intention of the first domain (eg, the message app) is “send text”, the second domain (eg, the KakaoTalk app) has domains intended to “send the text” (eg, the KakaoTalk app). , Line app).

동작 940에서, 변형 발화 생성 모듈(292)은 제2 도메인에서 사용되는 사용자 발화 패턴에 기초하여 제1 도메인에 적용될 변형 발화 텍스트 세트를 생성할 수 있다. 일 실시 예에 따르면, 파라미터 수집 모듈(291)은 자연어 이해 모듈(223)로부터 사용자 발화 데이터를 수신할 수 있다. 파라미터 수집 모듈(291)은 사용자 발화 데이터를 전처리(예: 노이즈 제거, 샘플 발화 추출, 연관 발화 선택 중 적어도 하나의 처리)하여 변형 발화 생성 모듈(292)에서 사용되는 형태로 변경할 수 있다. 파라미터 수집 모듈(291)은 전처리된 사용자 발화 데이터에 기초하여 사용자 발화 분류 정보를 생성하고, 전처리된 사용자 발화 데이터 및 사용자 발화 분류 정보를 변형 발화 생성 모듈(292)로 전송할 수 있다. 변형 발화 생성 모듈(292)은 사용자 발화 분류 정보에 기초하여 제2 도메인에서 사용되는 사용자 발화 패턴을 추출할 수 있다. 변형 발화 생성 모듈(292)는 추출된 사용자 발화 패턴을 이용하여 제1 도메인에 적용될 변형 발화 텍스트 세트를 생성할 수 있다. 변형 발화 텍스트 세트는 복수의 변형 발화 텍스트 세트를 포함할 수 있다. 예를 들어, 동작 940에서, 변형 발화 생성 모듈(292)은 도 7의 변형 발화 텍스트 세트 생성 방법 또는 도 8의 변형 발화 텍스트 세트 생성 방법을 통해 변형 발화 텍스트 세트를 생성할 수 있다.In operation 940, the modified speech generation module 292 may generate a modified speech text set to be applied to the first domain based on a user speech pattern used in the second domain. According to an embodiment, the parameter collection module 291 may receive user speech data from the natural language understanding module 223. The parameter collection module 291 may change the user utterance data into a form used in the modified utterance generation module 292 by preprocessing (eg, processing at least one of noise removal, sample utterance extraction, and associated utterance selection). The parameter collection module 291 may generate user utterance classification information based on the preprocessed user utterance data, and transmit the preprocessed user utterance data and user utterance classification information to the modified utterance generation module 292. The modified speech generation module 292 may extract a user speech pattern used in the second domain based on the user speech classification information. The modified speech generation module 292 may generate a modified speech text set to be applied to the first domain using the extracted user speech pattern. The variant speech text set may include a plurality of variant speech text sets. For example, in operation 940, the modified speech generation module 292 may generate the modified speech text set through the modified speech text set generation method of FIG. 7 or the modified speech text set generation method of FIG. 8.

동작 950에서, 지능형 서버(200)는 생성된 변형 발화 텍스트 세트를 개발자 단말 또는 사용자 단말로 전송할 수 있다. 일 실시 예에 따르면, 변형 발화 생성 모듈(292)은 변형 발화 텍스트 세트를 제1 변형 발화 추천 모듈(293) 또는 제1 변형 발화 추천 모듈(293)로 전송할 수 있다. 변형 발화 생성 모듈(292)은 개발자 단말(500)로부터 트레이닝 발화 텍스트 세트를 수신한 경우 생성된 변형 발화 텍스트 세트를 제1 변형 발화 추천 모듈(293)로 전송할 수 있다. 제1 변형 발화 추천 모듈(293)은 변형 발화 텍스트 세트를 개발자 단말(500)로 전송할 수 있다. 한편, 변형 발화 생성 모듈(292)은 파라미터 수집 모듈(291)로부터 사용자 발화 텍스트를 수신한 경우 생성된 변형 발화 텍스트 세트를 제2 변형 발화 추천 모듈(294)로 전송할 수 있다. 제2 변형 발화 추천 모듈(294)은 변형 발화 텍스트 세트를 사용자 단말(100)로 전송할 수 있다.In operation 950, the intelligent server 200 may transmit the generated modified speech text set to the developer terminal or the user terminal. According to an embodiment, the modified speech generation module 292 may transmit the modified speech text set to the first modified speech recommendation module 293 or the first modified speech recommendation module 293. The variant speech generation module 292 may transmit the generated variant speech text set to the first variant speech recommendation module 293 when the training speech text set is received from the developer terminal 500. The first modified speech recommendation module 293 may transmit the modified speech text set to the developer terminal 500. Meanwhile, the variant speech generation module 292 may transmit the generated variant speech text set to the second variant speech recommendation module 294 when the user speech text is received from the parameter collection module 291. The second modified speech recommendation module 294 may transmit the modified speech text set to the user terminal 100.

이하에서는 도 10a 내지 도 10c를 참조하여, 개발자 단말에서 변형 발화 텍스트 세트가 추천되는 실시 예가 설명된다.Hereinafter, an embodiment in which a modified speech text set is recommended in a developer terminal will be described with reference to FIGS. 10A to 10C.

도 10a는 본 발명의 일 실시 예에 따른 트레이닝 발화 텍스트가 발화 입력기를 통해 입력될 때 입력된 도메인의 카테고리에 따라 변형 발화 텍스트가 추천되는 방법을 나타내는 도면이다.FIG. 10A is a diagram illustrating a method in which a modified speech text is recommended according to a category of a domain input when training speech text is input through a speech input device according to an embodiment of the present invention.

도 10b는 본 발명의 일 실시 예에 따른 트레이닝 발화 텍스트가 발화 입력기를 통해 입력될 때 입력된 사용자 발화 예시의 의도에 따라 변형 발화 텍스트가 추천되는 방법을 나타내는 도면이다.10B is a diagram illustrating a method in which a modified speech text is recommended according to an intention of a user's speech example input when training speech text according to an embodiment of the present invention is input through a speech input device.

도 10c는 본 발명의 일 실시 예에 따른 트레이닝 발화 텍스트가 발화 입력기를 통해 입력될 때 입력된 사용자 발화 예시에 포함된 키워드에 따라 변형 발화 텍스트가 추천되는 방법을 나타내는 도면이다.FIG. 10C is a diagram illustrating a method in which a modified speech text is recommended according to a keyword included in an input user speech example when the training speech text according to an embodiment of the present invention is input through the speech input device.

도 10a 내지 도 10c를 참조하면, 개발자 단말(예: 도 4의 개발자 단말(500))은 발화 입력기(1000)를 화면에 표시할 수 있다. 발화 입력기(1000)는 개발자로부터 다양한 항목을 입력 받아 지능형 서버(예: 도 4의 지능형 서버(200))의 자연어 이해 모듈(예: 도 4의 자연어 이해 모듈(223))을 트레이닝하기 위한 트레이닝 발화 텍스트 세트를 생성할 수 있다. 지능형 서버는 트레이닝 발화 텍스트 세트를 수신하여 자연어 이해 모듈을 트레이닝 할 수 있다. 한편, 발화 입력기(1000)는 상기 다양한 항목이 입력되는 과정에서 추가적인 사용자 발화(예: 변형 발화 텍스트)를 제공할 수 있다.10A to 10C, the developer terminal (eg, the developer terminal 500 of FIG. 4) may display the utterance input device 1000 on the screen. The speech input device 1000 receives various items from a developer and trains speech to train a natural language understanding module (eg, a natural language understanding module 223 of FIG. 4) of an intelligent server (eg, the intelligent server 200 of FIG. 4). You can create a text set. The intelligent server can train a natural language understanding module by receiving a set of training speech texts. Meanwhile, the speech input device 1000 may provide additional user speech (eg, modified speech text) in the process of inputting the various items.

일 실시 예에 따르면, 개발자는 발화 입력기(1000)를 통해 도메인 항목(1001), 카테고리 항목(1002), 사용자 발화 예시 항목(1003), 의도 항목(1004), 동작 항목(1005), 파라미터 항목(1006) 및 응답 항목(1007)을 입력할 수 있다. 발화 입력기(1000)는 입력된 도메인 정보, 카테고리 정보, 사용자 발화 예시 정보, 의도 정보, 동작 정보, 파라미터 정보 및 응답 정보에 기초하여 트레이닝 발화 텍스트를 생성할 수 있다.According to one embodiment, the developer may input the domain item 1001, the category item 1002, the user utterance example item 1003, the intended item 1004, the action item 1005, the parameter item through the utterance input device 1000 ( 1006) and a response item 1007. The speech input device 1000 may generate training speech text based on the input domain information, category information, user speech example information, intention information, operation information, parameter information, and response information.

일 실시 예에 따르면, 개발자 단말은 입력된 도메인 정보, 카테고리 정보, 사용자 발화 예시 정보, 의도 정보, 동작 정보, 파라미터 정보 및 응답 정보 중 적어도 하나를 지능형 서버에 전송할 수 있다. 또한, 개발자 단말은 트레이닝 발화 텍스트를 지능형 서버에 전송할 수 있다. 지능형 서버는 도메인 정보, 카테고리 정보, 사용자 발화 예시 정보, 의도 정보, 동작 정보, 파라미터 정보 및 응답 정보 중 적어도 하나에 기초하여 변형 발화 텍스트 세트를 생성할 수 있다. 또한, 지능형 서버는 트레이닝 발화 텍스트에 기초하여 변형 발화 텍스트 세트를 생성할 수 있다.According to one embodiment, the developer terminal may transmit at least one of input domain information, category information, user utterance example information, intention information, operation information, parameter information, and response information to the intelligent server. Further, the developer terminal may transmit the training speech text to the intelligent server. The intelligent server may generate a modified speech text set based on at least one of domain information, category information, user speech example information, intention information, operation information, parameter information, and response information. In addition, the intelligent server may generate a modified speech text set based on the training speech text.

일 실시 예에 따르면, 지능형 서버는 도메인 정보, 카테고리 정보, 사용자 발화 예시 정보, 의도 정보, 동작 정보, 파라미터 정보 및 응답 정보 중 적어도 하나에 대응하는 변형 발화 텍스트 세트를 개발자 단말로 전송할 수 있다. 또한, 지능형 서버는 트레이닝 발화 텍스트에 대응하는 변형 발화 텍스트 세트를 개발자 단말로 전송할 수 있다. 변형 발화 텍스트 세트는 미리 생성되어 저장되거나 수신된 도메인 정보, 카테고리 정보, 사용자 발화 예시 정보, 의도 정보, 동작 정보, 파라미터 정보 및 응답 정보 중 적어도 하나 또는 변형 발화 텍스트 세트에 기초하여 새롭게 생성될 수 있다. 변형 발화 텍스트 세트는 도 4 내지 도 8에서 설명된 방법으로 생성될 수 있다.According to an embodiment, the intelligent server may transmit a modified speech text set corresponding to at least one of domain information, category information, user speech example information, intention information, operation information, parameter information, and response information to the developer terminal. In addition, the intelligent server may transmit a set of modified speech texts corresponding to the training speech text to the developer terminal. The modified speech text set may be newly generated based on at least one of the previously generated and stored or received domain information, category information, user speech example information, intention information, operation information, parameter information, and response information, or a modified speech text set. . The modified speech text set can be generated by the method described in FIGS. 4 to 8.

일 실시 예에 따르면, 개발자는 도메인 항목(1001)에 개발자가 개발을 담당하고 있는 도메인(예: 도미노 피자, 피자헛, 알람, 캘린더)을 입력할 수 있다. 개발자는 카테고리 항목(1002)에 상기 도메인이 속하는 카테고리를 입력할 수 있다. 예를 들어, 도메인이 음식 주문에 관련된 서비스(예, 도미노 피자, 피자헛, 요기요, 스타벅스, BHC)인 경우, 개발자는 카테고리 항목(1002)에 “음식 주문”을 입력할 수 있다. 카테고리 항목(1002)은 개발자가 직접 입력하거나 미리 입력된 후보들 중에 선택될 수 있다. 개발자는 사용자 발화 예시 항목(1003)에 사용자가 사용할 것으로 예상되는 사용자 발화 예시(예: 대표 발화 텍스트 또는 응용 발화 텍스트)를 입력할 수 있다. 개발자는 사용자 발화 예시 항목(1003)에 형태가 유사한 복수의 사용자 발화 예시(예: 메뉴 추천, 메뉴 추천해라, 메뉴 추천해줘, 메뉴 추천 부탁해)를 입력할 수 있다. 사용자 발화 예시 항목(1003)에 입력된 복수의 사용자 발화 예시는 동일한 의도(예: 의도 항목(1004)에 입력된 의도)로 지능형 서버에서 인식될 수 있다. 개발자는 의도 항목(1004)에 사용자 발화 예시에 대응하는 의도(예: 메뉴 추천, 메시지 전송)를 입력할 수 있다. 개발자는 동작 항목(1005)에 상기 의도에 대응하는 동작(예: 도미노 피자 앱 실행, 메시지 앱 실행, 와이파이 on/off)을 입력할 수 있다. 개발자는 파라미터 항목(1006)에 사용자 발화 예시에 포함되는 요소(예: 장소, 시간, 사람)의 내용들(예: 장소 - 서울, 광주, 부산)을 입력할 수 있다. 예를 들어, 파라미터 항목(1006)은 개발자에 의해 직접 입력되거나 시스템(예: 도 4의 개발자 단말(500), 지능형 서버(200))에서 제공되는 데이터에 기초하여 입력될 수 있다. 개발자는 응답 항목(1007)에 상기 의도에 대응하는 응답(예: 상기 의도가 “메시지 전송”인 경우, “메시지가 전송되었습니다”와 같이 상기 의도에 대응하는 동작에 대한 결과 알림)을 입력할 수 있다.According to one embodiment, the developer may input a domain (eg, Domino Pizza, Pizza Hut, Alarm, Calendar) in which the developer is in charge of development in the domain item 1001. The developer may input a category to which the domain belongs to the category item 1002. For example, if the domain is a service related to food ordering (eg, Domino Pizza, Pizza Hut, Yogiyo, Starbucks, BHC), the developer may enter “food order” in the category item 1002. The category item 1002 may be directly input by a developer or may be selected from among pre-entered candidates. The developer may input a user utterance example (eg, representative utterance text or application utterance text) expected to be used by the user in the user utterance example item 1003. The developer may input a plurality of user utterance examples having a similar shape (eg, menu recommendation, menu recommendation, menu recommendation, menu recommendation) to the user utterance example item 1003. A plurality of user utterance examples entered in the user utterance example item 1003 may be recognized by the intelligent server with the same intention (eg, the intention entered in the intent item 1004). The developer may input an intention (eg, menu recommendation, message transmission) corresponding to the user's speech example in the intent item 1004. The developer may input an operation corresponding to the intention (eg, a Domino Pizza app execution, a message app execution, and Wi-Fi on / off) in the action item 1005. The developer may input contents (eg, place-Seoul, Gwangju, and Busan) of elements (eg, place, time, person) included in the user's speech example in the parameter item 1006. For example, the parameter item 1006 may be directly input by the developer or may be input based on data provided by the system (eg, the developer terminal 500 of FIG. 4 or the intelligent server 200). The developer may input a response corresponding to the intention in the response item 1007 (for example, when the intention is “send message”, a result notification for an operation corresponding to the intention, such as “a message was sent”). have.

일 실시 예에 따른 도 10a를 참조하면, 도메인 항목(1001) 및 카테고리 항목(1002)이 입력된 경우, 발화 입력기(1000)는 추천 사용자 발화(1010a)를 표시할 수 있다. 발화 입력기(1000)를 통해 도메인 항목(1001) 및 카테고리 항목(1002)이 입력된 경우, 개발자 단말은 입력된 도메인 정보 및 카테고리 정보를 지능형 서버로 전송하고, 지능형 서버로부터 도메인 정보 및 카테고리 정보에 대응하는 변형 발화 텍스트 세트를 수신할 수 있다. 발화 입력기(1000)는 수신된 변형 발화 텍스트 세트를 추천 사용자 발화(1010a)에 표시할 수 있다. 또는 발화 입력기(1000)는 수신된 변형 발화 텍스트 세트에 기초하여 추천 사용자 발화(1010a)를 표시할 수 있다. 예를 들어, 추천 사용자 발화(1010a)는 입력된 도메인(예: 도미노 피자)과 동일한 카테고리에 속하는 다른 도메인(예: 피자헛, 스타벅스, BHC)에서 사용되는 사용자 발화들에 기초하여 생성(예: “메뉴 추천해줘”, “피자 주문해줘”, “배달 상태 보여줘”)된 것이다. 개발자는 추천 사용자 발화(1010a)를 참고하여 사용자 발화 예시 항목(1003)을 추가적으로 작성할 수 있다.Referring to FIG. 10A according to an embodiment, when the domain item 1001 and the category item 1002 are input, the utterance input device 1000 may display the recommended user utterance 1010a. When the domain item 1001 and the category item 1002 are input through the utterance input device 1000, the developer terminal transmits the input domain information and category information to the intelligent server, and corresponds to the domain information and category information from the intelligent server. Can receive a set of spoken text. The utterance input device 1000 may display the received modified utterance text set on the recommended user utterance 1010a. Alternatively, the utterance input device 1000 may display the recommended user utterance 1010a based on the received set of utterance utterance texts. For example, the recommended user speech 1010a is generated based on user speeches used in other domains (eg Pizza Hut, Starbucks, BHC) belonging to the same category as the input domain (eg Domino Pizza) (eg : "Recommend menu", "Please order pizza", "Show delivery status"). The developer may additionally create the user utterance example item 1003 with reference to the recommended user utterance 1010a.

일 실시 예에 따른 도 10b를 참조하면, 도메인 항목(1001), 카테고리 항목(1002), 사용자 발화 예시 항목(1003) 및 의도 항목(1004)이 입력된 경우, 발화 입력기(1000)는 추천 변형 발화(1020a)를 표시할 수 있다. 발화 입력기(1000)를 통해 도메인 항목(1001), 카테고리 항목(1002), 사용자 발화 예시 항목(1003) 및 의도 항목(1004)이 입력된 경우, 개발자 단말은 입력된 도메인 정보, 카테고리 정보, 사용자 발화 예시 정보 및 의도 정보를 지능형 서버로 전송하고, 지능형 서버로부터 도메인 정보, 카테고리 정보, 사용자 발화 예시 정보 및 의도 정보에 대응하는 변형 발화 텍스트 세트를 수신할 수 있다. 발화 입력기(1000)는 수신된 변형 발화 텍스트 세트에 기초하여 추천 변형 발화(1020a)를 표시할 수 있다. 예를 들어, 추천 변형 발화(1020a)는 입력된 의도(예: 메뉴 추천)와 유사한 의도(예: 지능형 서버에 의해 입력된 의도와 유사한 것으로 판단된 의도)를 가지고 유사한 도메인(예: 지능형 서버에 의해 입력된 도메인과 유사한 것으로 판단된 도메인)에서 사용되는 사용자 발화들에 기초하여 생성(예: “신메뉴 추천해줘”, “인기 메뉴 보여줘”, “요즘 가장 인기 있는 피자가 뭐야”)된 것이다. 개발자는 추천 변형 발화(1020a)를 참고하여 사용자 발화 예시 항목(1003)을 추가적으로 작성할 수 있다.Referring to FIG. 10B according to an embodiment, when a domain item 1001, a category item 1002, a user utterance example item 1003, and an intention item 1004 are input, the utterance input unit 1000 recommends the modified utterance (1020a) can be displayed. When the domain item 1001, the category item 1002, the user utterance example item 1003, and the intention item 1004 are input through the utterance input device 1000, the developer terminal inputs the entered domain information, category information, and user utterance The example information and the intent information may be transmitted to the intelligent server, and the modified speech text set corresponding to the domain information, the category information, the user speech example information, and the intention information may be received from the intelligent server. The utterance input device 1000 may display the recommended variant utterance 1020a based on the received set of variant utterance texts. For example, the recommendation variation utterance 1020a may have a similar domain (eg, an intelligent server) with an intent similar to the input intention (eg, menu recommendation) (eg, an intention determined to be similar to the intention entered by the intelligent server). It is created based on user utterances used in domains that are determined to be similar to the domain entered by (eg, “Recommend new menu”, “Show popular menu”, “What is the most popular pizza these days”). The developer may additionally create a user utterance example item 1003 with reference to the recommended modified utterance 1020a.

일 실시 예에 따른 도 10c를 참조하면, 발화 입력기(1000)는 수신된 변형 발화 텍스트 세트에 기초하여 추천 변형 발화(1020b)를 표시할 수 있다. 예를 들어, 추천 변형 발화(1020b)는 사용자 발화 예시(예: “에버랜드 놀러가서 찍은 사진 줄리한테 전송해줘”)에 포함된 키워드 별(예: 에버랜드, 놀러가서, 전송해줘)로 생성될 수 있다. 개발자는 추천 변형 발화(1020b)를 참고하여 사용자 발화 예시 항목(1003)을 추가적으로 작성할 수 있다.Referring to FIG. 10C according to an embodiment, the utterance input device 1000 may display the recommended variant utterance 1020b based on the received variant utterance text set. For example, the recommended modified utterance 1020b may be generated by keyword included in the user utterance example (eg, “Send photos to Julie taking a trip to Everland”) (eg, Everland, Go to Play, Send). . The developer may additionally create a user utterance example item 1003 with reference to the recommended modified utterance 1020b.

상술한 바와 같이, 다양한 실시 예에 따르면, 개발자 단말은 발화 입력기(1000)를 통해 추천 사용자 발화(1010) 또는 추천 변형 발화(1020)를 제공할 수 있다. 따라서, 개발자는 추천 사용자 발화(1010) 또는 추천 변형 발화(1020)에 기초하여 추가적인 사용자 발화 예시를 입력할 수 있고, 발화 입력기(1000)는 더욱 다양한 트레이닝 발화 텍스트 세트를 생성할 수 있다.As described above, according to various embodiments, the developer terminal may provide the recommended user speech 1010 or the recommended modified speech 1020 through the speech input device 1000. Accordingly, the developer may input additional user utterance examples based on the recommended user utterance 1010 or the recommended modified utterance 1020, and the utterance input unit 1000 may generate a more diverse set of training utterance texts.

다양한 실시 예에 따르면, 개발자 단말(500)은 지능형 서버(200)로 도메인 및 카테고리를 전송하고, 지능형 서버(200)로부터 도메인 및 카테고리에 대응하는 변형 발화 텍스트(또는 변형 발화 텍스트 세트)를 수신할 수 있다. 변형 발화 텍스트(또는 변형 발화 텍스트 세트)는 지능형 서버(200)에 기 저장된 사용자 발화 데이터에 기초하여 생성 모델 또는 전이 학습 모델을 통해 생성될 수 있다. 지능형 서버(200)는 사용자 발화를 입력 받는 사용자 단말이 지능형 서버(200)로 전달한 음성 데이터를 텍스트로 변환하여 사용자 발화 데이터로서 저장할 수 있다. 예를 들어, 상기 생성 모델은 GAN(Generative Adversarial Networks), VAE(Variational Autoencoder) 및 DNN(Deep Neural Network)를 포함하고, 상기 전이 학습 모델은 Style-transfer를 포함할 수 있다.According to various embodiments of the present disclosure, the developer terminal 500 transmits a domain and a category to the intelligent server 200, and receives variant speech text (or set of variant speech texts) corresponding to the domain and category from the intelligent server 200. You can. The modified speech text (or the modified speech text set) may be generated through a generation model or a transfer learning model based on user speech data previously stored in the intelligent server 200. The intelligent server 200 may convert voice data delivered by the user terminal receiving the user's speech to the intelligent server 200 into text and store it as user speech data. For example, the generation model includes Generative Adversarial Networks (GAN), Variant Autoencoder (VAE), and Deep Neural Network (DNN), and the transfer learning model may include a Style-transfer.

다양한 실시 예에 따르면, 개발자 단말(500)은 지능형 서버(200)로 도메인, 카테고리 및 사용자 발화 예시(예: 트레이닝 발화 텍스트 또는 트레이닝 발화 텍스트 세트)를 전송하고, 도메인, 카테고리 및 사용자 발화 예시에 대응하는 변형 발화 텍스트(또는 변형 발화 텍스트 세트)를 수신할 수 있다.According to various embodiments, the developer terminal 500 transmits a domain, category, and user utterance example (eg, training utterance text or training utterance text set) to the intelligent server 200 and corresponds to a domain, category, and user utterance example To receive a modified speech text (or a modified speech text set).

다양한 실시 예에 따르면, 개발자 단말(500)은 수신된 변형 발화 텍스트(또는 변형 발화 텍스트 세트)에 기초하여 트레이닝 발화 텍스트(또는 트레이닝 발화 텍스트 세트)에 포함된 하나의 파라미터(제1 파라미터)에 대응하여 복수의 제2 파라미터를 표시할 수 있다. 복수의 제2 파라미터 중 하나가 선택되는 경우, 개발자 단말(500)은 선택된 파라미터가 포함된 변형 발화 텍스트(또는 변형 발화 텍스트 세트)를 표시할 수 있다.According to various embodiments of the present disclosure, the developer terminal 500 corresponds to one parameter (first parameter) included in the training speech text (or training speech text set) based on the received variation speech text (or variation speech text set). By doing so, a plurality of second parameters can be displayed. When one of the plurality of second parameters is selected, the developer terminal 500 may display a modified speech text (or a modified speech text set) including the selected parameter.

다양한 실시 예에 따르면, 지능형 서버(200)는 개발자 단말(500)로부터 수신한 도메인을 제1 도메인으로 설정하고, 개발자 단말(500)로부터 수신한 카테고리 내에서 제1 도메인과 유사한 발화 패턴을 가지는 제2 도메인을 결정하고, 제2 도메인의 발화 패턴에 기초하여 변형 발화 텍스트를 생성할 수 있다. 예를 들어, 지능형 서버(200)는 제1 도메인에서 사용되는 의도와 유사한 의도가 사용되는 도메인을 상기 제2 도메인으로 결정할 수 있다. 또는 지능형 서버(200)는 트레이닝 발화 텍스트(또는 트레이닝 발화 텍스트 세트에 포함된 트레이닝 발화 텍스트)의 의도를 판별하고, 트레이닝 발화 텍스트의 의도와 유사한 의도가 사용되는 도메인을 제2 도메인으로 결정할 수 있다. 일 실시 예에 따르면, 지능형 서버(200)는 트레이닝 발화 텍스트(또는 트레이닝 발화 텍스트 세트)에 포함된 파라미터들을 판별하고, 상기 파라미터들과 유사한 제2 도메인의 파라미터들을 이용하여 변형 발화 텍스트 세트를 생성할 수 있다.According to various embodiments of the present disclosure, the intelligent server 200 sets the domain received from the developer terminal 500 as the first domain, and has a utterance pattern similar to the first domain within the category received from the developer terminal 500. The second domain may be determined, and a modified speech text may be generated based on the speech pattern of the second domain. For example, the intelligent server 200 may determine a domain in which an intent similar to that used in the first domain is used as the second domain. Alternatively, the intelligent server 200 may determine the intention of the training speech text (or training speech text included in the training speech text set), and determine a domain in which an intent similar to that of the training speech text is used as the second domain. According to an embodiment, the intelligent server 200 determines parameters included in the training speech text (or training speech text set), and generates a modified speech text set using parameters of a second domain similar to the parameters. You can.

다양한 실시 예에 따르면, 지능형 서버(200)는 트레이닝 발화 텍스트 세트에 포함된 트레이닝 발화 텍스트의 수가 기준 발화 횟수보다 작은 경우 변형 발화 텍스트(또는 변형 발화 텍스트 세트)를 생성할 수 있다. 예를 들어, 기준 발화 횟수는 도메인 별로 서로 다르게 설정될 수 있다. 수집되는 트레이닝 발화 텍스트의 수가 많은 도메인의 경우, 기준 발화 횟수는 상대적으로 크게 설정될 수 있다. 수집되는 트레이닝 발화 텍스트의 수가 적은 도메인의 경우, 기준 발화 횟수는 상대적으로 작게 설정될 수 있다.According to various embodiments of the present disclosure, when the number of training speech texts included in the training speech text set is smaller than the reference speech count, the intelligent server 200 may generate a variation speech text (or a variation speech text set). For example, the reference number of utterances may be set differently for each domain. In the case of a domain having a large number of training speech texts to be collected, the reference speech count may be set relatively large. In the case of domains having a small number of training speech texts to be collected, the reference speech count may be set relatively small.

다양한 실시 예에 따르면, 지능형 서버(200)는 사용자 발화 데이터에서 추출된 사용자 특성에 기초하여 변형 발화 텍스트를 생성할 수 있다. 또는 지능형 서버(200)는 사용자 특성에 기초하여 사용자 발화 패턴을 추출하고, 사용자 발화 패턴의 횟수가 기준 패턴 횟수보다 큰 경우 사용자 발화 패턴에 기초하여 변형 발화 텍스트를 생성할 수 있다. 기준 패턴 횟수는 사용자 발화 패턴의 발화량 또는 사용자 발화 패턴에 포함된 파라미터의 수에 기초하여 결정될 수 있다. 예를 들어, 사용자 특성은 나이, 지역 및 성별을 포함할 수 있다.According to various embodiments of the present disclosure, the intelligent server 200 may generate a modified speech text based on user characteristics extracted from user speech data. Alternatively, the intelligent server 200 may extract a user's speech pattern based on the user's characteristics, and generate a modified speech text based on the user's speech pattern when the number of user speech patterns is greater than the reference pattern number. The reference pattern number of times may be determined based on the utterance amount of the user utterance pattern or the number of parameters included in the user utterance pattern. For example, user characteristics may include age, region, and gender.

다양한 실시 예에 따르면, 지능형 서버(200)는 사용자 발화 데이터에 기초하여 사용자 발화 분류 정보를 생성하고, 사용자 발화 분류 정보에 기초하여 변형 발화 텍스트를 생성할 수 있다. 예를 들어, 사용자 발화 분류 정보는 사용자 발화 데이터에 포함된 사용자 발화들의 도메인 정보, 의도 정도 및 파라미터 정보를 포함할 수 있다.According to various embodiments of the present disclosure, the intelligent server 200 may generate user utterance classification information based on user utterance data, and generate modified utterance text based on user utterance classification information. For example, the user speech classification information may include domain information, degree of intention, and parameter information of user speeches included in the user speech data.

다양한 실시 예에 따르면, 지능형 서버(200)는 사용자 발화 데이터에서 노이즈를 제거하고, 사용자 발화 데이터에서 패턴화된 샘플 패턴을 추출하고, 사용자 발화 데이터에서 트레이닝 발화 텍스트(또는 트레이닝 발화 텍스트 세트)와 의미적으로 연관이 없는 사용자 발화를 제거할 수 있다.According to various embodiments, the intelligent server 200 removes noise from the user's speech data, extracts a patterned sample pattern from the user's speech data, and the meaning of the training speech text (or training speech text set) from the user speech data It can eliminate unrelated user speech.

도 11은 본 발명의 일 실시 예에 따른 사용자 발화 시 사용자에게 변형 발화 텍스트를 추천하는 방법을 나타내는 도면이다.11 is a diagram illustrating a method of recommending a modified utterance text to a user when a user speaks according to an embodiment of the present invention.

도 11을 참조하면, 사용자 단말(예: 도 4의 사용자 단말(100))은 사용자 발화(1101)를 수신하여 사용자 발화(1101)와 유사한 변형 발화 텍스트를 제공할 수 있다.Referring to FIG. 11, a user terminal (eg, the user terminal 100 of FIG. 4) may receive a user speech 1101 and provide a modified speech text similar to the user speech 1101.

일 실시 예에 따르면, 사용자 단말은 사용자 발화(1101)를 발화 텍스트(1111)로 변환하여 제1 화면(1110)에 표시할 수 있다. 사용자 단말은 제1 화면(1110)에 결과 보기 항목(1112)을 표시할 수 있다. 사용자가 결과 보기 항목(1112)을 선택하면, 사용자 단말은 발화 텍스트(1111)에 기초하여 탐색된 결과(예: 발화 텍스트(1111)에 대응하는 패스 룰의 실행)를 디스플레이에 표시할 수 있다.According to an embodiment, the user terminal may convert the user's utterance 1101 into utterance text 1111 and display it on the first screen 1110. The user terminal may display the result view item 1112 on the first screen 1110. When the user selects the view result item 1112, the user terminal may display the searched result (eg, execution of a pass rule corresponding to the utterance text 1111) based on the utterance text 1111 on the display.

일 실시 예에 따르면, 사용자가 변형 발화 추천 항목(1113)을 선택하면, 사용자 단말은 제2 화면(1120)을 표시할 수 있다. 사용자 단말은 제2 화면(1120)에서 사용자 발화(1101)에 대응하는 발화 텍스트(1121)를 표시하고, 발화 텍스트(1121)에 기초하여 변형 발화 텍스트(1122, 1123, 1124)를 표시할 수 있다. 사용자 단말은 사용자 발화(1101)에 대응하는 사용자 입력(예: 음성 데이터)을 지능형 서버(예: 도 4의 지능형 서버(200))로 전송할 수 있다. 지능형 서버는 수신된 사용자 입력에 대응하는 변형 발화 텍스트 세트를 사용자 단말로 전송할 수 있다. 변형 발화 텍스트 세트는 미리 생성되어 저장되거나 수신된 사용자 입력에 기초하여 새롭게 생성될 수 있다. 변형 발화 텍스트 세트는 도 4 내지 도 8에서 설명된 방법으로 생성될 수 있다.According to an embodiment, when the user selects the modified utterance recommendation item 1113, the user terminal may display the second screen 1120. The user terminal may display the utterance text 1121 corresponding to the user's utterance 1101 on the second screen 1120, and display the modified utterance text 1122, 1123, 1124 based on the utterance text 1121. . The user terminal may transmit a user input (eg, voice data) corresponding to the user's utterance 1101 to an intelligent server (eg, the intelligent server 200 of FIG. 4). The intelligent server may transmit the set of modified speech texts corresponding to the received user input to the user terminal. The modified speech text set may be generated in advance and stored or newly generated based on the received user input. The modified speech text set can be generated by the method described in FIGS. 4 to 8.

상술한 바와 같이, 다양한 실시 예에 따르면, 사용자 단말은 사용자 발화(1101)가 입력될 때 변형 발화 추천 항목(1113)을 제공할 수 있다. 사용자 단말은 사용자가 변형 발화 추천 항목(1113)을 선택할 때 변형 발화 텍스트(1122, 1123, 1124)를 제공할 수 있다. 따라서, 사용자 단말은 사용자 발화 패턴과 유사한 발화 텍스트를 제공할 수 있다. 예를 들어, 사용자 단말은 사용자 발화(예: “전화 닫아주삼”)에 대하여 대표 발화(예: “전화 종료”)가 아닌 사용자 발화 패턴과 유사한(사용자에게 익숙한) 발화 텍스트(예: “전화 꺼주삼”)를 추천할 수 있다.As described above, according to various embodiments, the user terminal may provide a modified speech recommendation item 1113 when the user speech 1101 is input. When the user selects the modified speech recommendation item 1113, the user terminal may provide the modified speech texts 1122, 1123, and 1124. Accordingly, the user terminal can provide the spoken text similar to the user's spoken pattern. For example, the user terminal is a user utterance (for example, “close the phone”), but a utterance text similar to the user utterance pattern (user-friendly), not a representative utterance (for example, “End Call”), such as “Turn off the phone. Jusam ”) can be recommended.

도 12은 다양한 실시 예들에 따른, 네트워크 환경(1200) 내의 전자 장치(1201)의 블럭도이다. 도 12을 참조하면, 네트워크 환경(1200)에서 전자 장치(1201)(예: 사용자 단말(100))는 제1 네트워크(1298)(예: 근거리 무선 통신)를 통하여 전자 장치(1202)와 통신하거나, 또는 제2 네트워크(1299)(예: 원거리 무선 통신)를 통하여 전자 장치(1204) 또는 서버(1208)(예: 지능형 서버(200))와 통신할 수 있다. 일 실시 예에 따르면, 전자 장치(1201)는 서버(1208)를 통하여 전자 장치(1204)와 통신할 수 있다. 일 실시 예에 따르면, 전자 장치(1201)는 프로세서(1220)(예: 프로세서(160)), 메모리(1230)(예: 메모리(150)), 입력 장치(1250)(예: 마이크(120)), 음향 출력 장치(1255)(예: 스피커(130)), 표시 장치(1260)(예: 디스플레이(140)), 오디오 모듈(1270), 센서 모듈(1276), 인터페이스(1277), 햅틱 모듈(1279), 카메라 모듈(1280), 전력 관리 모듈(1288), 배터리(1289), 통신 모듈(1290), 가입자 식별 모듈(1296), 및 안테나 모듈(1297)을 포함할 수 있다. 어떤 실시 예에서는, 전자 장치(1201)에는, 이 구성 요소들 중 적어도 하나(예: 표시 장치(1260) 또는 카메라 모듈(1280))가 생략되거나 다른 구성 요소가 추가될 수 있다. 어떤 실시 예에서는, 예를 들면, 표시 장치(1260)(예: 디스플레이)에 임베디드된 센서 모듈(1276)(예: 지문 센서, 홍채 센서, 또는 조도 센서)의 경우와 같이, 일부의 구성 요소들이 통합되어 구현될 수 있다.12 is a block diagram of an electronic device 1201 in a network environment 1200 according to various embodiments. Referring to FIG. 12, in the network environment 1200, the electronic device 1201 (eg, the user terminal 100) communicates with the electronic device 1202 through the first network 1298 (eg, short-range wireless communication) or Alternatively, the electronic device 1204 or the server 1208 (eg, the intelligent server 200) may be communicated through the second network 1299 (eg, remote wireless communication). According to an embodiment, the electronic device 1201 may communicate with the electronic device 1204 through the server 1208. According to an embodiment, the electronic device 1201 includes a processor 1220 (eg, the processor 160), a memory 1230 (eg, the memory 150), an input device 1250 (eg, the microphone 120) ), Sound output device 1255 (e.g., speaker 130), display device 1260 (e.g., display 140), audio module 1270, sensor module 1276, interface 1277, haptic module 1279, a camera module 1280, a power management module 1288, a battery 1289, a communication module 1290, a subscriber identification module 1296, and an antenna module 1297. In some embodiments, at least one of the components (eg, the display device 1260 or the camera module 1280) may be omitted or another component may be added to the electronic device 1201. In some embodiments, some components, such as, for example, the sensor module 1276 embedded in the display device 1260 (eg, a display) (eg, a fingerprint sensor, an iris sensor, or an illuminance sensor) It can be integrated and implemented.

프로세서(1220)는, 예를 들면, 소프트웨어(예: 프로그램(1240))를 구동하여 프로세서(1220)에 연결된 전자 장치(1201)의 적어도 하나의 다른 구성 요소(예: 하드웨어 또는 소프트웨어 구성 요소)을 제어할 수 있고, 다양한 데이터 처리 및 연산을 수행할 수 있다. 프로세서(1220)는 다른 구성 요소(예: 센서 모듈(1276) 또는 통신 모듈(1290))로부터 수신된 명령 또는 데이터를 휘발성 메모리(1232)에 로드하여 처리하고, 결과 데이터를 비휘발성 메모리(1234)에 저장할 수 있다. 일 실시 예에 따르면, 프로세서(1220)는 메인 프로세서(1221)(예: 중앙 처리 장치 또는 어플리케이션 프로세서), 및 이와는 독립적으로 운영되고, 추가적으로 또는 대체적으로, 메인 프로세서(1221)보다 저전력을 사용하거나, 또는 지정된 기능에 특화된 보조 프로세서(1223)(예: 그래픽 처리 장치, 이미지 시그널 프로세서, 센서 허브 프로세서, 또는 커뮤니케이션 프로세서)를 포함할 수 있다. 여기서, 보조 프로세서(1223)는 메인 프로세서(1221)와 별개로 또는 임베디드되어 운영될 수 있다.The processor 1220, for example, drives software (eg, the program 1240) to execute at least one other component (eg, hardware or software component) of the electronic device 1201 connected to the processor 1220. It can be controlled and can perform various data processing and operations. The processor 1220 loads and processes commands or data received from other components (eg, the sensor module 1276 or the communication module 1290) into the volatile memory 1232, and processes the result data into the non-volatile memory 1234 Can be stored in. According to one embodiment, the processor 1220 is a main processor 1221 (for example, a central processing unit or an application processor), and operates independently thereof, additionally or alternatively, uses less power than the main processor 1221, or Or it may include an auxiliary processor 1223 (eg, a graphic processing device, an image signal processor, a sensor hub processor, or a communication processor) specialized for a designated function. Here, the coprocessor 1223 may be operated separately from the main processor 1221 or embedded.

이런 경우, 보조 프로세서(1223)는, 예를 들면, 메인 프로세서(1221)가 인액티브(예: 슬립) 상태에 있는 동안 메인 프로세서(1221)를 대신하여, 또는 메인 프로세서(1221)가 액티브(예: 어플리케이션 수행) 상태에 있는 동안 메인 프로세서(1221)와 함께, 전자 장치(1201)의 구성 요소들 중 적어도 하나의 구성 요소(예: 표시 장치(1260), 센서 모듈(1276), 또는 통신 모듈(1290))와 관련된 기능 또는 상태들의 적어도 일부를 제어할 수 있다. 일 실시 예에 따르면, 보조 프로세서(1223)(예: 이미지 시그널 프로세서 또는 커뮤니케이션 프로세서)는 기능적으로 관련 있는 다른 구성 요소(예: 카메라 모듈(1280) 또는 통신 모듈(1290))의 일부 구성 요소로서 구현될 수 있다. In this case, the coprocessor 1223 may, for example, replace the main processor 1221 while the main processor 1221 is in an inactive (eg, sleep) state, or the main processor 1221 may be active (eg, : While performing the application), along with the main processor 1221, at least one of the components of the electronic device 1201 (eg, the display device 1260, the sensor module 1276, or the communication module ( 1290)). According to an embodiment, the coprocessor 1223 (eg, an image signal processor or communication processor) is implemented as some component of another functionally related component (eg, the camera module 1280 or the communication module 1290). Can be.

메모리(1230)는, 전자 장치(1201)의 적어도 하나의 구성 요소(예: 프로세서(1220) 또는 센서 모듈(1276))에 의해 사용되는 다양한 데이터, 예를 들어, 소프트웨어(예: 프로그램(1240)) 및, 이와 관련된 명령에 대한 입력 데이터 또는 출력 데이터를 저장할 수 있다. 메모리(1230)는 휘발성 메모리(1232) 또는 비휘발성 메모리(1234)를 포함할 수 있다.The memory 1230 includes various data used by at least one component of the electronic device 1201 (for example, the processor 1220 or the sensor module 1276), for example, software (eg, the program 1240). ), And input data or output data for a command related thereto. The memory 1230 may include a volatile memory 1232 or a nonvolatile memory 1234.

프로그램(1240)은 메모리(1230)에 저장되는 소프트웨어로서, 예를 들면, 운영 체제(1242), 미들 웨어(1244) 또는 어플리케이션(1246)을 포함할 수 있다.The program 1240 is software stored in the memory 1230, and may include, for example, an operating system 1242, middleware 1244, or an application 1246.

입력 장치(1250)는, 전자 장치(1201)의 구성 요소(예: 프로세서(1220))에 사용될 명령 또는 데이터를 전자 장치(1201)의 외부(예: 사용자)로부터 수신하기 위한 장치로서, 예를 들면, 마이크, 마우스, 또는 키보드를 포함할 수 있다.The input device 1250 is a device for receiving commands or data to be used for components (eg, the processor 1220) of the electronic device 1201 from outside (eg, a user) of the electronic device 1201, for example For example, it may include a microphone, mouse, or keyboard.

음향 출력 장치(1255)는 음향 신호를 전자 장치(1201)의 외부로 출력하기 위한 장치로서, 예를 들면, 멀티미디어 재생 또는 녹음 재생과 같이 일반적인 용도로 사용되는 스피커와 전화 수신 전용으로 사용되는 리시버를 포함할 수 있다. 일 실시 예에 따르면, 리시버는 스피커와 일체 또는 별도로 형성될 수 있다.The audio output device 1255 is a device for outputting an audio signal to the outside of the electronic device 1201, for example, a speaker used for general purposes, such as multimedia playback or recording playback, and a receiver used only for receiving calls. It can contain. According to one embodiment, the receiver may be formed integrally or separately from the speaker.

표시 장치(1260)는 전자 장치(1201)의 사용자에게 정보를 시각적으로 제공하기 위한 장치로서, 예를 들면, 디스플레이, 홀로그램 장치, 또는 프로젝터 및 해당 장치를 제어하기 위한 제어 회로를 포함할 수 있다. 일 실시 예에 따르면, 표시 장치(1260)는 터치 회로(touch circuitry) 또는 터치에 대한 압력의 세기를 측정할 수 있는 압력 센서를 포함할 수 있다.The display device 1260 is a device for visually providing information to a user of the electronic device 1201, and may include, for example, a display, a hologram device, or a projector and a control circuit for controlling the device. According to an embodiment, the display device 1260 may include a touch circuitry or a pressure sensor capable of measuring the intensity of pressure on the touch.

오디오 모듈(1270)은 소리와 전기 신호를 쌍방향으로 변환시킬 수 있다. 일 실시 예에 따르면, 오디오 모듈(1270)은, 입력 장치(1250)를 통해 소리를 획득하거나, 음향 출력 장치(1255), 또는 전자 장치(1201)와 유선 또는 무선으로 연결된 외부 전자 장치(예: 전자 장치(1202)(예: 스피커 또는 헤드폰))를 통해 소리를 출력할 수 있다.The audio module 1270 may convert sound and electric signals in both directions. According to an embodiment of the present disclosure, the audio module 1270 acquires sound through the input device 1250, or an external electronic device connected to the sound output device 1255 or the electronic device 1201 by wire or wireless (for example: Sound may be output through the electronic device 1202 (eg, a speaker or headphones).

센서 모듈(1276)은 전자 장치(1201)의 내부의 작동 상태(예: 전력 또는 온도), 또는 외부의 환경 상태에 대응하는 전기 신호 또는 데이터 값을 생성할 수 있다. 센서 모듈(1276)은, 예를 들면, 제스처 센서, 자이로 센서, 기압 센서, 마그네틱 센서, 가속도 센서, 그립 센서, 근접 센서, 컬러 센서, IR(infrared) 센서, 생체 센서, 온도 센서, 습도 센서, 또는 조도 센서를 포함할 수 있다.The sensor module 1276 may generate an electrical signal or data value corresponding to an internal operating state (eg, power or temperature) of the electronic device 1201 or an external environmental state. The sensor module 1276 includes, for example, a gesture sensor, a gyro sensor, a barometric pressure sensor, a magnetic sensor, an acceleration sensor, a grip sensor, a proximity sensor, a color sensor, an infrared (IR) sensor, a bio sensor, a temperature sensor, a humidity sensor, Or it may include an illuminance sensor.

인터페이스(1277)는 외부 전자 장치(예: 전자 장치(1202))와 유선 또는 무선으로 연결할 수 있는 지정된 프로토콜을 지원할 수 있다. 일 실시 예에 따르면, 인터페이스(1277)는 HDMI(high definition multimedia interface), USB(universal serial bus) 인터페이스, SD카드 인터페이스, 또는 오디오 인터페이스를 포함할 수 있다.The interface 1277 may support a designated protocol that can be connected to an external electronic device (eg, the electronic device 1202) by wire or wirelessly. According to an embodiment, the interface 1277 may include a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, an SD card interface, or an audio interface.

연결 단자(1278)는 전자 장치(1201)와 외부 전자 장치(예: 전자 장치(1202))를 물리적으로 연결시킬 수 있는 커넥터, 예를 들면, HDMI 커넥터, USB 커넥터, SD 카드 커넥터, 또는 오디오 커넥터(예: 헤드폰 커넥터)를 포함할 수 있다.The connection terminal 1278 is a connector that can physically connect the electronic device 1201 and an external electronic device (eg, the electronic device 1202), for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (Eg, headphone connector).

햅틱 모듈(1279)은 전기적 신호를 사용자가 촉각 또는 운동 감각을 통해서 인지할 수 있는 기계적인 자극(예: 진동 또는 움직임) 또는 전기적인 자극으로 변환할 수 있다. 햅틱 모듈(1279)은, 예를 들면, 모터, 압전 소자, 또는 전기 자극 장치를 포함할 수 있다.The haptic module 1279 may convert electrical signals into mechanical stimuli (eg, vibration or movement) or electrical stimuli that the user can perceive through tactile or motor sensations. The haptic module 1279 may include, for example, a motor, a piezoelectric element, or an electrical stimulation device.

카메라 모듈(1280)은 정지 영상 및 동영상을 촬영할 수 있다. 일 실시 예에 따르면, 카메라 모듈(1280)은 하나 이상의 렌즈, 이미지 센서, 이미지 시그널 프로세서, 또는 플래시를 포함할 수 있다.The camera module 1280 may capture still images and videos. According to one embodiment, the camera module 1280 may include one or more lenses, an image sensor, an image signal processor, or a flash.

전력 관리 모듈(1288)은 전자 장치(1201)에 공급되는 전력을 관리하기 위한 모듈로서, 예를 들면, PMIC(power management integrated circuit)의 적어도 일부로서 구성될 수 있다.The power management module 1288 is a module for managing power supplied to the electronic device 1201, and may be configured, for example, as at least a part of a power management integrated circuit (PMIC).

배터리(1289)는 전자 장치(1201)의 적어도 하나의 구성 요소에 전력을 공급하기 위한 장치로서, 예를 들면, 재충전 불가능한 1차 전지, 재충전 가능한 2차 전지 또는 연료 전지를 포함할 수 있다.The battery 1289 is a device for supplying power to at least one component of the electronic device 1201, and may include, for example, a non-rechargeable primary cell, a rechargeable secondary cell, or a fuel cell.

통신 모듈(1290)은 전자 장치(1201)와 외부 전자 장치(예: 전자 장치(1202), 전자 장치(1204), 또는 서버(1208))간의 유선 또는 무선 통신 채널의 수립, 및 수립된 통신 채널을 통한 통신 수행을 지원할 수 있다. 통신 모듈(1290)은 프로세서(1220)(예: 어플리케이션 프로세서)와 독립적으로 운영되는, 유선 통신 또는 무선 통신을 지원하는 하나 이상의 커뮤니케이션 프로세서를 포함할 수 있다. 일 실시 예에 따르면, 통신 모듈(1290)은 무선 통신 모듈(1292)(예: 셀룰러 통신 모듈, 근거리 무선 통신 모듈, 또는 GNSS(global navigation satellite system) 통신 모듈) 또는 유선 통신 모듈(1294)(예: LAN(local area network) 통신 모듈, 또는 전력선 통신 모듈)을 포함하고, 그 중 해당하는 통신 모듈을 이용하여 제1 네트워크(1298)(예: 블루투스, WiFi direct 또는 IrDA(infrared data association) 같은 근거리 통신 네트워크) 또는 제2 네트워크(1299)(예: 셀룰러 네트워크, 인터넷, 또는 컴퓨터 네트워크(예: LAN 또는 WAN)와 같은 원거리 통신 네트워크)를 통하여 외부 전자 장치와 통신할 수 있다. 상술한 여러 종류의 통신 모듈(1290)은 하나의 칩으로 구현되거나 또는 각각 별도의 칩으로 구현될 수 있다.The communication module 1290 establishes a wired or wireless communication channel between the electronic device 1201 and an external electronic device (eg, the electronic device 1202, the electronic device 1204, or the server 1208), and an established communication channel It can support the execution of communication through. The communication module 1290 may include one or more communication processors supporting wired communication or wireless communication, which are operated independently of the processor 1220 (eg, an application processor). According to one embodiment, the communication module 1290 may include a wireless communication module 1292 (eg, a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module) or a wired communication module 1294 (eg : Includes a local area network (LAN) communication module or a power line communication module, and a short-range such as a first network 1298 (for example, Bluetooth, WiFi direct, or infrared data association) using a corresponding communication module A communication network) or a second network 1299 (eg, a cellular network, the Internet, or a telecommunication network such as a computer network (eg, a LAN or WAN)) to communicate with external electronic devices. The various types of communication modules 1290 described above may be implemented as one chip or may be implemented as separate chips.

일 실시 예에 따르면, 무선 통신 모듈(1292)은 가입자 식별 모듈(1296)에 저장된 사용자 정보를 이용하여 통신 네트워크 내에서 전자 장치(1201)를 구별 및 인증할 수 있다.According to an embodiment, the wireless communication module 1292 may distinguish and authenticate the electronic device 1201 within the communication network using user information stored in the subscriber identification module 1296.

안테나 모듈(1297)은 신호 또는 전력을 외부로 송신하거나 외부로부터 수신하기 위한 하나 이상의 안테나들을 포함할 수 있다. 일 실시 예에 따르면, 통신 모듈(1290)(예: 무선 통신 모듈(1292))은 통신 방식에 적합한 안테나를 통하여 신호를 외부 전자 장치로 송신하거나, 외부 전자 장치로부터 수신할 수 있다.The antenna module 1297 may include one or more antennas for transmitting a signal or power to the outside or receiving it from the outside. According to an embodiment, the communication module 1290 (eg, the wireless communication module 1292) may transmit a signal to an external electronic device through an antenna suitable for a communication method, or receive a signal from the external electronic device.

상기 구성 요소들 중 일부 구성 요소들은 주변 기기들 간 통신 방식(예: 버스, GPIO(general purpose input/output), SPI(serial peripheral interface), 또는 MIPI(mobile industry processor interface))를 통해 서로 연결되어 신호(예: 명령 또는 데이터)를 상호간에 교환할 수 있다.Some of the above components are connected to each other through a communication method between peripheral devices (for example, a bus, a general purpose input / output (GPIO), a serial peripheral interface (SPI), or a mobile industry processor interface (MIPI)). Signals (eg commands or data) can be exchanged with each other.

일 실시 예에 따르면, 명령 또는 데이터는 제2 네트워크(1299)에 연결된 서버(1208)를 통해서 전자 장치(1201)와 외부의 전자 장치(1204)간에 송신 또는 수신될 수 있다. 전자 장치(1202, 1704) 각각은 전자 장치(1201)와 동일한 또는 다른 종류의 장치일 수 있다. 일 실시 예에 따르면, 전자 장치(1201)에서 실행되는 동작들의 전부 또는 일부는 다른 하나 또는 복수의 외부 전자 장치에서 실행될 수 있다. 일 실시 예에 따르면, 전자 장치(1201)가 어떤 기능이나 서비스를 자동으로 또는 요청에 의하여 수행해야 할 경우에, 전자 장치(1201)는 기능 또는 서비스를 자체적으로 실행시키는 대신에 또는 추가적으로, 그와 연관된 적어도 일부 기능을 외부 전자 장치에게 요청할 수 있다. 상기 요청을 수신한 외부 전자 장치는 요청된 기능 또는 추가 기능을 실행하고, 그 결과를 전자 장치(1201)로 전달할 수 있다. 전자 장치(1201)는 수신된 결과를 그대로 또는 추가적으로 처리하여 요청된 기능이나 서비스를 제공할 수 있다. 이를 위하여, 예를 들면, 클라우드 컴퓨팅, 분산 컴퓨팅, 또는 클라이언트-서버 컴퓨팅 기술이 이용될 수 있다. According to an embodiment, the command or data may be transmitted or received between the electronic device 1201 and an external electronic device 1204 through the server 1208 connected to the second network 1299. Each of the electronic devices 1202 and 1704 may be the same or a different type of device from the electronic device 1201. According to an embodiment, all or some of the operations executed in the electronic device 1201 may be executed in another one or a plurality of external electronic devices. According to an embodiment, when the electronic device 1201 needs to perform a function or service automatically or by request, the electronic device 1201 may instead execute or additionally execute the function or service itself, and The external electronic device may request at least some related functions. The external electronic device receiving the request may execute a requested function or an additional function, and deliver the result to the electronic device 1201. The electronic device 1201 may process the received result as it is or additionally to provide the requested function or service. To this end, for example, cloud computing, distributed computing, or client-server computing technology can be used.

본 문서에 개시된 다양한 실시 예들에 따른 전자 장치는 다양한 형태의 장치가 될 수 있다. 전자 장치는, 예를 들면, 휴대용 통신 장치 (예: 스마트폰), 컴퓨터 장치, 휴대용 멀티미디어 장치, 휴대용 의료 기기, 카메라, 웨어러블 장치, 또는 가전 장치 중 적어도 하나를 포함할 수 있다. 본 문서의 실시 예에 따른 전자 장치는 전술한 기기들에 한정되지 않는다.An electronic device according to various embodiments disclosed in this document may be a device of various types. The electronic device may include, for example, at least one of a portable communication device (eg, a smart phone), a computer device, a portable multimedia device, a portable medical device, a camera, a wearable device, or a home appliance. The electronic device according to the exemplary embodiment of the present document is not limited to the aforementioned devices.

본 문서의 다양한 실시 예들 및 이에 사용된 용어들은 본 문서에 기재된 기술을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 해당 실시 예의 다양한 변경, 균등물, 및/또는 대체물을 포함하는 것으로 이해되어야 한다. 도면의 설명과 관련하여, 유사한 구성 요소에 대해서는 유사한 참조 부호가 사용될 수 있다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. 본 문서에서, "A 또는 B", "A 및/또는 B 중 적어도 하나", "A, B 또는 C" 또는 "A, B 및/또는 C 중 적어도 하나" 등의 표현은 함께 나열된 항목들의 모든 가능한 조합을 포함할 수 있다. "제1", "제2", "첫째" 또는 "둘째" 등의 표현들은 해당 구성 요소들을, 순서 또는 중요도에 상관없이 수식할 수 있고, 한 구성 요소를 다른 구성 요소와 구분하기 위해 사용될 뿐 해당 구성 요소들을 한정하지 않는다. 어떤(예: 제1) 구성 요소가 다른(예: 제2) 구성 요소에 "(기능적으로 또는 통신적으로) 연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 상기 어떤 구성 요소가 상기 다른 구성 요소에 직접적으로 연결되거나, 다른 구성 요소(예: 제3 구성 요소)를 통하여 연결될 수 있다.It should be understood that various embodiments of the document and terms used therein are not intended to limit the technology described in this document to specific embodiments, and include various modifications, equivalents, and / or replacements of the embodiments. In connection with the description of the drawings, similar reference numerals may be used for similar components. Singular expressions may include plural expressions unless the context clearly indicates otherwise. In this document, expressions such as "A or B", "at least one of A and / or B", "A, B or C" or "at least one of A, B and / or C", etc. are all of the items listed together. Possible combinations may be included. Expressions such as "first", "second", "first" or "second" can modify the components, regardless of order or importance, and are used only to distinguish one component from another component The components are not limited. When it is mentioned that one (eg, first) component is “connected (functionally or communicatively)” to another (eg, second) component or is “connected,” the component is the other It may be directly connected to the component, or may be connected through another component (eg, a third component).

본 문서에서 사용된 용어 "모듈"은 하드웨어, 소프트웨어 또는 펌웨어로 구성된 유닛을 포함하며, 예를 들면, 로직, 논리 블록, 부품, 또는 회로 등의 용어와 상호 호환적으로 사용될 수 있다. 모듈은, 일체로 구성된 부품 또는 하나 또는 그 이상의 기능을 수행하는 최소 단위 또는 그 일부가 될 수 있다. 예를 들면, 모듈은 ASIC(application-specific integrated circuit)으로 구성될 수 있다.As used herein, the term "module" includes units composed of hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic blocks, components, or circuits. The module may be an integrally configured component or a minimum unit that performs one or more functions or a part thereof. For example, the module can be configured with an application-specific integrated circuit (ASIC).

본 문서의 다양한 실시 예들은 기기(machine)(예: 컴퓨터)로 읽을 수 있는 저장 매체(machine-readable storage media)(예: 내장 메모리(1236) 또는 외장 메모리(1238))에 저장된 명령어를 포함하는 소프트웨어(예: 프로그램(1240))로 구현될 수 있다. 기기는, 저장 매체로부터 저장된 명령어를 호출하고, 호출된 명령어에 따라 동작이 가능한 장치로서, 개시된 실시 예들에 따른 전자 장치(예: 전자 장치(1201))를 포함할 수 있다. 상기 명령이 프로세서(예: 프로세서(1220))에 의해 실행될 경우, 프로세서가 직접, 또는 상기 프로세서의 제어 하에 다른 구성 요소들을 이용하여 상기 명령에 해당하는 기능을 수행할 수 있다. 명령은 컴파일러 또는 인터프리터에 의해 생성 또는 실행되는 코드를 포함할 수 있다. 기기로 읽을 수 있는 저장 매체는, 비일시적(non-transitory) 저장 매체의 형태로 제공될 수 있다. 여기서, '비일시적'은 저장 매체가 신호(signal)를 포함하지 않으며 실재(tangible)한다는 것을 의미할 뿐 데이터가 저장 매체에 반영구적 또는 임시적으로 저장됨을 구분하지 않는다.Various embodiments of the present document include instructions stored in a machine-readable storage media (eg, internal memory 1236 or external memory 1238). It may be implemented in software (eg, program 1240). The device may include an electronic device (eg, the electronic device 1201) according to the disclosed embodiments as a device capable of invoking a stored command from a storage medium and operating according to the called command. When the instruction is executed by a processor (for example, the processor 1220), the processor may perform a function corresponding to the instruction directly or by using other components under the control of the processor. Instructions can include code generated or executed by a compiler or interpreter. The storage medium readable by the device may be provided in the form of a non-transitory storage medium. Here, 'non-transitory' means that the storage medium does not contain a signal and is tangible, but does not distinguish between data being stored semi-permanently or temporarily on the storage medium.

일 실시 예에 따르면, 본 문서에 개시된 다양한 실시 예들에 따른 방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다. 컴퓨터 프로그램 제품은 기기로 읽을 수 있는 저장 매체(예: compact disc read only memory (CD-ROM))의 형태로, 또는 어플리케이션 스토어(예: 플레이 스토어^TM)를 통해 온라인으로 배포될 수 있다. 온라인 배포의 경우에, 컴퓨터 프로그램 제품의 적어도 일부는 제조사의 서버, 어플리케이션 스토어의 서버, 또는 중계 서버의 메모리와 같은 저장 매체에 적어도 일시 저장되거나, 임시적으로 생성될 수 있다.According to one embodiment, a method according to various embodiments disclosed in this document may be provided as being included in a computer program product. Computer program products are commodities that can be traded between sellers and buyers. The computer program product may be distributed online in the form of a storage medium readable by the device (eg compact disc read only memory (CD-ROM)) or through an application store (eg Play Store ^TM ). In the case of online distribution, at least a portion of the computer program product may be stored at least temporarily on a storage medium such as a memory of a manufacturer's server, an application store's server, or a relay server, or may be temporarily generated.

다양한 실시 예들에 따른 구성 요소(예: 모듈 또는 프로그램) 각각은 단수 또는 복수의 개체로 구성될 수 있으며, 전술한 해당 서브 구성 요소들 중 일부 서브 구성 요소가 생략되거나, 또는 다른 서브 구성 요소가 다양한 실시 예에 더 포함될 수 있다. 대체적으로 또는 추가적으로, 일부 구성 요소들(예: 모듈 또는 프로그램)은 하나의 개체로 통합되어, 통합되기 이전의 각각의 해당 구성 요소에 의해 수행되는 기능을 동일 또는 유사하게 수행할 수 있다. 다양한 실시 예들에 따른, 모듈, 프로그램 또는 다른 구성 요소에 의해 수행되는 동작들은 순차적, 병렬적, 반복적 또는 휴리스틱하게 실행되거나, 적어도 일부 동작이 다른 순서로 실행되거나, 생략되거나, 또는 다른 동작이 추가될 수 있다.Each component (eg, module or program) according to various embodiments may be composed of a singular or a plurality of entities, and some of the aforementioned sub-components may be omitted, or other sub-components may be various. It may be further included in the embodiment. Alternatively or additionally, some components (eg, modules or programs) may be integrated into one entity, performing the same or similar functions performed by each corresponding component before being integrated. According to various embodiments, operations performed by a module, program, or other component may be sequentially, parallelly, repeatedly, or heuristically executed, at least some operations may be executed in a different order, omitted, or other operations may be added. You can.

Claims

In the operation method of the electronic device to communicate with the server,
Receiving a domain and a category;
Transmitting the domain and the category to the server;
Receiving modified speech text corresponding to the domain and the category from the server; And
And displaying the modified speech text,
The modified speech text is generated through a generation model or a transfer learning model based on user speech data previously stored in the server,
The user utterance data is a method of operation of an electronic device in which the server converts and stores the voice data transmitted to the server by an external electronic device that receives a user utterance.

The method according to claim 1,
The generation model includes GAN (Generative Adversarial Networks), VAE (Variational Autoencoder) and DNN (Deep Neural Network),
The transfer learning model includes a style-transfer method of operating an electronic device.

The method according to claim 1,
The server,
Set the domain as a first domain, determine a second domain having a utterance pattern similar to the first domain in the category, and set the generated utterance text based on the utterance pattern of the second domain How the electronic device operates.

The method according to claim 3,
The server,
A method of operation of an electronic device that is set to determine a domain in which an intent similar to that used in the first domain is used as the second domain.

The method according to claim 1,
The server,
A method of operation of an electronic device configured to generate the modified speech text based on user characteristics extracted from the user speech data.

The method according to claim 5,
The server,
A method of operation of an electronic device configured to extract a user's speech pattern based on the user characteristic and generate the modified speech text based on the user's speech pattern when the number of times the user's speech pattern is greater than a reference pattern number.

The method according to claim 6,
The reference pattern number of times is determined based on the utterance amount of the user utterance pattern or the number of parameters included in the user utterance pattern.

The method according to claim 5,
The user characteristic includes the age, region and gender.

The method according to claim 1,
The server,
It is set to generate user speech classification information based on the user speech data, and to generate the modified speech text based on the user speech classification information,
The user utterance classification information includes domain information, degree of intention, and parameter information of user utterances included in the user utterance data.

In the operation method of the electronic device to communicate with the server,
Receiving a domain and a category;
Receiving a set of training speech texts corresponding to the domain and the category;
Transmitting the domain, the category, and the training speech text set to the server;
Receiving a modified speech text set corresponding to the training speech text set from the server; And
Displaying the set of modified speech text,
The modified speech text set is generated through a generation model or a transfer learning model based on user speech data previously stored in the server,
The user utterance data is a method of operation of an electronic device in which the server converts and stores the voice data transmitted to the server by an external electronic device that receives a user utterance.

The method according to claim 10,
The server,
An electronic device configured to remove noise from the user's speech data, extract a patterned sample pattern from the user's speech data, and remove a user's speech that is not meaningfully related to the training speech text set from the user's speech data How it works.

The method according to claim 10,
The server,
Set the domain as a first domain, determine a second domain having a utterance pattern similar to the first domain in the category, and set the generated utterance text based on the utterance pattern of the second domain How the electronic device operates.

The method according to claim 12,
The server,
A method of operation of an electronic device, configured to determine an intention of a training speech text included in the training speech text set and determine a domain in which an intent similar to that of the training speech text is used as the second domain.

The method according to claim 12,
The server,
The method of operation of the electronic device, configured to determine parameters included in the training speech text set and generate the modified speech text set using parameters of the second domain similar to the parameters.

The method according to claim 10,
The server,
When the number of training speech texts included in the training speech text set is smaller than a reference number of speeches, the electronic device is set to generate the modified speech text set.

The method according to claim 15,
The reference number of utterances is set differently for each domain.

In the operation method of the electronic device to communicate with the server,
Receiving a domain and a category;
Receiving a set of training speech texts corresponding to the domain and the category;
Transmitting the domain, the category, and the training speech text set to the server;
Receiving a modified speech text set corresponding to the training speech text set from the server; And
And displaying a plurality of second parameters corresponding to the first parameter included in the training speech text set based on the modified speech text set.

The method according to claim 17,
The modified speech text set is generated through a generation model or a transfer learning model based on user speech data previously stored in the server,
The generation model includes GAN (Generative Adversarial Networks), VAE (Variational Autoencoder) and DNN (Deep Neural Network),
The transfer learning model includes a style-transfer method of operating an electronic device.

The method according to claim 18,
The user utterance data is a method of operation of an electronic device in which the server converts and stores the voice data transmitted to the server by an external electronic device that receives a user utterance.

The method according to claim 17,
And when one of the plurality of second parameters is selected, displaying the modified speech text including the selected parameter.