KR20210087384A

KR20210087384A - The server, the client device, and the method for training the natural language model

Info

Publication number: KR20210087384A
Application number: KR1020200019989A
Authority: KR
Inventors: 양희정; 김광윤; 김성수
Original assignee: 삼성전자주식회사
Priority date: 2020-01-02
Filing date: 2020-02-18
Publication date: 2021-07-12

Abstract

Provided are a server and client device for learning a language model, and an operating method thereof. Provided, in one embodiment of the present disclosure, are the server and client device for learning a natural language understanding model and an operating method thereof, wherein the operating method: identifies a word or a phrase composed of an object name and the like for which a user finds difficulty in pronouncing accurately, or is difficult to pronounce correctly among an input text used for learning of a natural language understanding model; allows the user to be expected to utter the identified word or phrase; generates a text candidate used for learning the natural language understanding model by substituting the word or the phrase with a high phonetic similarity; and uses the generated text candidate.

Description

Server, client device, and operation method for learning natural language understanding model {THE SERVER, THE CLIENT DEVICE, AND THE METHOD FOR TRAINING THE NATURAL LANGUAGE MODEL}

본 개시는 자연어 이해 모델의 학습을 위한 서버, 클라이언트 디바이스, 및 그 동작 방법에 관한 것이다. The present disclosure relates to a server, a client device, and an operating method thereof for learning a natural language understanding model.

멀티미디어 기술 및 네트워크 기술이 발전함에 따라, 사용자는 디바이스를 이용하여 다양한 서비스를 제공 받을 수 있게 되었다. 특히, 음성 인식 기술이 발전함에 따라, 사용자는 디바이스에 음성(예를 들어, 발화)을 입력하고, 음성 서비스 제공 에이전트를 통해 음성 입력에 따른 응답 메시지를 수신할 수 있게 되었다. As multimedia technology and network technology develop, a user can receive various services using a device. In particular, as voice recognition technology develops, a user may input a voice (eg, utterance) into a device and receive a response message according to the voice input through a voice service providing agent.

사용자의 음성 입력에 포함된 의도를 파악할 때, 인공지능(Artificial Intelligence, AI) 기술이 활용될 수 있으며, 룰(Rule) 기반의 자연어 이해 기술(Natural Language Understanding, NLU)이 활용될 수도 있다. 디바이스 또는 서버가 사용자의 음성 입력을 수신하고, 수신된 음성 입력을 ASR(Automatic Speech Recognition)하는 경우, ASR 결과와 자연어 이해 모델에 입력된 텍스트 간의 불일치(mismatch)가 발생하여 정확한 발화 의도를 파악하지 못하는 문제점이 있다. 종래의 자연어 이해 기술에서는 예를 들어, 사용자가 부정확한 발음으로 발화하였거나, 개체명(Named Entity), 장소, 지명, 영화 제목, 또는 게임 명칭 등을 명확하게 알지 못하여 부정확하게 발화하는 경우 또는 사용자는 명확하게 발음하였으나 ASR 모델이 사용자가 발화한 개체명, 장소, 지명, 영화제목, 또는 게임 명칭에 대하여 학습되어 있지 않아 ASR의 출력 텍스트가 부정확한 경우, ASR의 출력 텍스트가 자연어 이해 모델을 통해 기 학습된 텍스트와 일치하지 않는 문제점이 있었다. 종래의 자연어 이해 기술은 ASR 출력 텍스트와 자연어 이해 모델의 텍스트가 일치되지 않는 문제점으로 인하여 사용자의 의도에 따른 적절한 응답을 제공하지 못할 수 있다. When understanding the intention included in the user's voice input, artificial intelligence (AI) technology may be utilized, and rule-based natural language understanding (NLU) technology may be utilized. When a device or server receives a user's speech input and performs Automatic Speech Recognition (ASR) on the received speech input, a mismatch occurs between the ASR result and the text input to the natural language understanding model, and the precise intention of the speech cannot be identified. There is a problem that I cannot. In the conventional natural language understanding technology, for example, when the user utters with inaccurate pronunciation, or when the user utters inaccurately because he or she does not clearly know the named entity, place, place name, movie title, or game name, etc. If the output text of ASR is inaccurate because it is clearly pronounced but the ASR model is not trained on the object name, place, place name, movie title, or game name uttered by the user, the output text of the ASR is written through the natural language understanding model. There was a problem that did not match the learned text. The conventional natural language understanding technology may not provide an appropriate response according to the user's intention due to a problem that the ASR output text and the text of the natural language understanding model do not match.

본 개시는 자연어 이해 모델의 학습에 이용되는 입력 텍스트 중 사용자가 부정확하게 발음하거나, 사용자가 정확하게 발음하기 어려운 개체명, ASR을 통해 텍스트로 변환될 경우 오류가 날 가능성이 높은 개체명 등으로 구성된 단어 또는 구를 식별하고, 식별된 단어 또는 구를 사용자가 발화할 것으로 예상되고, 발음적으로 유사도가 높은 단어 또는 구로 대체함으로써 자연어 이해 모델의 학습에 이용되는 후보 텍스트(text candidates)를 생성하고, 생성된 후보 텍스트를 이용하여 자연어 이해 모델을 학습하는 서버, 클라이언트 디바이스, 및 그 동작 방법을 제공하는 것을 목적으로 한다. The present disclosure discloses a word composed of an object name that a user pronounces incorrectly or is difficult for a user to pronounce correctly among input texts used for learning a natural language understanding model, an object name that is highly likely to cause an error when converted into text through ASR, etc. Or by identifying a phrase and replacing the identified word or phrase with a word or phrase that is expected to be uttered by the user and has a high phonological similarity to generate and generate text candidates used for learning the natural language understanding model An object of the present invention is to provide a server, a client device, and a method of operating the same for learning a natural language understanding model using the candidate text.

상술한 기술적 과제를 해결하기 위하여 본 개시의 일 실시예는, 서버가 텍스트를 이용하여 자연어 이해 모델을 학습하는 방법을 제공한다. 상기 방법은 클라이언트 디바이스로부터 사용자가 입력한 입력 텍스트를 수신하는 단계, 상기 입력 텍스트에 포함되는 적어도 하나의 단어들 중에서 대체가 필요한 대체 대상 텍스트를 식별하는 단계, 상기 식별된 대체 대상 텍스트에 대하여, 사용자가 발화할 것으로 예상되고, 발음적으로 유사도가 높은 대체 텍스트를 생성하는 단계, 상기 입력 텍스트 중 상기 대체 대상 텍스트를 상기 생성된 대체 텍스트로 대체(replacing)함으로써, 적어도 하나의 학습 후보 텍스트를 생성하는 단계, 상기 입력 텍스트 및 상기 적어도 하나의 학습 후보 텍스트를 학습 데이터로 이용하여, 자연어 이해 모델(Natural Language Understanding)을 학습하는(training) 단계를 포함한다. In order to solve the above-described technical problem, an embodiment of the present disclosure provides a method for a server to learn a natural language understanding model using text. The method may include receiving input text input by a user from a client device, identifying a replacement target text requiring replacement from among at least one word included in the input text, with respect to the identified replacement target text, the user generating an alternative text expected to be uttered and having a high phonological similarity; generating at least one learning candidate text by replacing the replacement target text among the input text with the generated replacement text and using the input text and the at least one learning candidate text as training data to train a natural language understanding model.

예를 들어, 상기 대체 대상 텍스트를 식별하는 단계는 상기 클라이언트 디바이스로부터 입력 텍스트 중 적어도 하나의 단어 또는 구(phrase)를 선택하는 사용자 입력을 수신하는 단계, 상기 입력 텍스트 중 상기 사용자 입력에 기초하여 선택된 적어도 하나의 단어 또는 구를 식별하는 단계, 및 식별된 상기 적어도 하나의 단어 또는 구를 상기 대체 대상 텍스트로 결정하는 단계를 포함할 수 있다. For example, the step of identifying the replacement target text may include receiving a user input for selecting at least one word or phrase from among the input text from the client device, and selecting from the client device based on the user input from the input text. It may include identifying at least one word or phrase, and determining the identified at least one word or phrase as the replacement target text.

예를 들어, 상기 대체 대상 텍스트를 식별하는 단계는 상기 수신된 입력 텍스트를 단어, 형태소, 및 구문 단위로 파싱(parse)하는 단계, 복수의 단어에 대한 발음열 정보 또는 임베딩 벡터에 관한 정보를 포함하는 사전 DB(Dictionary DB)에서, 파싱된 적어도 하나의 단어를 검색하는 단계, 상기 사전 DB의 검색 결과에 기초하여, 검색되지 않거나, 또는 사용 빈도가 기설정된 제1 임계치 보다 낮은 단어를 상기 대체 대상 텍스트로 결정하는 단계를 포함할 수 있다. For example, the step of identifying the replacement target text includes parsing the received input text into words, morphemes, and phrases, and information on pronunciation sequence information or embedding vectors for a plurality of words. retrieving at least one parsed word from a dictionary DB (Dictionary DB) that is not searched for, based on a search result of the dictionary DB, or a word whose frequency of use is lower than a preset first threshold as the replacement target It may include the step of determining with text.

예를 들어, 상기 대체 대상 텍스트를 식별하는 단계는 기학습된 자연어 이해 모델을 이용하여 상기 입력 텍스트를 해석함으로써, 상기 입력 텍스트가 분류되는 도메인(domain)을 검출하는 단계, 상기 기학습된 자연어 이해 모델을 이용하여 상기 입력 텍스트를 해석함으로써, 상기 입력 텍스트로부터 인텐트(intent)를 검출하는 단계, 상기 기학습된 자연어 이해 모델을 이용하여 상기 입력 텍스트를 해석함으로써, 상기 입력 텍스트 내의 슬롯(slot)을 식별하고, 슬롯 태깅(slot tagging)을 수행하는 단계, 및 상기 슬롯에 해당되는 텍스트를 상기 대체 대상 텍스트로 결정하는 단계를 포함할 수 있다. For example, the step of identifying the replacement target text may include detecting a domain into which the input text is classified by interpreting the input text using a pre-learned natural language understanding model, and the pre-learned natural language understanding. Detecting an intent from the input text by interpreting the input text using a model; interpreting the input text using the pre-learned natural language understanding model, thereby creating a slot in the input text and performing slot tagging, and determining a text corresponding to the slot as the replacement target text.

예를 들어, 상기 적어도 하나의 학습 후보 텍스트를 생성하는 단계는 상기 생성된 대체 텍스트를 상기 입력 텍스트로부터 식별된 상기 슬롯과 동일한 슬롯으로 결정하는 단계를 포함할 수 있다. For example, generating the at least one learning candidate text may include determining the generated replacement text to be the same slot as the slot identified from the input text.

예를 들어, 상기 대체 텍스트를 생성하는 단계는 상기 대체 대상 텍스트에 관한 발음 열(phoneme sequence)을 추출하는 단계, 발음 열의 연관성 정보(phonetic relevance)에 기초하여, 사전 DB에 기 저장된 단어 중 상기 추출된 발음 열과 유사한 텍스트를 검색하는 단계, 및 검색 결과에 기초하여, 상기 추출된 발음 열과의 유사도(similarity)가 높은 적어도 하나의 텍스트를 이용하여 상기 대체 텍스트를 생성하는 단계를 포함할 수 있다. For example, the generating of the replacement text includes extracting a phoneme sequence for the replacement target text, and extracting the phoneme sequence from the words pre-stored in the dictionary DB based on phonetic relevance of the pronunciation sequence. The method may include searching for text similar to the pronunciation string, and generating the alternative text using at least one text having a high similarity to the extracted pronunciation string based on the search result.

예를 들어, 상기 대체 텍스트를 생성하는 단계는 워드 임베딩 모델을 이용하여, 상기 대체 대상 텍스트를 임베딩 벡터(embedding vector)로 변환하는 단계, 신경망 모델(Neural Network model)을 이용하여, 상기 변환된 임베딩 벡터와 유사한 벡터값을 갖는 텍스트를 생성하는 단계, 및 상기 생성된 텍스트를 이용하여 상기 대체 텍스트를 생성하는 단계를 포함할 수 있다.For example, generating the replacement text includes converting the replacement target text into an embedding vector using a word embedding model, and using a neural network model to embed the converted text. It may include generating text having a vector value similar to a vector, and generating the replacement text by using the generated text.

예를 들어, 상기 대체 텍스트를 생성하는 단계는 TTS 모델(Text-to-Speech)을 이용하여, 상기 대체 대상 텍스트를 음향 신호(wave signal)로 변환하는 단계, 상기 변환된 음향 신호를 출력하는 단계, ASR 모델을 이용하여, 상기 출력된 음향 신호를 출력 텍스트로 변환하는 단계, 및 상기 변환된 출력 텍스트를 이용하여 상기 대체 대상 텍스트를 대체함으로써, 상기 대체 텍스트를 생성하는 단계를 포함할 수 있다. For example, the generating of the replacement text may include converting the replacement text into a wave signal using a Text-to-Speech model, and outputting the converted sound signal. , converting the output sound signal into output text using an ASR model, and generating the replacement text by replacing the replacement target text using the converted output text.

예를 들어, 상기 자연어 이해 모델을 학습하는 단계는 상기 입력 텍스트 및 상기 적어도 하나의 학습 후보 텍스트를 상기 클라이언트 디바이스에 전송하는 단계, 상기 클라이언트 디바이스로부터, 상기 입력 텍스트 및 상기 적어도 하나의 학습 후보 텍스트 중 사용자 입력에 의해 선택된 적어도 하나의 텍스트에 관한 식별 값을 수신하는 단계, 상기 수신된 식별 값에 기초하여, 상기 입력 텍스트 및 상기 적어도 하나의 학습 후보 텍스트 중 적어도 하나의 텍스트를 선택하는 단계, 및 상기 선택된 적어도 하나의 텍스트를 학습 데이터로 이용하여, 상기 자연어 이해 모델을 학습하는 단계를 포함할 수 있다. For example, the training of the natural language understanding model may include transmitting the input text and the at least one learning candidate text to the client device, from the client device, among the input text and the at least one learning candidate text. receiving an identification value related to at least one text selected by a user input, selecting at least one text of the input text and the at least one learning candidate text based on the received identification value, and the The method may include learning the natural language understanding model by using the selected at least one text as training data.

상술한 기술적 과제를 해결하기 위하여, 본 개시의 일 실시예는, 클라이언트 디바이스가 자연어 이해 모델을 학습하기 위한 애플리케이션(application)을 제공하는 방법을 제공한다. 상기 방법은 상기 클라이언트 디바이스의 디스플레이부 상에 상기 자연어 이해 모델 학습을 위한 입력 텍스트를 입력하는 사용자 입력을 수신하는 제1 그래픽 사용자 인터페이스(Graphic User Interface; GUI)를 디스플레이하는 단계, 상기 제1 GUI를 통해 입력받은 상기 입력 텍스트를 서버에 전송하는 단계, 상기 입력 텍스트로부터 식별된 적어도 하나의 대체 대상 텍스트 중 적어도 하나를 선택하는 사용자 입력을 수신하기 위한 제2 GUI를 디스플레이하는 단계, 상기 입력 텍스트 중 상기 제2 GUI를 통해 선택된 적어도 하나의 대체 대상 텍스트를 사용자가 발화할 것으로 예상되는 대체 텍스트로 대체함으로써 생성된 적어도 하나의 학습 후보 텍스트를 서버로부터 수신하는 단계, 및 상기 적어도 하나의 학습 후보 텍스트 중 적어도 하나를 선택하는 사용자 입력을 수신하기 위한 제3 GUI를 상기 디스플레이부 상에 디스플레이하는 단계를 포함할 수 있다. In order to solve the above-described technical problem, an embodiment of the present disclosure provides a method for providing an application (application) for a client device to learn a natural language understanding model. The method includes: displaying a first graphical user interface (GUI) for receiving a user input for inputting input text for learning the natural language understanding model on a display unit of the client device; transmitting the input text inputted through the server to a server; displaying a second GUI for receiving a user input for selecting at least one of at least one replacement target text identified from the input text; Receiving at least one training candidate text generated by replacing the at least one replacement target text selected through the second GUI with the replacement text expected to be uttered by the user from the server, and at least one of the at least one training candidate text The method may include displaying a third GUI for receiving a user input for selecting one on the display unit.

예를 들어, 상기 방법은 대체 텍스트를 생성하기 위하여 고려되는 사용자의 나이, 성별, 지역, 사용 언어, 및 사투리 중 적어도 하나를 포함하는 컨텍스트(context) 정보 중 적어도 하나를 선택하는 사용자 입력을 수신하기 위한 제4 GUI를 디스플레이하는 단계를 더 포함할 수 있다. For example, the method may include receiving a user input selecting at least one of context information including at least one of age, gender, region, spoken language, and dialect of a user being considered for generating the alt text; The method may further include displaying a fourth GUI for

상술한 기술적 과제를 해결하기 위하여 본 개시의 일 실시예는, 텍스트를 이용하여 자연어 이해 모델을 학습하는 서버(server)를 제공한다. 상기 서버는 클라이언트 디바이스와 데이터 통신을 수행하는 통신 인터페이스, 하나 이상의 명령어들(instructions)을 포함하는 프로그램을 저장하는 메모리, 및 상기 메모리에 저장된 프로그램의 하나 이상의 명령어들을 실행하는 프로세서를 포함하고, 상기 프로세서는 상기 통신 인터페이스를 통해 상기 클라이언트 디바이스로부터 사용자가 입력한 입력 텍스트를 수신하고, 상기 입력 텍스트에 포함되는 적어도 하나의 단어들 중에서 대체가 필요한 대체 대상 텍스트를 식별하고, 상기 식별된 대체 대상 텍스트에 대하여, 사용자가 발화할 것으로 예상되고, 발음적으로 유사도가 높은 대체 텍스트를 생성하고, 상기 입력 텍스트 중 상기 대체 대상 텍스트를 상기 생성된 대체 텍스트로 대체(replacing)함으로써, 적어도 하나의 학습 후보 텍스트를 생성하고, 상기 입력 텍스트 및 상기 적어도 하나의 학습 후보 텍스트를 학습 데이터로 이용하여, 자연어 이해 모델(Natural Language Understanding)을 학습(training)할 수 있다. In order to solve the above-described technical problem, an embodiment of the present disclosure provides a server for learning a natural language understanding model using text. The server includes a communication interface for performing data communication with a client device, a memory storing a program including one or more instructions, and a processor executing one or more instructions of the program stored in the memory, the processor receives the input text input by the user from the client device through the communication interface, identifies the replacement target text requiring replacement from among at least one word included in the input text, and relates to the identified replacement target text , generating at least one learning candidate text by generating an alternative text that is expected to be uttered by a user and having a high phonological similarity, and replacing the replacement target text among the input text with the generated replacement text and, by using the input text and the at least one learning candidate text as training data, a natural language understanding model may be trained.

예를 들어, 상기 프로세서는 상기 통신 인터페이스를 이용하여, 상기 클라이언트 디바이스로부터 상기 입력 텍스트 중 적어도 하나의 단어 또는 구(phrase)를 선택하는 사용자 입력을 수신하고, 상기 입력 텍스트 중 상기 사용자 입력에 기초하여 선택된 적어도 하나의 단어 또는 구를 식별하고, 식별된 상기 적어도 하나의 단어 또는 구를 상기 대체 대상 텍스트로 결정할 수 있다.For example, the processor receives a user input for selecting at least one word or phrase from the input text from the client device using the communication interface, and based on the user input from the input text The selected at least one word or phrase may be identified, and the identified at least one word or phrase may be determined as the replacement target text.

예를 들어, 상기 메모리는 복수의 단어에 대한 발음열 정보 또는 임베딩 벡터에 관한 정보를 포함하는 사전 DB(dictionary DB)를 저장하고, 상기 프로세서는 상기 수신된 입력 텍스트를 단어, 형태소, 및 구문 단위로 파싱(parse)하고, 상기 파싱된 적어도 하나의 단어를 상기 사전 DB에서 검색하고, 검색 결과에 기초하여 검색되지 않거나, 또는 사용 빈도가 기설정된 제1 임계치 보다 낮은 단어를 상기 대체 대상 텍스트로 결정할 수 있다.For example, the memory stores a dictionary DB (DB) including pronunciation sequence information for a plurality of words or information on embedding vectors, and the processor converts the received input text into words, morphemes, and syntax units. parse, search the at least one parsed word in the dictionary DB, and determine a word that is not searched based on a search result or has a frequency of use lower than a preset first threshold as the replacement target text can

예를 들어, 상기 메모리는 기 학습된 적어도 하나의 자연어 이해 모델을 저장하고, 상기 프로세서는 상기 메모리에 기 저장된 상기 적어도 하나의 자연어 이해 모델 중 어느 하나의 자연어 이해 모델을 이용하여 입력 텍스트를 해석함으로써, 상기 입력 텍스트가 분류되는 도메인(domain) 및 인텐트(intent)를 검출하고, 상기 기학습된 자연어 이해 모델을 이용하여 상기 입력 텍스트를 해석함으로써, 상기 입력 텍스트 내의 슬롯(slot)을 식별하고, 슬롯 태깅(slot tagging)을 수행하고, 상기 슬롯에 해당되는 텍스트를 상기 대체 대상 텍스트로 결정할 수 있다. For example, the memory stores at least one pre-trained natural language understanding model, and the processor interprets the input text by using any one natural language understanding model among the at least one natural language understanding model pre-stored in the memory. , by detecting a domain and an intent into which the input text is classified, and interpreting the input text using the pre-learned natural language understanding model, to identify a slot in the input text, Slot tagging may be performed, and the text corresponding to the slot may be determined as the replacement target text.

예를 들어, 상기 프로세서는 상기 대체 대상 텍스트에 관한 발음 열(phoneme sequence)을 추출하고, 발음 열의 연관성 정보(phonetic relevance)에 기초하여, 상기 메모리에 기 저장된 사전 DB에 포함된 단어들 중 상기 추출된 발음 열과 유사한 텍스트를 검색하고, 검색 결과에 기초하여, 상기 추출된 발음 열과의 유사도(similarity)가 높은 적어도 하나의 텍스트를 이용하여 상기 대체 텍스트를 생성할 수 있다. For example, the processor extracts a phoneme sequence related to the replacement target text, and based on phonetic relevance of the pronunciation sequence, extracts the extracted words from the dictionary DB stored in the memory. A text similar to the selected pronunciation string may be searched for, and the alternative text may be generated using at least one text having a high similarity to the extracted pronunciation string based on the search result.

예를 들어, 상기 메모리는 워드 임베딩 모델(word embedding model)을 저장하고, 상기 프로세서는 상기 워드 임베딩 모델을 이용하여, 상기 대체 대상 텍스트를 임베딩 벡터(embedding vector)로 변환하고, 신경망 모델(Neural Network model)을 이용하여, 상기 변환된 임베딩 벡터와 유사한 벡터값을 갖는 텍스트를 생성하고, 상기 생성된 텍스트를 이용하여 상기 대체 텍스트를 생성할 수 있다.For example, the memory stores a word embedding model, and the processor uses the word embedding model to convert the replacement target text into an embedding vector, and a neural network model (Neural Network). model), a text having a vector value similar to the converted embedding vector may be generated, and the replacement text may be generated using the generated text.

예를 들어, 상기 메모리는 텍스트를 음향 신호(wave signal)로 변환하는 TTS 모델(Text-to-Speech)을 저장하고, 상기 프로세서는 상기 TTS 모델을 이용하여, 상기 대체 대상 텍스트를 음향 신호로 변환하고, 상기 변환된 음향 신호를 출력하고, 상기 메모리에 기 저장된 ASR 모델을 이용하여, 상기 출력된 음향 신호를 출력 텍스트로 변환하고, 상기 변환된 출력 텍스트를 이용하여 상기 대체 대상 텍스트를 대체함으로써, 상기 대체 텍스트를 생성할 수 있다. For example, the memory stores a text-to-speech model for converting text into a wave signal, and the processor converts the replacement target text into a sound signal using the TTS model. and outputting the converted sound signal, using an ASR model pre-stored in the memory, converting the output sound signal into an output text, and replacing the replacement target text using the converted output text, The alternative text may be generated.

예를 들어, 상기 프로세서는 상기 통신 인터페이스를 이용하여 상기 입력 텍스트 및 상기 적어도 하나의 학습 후보 텍스트를 상기 클라이언트 디바이스에 전송하고, 상기 클라이언트 디바이스로부터, 상기 입력 텍스트 및 상기 적어도 하나의 학습 후보 텍스트 중 사용자 입력에 의해 선택된 적어도 하나의 텍스트에 관한 식별 값을 수신하고, 상기 수신된 식별 값에 기초하여, 상기 입력 텍스트 및 상기 적어도 하나의 학습 후보 텍스트 중 적어도 하나의 텍스트를 선택하고, 상기 선택된 적어도 하나의 텍스트를 학습 데이터로 이용하여, 상기 자연어 이해 모델을 학습할 수 있다. For example, the processor transmits the input text and the at least one learning candidate text to the client device using the communication interface, and from the client device, a user of the input text and the at least one learning candidate text. Receive an identification value for at least one text selected by an input, and select at least one of the input text and the at least one learning candidate text based on the received identification value, and select the at least one selected text The natural language understanding model may be learned by using the text as training data.

상술한 기술적 과제를 해결하기 위하여, 본 개시의 일 실시예는, 자연어 이해 모델을 학습하기 위한 애플리케이션을 제공하는 클라이언트 디바이스를 제공한다. 상기 클라이언트 디바이스는 디스플레이부, 입력 텍스트를 입력하는 사용자 입력을 수신하도록 구성되는 사용자 입력부, 서버와 데이터 통신을 수행하는 통신 인터페이스, 하나 이상의 명령어들(instructions)을 포함하는 프로그램을 저장하는 메모리, 및 상기 메모리에 저장된 프로그램의 하나 이상의 명령어들을 실행하는 프로세서를 포함하고, 상기 프로세서는 상기 입력 텍스트를 입력하는 사용자 입력을 수신하기 위한 제1 그래픽 사용자 인터페이스(Graphic User Interface; GUI)를 디스플레이하도록 상기 디스플레이부를 제어하고, 상기 입력 텍스트로부터 식별된 적어도 하나의 대체 대상 텍스트 중 적어도 하나를 선택하는 사용자 입력을 수신하기 위한 제2 GUI를 디스플레이하도록 상기 디스플레이부를 제어하고, 상기 입력 텍스트 중 상기 제2 GUI를 통해 선택된 적어도 하나의 대체 대상 텍스트를 사용자가 발화할 것으로 예상되는 대체 텍스트로 대체함으로써 생성된 적어도 하나의 학습 후보 텍스트를 서버로부터 수신하도록 상기 통신 인터페이스를 제어하고, 상기 적어도 하나의 학습 후보 텍스트 중 적어도 하나를 선택하는 사용자 입력을 수신하기 위한 제3 GUI를 디스플레이하도록 상기 디스플레이부를 제어할 수 있다. In order to solve the above technical problem, an embodiment of the present disclosure provides a client device that provides an application for learning a natural language understanding model. The client device includes a display unit, a user input unit configured to receive a user input for inputting input text, a communication interface for performing data communication with a server, a memory storing a program including one or more instructions, and the a processor for executing one or more instructions of a program stored in a memory, wherein the processor controls the display unit to display a first graphical user interface (GUI) for receiving a user input for inputting the input text and control the display unit to display a second GUI for receiving a user input for selecting at least one of the at least one replacement target text identified from the input text, and at least one of the input text selected through the second GUI controlling the communication interface to receive, from a server, at least one learning candidate text generated by replacing one replacement target text with an alternative text expected to be uttered by the user, and selecting at least one of the at least one training candidate text The display unit may be controlled to display a third GUI for receiving a user input.

예를 들어, 상기 프로세서는 상기 대체 텍스트를 생성하기 위하여 고려되는 사용자의 나이, 성별, 지역, 사용 언어, 및 사투리 중 적어도 하나를 포함하는 컨텍스트(context) 정보 중 적어도 하나를 선택하는 사용자 입력을 수신하기 위한 제4 GUI를 디스플레이하도록 상기 디스플레이부를 제어할 수 있다. For example, the processor receives a user input for selecting at least one of context information including at least one of age, gender, region, spoken language, and dialect of a user to be considered for generating the alternative text. The display unit may be controlled to display a fourth GUI for

상술한 기술적 과제를 해결하기 위하여, 본 개시의 다른 실시예는 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다. In order to solve the above-described technical problem, another embodiment of the present disclosure provides a computer-readable recording medium in which a program for execution in a computer is recorded.

도 1은 본 개시의 일 실시예에 따른 서버 및 클라이언트 디바이스가 언어 모델을 학습하기 위한 동작들을 도시한 개념도이다.
도 2는 본 개시의 일 실시예에 따른 서버의 구성을 도시한 블록도이다.
도 3은 본 개시의 일 실시예에 따른 클라이언트 디바이스의 구성을 도시한 블록도이다.
도 4는 본 개시의 서버가 자연어 이해 모델을 학습하는 실시예를 도시한 흐름도이다.
도 5는 본 개시의 서버가 사용자 입력에 기초하여 대체 대상 텍스트를 결정하는 실시예에 관한 흐름도이다.
도 6은 본 개시의 서버가 사전 DB의 검색 결과에 기초하여 대체 대상 텍스트를 식별하는 실시예에 관한 흐름도이다.
도 7은 본 개시의 서버가 자연어 이해 모델을 이용하여 자동으로 대체 대상 텍스트를 식별하는 실시예에 관한 흐름도이다.
도 8은 본 개시의 서버가 대체 대상 텍스트로부터 대체 텍스트를 생성하는 실시예를 도시한 도면이다.
도 9는 본 개시의 서버가 대체 대상 텍스트로부터 대체 텍스트를 생성하는 실시예에 관한 흐름도이다.
도 10은 본 개시의 서버가 신경망 모델을 이용하여 대체 대상 텍스트로부터 대체 텍스트를 생성하는 실시예를 도시한 도면이다.
도 11은 본 개시의 서버가 신경망 모델을 이용하여 대체 텍스트를 생성하는 실시예에 관한 흐름도이다.
도 12는 본 개시의 서버가 대체 대상 텍스트로부터 대체 텍스트를 생성하는 실시예를 도시한 도면이다.
도 13은 본 개시의 서버가 대체 대상 텍스트로부터 대체 텍스트를 생성하는 실시예에 관한 흐름도이다.
도 14는 본 개시의 서버가 자연어 이해 모델을 이용하여 입력 텍스트 및 적어도 하나의 학습 후보 텍스트의 해석하고, 해석 결과에 관한 정보를 생성하는 방법에 관한 흐름도이다.
도 15는 본 개시의 서버가 자연어 이해 모델에 학습 데이터로서 입력되는 텍스트를 결정하는 실시예에 관한 흐름도이다.
도 16은 본 개시의 서버가 입력 텍스트 및 적어도 하나의 학습 후보 텍스트의 개수에 기초하여, 자연어 이해 모델을 학습하는 실시예에 관한 흐름도이다.
도 17은 본 개시의 클라이언트 디바이스의 동작을 도시한 흐름도이다.
도 18은 본 개시의 클라이언트 디바이스가 디스플레이하는 GUI의 일 예시를 도시한 도면이다.
도 19a는 본 개시의 클라이언트 디바이스가 디스플레이하는 GUI의 일 예시를 도시한 도면이다.
도 19b는 본 개시의 클라이언트 디바이스가 디스플레이하는 GUI의 일 예시를 도시한 도면이다.
도 20은 본 개시의 클라이언트 디바이스가 디스플레이하는 GUI의 일 예시를 도시한 도면이다.
도 21은 본 개시의 클라이언트 디바이스가 디스플레이하는 GUI의 일 예시를 도시한 도면이다.
도 22는 본 개시의 일 실시예에 따른 언어 이해 서비스의 구성을 도시한 블록도이다. 1 is a conceptual diagram illustrating operations for a server and a client device to learn a language model according to an embodiment of the present disclosure.
2 is a block diagram illustrating a configuration of a server according to an embodiment of the present disclosure.
3 is a block diagram illustrating a configuration of a client device according to an embodiment of the present disclosure.
4 is a flowchart illustrating an embodiment in which a server of the present disclosure learns a natural language understanding model.
5 is a flowchart of an embodiment in which a server of the present disclosure determines an alternative target text based on a user input.
6 is a flowchart of an embodiment in which a server of the present disclosure identifies a text to be replaced based on a search result of a dictionary DB.
7 is a flowchart of an embodiment in which a server of the present disclosure automatically identifies an alternative target text using a natural language understanding model.
8 is a diagram illustrating an embodiment in which a server of the present disclosure generates an alternative text from an alternative text.
9 is a flowchart of an embodiment in which a server of the present disclosure generates an alternative text from an alternative text.
10 is a diagram illustrating an embodiment in which a server of the present disclosure generates an alt text from an alt text by using a neural network model.
11 is a flowchart of an embodiment in which a server of the present disclosure generates an alternative text using a neural network model.
12 is a diagram illustrating an embodiment in which a server of the present disclosure generates an alternative text from an alternative text.
13 is a flowchart of an embodiment in which a server of the present disclosure generates an alternative text from an alternative text.
14 is a flowchart illustrating a method in which a server of the present disclosure interprets an input text and at least one learning candidate text using a natural language understanding model, and generates information about an interpretation result.
15 is a flowchart of an embodiment in which a server of the present disclosure determines text input as training data to a natural language understanding model.
16 is a flowchart of an embodiment in which a server of the present disclosure learns a natural language understanding model based on the number of input texts and at least one learning candidate text.
17 is a flowchart illustrating an operation of a client device of the present disclosure.
18 is a diagram illustrating an example of a GUI displayed by a client device of the present disclosure.
19A is a diagram illustrating an example of a GUI displayed by a client device of the present disclosure.
19B is a diagram illustrating an example of a GUI displayed by a client device of the present disclosure.
20 is a diagram illustrating an example of a GUI displayed by a client device of the present disclosure.
21 is a diagram illustrating an example of a GUI displayed by a client device of the present disclosure.
22 is a block diagram illustrating a configuration of a language understanding service according to an embodiment of the present disclosure.

본 명세서의 실시예들에서 사용되는 용어는 본 개시의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 실시예의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 명세서에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다. Terms used in the embodiments of the present specification have been selected as currently widely used general terms as possible while considering the functions of the present disclosure, which may vary depending on the intention or precedent of a person skilled in the art, the emergence of new technology, etc. . In addition, in specific cases, there are also terms arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the corresponding embodiment. Therefore, the terms used in the present specification should be defined based on the meaning of the term and the contents of the present disclosure, rather than the name of a simple term.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 용어들은 본 명세서에 기재된 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가질 수 있다. The singular expression may include the plural expression unless the context clearly dictates otherwise. Terms used herein, including technical or scientific terms, may have the same meanings as commonly understood by one of ordinary skill in the art described herein.

본 개시 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 본 명세서에 기재된 "...부", "...모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.In the present disclosure, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated. In addition, terms such as "... unit", "... module", etc. described in this specification mean a unit that processes at least one function or operation, which is implemented as hardware or software, or is a combination of hardware and software. can be implemented.

본 명세서에서 사용된 표현 "~하도록 구성된(또는 설정된)(configured to)"은 상황에 따라, 예를 들면, "~에 적합한(suitable for)", "~하는 능력을 가지는(having the capacity to)", "~하도록 설계된(designed to)", "~하도록 변경된(adapted to)", "~하도록 만들어진(made to)", 또는 "~를 할 수 있는(capable of)"과 바꾸어 사용될 수 있다. 용어 "~하도록 구성된(또는 설정된)"은 하드웨어적으로 "특별히 설계된(specifically designed to)" 것만을 반드시 의미하지 않을 수 있다. 대신, 어떤 상황에서는, "~하도록 구성된 시스템"이라는 표현은, 그 시스템이 다른 장치 또는 부품들과 함께 "~할 수 있는" 것을 의미할 수 있다. 예를 들면, 문구 "A, B, 및 C를 수행하도록 구성된(또는 설정된) 프로세서"는 해당 동작을 수행하기 위한 전용 프로세서(예: 임베디드 프로세서), 또는 메모리에 저장된 하나 이상의 소프트웨어 프로그램들을 실행함으로써, 해당 동작들을 수행할 수 있는 범용 프로세서(generic-purpose processor)(예: CPU 또는 application processor)를 의미할 수 있다.As used herein, the expression “configured to (or configured to)” depends on the context, for example, “suitable for”, “having the capacity to” It can be used interchangeably with "," "designed to", "adapted to", "made to", or "capable of". The term “configured (or configured to)” may not necessarily mean only “specifically designed to” in hardware. Instead, in some circumstances, the expression “a system configured to” may mean that the system is “capable of” with other devices or components. For example, the phrase "a processor configured (or configured to perform) A, B, and C" refers to a dedicated processor (eg, an embedded processor) for performing the operations, or by executing one or more software programs stored in memory; It may refer to a generic-purpose processor (eg, a CPU or an application processor) capable of performing corresponding operations.

본 개시에서 '자연어 이해 모델(Natural Language Understanding, NLU)'은 음성 신호로부터 변환된 텍스트를 해석하여, 텍스트에 대응하는 도메인(domain) 및 인텐트(intent)를 획득하도록 학습된 모델이다. 자연어 이해 모델은 텍스트를 해석함으로써, 도메인 및 인텐트 뿐만 아니라 슬롯(slot)에 관한 정보를 식별할 수 있다. In the present disclosure, a 'Natural Language Understanding (NLU)' is a model trained to interpret text converted from a voice signal to obtain a domain and an intent corresponding to the text. By interpreting the text, the natural language understanding model can identify information about a slot as well as a domain and an intent.

본 개시에서 '자연어 이해 모델'은 '언어 모델', '언어 이해 모델', '자연어 이해 모델', '자연어 처리 모델', '언어 처리 모델', 및 '언어 이해 기술 모델' 중 적어도 하나의 표현으로 대체되어 이용될 수 있다. In the present disclosure, a 'Natural Language Understanding Model' is an expression of at least one of a 'Language Model', a 'Language Understanding Model', a 'Natural Language Understanding Model', a 'Natural Language Processing Model', a 'Language Processing Model', and a 'Language Understanding Technology Model'. can be used instead of

본 개시에서 '도메인(domain)'은 텍스트를 해석하여 식별되는 사용자의 의도와 관련되는 카테고리 또는 영역이다. 도메인은 자연어 이해 모델을 이용하여 텍스트를 해석함으로써 검출될 수 있다. 도메인은 자연어 이해 모델을 이용하여 텍스트로부터 검출되는 인텐트와 관련될 수 있다. 일 실시예에서, 도메인은 텍스트와 관련된 서비스에 따라 분류될 수도 있다. 도메인은 텍스트가 해당되는 카테고리 예를 들어, 영화 도메인, 음악 도메인, 책 도메인, 게임 도메인, 항공 도메인, 음식 도메인 등 하나 이상의 영역을 포함할 수 있다. In the present disclosure, a 'domain' is a category or area related to a user's intention identified by analyzing text. The domain can be detected by interpreting the text using a natural language understanding model. A domain may be associated with an intent detected from text using a natural language understanding model. In one embodiment, domains may be classified according to text-related services. The domain may include one or more areas, such as a category to which text corresponds, for example, a movie domain, a music domain, a book domain, a game domain, an aviation domain, and a food domain.

본 개시에서 '인텐트(intent)'는 텍스트를 해석하여 판단되는 사용자의 의도를 나타내는 정보이다. 인텐트는 사용자의 발화 의도를 나타내는 정보로서, 사용자가 디바이스를 이용하여 실행하고자 하는 동작 또는 기능을 나타내는 정보를 포함할 수 있다. 인텐트는 자연어 이해 모델(Natural Language Understanding, NLU) 모델을 이용하여 텍스트를 해석함으로써 결정될 수 있다. 예를 들어, 사용자의 음성 입력으로부터 변환된 텍스트가 "영화 어벤저스 엔드게임 개봉일 검색해줘" 인 경우, 도메인은 '영화'이고, 인텐트는 '영화 검색'일 수 있다. 인텐트는 인텐트 액션(intent action) 및 인텐트 객체(intent object)를 포함할 수 있다. In the present disclosure, an 'intent' is information indicating a user's intention determined by analyzing text. The intent is information indicating the user's intention to speak, and may include information indicating an action or function that the user intends to execute using the device. The intent may be determined by interpreting the text using a Natural Language Understanding (NLU) model. For example, if the text converted from the user's voice input is "Search for the release date of the movie Avengers: Endgame", the domain may be 'movie' and the intent may be 'movie search'. The intent may include an intent action and an intent object.

인텐트는, 사용자의 발화 의도를 나타내는 정보(이하, 의도 정보)뿐 아니라, 사용자의 의도를 나타내는 정보에 대응하는 수치값을 포함할 수 있다. 수치값은, 텍스트가 특정 의도를 나타내는 정보와 관련될 확률을 나타낼 수 있다. 자연어 이해 모델을 이용하여 텍스트를 해석한 결과, 사용자의 의도를 나타내는 정보가 복수개 획득되는 경우, 각 의도 정보에 대응되는 수치값이 최대인 의도 정보가 인텐트로 결정될 수 있다. The intent may include not only information indicating the user's utterance intention (hereinafter, intention information), but also a numerical value corresponding to information indicating the user's intention. The numerical value may indicate a probability that the text is associated with information indicating a particular intention. When a plurality of pieces of information indicating the user's intention are obtained as a result of analyzing the text using the natural language understanding model, intention information having a maximum numerical value corresponding to each intention information may be determined as an intent.

본 개시에서 '슬롯(slot)'은 인텐트와 관련된 세부 정보들을 획득하거나, 세부 동작을 결정하기 위한 변수(variable) 정보를 의미한다. 슬롯은 인텐트와 관련된 정보이며, 하나의 인텐트에 복수 종류의 슬롯이 대응될 수 있다. 슬롯은 텍스트가 그 변수 정보와 관련될 확률을 나타내는 수치값을 포함할 수 있다. 일 실시예에서, 자연어 이해 모델을 이용하여 텍스트를 해석한 결과, 슬롯을 나타내는 변수 정보가 복수 개 획득될 수 있다. 이 경우, 각 변수 정보에 대응되는 수치값이 최대인 변수 정보가 슬롯으로 결정될 수 있다. 예를 들어, 텍스트가 "영화 어벤저스 엔드게임 개봉일 검색해줘" 인 경우, 텍스트로부터 획득된 슬롯은 '영화 제목(어벤저스 엔드게임)' 및 '영화 개봉일'일 수 있다. In the present disclosure, a 'slot' means variable information for obtaining detailed information related to an intent or determining a detailed operation. A slot is information related to an intent, and a plurality of types of slots may correspond to one intent. A slot may include a numerical value indicating a probability that the text is associated with the variable information. In an embodiment, as a result of analyzing the text using the natural language understanding model, a plurality of variable information indicating slots may be obtained. In this case, variable information having a maximum numerical value corresponding to each variable information may be determined as a slot. For example, when the text is "Search for the release date of the movie Avengers Endgame", the slots obtained from the text may be 'Movie Title (Avengers Endgame)' and 'Film Release Date'.

도메인, 인텐트 및 슬롯은 자연어 이해 모델을 이용하여 자동으로 식별 또는 검출할 수 있지만, 이에 한정되는 것은 아니다. 본 개시의 일 실시예에서, 도메인, 인텐트, 및 슬롯은 클라이언트 디바이스를 통해 입력되는 사용자 입력에 의해 수동으로 각각 지정되거나, 결정될 수 있다. Domains, intents, and slots may be automatically identified or detected using a natural language understanding model, but is not limited thereto. In an embodiment of the present disclosure, the domain, the intent, and the slot may be manually designated or determined by a user input input through a client device, respectively.

도 1은 본 개시의 일 실시예에 따른 서버(1000) 및 클라이언트 디바이스(2000)가 언어 모델을 학습하기 위한 동작들을 도시한 개념도이다. 서버(1000) 및 클라이언트 디바이스(2000)는 자연어 이해 모델을 학습하기 위한 동작들을 수행할 수 있다.1 is a conceptual diagram illustrating operations for learning a language model by a server 1000 and a client device 2000 according to an embodiment of the present disclosure. The server 1000 and the client device 2000 may perform operations for learning the natural language understanding model.

도 1을 참조하면, 서버(1000)는 클라이언트 디바이스(2000)와 유선 또는 무선 통신 방식을 통해 데이터를 송수신할 수 있다. Referring to FIG. 1 , a server 1000 may transmit/receive data to/from a client device 2000 through a wired or wireless communication method.

클라이언트 디바이스(2000)는 사용자로부터 자연어 이해 모델 학습을 위한 개발자용 소프트웨어 툴(tool)을 제공하는 컴퓨팅 장치일 수 있다. 클라이언트 디바이스(2000)는 예를 들어, 스마트 폰, 태블릿 PC, PC, 랩톱, 스마트 TV, 휴대폰, PDA(personal digital assistant), 미디어 플레이어, GPS(global positioning system) 장치, 전자책 단말기, 디지털방송용 단말기, 네비게이션, 키오스크, MP3 플레이어, 디지털 카메라, 가전기기 및 기타 모바일 또는 비모바일 컴퓨팅 장치일 수 있으나, 이에 한정되지 않는다.The client device 2000 may be a computing device that provides a developer software tool for learning a natural language understanding model from a user. The client device 2000 is, for example, a smart phone, a tablet PC, a PC, a laptop, a smart TV, a mobile phone, a personal digital assistant (PDA), a media player, a global positioning system (GPS) device, an e-book terminal, or a digital broadcasting terminal. , navigation, kiosks, MP3 players, digital cameras, home appliances and other mobile or non-mobile computing devices.

클라이언트 디바이스(2000)가 제공하는 개발자용 소프트웨어 툴은 애플리케이션을 통해 입력 텍스트를 수신 받고, 서버(1000)와의 상호 작용을 통해 입력 텍스트에 대하여, 서버에 의해 생성된 적어도 하나의 학습 후보 텍스트를 디스플레이부(2510) 상에 디스플레이할 수 있다. 적어도 하나의 학습 후보 텍스트는, 입력 텍스트의 단어(word) 또는 구(phrase)를 음성 비서 기능을 사용하는 사용자가 발음할 것으로 예측되는 유사 발음, 또는 유사한 발음 열을 갖는 단어 또는 구로 대체함으로써, 서버에 의해 생성되는 텍스트이다. 적어도 하나의 학습 후보 텍스트는 자연어 이해 모델(1320)을 학습하는데 이용되는 입력 데이터일 수 있다. 자연어 이해 모델(1320)은 특정 도메인에 따라 기 학습된 모델(Pre-trained NLU model)일 수 있으나, 이에 한정되지 않는다. The software tool for developers provided by the client device 2000 receives input text through an application, and displays at least one learning candidate text generated by the server with respect to the input text through interaction with the server 1000 . may be displayed on 2510 . The at least one learning candidate text is obtained by replacing a word or phrase in the input text with a word or phrase having a similar pronunciation, or a similar pronunciation string, that a user using the voice assistant function is expected to pronounce, thereby generating a server text generated by The at least one learning candidate text may be input data used to train the natural language understanding model 1320 . The natural language understanding model 1320 may be a pre-trained NLU model according to a specific domain, but is not limited thereto.

서버(1000)는 클라이언트 디바이스(2000)로부터 입력 텍스트를 수신받고, 수신된 입력 텍스트에 관한 적어도 하나의 학습 후보 텍스트를 생성하고, 입력 텍스트 및 생성된 적어도 하나의 학습 후보 텍스트를 학습 데이터로 입력하여 자연어 이해 모델에 관한 학습(training)을 수행할 수 있다. 서버(1000)는 입력 텍스트 및 적어도 하나의 학습 후보 텍스트를 학습 데이터로 이용하여, 기 학습된 자연어 이해 모델(1320)을 갱신(update)하거나, 또는 신규 자연어 이해 모델을 생성할 수 있다. 일 실시예에서, 서버(1000)는 ASR 모델(1310), 자연어 이해 모델(1320), TTS 모델(1330), 대체 대상 텍스트 식별 모듈(1340), 대체 텍스트 생성 모듈(1350), 및 학습 후보 텍스트 생성 모듈(1360)을 포함할 수 있다. The server 1000 receives the input text from the client device 2000, generates at least one learning candidate text related to the received input text, and inputs the input text and the generated at least one learning candidate text as training data. Training on the natural language understanding model may be performed. The server 1000 may update the previously learned natural language understanding model 1320 or generate a new natural language understanding model by using the input text and at least one learning candidate text as training data. In one embodiment, the server 1000 includes an ASR model 1310 , a natural language understanding model 1320 , a TTS model 1330 , an alternative target text identification module 1340 , an alternative text generation module 1350 , and a learning candidate text. A generation module 1360 may be included.

이하에서는, 서버(1000) 및 클라이언트 디바이스(2000) 간의 데이터 송수신 및 동작을 설명한다. Hereinafter, data transmission/reception and operation between the server 1000 and the client device 2000 will be described.

단계 S110에서, 클라이언트 디바이스(2000)는 사용자로부터 입력 텍스트를 입력 받고, 입력 텍스트를 서버(1000)에 전송한다. 일 실시예에서, 클라이언트 디바이스(2000)는 디스플레이부(2510) 상에 입력 텍스트를 입력받기 위한 제1 그래픽 사용자 인터페이스(Graphic User Interface; GUI)(2010)를 디스플레이하고, 제1 GUI(2010)를 통해 입력 텍스트를 수신할 수 있다.In step S110 , the client device 2000 receives input text from the user and transmits the input text to the server 1000 . In an embodiment, the client device 2000 displays a first graphical user interface (GUI) 2010 for receiving input text on the display unit 2510 , and displays the first GUI 2010 . to receive input text.

단계 S120에서, 서버(1000)는 대체 텍스트를 생성하고, 대체 텍스트를 이용하여 입력 텍스트를 재구성함으로써, 생성된 적어도 하나의 학습 후보 텍스트를 클라이언트 디바이스(2000)에 전송한다. 일 실시예에서, 서버(1000)는 대체 대상 텍스트 식별 모듈(1340)의 데이터 및 프로그램 코드를 이용하여, 입력 텍스트 중에서 대체가 필요한 대체 대상 텍스트를 식별할 수 있다. 일 실시예에서, 서버(1000)는 대체 텍스트 생성 모듈(1350)의 데이터 및 프로그램 코드를 이용하여, 식별된 대체 대상 텍스트에 대하여 사용자가 발화할 것으로 예상되고, 발음적으로 유사도가 높은 대체 텍스트를 생성할 수 있다. 일 실시예에서, 서버(1000)는 학습 후보 텍스트 생성 모듈(1360)의 데이터 및 프로그램 코드를 이용하여, 입력 텍스트 중에서 식별된 대체 대상 텍스트를 대체 텍스트로 대체(replacing)함으로써, 자연어 이해 모델(1320)의 학습에 입력되는 학습 후보 텍스트를 생성할 수 있다. In step S120 , the server 1000 generates an alternative text and reconstructs the input text using the alternative text, thereby transmitting the generated at least one learning candidate text to the client device 2000 . In an embodiment, the server 1000 may identify the replacement target text requiring replacement from among the input text by using the data and the program code of the replacement target text identification module 1340 . In an embodiment, the server 1000 generates an alternative text that is expected to be uttered by the user with respect to the identified alternative text and has a high phonological similarity by using the data and the program code of the alternative text generating module 1350 . can create In an embodiment, the server 1000 uses the data and program code of the training candidate text generation module 1360 to replace the replacement target text identified in the input text with the replacement text, thereby generating the natural language understanding model 1320 . ) can generate a learning candidate text input to the learning of.

예를 들어, 사용자가 클라이언트 디바이스(2000)를 통해 입력한 입력 텍스트가 "페퍼로니 피자 3판 언주로 30길로 배달해줘"인 경우, 서버(1000)는 '페퍼로니'를 대체 대상 텍스트로 식별하고, '페퍼로니'에 대하여 사용자가 발음할 것으로 예측되고, 발음 열이 유사한 '페파로니', '파페로니', '페포로니' 등을 대체 텍스트로 생성하며, 생성된 대체 텍스트로 대체 대상 텍스트를 대체함으로서, "페파로니 피자 3판 언주로 30길로 배달해줘", "파페로니 피자 3판 언주로 30길로 배달해줘", "페포로니 피자 3판 언주로 30길로 배달해줘"의 학습 후보 텍스트를 생성할 수 있다. For example, if the input text input by the user through the client device 2000 is "Deliver 3 pepperoni pizzas to 30 gil in Unjuro", the server 1000 identifies 'pepperoni' as the replacement target text, and ' 'Pepperoni', 'Pepperoni', 'Pepperoni', etc., which are predicted to be pronounced by the user with respect to 'Pepperoni' and have similar pronunciation columns, are generated as alternative texts, , "deliver 3 slices of pepperoni pizza by 30 gil", "deliver 3 slices of pepperoni pizza by 30 gil", "deliver 3 slices of pepperoni pizza by 30 gil". can

도 1에 도시된 실시예에서, 입력 텍스트로부터 '페퍼로니'만 대체 대상 텍스트로 식별된 것으로 설명되었으나, 이에 한정되는 것은 아니다. 일 실시예에서, 서버(1000)는 대체 대상 텍스트 식별 모듈(1340)의 데이터 및 프로그램 코드를 이용하여, 입력 텍스트로부터 복수의 대체 대상 텍스트를 식별할 수 있다. 예를 들어, 서버(1000)는 "페퍼로니 피자 3판 언주로 30길로 배달해줘"의 입력 텍스트로부터 주소를 나타내는 슬롯인 '언주로 30길'을 대체 대상 텍스트로 식별할 수도 있다. '언주로 30길'이 대체 대상 텍스트로 식별된 경우, 서버(1000)는 대체 텍스트 생성 모듈(1350)의 데이터 및 프로그램 코드를 이용하여, '언주로 30길'에 대하여 사용자가 발음할 것으로 예측되고, 발음 열이 유사한 '안주로 30길', '운주로 30길', '언조로 30길' 등을 대체 텍스트로 생성하고, 생성된 대체 텍스트로 대체 대상 텍스트를 대체함으로서, "페퍼로니 피자 3판 안주로 30길로 배달해줘", "페퍼로니 피자 3판 운주로 30길로 배달해줘", "페퍼로니 피자 3판 언조로 30길로 배달해줘"의 학습 후보 텍스트를 생성할 수 있다. In the embodiment illustrated in FIG. 1 , it has been described that only 'pepperoni' is identified as the replacement target text from the input text, but is not limited thereto. In an embodiment, the server 1000 may identify a plurality of replacement target texts from the input text using data and program codes of the replacement target text identification module 1340 . For example, the server 1000 may identify 'Eonjuro 30gil', which is a slot indicating an address, as the replacement target text from the input text of "3 pieces of pepperoni pizza delivered to Eonjuro 30gil". When 'Eonju-ro 30-gil' is identified as the replacement target text, the server 1000 predicts that the user will pronounce 'Eonju-ro 30-gil' by using the data and program code of the alternative text generation module 1350 . By creating alternative texts such as 'Anju-ro 30-gil', 'Unju-ro 30-gil', 'Eonjo-ro 30-gil', etc. with similar pronunciation columns, and replacing the replacement target text with the generated alternative text, "Pepperoni Pizza 3 You can create learning candidate texts such as "Deliver 3 slices of pepperoni pizza to 30 gil with a side dish", "Deliver 3 slices of pepperoni pizza to 30 gil with Unjuro", and "Deliver 3 slices of pepperoni pizza to 30 gil with Eonjo".

복수의 대체 대상 텍스트가 식별되는 경우, 서버(1000)는 식별된 복수의 대체 대상 텍스트 각각에 대하여 대체 텍스트를 생성할 수 있다. 예를 들어, 피자 토핑의 종류를 나타내는 슬롯인 '페퍼로니'와 주소를 나타내는 슬롯인 '언주로 30길'이 모두 대체 대상 텍스트로 식별된 경우, 서버(1000)는 대체 텍스트 생성 모듈(1350)을 이용하여 '페퍼로니'와 '언주로 30길' 각각에 관한 대체 텍스트를 생성할 수 있다. 전술한 예시와 같이, 서버(1000)는 '페퍼로니'에 대해서는 '페파로니', '파페로니', '페포로니'를, '언주로 30길'에 대해서는 '안주로 30길', '운주로 30길', '언조로 30길'을 대체 텍스트로 생성하고, 생성된 대체 텍스트를 조합함으로써, 입력 텍스트를 대체하는 학습 후보 텍스트를 생성할 수 있다. When a plurality of replacement target texts are identified, the server 1000 may generate an alternative text for each of the identified plurality of replacement target texts. For example, when 'pepperoni', a slot indicating the type of pizza topping, and 'Eonjuro 30-gil', a slot indicating an address, are both identified as replacement target text, the server 1000 generates an alternative text generating module 1350. can be used to generate alternative texts for each of 'Pepperoni' and 'Eonju-ro 30-gil'. As in the above example, the server 1000 selects 'pepperoni', 'paperoni', and 'peporoni' for 'pepperoni', and 'anju-ro 30-gil' and 'unju-ro 30-gil' for 'Eonju-ro 30-gil'. By generating 'Ro 30-gil' and 'Eonjo-ro 30-gil' as alternative texts and combining the generated alternative texts, it is possible to generate a learning candidate text replacing the input text.

서버(1000)는 생성된 적어도 하나의 학습 후보 텍스트를 클라이언트 디바이스(2000)에 전송할 수 있다. The server 1000 may transmit the generated at least one learning candidate text to the client device 2000 .

단계 S130에서, 클라이언트 디바이스(2000)는 적어도 하나의 학습 후보 텍스트를 서버(1000)로부터 수신하고, 적어도 하나의 학습 후보 텍스트 중 사용자에 의해 선택된 적어도 하나의 텍스트를 서버(1000)에 전송한다. 일 실시예에서, 클라이언트 디바이스(2000)는 서버(1000)로부터 수신한 적어도 하나의 학습 후보 텍스트를 포함하는 신규 문장 리스트를 나타내는 제2 그래픽 사용자 인터페이스(2020)를 디스플레이부(2510) 상에 디스플레이할 수 있다. 클라이언트 디바이스(2000)는 제2 그래픽 사용자 인터페이스(2020)를 통해 신규 문장 리스트에 포함되는 적어도 하나의 학습 후보 텍스트 중 적어도 하나의 텍스트를 선택하는 사용자 입력을 수신하고, 수신된 사용자 입력에 기초하여 적어도 하나의 텍스트를 선택할 수 있다. In step S130 , the client device 2000 receives at least one learning candidate text from the server 1000 , and transmits at least one text selected by the user among the at least one learning candidate text to the server 1000 . In an embodiment, the client device 2000 may display the second graphical user interface 2020 indicating a new sentence list including at least one learning candidate text received from the server 1000 on the display unit 2510 . can The client device 2000 receives a user input for selecting at least one text among at least one learning candidate text included in the new sentence list through the second graphical user interface 2020, and based on the received user input, at least You can select one text.

클라이언트 디바이스(2000)는 선택된 적어도 하나의 텍스트에 관한 식별 값을 서버(1000)에 전송할 수 있다.The client device 2000 may transmit an identification value regarding at least one selected text to the server 1000 .

일 실시예에서, 서버(1000)는 입력 텍스트 및 적어도 하나의 학습 후보 텍스트를 이용하여 학습된 결과(도메인, 인텐트, 및 슬롯 정보)를 클라이언트 디바이스(2000)에 전송할 수 있다. 일 실시예에서, 서버(1000)는 자연어 이해 모델(1320)을 이용하여 입력 텍스트 및 적어도 하나의 학습 후보 텍스트가 분류되는 도메인, 인텐트 액션, 인텐트 객체, 및 슬롯을 식별할 수 있다. 일 실시예에서, 서버(1000)는 식별된 도메인, 인텐트 액션, 및 인텐트 객체에 기초하여, 입력 텍스트 및 적어도 하나의 학습 후보 텍스트 내의 슬롯에 관한 슬롯 태깅(slot tagging)을 자동으로 수행할 수도 있다. In an embodiment, the server 1000 may transmit a result (domain, intent, and slot information) learned using the input text and at least one learning candidate text to the client device 2000 . In an embodiment, the server 1000 may identify a domain, an intent action, an intent object, and a slot into which the input text and at least one learning candidate text are classified using the natural language understanding model 1320 . In one embodiment, the server 1000 may automatically perform slot tagging on slots in the input text and at least one learning candidate text based on the identified domain, the intent action, and the intent object. may be

단계 S140에서, 서버(1000)는 클라이언트 디바이스(2000)를 통해 수신한 사용자 입력에 기초하여 선택된 텍스트를 학습 데이터로써 자연어 이해 모델(1320)에 입력함으로써, 학습을 수행하고, 학습 완료에 관한 정보를 클라이언트 디바이스(2000)에 전송한다. 일 실시예에서, 서버(1000)는 자연어 이해 모델(1320)을 이용하여, 입력 텍스트 및 적어도 하나의 학습 후보 텍스트가 분류되는 도메인을 식별하고, 메모리(1300, 도 2 참조)에 저장된 복수의 자연어 이해 모델(1320a 내지 1320c, 도 2 참조) 중 식별된 도메인에 특화되어 학습된 자연어 이해 모델을 선택할 수 있다. 일 실시예에서, 서버(1000)는 선택된 자연어 이해 모델에 입력 텍스트 및 적어도 하나의 학습 후보 텍스트 중 선택된 텍스트를 학습 데이터로 입력함으로써 학습을 수행하고, 자연어 이해 모델을 갱신(update)할 수 있다. 그러나, 이에 한정되는 것은 아니고, 서버(1000)는 입력 텍스트 및 적어도 하나의 학습 후보 텍스트를 이용하여 신규 자연어 이해 모델을 생성할 수 있다. In step S140, the server 1000 performs learning by inputting the text selected based on the user input received through the client device 2000 into the natural language understanding model 1320 as learning data, and provides information on completion of learning. It is transmitted to the client device 2000 . In an embodiment, the server 1000 uses the natural language understanding model 1320 to identify a domain into which the input text and at least one learning candidate text are classified, and a plurality of natural languages stored in the memory 1300 (refer to FIG. 2 ). A natural language understanding model specialized for the identified domain among the understanding models 1320a to 1320c (refer to FIG. 2 ) may be selected. In an embodiment, the server 1000 may perform learning by inputting the selected text among the input text and the at least one learning candidate text to the selected natural language understanding model as training data, and may update the natural language understanding model. However, the present invention is not limited thereto, and the server 1000 may generate a new natural language understanding model using the input text and at least one learning candidate text.

도 2는 본 개시의 일 실시예에 따른 서버(1000)의 구성을 도시한 블록도이다.2 is a block diagram illustrating a configuration of a server 1000 according to an embodiment of the present disclosure.

도 2를 참조하면, 서버(1000)는 통신 인터페이스(1100), 프로세서(1200), 및 메모리(1300)를 포함할 수 있다. Referring to FIG. 2 , the server 1000 may include a communication interface 1100 , a processor 1200 , and a memory 1300 .

통신 인터페이스(1100)는 프로세서(1200)의 제어에 의해 클라이언트 디바이스(2000)와 데이터 통신을 수행할 수 있다. 통신 인터페이스(1100)는 클라이언트 디바이스(2000)뿐 아니라, 다른 서버와도 데이터 통신을 수행할 수 있다. 통신 인터페이스(1100)는 예를 들어, 유선 랜, 무선 랜(Wireless LAN), 와이파이(Wi-Fi), 블루투스(Bluetooth), 지그비(zigbee), WFD(Wi-Fi Direct), 적외선 통신(IrDA, infrared Data Association), BLE (Bluetooth Low Energy), NFC(Near Field Communication), 와이브로(Wireless Broadband Internet, Wibro), 와이맥스(World Interoperability for Microwave Access, WiMAX), SWAP(Shared Wireless Access Protocol), 와이기그(Wireless Gigabit Allicance, WiGig) 및 RF 통신을 포함하는 데이터 통신 방식 중 적어도 하나를 이용하여 클라이언트 디바이스(2000) 또는 다른 서버와 데이터 통신을 수행할 수 있다. The communication interface 1100 may perform data communication with the client device 2000 under the control of the processor 1200 . The communication interface 1100 may perform data communication not only with the client device 2000 but also with other servers. Communication interface 1100 is, for example, wired LAN, wireless LAN (Wireless LAN), Wi-Fi (Wi-Fi), Bluetooth (Bluetooth), Zigbee (zigbee), WFD (Wi-Fi Direct), infrared communication (IrDA, Infrared Data Association), BLE (Bluetooth Low Energy), NFC (Near Field Communication), WiBro (Wireless Broadband Internet, Wibro), WiMAX (World Interoperability for Microwave Access, WiMAX), SWAP (Shared Wireless Access Protocol), WiGig Data communication may be performed with the client device 2000 or another server using at least one of a data communication method including (Wireless Gigabit Allicance, WiGig) and RF communication.

프로세서(1200)는 메모리(1300)에 저장된 프로그램의 하나 이상의 명령어들(instructions)을 실행할 수 있다. 프로세서(1200)는 산술, 로직 및 입출력 연산과 시그널 프로세싱을 수행하는 하드웨어 구성 요소로 구성될 수 있다. 프로세서(1200)는 예를 들어, 중앙 처리 장치(Central Processing Unit), 마이크로 프로세서(microprocessor), 그래픽 프로세서(Graphic Processing Unit), ASICs(Application Specific Integrated Circuits), DSPs(Digital Signal Processors), DSPDs(Digital Signal Processing Devices), PLDs(Programmable Logic Devices), 및 FPGAs(Field Programmable Gate Arrays) 중 적어도 하나로 구성될 수 있으나, 이에 제한되는 것은 아니다. The processor 1200 may execute one or more instructions of a program stored in the memory 1300 . The processor 1200 may be configured as a hardware component that performs arithmetic, logic, input/output operations and signal processing. The processor 1200 may include, for example, a central processing unit (Central Processing Unit), a microprocessor, a graphic processing unit (ASIC), Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), and Digital Signal Processors (DSPDs). Signal Processing Devices), PLDs (Programmable Logic Devices), and FPGAs (Field Programmable Gate Arrays) may be configured as at least one, but is not limited thereto.

메모리(1300)는 예를 들어, 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 롬(ROM, Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나를 포함하는 비휘발성 메모리 및 램(RAM, Random Access Memory) 또는 SRAM(Static Random Access Memory)과 같은 휘발성 메모리를 포함할 수 있다. The memory 1300 is, for example, a flash memory type, a hard disk type, a multimedia card micro type, or a card type memory (eg, SD or XD memory). etc.), non-volatile memory including at least one of ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, magnetic disk, optical disk, and It may include a volatile memory such as random access memory (RAM) or static random access memory (SRAM).

메모리(1300)에는 프로세서(1200)가 판독할 수 있는 명령어들, 데이터 구조, 및 프로그램 코드(program code)가 저장될 수 있다. 이하의 실시예에서, 프로세서(1200)는 메모리에 저장된 프로그램의 명령어들 또는 코드들을 실행함으로써 구현될 수 있다. 또한, 메모리(1300)에는 ASR 모델(1310), NLU 모델(1320), TTS 모델(1330), 대체 대상 텍스트 식별 모듈(1340), 대체 텍스트 생성 모듈(1350), 학습 후보 텍스트 생성 모듈(1360), 사전 DB(Dictionary DB, 1370), 및 워드 임베딩 모델(1380)에 대응되는 데이터 및 프로그램 명령어 코드들이 저장될 수 있다. The memory 1300 may store instructions, data structures, and program codes that the processor 1200 can read. In the following embodiment, the processor 1200 may be implemented by executing instructions or codes of a program stored in a memory. In addition, the memory 1300 includes an ASR model 1310 , an NLU model 1320 , a TTS model 1330 , an alternative target text identification module 1340 , an alternative text generation module 1350 , and a learning candidate text generation module 1360 ). , a dictionary DB (Dictionary DB, 1370), and data and program command codes corresponding to the word embedding model 1380 may be stored.

프로세서(1200)는 통신 인터페이스(1100)를 이용하여, 클라이언트 디바이스(2000)로부터 사용자가 입력한 입력 텍스트를 수신할 수 있다. 입력 텍스트는 클라이언트 디바이스(2000)의 사용자 입력부(2410, 도 3 참조)를 통한 타이핑(typing) 입력을 통해 획득될 수 있으나, 이에 한정되는 것은 아니다. 일 실시예에서, 입력 텍스트는 클라이언트 디바이스(2000)의 마이크로폰(2420, 도 3 참조)을 통해 수신된 음성 입력을 ASR 모델(1310)을 이용하여 텍스트로 변환함으로써, 획득될 수 있다. 이 경우, 서버(1000)는 음성 입력이 변환된 음성 신호를 클라이언트 디바이스(2000)로부터 수신할 수 있다. The processor 1200 may receive the input text input by the user from the client device 2000 using the communication interface 1100 . The input text may be acquired through a typing input through the user input unit 2410 (refer to FIG. 3 ) of the client device 2000, but is not limited thereto. In one embodiment, the input text may be obtained by converting a voice input received through the microphone 2420 (refer to FIG. 3 ) of the client device 2000 into text using the ASR model 1310 . In this case, the server 1000 may receive a voice signal converted from a voice input from the client device 2000 .

일 실시예에서, 프로세서(1200)는 ASR 모델(1310)에 관한 데이터 및 명령어 코드를 이용하여 ASR(Automatic Speech Recognition)을 수행하고, 클라이언트 디바이스(2000)로부터 수신한 음성 신호를 입력 텍스트로 변환할 수 있다. ASR 모델(1310)은 사용자의 음성을 인식하는 음성 인식 모델로서, 사용자의 음성 입력을 입력 텍스트로 변환하여 출력할 수 있다. ASR 모델(1310)은 예를 들어, 음향 모델, 발음 사전 및 언어 모델을 포함하는 인공지능 모델일 수 있다. 또는, ASR 모델(1310)은 예를 들어, 음향 모델, 발음 사전 및 언어 모델을 별도로 포함하지 않고 통합된 신경망을 포함하는 구조를 가지는 종단간(End-to-End) 음성 인식 모델일 수 있다. 종단간 ASR 모델은 통합된 신경망을 이용함으로써, 음성으로부터 음소를 인식한 이후에 음소를 텍스트로 변환하는 과정이 없이, 음성을 텍스트로 변환할 수 있다.In one embodiment, the processor 1200 performs Automatic Speech Recognition (ASR) using the data and the command code regarding the ASR model 1310, and converts the voice signal received from the client device 2000 into input text. can The ASR model 1310 is a voice recognition model for recognizing the user's voice, and may convert the user's voice input into input text and output it. The ASR model 1310 may be, for example, an artificial intelligence model including an acoustic model, a pronunciation dictionary, and a language model. Alternatively, the ASR model 1310 may be, for example, an end-to-end speech recognition model having a structure including an integrated neural network without separately including an acoustic model, a pronunciation dictionary, and a language model. The end-to-end ASR model uses an integrated neural network to convert speech into text without a process of converting phonemes into text after recognizing phonemes from speech.

메모리(1300)에는 복수의 자연어 이해 모델(1320)이 저장되어 있을 수 있다. 도면에는 제1 자연어 이해 모델(1320a), 제2 자연어 이해 모델(1320b), 및 제3 자연어 이해 모델(1320c)만이 도시되어 있지만, 이는 설명의 편의를 위한 것이다. 메모리(1300)에 저장되어 있는 자연어 이해 모델의 개수가 도면에 도시된 바와 같이 한정되는 것은 아니다. 일 실시예에서, 메모리(1300)에는 제1 자연어 이해 모델(1320a)만 저장되어 있을 수도 있다. A plurality of natural language understanding models 1320 may be stored in the memory 1300 . Although only the first natural language understanding model 1320a, the second natural language understanding model 1320b, and the third natural language understanding model 1320c are illustrated in the drawings, this is for convenience of description. The number of natural language understanding models stored in the memory 1300 is not limited as shown in the drawings. In an embodiment, only the first natural language understanding model 1320a may be stored in the memory 1300 .

제1 자연어 이해 모델(1320a) 내지 제3 자연어 이해 모델(1320c)은 기학습된 자연어 모델(Pre-trained NLU)일 수 있다. 제1 자연어 이해 모델(1320a) 내지 제3 자연어 이해 모델(1320c) 각각은 특정 도메인에 기초하여 학습된 인공지능 모델일 수 있다. 예를 들어, 제1 자연어 이해 모델(1320a)은 영화와 관련된 단어 또는 텍스트와 영화 도메인에 특화된 인텐트 및 슬롯에 관한 레이블 값(정답값)을 이용하여 학습된 모델이고, 제2 자연어 이해 모델(1320b)은 게임과 관련된 단어 또는 텍스트와 게임 도메인에 특화된 인텐트 및 슬롯의 레이블 값을 이용하여 학습된 모델이며, 제3 자연어 이해 모델(1320c)은 가전 기기의 제어와 관련된 단어 또는 텍스트와 가전 기기 제어 도메인에 특화된 인텐트 및 슬롯의 레이블 값을 이용하여 학습된 모델일 수 있다. 그러나, 이에 한정되지 않고, 제1 자연어 이해 모델(1320a) 내지 제3 자연어 이해 모델(1320c) 중 적어도 하나는 특정 도메인에 특화되지 않고, 범용적인 단어, 텍스트와 인텐트 및 슬롯의 레이블 값을 이용하여 학습된 모델일 수 있다. The first natural language understanding model 1320a to the third natural language understanding model 1320c may be pre-trained NLUs. Each of the first natural language understanding model 1320a to the third natural language understanding model 1320c may be an artificial intelligence model learned based on a specific domain. For example, the first natural language understanding model 1320a is a model trained using a word or text related to a movie, and label values (correct values) for intents and slots specialized in a movie domain, and the second natural language understanding model ( 1320b) is a model learned by using game-related words or texts and label values of intents and slots specialized for game domains, and the third natural language understanding model 1320c is a word or text related to control of home appliances and home appliances. It may be a model trained using the label values of intents and slots specialized for the control domain. However, the present invention is not limited thereto, and at least one of the first natural language understanding model 1320a to the third natural language understanding model 1320c is not specific to a specific domain, and uses general-purpose word, text and label values of intents and slots. This may be a trained model.

일 실시예에서, 제1 자연어 이해 모델(1320a) 내지 제3 자연어 이해 모델(1320c)은 사용자에 의하여 새롭게 생성된 신규 도메인, 신규 인텐트, 및 신규 슬롯과 입력 텍스트를 이용하여 학습된 신규 자연어 이해 모델일 수도 있다. In an embodiment, the first natural language understanding model 1320a to the third natural language understanding model 1320c is a new natural language understanding learned using a new domain, a new intent, and a new slot and input text newly created by the user. It could be a model.

이하에서는, 제1 자연어 이해 모델(1320a)을 중심으로 구성 요소에 관하여 설명한다. Hereinafter, components will be described based on the first natural language understanding model 1320a.

제1 자연어 이해 모델(1320a)은 레이블링 모델(1321a), 도메인 식별 모델(1322a), 인텐트 액션 식별 모델(1323a), 인텐트 객체 식별 모델(1324a), 및 슬롯 태깅 모델(1325a)을 포함할 수 있다. 그러나, 이는 예시적인 것이고, 제1 자연어 이해 모델(1320a)이 도면에 도시된 레이블링 모델(1321a), 도메인 식별 모델(1322a), 인텐트 액션 식별 모델(1323a), 인텐트 객체 식별 모델(1324a), 및 슬롯 태깅 모델(1325a)을 모두 포함하지 않고, 일부의 모델만을 포함할 수도 있다. The first natural language understanding model 1320a may include a labeling model 1321a, a domain identification model 1322a, an intent action identification model 1323a, an intent object identification model 1324a, and a slot tagging model 1325a. can However, this is exemplary, and the first natural language understanding model 1320a is the labeling model 1321a, domain identification model 1322a, intent action identification model 1323a, intent object identification model 1324a shown in the figure. , and the slot tagging model 1325a may not be included, and only some models may be included.

레이블링 모델(1321a)은 입력 텍스트로부터 검출되는 도메인, 인텐트, 및 슬롯에 대한 정보를 레이블링하도록 구성되는 데이터 및 프로그램 코드들을 포함할 수 있다. 예를 들어, 레이블링 모델(1321a)은 사용자가 인텐트 및 슬롯에 대하여 자신의 태그(tag)를 추가하는데 이용될 수 있다. 프로세서(1200)는 레이블링 모델(1321a)의 데이터 및 프로그램 코드를 이용하여 입력 텍스트로부터 검출된 도메인, 인텐트, 및 슬롯에 관한 레이블을 자동으로 결정할 수 있다. The labeling model 1321a may include data and program codes configured to label information about domains, intents, and slots detected from input text. For example, the labeling model 1321a may be used by a user to add his or her own tags to intents and slots. The processor 1200 may automatically determine a label for a domain, an intent, and a slot detected from the input text by using the data and the program code of the labeling model 1321a.

그러나, 이에 한정되는 것은 아니고, 프로세서(1200)는 통신 인터페이스(1100)를 이용하여, 사용자 입력에 기초하여 입력 텍스트에 대하여 결정된 도메인, 인텐트, 및 슬롯에 관한 정보를 클라이언트 디바이스(2000)로부터 수신할 수도 있다. 이 경우, 프로세서(1200)는 레이블링 모델(1321a)의 데이터 및 프로그램 코드를 이용하여, 수신된 도메인, 인텐트, 및 슬롯을 입력 텍스트에 레이블링할 수 있다.However, the present invention is not limited thereto, and the processor 1200 receives, from the client device 2000 , information about a domain, an intent, and a slot determined for the input text based on a user input using the communication interface 1100 . You may. In this case, the processor 1200 may use the data and program code of the labeling model 1321a to label the received domain, intent, and slot in the input text.

프로세서(1200)는 레이블링 모델(1321a)의 데이터 및 프로그램 코드를 이용하여, 사용자가 신규 도메인, 인텐트 액션, 인텐트 객체, 및 슬롯을 생성하도록 허용할 수 있다. 예를 들어, 레이블링 모델(1321a)은 사용자가 인텐트 및 슬롯에 대하여 임의의 태그를 추가하거나 또는 자동으로 결정된 태그를 선택하는데 이용될 수 있다. 레이블링 모델(1321a)은 도메인 식별 모델(1322a), 인텐트 액션 식별 모델(1323a), 인텐트 객체 식별 모델(1324a), 및 슬롯 태깅 모델(1325a)을 이용할 수 있다.The processor 1200 may allow a user to create a new domain, an Intent action, an Intent object, and a slot by using the data and program code of the labeling model 1321a. For example, the labeling model 1321a may be used by the user to add arbitrary tags for intents and slots, or to select automatically determined tags. The labeling model 1321a may use a domain identification model 1322a , an intent action identification model 1323a , an intent object identification model 1324a , and a slot tagging model 1325a .

도메인 식별 모델(1322a)은 입력 텍스트를 해석함으로써, 입력 텍스트에 해당되는 도메인을 식별하는데 이용되는 모델이다. 프로세서(1200)는 도메인 식별 모델(1322a)의 데이터 및 프로그램 코드를 이용하여 입력 텍스트를 해석함으로써, 입력 텍스트가 속하거나, 또는 분류될 수 있는 카테고리 정보인 도메인을 식별할 수 있다. 일 실시예에서, 프로세서(1200)는 입력 텍스트로부터 하나 또는 복수의 도메인을 식별할 수 있다. 일 실시예에서, 프로세서(1200)는 도메인 식별 모델(1322a)을 이용하여 입력 텍스트를 해석함으로써, 입력 텍스트와 도메인과의 관련도를 수치 값으로 산출할 수 있다. 일 실시예에서, 도메인 식별 모델(1322a)은 입력 텍스트와 하나 또는 복수의 도메인 간의 관련도를 확률값으로 산출하고, 산출된 확률값 중 높은 확률값을 갖는 도메인을 입력 텍스트가 분류되는 도메인으로 결정하도록 구성될 수 있다. The domain identification model 1322a is a model used to identify a domain corresponding to the input text by interpreting the input text. The processor 1200 may interpret the input text by using the data and the program code of the domain identification model 1322a to identify a domain to which the input text belongs or is category information to which it can be classified. In one embodiment, the processor 1200 may identify one or more domains from the input text. In an embodiment, the processor 1200 may calculate the relation between the input text and the domain as a numerical value by interpreting the input text using the domain identification model 1322a. In an embodiment, the domain identification model 1322a is configured to calculate a degree of relevance between the input text and one or more domains as a probability value, and to determine a domain having a high probability value among the calculated probability values as a domain into which the input text is classified. can

인텐트 액션 식별 모델(1323a)은 입력 텍스트를 해석함으로써, 입력 텍스트로부터 인텐트 액션을 예측하도록 구성되는 모델이다. 인텐트 액션은 입력 텍스트가 수행하는 액션, 예를 들어, 검색, 포스팅, 플레이, 구매, 주문 등을 의미한다. 프로세서(1200)는 인텐트 액션 식별 모델(1323a)의 데이터 및 프로그램 코드를 이용하여 문법적 분석(syntactic analyze) 또는 의미적 분석(semantic analyze)을 수행함으로써, 입력 텍스트로부터 인텐트 액션을 식별할 수 있다. 일 실시예에서, 프로세서(1200)는 인텐트 액션 식별 모델(1323a)의 데이터 및 프로그램 코드를 이용하여 입력 텍스트를 형태소, 단어(word), 또는 구(phrase)의 단위로 파싱(parse)하고, 파싱된 형태소, 단어, 또는 구의 언어적 특징(예: 문법적 요소)을 이용하여 파싱된 텍스트로부터 추출된 단어 또는 구의 의미를 추론할 수 있다. 프로세서(1200)는, 추론된 단어 또는 구의 의미를 자연어 이해 모델에서 제공되는 기 정의된 인텐트들과 비교함으로써, 추론된 단어 또는 구의 의미에 대응되는 인텐트 액션을 결정할 수 있다. The intent action identification model 1323a is a model configured to predict an intent action from the input text by interpreting the input text. The intent action refers to an action performed by the input text, for example, search, posting, play, purchase, order, and the like. The processor 1200 may identify the intent action from the input text by performing syntactic analysis or semantic analysis using the data and program code of the intent action identification model 1323a. . In one embodiment, the processor 1200 parses the input text into units of morphemes, words, or phrases using the data and program code of the intent action identification model 1323a, The meaning of the word or phrase extracted from the parsed text may be inferred by using the linguistic features (eg, grammatical elements) of the parsed morpheme, word, or phrase. The processor 1200 may determine an intent action corresponding to the meaning of the inferred word or phrase by comparing the meaning of the inferred word or phrase with predefined intents provided from the natural language understanding model.

인텐트 객체 식별 모델(1324a)은 입력 텍스트를 해석함으로써, 입력 텍스트로부터 인텐트 객체(intent object)를 식별하도록 구성되는 모델이다. 인텐트 객체는 식별된 인텐트 액션과 관련된 객체를 의미한다. 인텐트 객체는 검출된 인텐트 액션의 대상이 되는 객체로서, 예를 들어 영화, 사진, 스포츠, 날씨, 항공 등을 의미할 수 있다. 프로세서(1200)는 인텐트 객체 식별 모델(1324a)에 관한 데이터 및 프로그램 코드를 이용하여, 입력 텍스트를 해석함으로써 입력 텍스트로부터 인텐트 액션과 관련된 인텐트 객체를 식별할 수 있다.The intent object identification model 1324a is a model configured to identify an intent object from the input text by interpreting the input text. The intent object means an object related to the identified intent action. The intent object is an object that is the target of the detected intent action, and may mean, for example, a movie, a photo, sports, weather, or aviation. The processor 1200 may identify the intent object related to the intent action from the input text by interpreting the input text by using the program code and the data related to the intent object identification model 1324a.

인텐트 액션 식별 모델(1323a) 및 인텐트 객체 식별 모델(1324a)은 특정 도메인에 특화되어 학습된 결과에 기초하여, 인텐트 액션 및 인텐트 객체를 식별할 수 있다. The intent action identification model 1323a and the intent object identification model 1324a may identify an intent action and an intent object based on a result of learning specialized for a specific domain.

예를 들어, 입력 텍스트가 "영화 어벤저스 엔드게임 개봉일 검색해줘"인 경우, 프로세서(1200)는 인텐트 액션 식별 모델(1323a)을 이용하여 입력 텍스트를 해석하고, 해석의 결과 인텐트 액션을 '검색'으로 식별하고, 인텐트 객체 식별 모델(1324a)을 이용하여, 인텐트 객체를 '영화'로 검출할 수 있다. 다른 예를 들면, 입력 텍스트가 "도미노 피자에서 페퍼로니 피자 주문해 줘~"인 경우, 프로세서(1200)는 인텐트 액션 식별 모델(1323a)을 이용하여, 입력 텍스트를 해석하고, 해석의 결과 인텐트 액션을 '주문'으로 식별하고, 인텐트 객체 식별 모델(1324a)을 이용하여, 인텐트 객체를 '음식'로 검출할 수 있다. For example, if the input text is "Search for the release date of the movie Avengers: Endgame", the processor 1200 interprets the input text using the intent action identification model 1323a, and determines the intent action as a result of the interpretation. 'search', and using the intent object identification model 1324a, the intent object may be detected as a 'movie'. As another example, if the input text is "Order a pepperoni pizza at Domino's Pizza~", the processor 1200 interprets the input text using the intent action identification model 1323a, and the result of the interpretation is the intent The action may be identified as 'order' and the intent object may be detected as 'food' using the intent object identification model 1324a.

일 실시예에서, 인텐트 액션 식별 모델(1323a) 및 인텐트 객체 식별 모델(1324a)은 매칭 모델에 기초하여, 입력 텍스트로부터 인텐트 액션과 인텐트 객체를 각각 식별할 수 있다. 입력 텍스트로부터 파싱된 각 단어 또는 구와 인텐트 액션 및 인텐트 객체와의 관련성은 소정의 수치 값으로 산출될 수 있다. 일 실시예에서, 입력 텍스트의 단어 또는 구와 인텐트(인텐트 액션 및 인텐트 객체를 포함) 간의 관련성은 확률값으로 산출될 수 있다. 일 실시예에서, 프로세서(1200)는 인텐트 액션 식별 모델(1323a)의 매칭 모델을 입력 텍스트에 적용함으로써, 입력 텍스트로부터 파싱된 단어와 기 저장된 인텐트 액션 간의 관련성 정도를 나타내는 복수의 수치값을 획득하고, 복수의 수치값 중 최대값을 갖는 인텐트 액션을 입력 텍스트에 관한 인텐트 액션으로 식별할 수 있다. 예를 들어, 입력 텍스트가 "도미노 피자에서 페퍼로니 피자 3판 주문해 줘" 인 경우, 프로세서(1200)는 인텐트 액션 식별 모델(1323a)의 매칭 모델을 입력 텍스트에 적용함으로써, 입력 텍스트와 인텐트 액션으로서 '주문' 간의 매칭 정도를 0.9의 수치 값으로, 입력 텍스트와 '검색' 간의 매칭 정도를 0.1의 수치 값으로 각각 산출할 수 있다. 프로세서(1200)는 수치 값 중 최대값으로 산출된 인텐트 액션을 입력 텍스트와 관련된 인텐트 액션으로 결정할 수 있지만, 이에 한정되는 것은 아니다. 일 실시예에서, 프로세서(1200)는 복수의 인텐트 중 어느 하나를 선택하는 사용자 입력에 기초하여 입력 텍스트에 관한 인텐트 액션을 결정할 수도 있다. In an embodiment, the intent action identification model 1323a and the intent object identification model 1324a may identify an intent action and an intent object from the input text, respectively, based on the matching model. The relation between each word or phrase parsed from the input text and the intent action and intent object may be calculated as a predetermined numerical value. In an embodiment, a relationship between a word or phrase of the input text and an intent (including an intent action and an intent object) may be calculated as a probability value. In an embodiment, the processor 1200 applies the matching model of the intent action identification model 1323a to the input text, thereby generating a plurality of numerical values indicating the degree of relevance between the word parsed from the input text and the pre-stored intent action. obtained, and an intent action having a maximum value among a plurality of numerical values may be identified as an intent action related to the input text. For example, when the input text is "Order 3 slices of pepperoni pizza at Domino's Pizza", the processor 1200 applies the matching model of the intent action identification model 1323a to the input text, thereby making the input text and the intent As an action, the matching degree between 'order' can be calculated as a numerical value of 0.9, and the matching degree between the input text and 'search' can be calculated as a numerical value of 0.1, respectively. The processor 1200 may determine the intent action calculated as the maximum value among the numerical values as the intent action related to the input text, but is not limited thereto. In an embodiment, the processor 1200 may determine an intent action regarding the input text based on a user input of selecting one of a plurality of intents.

프로세서(1200)가 입력 텍스트로부터 인텐트 객체를 식별하는 방법은 인텐트 액션을 식별하는 방법과 동일하므로, 중복되는 설명은 생략한다. Since the method of the processor 1200 for identifying the intent object from the input text is the same as the method of identifying the intent action, a redundant description will be omitted.

도 2에서는 인텐트 액션 식별 모델(1323a)와 인텐트 객체 식별 모델(1324a)이 별개의 모델로 구분되는 것으로 도시되었지만, 이에 한정되는 것은 아니다. 일 실시예에서, 인텐트 액션 식별 모델(1323a)와 인텐트 객체 식별 모델(1324a)은 하나의 인텐트 식별 모델로 통합될 수 있다. In FIG. 2 , the intent action identification model 1323a and the intent object identification model 1324a are illustrated as separate models, but the present disclosure is not limited thereto. In an embodiment, the intent action identification model 1323a and the intent object identification model 1324a may be integrated into one intent identification model.

슬롯 태깅 모델(1325a)은 입력 텍스트로부터 슬롯(slot)을 식별하고, 식별된 슬롯을 도메인, 인텐트 액션, 및 인텐트 객체와 연관시키는 슬롯 태깅(slot tagging)을 수행하도록 구성된 모델이다. 슬롯은 입력 텍스트로부터 도메인, 인텐트 액션 및 인텐트 객체와 관련된 세부 정보들을 획득하거나, 세부 동작을 결정하기 위한 변수(variable) 정보를 의미한다. 슬롯은 인텐트 액션 및 인텐트 객체와 관련된 정보이고, 하나의 인텐트에 대하여 복수 종류의 슬롯이 대응될 수 있다. 슬롯 태깅은 입력 텍스트에서 슬롯을 식별하고, 식별된 슬롯을 도메인, 인텐트 액션, 및 인텐트 객체와 관련된 요소에 대응시키는 동작을 의미한다. The slot tagging model 1325a is a model configured to identify slots from input text and perform slot tagging associating the identified slots with domains, intent actions, and intent objects. The slot means variable information for obtaining detailed information related to a domain, an intent action, and an intent object from an input text or determining a detailed action. A slot is information related to an intent action and an intent object, and a plurality of types of slots may correspond to one intent. Slot tagging refers to an operation of identifying a slot in an input text and mapping the identified slot to an element related to a domain, an Intent action, and an Intent object.

일 실시예에서, 프로세서(1200)는 슬롯 태깅 모델(1325a)을 이용하여, 입력 텍스트로부터 슬롯을 식별하고, 식별된 슬롯을 도메인, 인텐트 액션, 및 인텐트 객체와 관련된 슬롯 엘리먼트에 자동으로 대응시킬 수 있다. 예를 들어, 입력 텍스트가 "영화 어벤저스 엔드게임 개봉일 검색해줘" 인 경우, 프로세서(1200)는 슬롯 태깅 모델(1325a)의 데이터 및 프로그램 코드를 이용하여, 입력 텍스트로부터 '어벤저스 엔드게임' 및 '개봉일'을 슬롯으로 식별하고, '어벤저스 엔드게임'을 영화의 도메인 및 영화 검색 인텐트와 관련된 슬롯인 '영화 제목'에 태깅하고, '개봉일'을 '영화 개봉 날짜'의 슬롯에 태깅할 수 있다. 다른 예를 들면, 입력 텍스트가 "도미노 피자에서 페퍼로니 피자 3판 언주로 30길로 배달 주문해 줘" 인 경우, 프로세서(1200)는 슬롯 태깅 모델(1325a)의 데이터 및 프로그램 코드를 이용하여, 입력 텍스트로부터 '도미노 피자', '페퍼로니 피자', '3판', 및 '언주로 30길'을 슬롯으로 식별하고, 음식의 도메인 및 음식 주문의 인텐트와 관련된 슬롯으로서, '도미노 피자'를 '음식점 상호명'으로, '페퍼로니 피자'를 '피자 토핑'으로, '3판'을 '주문 수량'으로, '언주로 30길'을 '장소'로 각각 태깅할 수 있다. In one embodiment, the processor 1200 uses the slot tagging model 1325a to identify a slot from the input text and automatically map the identified slot to a slot element associated with a domain, an Intent action, and an Intent object. can do it For example, if the input text is "Search for the release date of the movie Avengers Endgame", the processor 1200 uses the data and program code of the slot tagging model 1325a to extract 'Avengers Endgame' and To identify 'release date' as a slot, tag 'Avengers Endgame' to 'movie title', a slot related to the domain and movie search intent of the movie, and tag 'release date' to the slot for 'movie release date'. can As another example, if the input text is "Order for delivery at 30 gil with 3 slices of pepperoni pizza from Domino's pizza", the processor 1200 uses the data and program code of the slot tagging model 1325a, the input text From 'Domino Pizza', 'Pepperoni Pizza', '3rd Edition', and 'Eonju-ro 30-gil' are identified as slots, and as slots related to the domain of food and the intent of the food order, 'Domino Pizza' is defined as 'restaurant'. You can tag 'Pepperoni Pizza' as 'Pizza Topping', '3 pieces' as 'order quantity', and 'Eonju-ro 30-gil' as 'place'.

프로세서(1200)는 슬롯 태깅 모델(1325a)을 이용하여, 입력 텍스트에 대한 슬롯 태깅을 자동으로 수행할 수 있지만, 이에 한정되는 것은 아니다. 프로세서(1200)는 통신 인터페이스(1100)를 통해, 클라이언트 디바이스(2000)로부터 입력 텍스트 내의 단어 또는 구에 대하여, 특정 슬롯을 입력하거나, 새로운 슬롯을 추가하는 사용자 입력을 수신할 수 있다. 이 경우, 프로세서(1200)는 입력 텍스트의 단어 또는 구를, 사용자 입력에 의해 결정되거나 또는 추가된 슬롯에 태깅할 수 있다. The processor 1200 may automatically perform slot tagging on the input text using the slot tagging model 1325a, but is not limited thereto. The processor 1200 may receive a user input for inputting a specific slot or adding a new slot with respect to a word or phrase in the input text from the client device 2000 through the communication interface 1100 . In this case, the processor 1200 may tag the word or phrase of the input text to the slot determined or added by the user input.

프로세서(1200)는 통신 인터페이스(1100)를 통해, 클라이언트 디바이스(2000)로부터 사용자가 입력한 입력 텍스트를 수신한 경우, 제1 자연어 이해 모델(1320a) 내지 제3 자연어 이해 모델(1320c) 중 입력 텍스트가 특정 도메인에 해당할 확률을 계산하고, 계산된 확률값에 기초하여 제1 자연어 이해 모델(1320a) 내지 제3 자연어 이해 모델(1320c) 중 어느 하나의 자연어 이해 모델을 선택할 수 있다. 프로세서(1200)는 선택된 자연어 이해 모델을 이용하여, 입력 텍스트의 인텐트 검출 및 슬롯 태깅을 수행하고, 대체 대상 텍스트를 식별하며, 대체 텍스트를 생성할 수 있다. 이에 관해서는 도 16에서 상세하게 설명하기로 한다. When the processor 1200 receives the input text input by the user from the client device 2000 through the communication interface 1100 , the input text from the first natural language understanding model 1320a to the third natural language understanding model 1320c may calculate a probability corresponding to a specific domain, and select any one natural language understanding model from among the first natural language understanding model 1320a to the third natural language understanding model 1320c based on the calculated probability value. The processor 1200 may perform intent detection and slot tagging of the input text, identify the replacement target text, and generate the replacement text by using the selected natural language understanding model. This will be described in detail with reference to FIG. 16 .

프로세서(1200)는 통신 인터페이스(1100)를 이용하여, 입력 텍스트와 매칭되는 인텐트, 슬롯 태깅 정보 및 산출된 수치 값을 클라이언트 디바이스(2000)에 전송할 수 있다. 전송된 인텐트, 슬롯 태깅 정보, 및 각각의 산출된 수치 값은 클라이언트 디바이스(2000)의 디스플레이부(2510, 도 3 참조) 상에 디스플레이 될 수 있다. 이에 관해서는 도 18에서 상세하게 설명하기로 한다. The processor 1200 may transmit an intent matching the input text, slot tagging information, and a calculated numerical value to the client device 2000 using the communication interface 1100 . The transmitted intent, slot tagging information, and each calculated numerical value may be displayed on the display unit 2510 (refer to FIG. 3 ) of the client device 2000 . This will be described in detail with reference to FIG. 18 .

TTS(Text-to-Specch) 모델(1330)은 텍스트를 음향 신호(wave signal)로 변환하도록 구성되는 모델이다. 일 실시예에서, 프로세서(1200)는 TTS 모델(1330)에 관한 데이터 및 프로그램 코드를 이용하여, 텍스트를 음향 신호로 변환하고, 변환된 음향 신호를 바이너리 데이터 스트리밍(binary data streaming) 형태의 신호로 출력할 수 있다. The Text-to-Specch (TTS) model 1330 is a model configured to convert text into a wave signal. In an embodiment, the processor 1200 converts text into a sound signal by using data and a program code related to the TTS model 1330 , and converts the converted sound signal into a binary data streaming type signal. can be printed out.

일 실시예에서, TTS 모델(1330)은 지역 또는 특정인의 정보를 포함하는 컨텍스트(context)를 반영한 개인화된 TTS 모델(Personalized TTS model)을 포함할 수 있다. 개인화된 TTS 모델은 예를 들어, 특정 지역, 나이, 성별 등에 따른 사람의 말투, 억양, 및 사투리 중 적어도 하나의 특징을 반영하여 텍스트를 음향 신호로 변환하여 출력할 수 있다. In an embodiment, the TTS model 1330 may include a personalized TTS model reflecting a context including information of a region or a specific person. The personalized TTS model may convert a text into an acoustic signal and output it by reflecting at least one characteristic of a person's tone, intonation, and dialect according to a specific region, age, gender, etc., for example.

대체 대상 텍스트 식별 모듈(1340)은 입력 텍스트 중에서 대체가 필요한 대체 대상 텍스트를 식별하도록 구성된 모듈이다. 일 실시예에서, 프로세서(1200)는 대체 대상 텍스트 식별 모듈(1340)에 관한 데이터 및 프로그램 코드를 이용하여, 입력 텍스트에 포함되는 적어도 하나의 단어들 중에서 대체가 필요한 대체 대상 텍스트를 식별할 수 있다. '대체 대상 텍스트'는 언어 모델을 포함하는 애플리케이션을 사용하는 사용자가 발화 시 잘못 발음하였거나, ASR 모델(1310)을 통한 ASR의 수행 결과로서 출력된 텍스트가 부정확하여, 자연어 이해 모델(1320)에 입력되는 경우 정확한 학습이 어려운 단어 또는 구를 의미할 수 있다. 대체 대상 텍스트는 예를 들어, 입력 텍스트에 포함된 개체명(Named Entity), 키워드, 장소, 지명, 영화 제목, 또는 게임 용어 등을 포함할 수 있다. 대체 대상 텍스트는 예를 들어, 슬롯(slot)일 수도 있다. The replacement target text identification module 1340 is a module configured to identify replacement target text requiring replacement among input texts. In an embodiment, the processor 1200 may identify the replacement target text requiring replacement from among at least one word included in the input text by using the data and the program code related to the replacement target text identification module 1340 . . 'Substitute text' is input to the natural language understanding model 1320 because a user using an application including a language model pronounces it incorrectly when uttering, or the text output as a result of performing ASR through the ASR model 1310 is inaccurate. If it is, it may mean a word or phrase that is difficult to learn accurately. The replacement target text may include, for example, a named entity, a keyword, a place, a place name, a movie title, or a game term included in the input text. The replacement target text may be, for example, a slot.

대체 대상 텍스트 식별 모듈(1340)은 입력 텍스트로부터 하나 또는 그 이상의 대체 대상 텍스트를 식별하도록 학습될 수 있다. 예를 들어, 입력 텍스트가 "페퍼로니 피자 3판 언주로 30길로 배달 주문해 줘" 인 경우, 프로세서(1200)는 대체 대상 텍스트 식별 모듈(1340)을 이용하여, 입력 텍스트로부터 '페퍼로니' 또는 '언주로 30길'을 식별할 수 있다. The replacement target text identification module 1340 may be trained to identify one or more replacement target texts from the input text. For example, if the input text is "Order for delivery by 30 gil with 3 slices of pepperoni pizza", the processor 1200 uses the replacement target text identification module 1340 to obtain 'pepperoni' or 'unjug' from the input text. '30-gil' can be identified.

대체 대상 텍스트 식별 모듈(1340)은 자연어 이해 모델(1320)로부터 입력 텍스트의 해석 결과를 획득하고, 입력 텍스트의 해석 결과에 기초하여 자동으로 입력 텍스트로부터 대체 대상 텍스트를 식별하도록 구성될 수 있다. 일 실시예에서, 프로세서(1200)는 기학습된 자연어 이해 모델을 이용하여 입력 텍스트를 해석함으로써, 입력 텍스트가 분류되는 도메인, 인텐트 액션, 인텐트 객체, 및 슬롯을 식별하고, 식별된 슬롯을 도메인, 인텐트 액션, 인텐트 객체와 관련된 슬롯에 태깅하며, 대체 대상 텍스트 식별 모듈(1340)을 이용하여 태깅된 슬롯에 해당되는 텍스트를 대체 대상 텍스트로 결정할 수 있다. 이 경우, 대체 대상 텍스트를 대체하는 대체 텍스트는 대체 대상 텍스트가 태깅된 슬롯과 동일한 슬롯으로 태깅될 수 있다. The replacement target text identification module 1340 may be configured to obtain an analysis result of the input text from the natural language understanding model 1320 and automatically identify the replacement target text from the input text based on the analysis result of the input text. In an embodiment, the processor 1200 interprets the input text using the pre-learned natural language understanding model to identify a domain, an intent action, an intent object, and a slot into which the input text is classified, and select the identified slot. The domain, the intent action, and the slot related to the intent object are tagged, and the text corresponding to the tagged slot may be determined as the replacement target text by using the replacement target text identification module 1340 . In this case, the replacement text replacing the replacement target text may be tagged with the same slot as the slot in which the replacement target text is tagged.

그러나, 이에 한정되는 것은 아니고, 대체 대상 텍스트 식별 모듈(1340)은 사용자 입력에 의해 수동으로 결정된 단어 또는 구를 대체 텍스트로 결정하도록 구성될 수도 있다. 일 실시예에서, 프로세서(1200)는 통신 인터페이스(1100)를 통해 클라이언트 디바이스(2000)로부터, 입력 텍스트 중 적어도 하나의 단어 또는 구를 선택하는 사용자 입력을 수신하고, 대체 대상 텍스트 식별 모듈(1340)을 이용하여 사용자 입력에 기초하여 선택된 적어도 하나의 단어 또는 구를 식별하고, 식별된 적어도 하나의 단어 또는 구를 대체 대상 텍스트로 결정할 수 있다. 일 실시예에서, 대체 대상 텍스트 식별 모듈(1340)은 사용자 입력에 기초하여 선택된 적어도 하나의 단어 또는 구를 개체명으로 식별할 수도 있다. However, the present invention is not limited thereto, and the replacement target text identification module 1340 may be configured to determine a word or phrase manually determined by a user input as the replacement text. In an embodiment, the processor 1200 receives a user input for selecting at least one word or phrase among the input text from the client device 2000 through the communication interface 1100 , and the replacement target text identification module 1340 . may be used to identify at least one word or phrase selected based on the user input, and determine the identified at least one word or phrase as the replacement target text. In an embodiment, the replacement target text identification module 1340 may identify at least one word or phrase selected based on a user input as an entity name.

일 실시예에서, 대체 대상 텍스트 식별 모듈(1340)은 입력 텍스트로부터 식별되는 단어들의 품사 태깅(POS Tagging, Part-Of-Speech Tagging)을 수행함으로써, 입력 텍스트 내에 포함되어 있는 개체명을 식별하도록 구성될 수 있다. 그러나 본 개시의 실시예는 이에 한정되지 않으며, 대체 대상 텍스트 식별 모듈(1340)은 기 공지된 다양한 방식의 개체명 인식(Named Entity Recognition, NER) 방법을 이용하여 입력 텍스트로부터 개체명으로 식별하도록 구성될 수 있다. In an embodiment, the replacement target text identification module 1340 is configured to identify the entity name included in the input text by performing part-of-speech tagging (POS Tagging, Part-Of-Speech Tagging) of words identified from the input text. can be However, the embodiment of the present disclosure is not limited thereto, and the replacement target text identification module 1340 is configured to identify the entity name from the input text using various known named entity recognition (NER) methods. can be

일 실시예에서, 대체 대상 텍스트 식별 모듈(1340)은 사전 DB(1370)에 기 저장된 단어에 관한 검색 결과에 기초하여, 대체 대상 텍스트를 식별할 수도 있다. 사전 DB(Dictionary Database, 1370)는 복수의 단어에 대한 발음열 정보 또는 복수의 단어 각각에 관하여 변환된 임베딩 벡터에 관한 정보를 저장하는 데이터베이스일 수 있다. 사전 DB(1370)는 복수의 단어들에 관한 발음 열 정보 및 복수의 단어 각각에 포함되는 발음 열에 관한 임베딩 벡터를 저장하는 데이터베이스이다. 일 실시예에서, 프로세서(1200)는 대체 대상 텍스트 식별 모듈(1340)의 데이터 및 프로그램 코드를 이용하여, 입력 텍스트를 단어, 형태소, 및 구문 단위로 파싱하고, 사전 DB(1370)에서 파싱된 적어도 하나의 단어를 검색하고, 사전 DB(1370)의 검색 결과에 기초하여, 검색되지 않거나, 또는 사용 빈도가 기설정된 제1 임계치 보다 낮은 단어를 대체 대상 텍스트로 결정할 수 있다. 제1 임계치는 사전 DB(1370)의 검색 결과, 대체 대상 텍스트를 결정하기 위한 사용 빈도에 관한 설정값이다. 일 실시예에서, 제1 임계치는 사용자 입력에 기초하여 설정될 수 있다. 예를 들어, 제1 임계치는 클라이언트 디바이스(2000)를 통해 디스플레이되는 UI를 통해 수신되는 사용자 입력에 의해 설정될 수 있다. 제1 임계치를 설정하기 위한 UI에 대해서는 도 18에서 상세하게 설명하기로 한다. In an embodiment, the replacement target text identification module 1340 may identify the replacement target text based on a search result related to a word pre-stored in the dictionary DB 1370 . The dictionary DB (Dictionary Database) 1370 may be a database that stores pronunciation sequence information for a plurality of words or information on an embedding vector converted for each of the plurality of words. The dictionary DB 1370 is a database that stores pronunciation column information on a plurality of words and embedding vectors on pronunciation columns included in each of the plurality of words. In one embodiment, the processor 1200 parses the input text into words, morphemes, and phrases by using the data and program code of the replacement target text identification module 1340 , and at least parsed in the dictionary DB 1370 . A word may be searched for, and a word that is not searched or has a frequency of use lower than a preset first threshold may be determined as the replacement target text based on a search result of the dictionary DB 1370 . The first threshold is a set value related to the frequency of use for determining the replacement target text as a result of searching the dictionary DB 1370 . In one embodiment, the first threshold may be set based on a user input. For example, the first threshold may be set by a user input received through a UI displayed through the client device 2000 . A UI for setting the first threshold will be described in detail with reference to FIG. 18 .

예를 들어, 제1 임계치가 상대적으로 낮게 설정되면, 사전 DB(1370)의 검색 결과, 사용 빈도가 제1 임계치 보다 낮은 단어가 적게 검색될 것이므로, 대체 대상 텍스트의 수가 상대적으로 적을 수 있다. 반대의 예로, 제1 임계치가 상대적으로 높게 설정되면, 사전 DB(1370)의 검색 결과 사용 빈도가 제1 임계치 보다 낮은 단어라도 대체 대상 텍스트로 결정될 확률이 높으므로, 상대적으로 많은 수의 대체 대상 텍스트가 결정될 수 있다.For example, if the first threshold is set to be relatively low, fewer words with a frequency of use lower than the first threshold will be searched as a result of a search of the dictionary DB 1370 , and thus the number of replacement target texts may be relatively small. Conversely, if the first threshold is set to be relatively high, even if the frequency of use of the search result of the dictionary DB 1370 is lower than the first threshold, there is a high probability of being determined as the replacement target text, so a relatively large number of replacement target texts can be determined.

대체 텍스트 생성 모듈(1350)은, 대체 대상 텍스트 식별 모듈(1340)에 의해 식별된 대체 대상 텍스트에 대하여, 사용자가 발화할 것으로 예상되고, 발음적으로 유사도가 높은 대체 텍스트를 생성하도록 구성되는 모듈이다. 일 실시예에서, 프로세서(1200)는 대체 대상 텍스트 식별 모듈(1340)로부터 식별된 대체 대상 텍스트를 획득하고, 대체 텍스트 생성 모듈(1350)의 데이터 및 프로그램 코드를 이용하여, 대체 대상 텍스트를 대체(replacing)할 수 있는 대체 텍스트를 생성할 수 있다. 일 실시예에서, 프로세서(1200)는 대체 텍스트 생성 모듈(1350)을 이용하여, 적어도 하나의 대체 텍스트를 생성할 수 있다. The alternative text generating module 1350 is a module configured to generate an alternative text that is expected to be uttered by the user and has a high phonological similarity with respect to the alternative text identified by the alternative text identification module 1340 . . In one embodiment, the processor 1200 obtains the identified replacement target text from the replacement target text identification module 1340 , and uses the data and program code of the replacement text generation module 1350 to replace ( You can generate alternative text that can be replaced. In an embodiment, the processor 1200 may generate at least one alternative text using the alternative text generation module 1350 .

'대체 텍스트'는 특정 개체명이나, 키워드, 또는 자연어 이해 모델(1320)에 의해 슬롯으로 식별될 수 있는 단어 또는 구에 대하여, 발음적으로 유사하거나, 사용자가 발음할 것으로 예측되는 텍스트의 예시이다. 대체 텍스트는 사용자로부터 입력된 음성 입력이 ASR 모델(1310)을 통해 변환되어 출력될 것으로 예상되는 텍스트의 예시일 수도 있다. 예를 들어, 대체 대상 텍스트가 '페퍼로니 피자'인 경우, 대체 텍스트는 사용자가 대체 대상 텍스트를 발음할 때, 부정확하거나 또는 유사한 발음으로 발화할 수 있는 '페파로니', '파페로니', '페포로니' 등일 수 있다. 예를 들어, 대체 대상 텍스트가 '언주로 30길'인 경우, '언주로 30길’에 대하여 사용자가 발음할 것으로 예측되고, 발음 열이 유사한 ‘안주로 30길’, ‘운주로 30길’, ‘언조로 30길’ 등일 수 있다. 'Alternative text' is an example of text that is phonetically similar or predicted by a user to pronounce a specific entity name, keyword, or word or phrase that can be identified as a slot by the natural language understanding model 1320 . The alternative text may be an example of text that is expected to be output after a voice input input from the user is converted through the ASR model 1310 . For example, if the alt text is 'Pepperoni Pizza', the alt text is 'Pepperoni', 'Paperoni', 'Peperoni', which may be uttered with incorrect or similar pronunciation when the user pronounces the alt text. 'poroni' and the like. For example, if the replacement target text is 'Eonju-ro 30-gil', it is predicted that the user will pronounce 'Eonju-ro 30-gil', and the pronunciation column is similar to 'Anju-ro 30-gil' and 'Unju-ro 30-gil'. , 'Eonjo-ro 30-gil', etc.

일 실시예에서, 프로세서(1200)는 대체 텍스트 생성 모듈(1350)의 데이터 및 프로그램 코드를 이용하여, 대체 대상 텍스트에 관한 발음 열(phoneme sequence)을 추출하고, 발음 열의 연관성 정보(phonetic relevance)에 기초하여, 사전 DB(1370)에 기 저장된 단어 중 추출된 발음 열과 유사한 텍스트를 검색하고, 검색 결과에 기초하여, 추출된 발음 열과의 유사도(similarity)가 높은 적어도 하나의 텍스트를 이용하여 대체 텍스트를 생성할 수 있다. 여기서, 유사도는 신뢰도 점수(confidence score)로 산출될 수 있다. 프로세서(1200)가 발음 열의 유사도에 기초하여 대체 텍스트를 생성하는 구체적인 방법에 대해서는 도 8 및 도 9에서 상세하게 설명하기로 한다.In an embodiment, the processor 1200 extracts a phoneme sequence related to the replacement target text by using the data and the program code of the replacement text generation module 1350, and adds the phoneme sequence to the phonetic relevance of the pronunciation sequence. Based on the search for text similar to the extracted pronunciation column among the words pre-stored in the dictionary DB 1370, an alternative text is generated using at least one text having a high similarity to the extracted pronunciation column based on the search result. can create Here, the similarity may be calculated as a confidence score. A detailed method for the processor 1200 to generate the alternative text based on the similarity of the pronunciation column will be described in detail with reference to FIGS. 8 and 9 .

일 실시예에서, 프로세서(1200)는 대체 텍스트 생성 모듈(1350)의 데이터 및 프로그램 코드를 이용하여, 대체 대상 텍스트를 임베딩 벡터(embedding vector)로 변환하고, 신경망 모델(Neural Network model)을 이용하여, 변환된 임베딩 벡터와 유사한 벡터값을 갖는 텍스트를 생성하고, 생성된 텍스트를 이용하여 대체 텍스트를 생성할 수 있다. 프로세서(1200)는 워드 임베딩 모델(1380)을 이용하여, 대체 대상 텍스트를 임베딩 벡터로 변환할 수 있다. 워드 임베딩 모델(1380)은 입력 텍스트를 구성하는 적어도 하나의 단어를 수치화하도록 구성된 모델이다. 워드 임베딩 모델(1380)은 단어를 벡터 값으로 수치화하는 모델로서, 예를 들어, CBOW(Continuous Bag of Words) 임베딩 모델, Skip-Gram 임베딩 모델, word2vec 등과 같은 공지의 임베딩 모델을 포함할 수 있다. 프로세서(1200)는 예를 들어, 생성 모델(Generative Network)을 이용하여 임베딩 벡터와 유사한 벡터값을 갖는 대체 텍스트를 생성할 수 있다. 프로세서(1200)가 생성 모델을 이용하여 대체 텍스트를 생성하는 구체적인 방법에 대해서는 도 10 및 도 11에서 상세하게 설명하기로 한다.In an embodiment, the processor 1200 converts the replacement target text into an embedding vector by using the data and the program code of the alt text generation module 1350, and uses a neural network model to , a text having a vector value similar to the converted embedding vector may be generated, and alternative text may be generated using the generated text. The processor 1200 may convert the replacement target text into an embedding vector by using the word embedding model 1380 . The word embedding model 1380 is a model configured to quantify at least one word constituting the input text. The word embedding model 1380 is a model for digitizing words as vector values, and for example, a CBOW (Continuous Bag of Words) embedding model, Skip-Gram embedding model, word2vec, etc. It may include a well-known embedding model. The processor 1200 may generate an alternative text having a vector value similar to an embedding vector using, for example, a generative network. A specific method for the processor 1200 to generate the alternative text by using the generation model will be described in detail with reference to FIGS. 10 and 11 .

일 실시예에서, 프로세서(1200)는 TTS 모델(1330)을 이용하여 대체 대상 텍스트를 음향 신호(wave signal)로 변환하고, 변환된 음향 신호를 출력하고, ASR 모델(1310)을 이용하여 출력된 음향 신호를 출력 텍스트로 변환할 수 있다. 프로세서(1200)는 대체 텍스트 생성 모듈(1350)의 데이터 및 프로그램 코드를 이용하여, 대체 대상 텍스트를 출력 텍스트로 대체함으로써, 대체 텍스트를 생성할 수 있다. 프로세서(1200)가 ASR 모델(1310), TTS 모델(1330), 및 대체 텍스트 생성 모듈(1350)을 이용하여 대체 텍스트를 생성하는 구체적인 방법에 대해서는 도 12 및 도 13에서 상세하게 설명하기로 한다.In an embodiment, the processor 1200 converts the replacement target text into a sound signal using the TTS model 1330 , outputs the converted sound signal, and outputs the outputted text using the ASR model 1310 . It is possible to convert an acoustic signal into output text. The processor 1200 may generate the replacement text by replacing the replacement target text with the output text using the data and the program code of the replacement text generation module 1350 . A specific method for the processor 1200 to generate the alternative text using the ASR model 1310 , the TTS model 1330 , and the alternative text generation module 1350 will be described in detail with reference to FIGS. 12 and 13 .

학습 후보 텍스트 생성 모듈(1360)은, 대체 텍스트 생성 모듈(1350)에 의해 생성된 대체 텍스트를 입력 텍스트로부터 식별된 대체 대상 텍스트에 대체(replacing)함으로써, 자연어 이해 모델(1320)에 의한 학습에 입력되는 학습 후보 텍스트를 생성하도록 구성된 모듈이다. 일 실시예에서, 프로세서(1200)는 학습 후보 텍스트 생성 모듈(1360)의 데이터 및 프로그램 코드를 이용하여, 입력 텍스트 내의 대체 대상 텍스트를 대체 텍스트로 대체하거나, 입력 텍스트에서 대체 대상 텍스트를 제외한 나머지 단어 또는 구를 대체 텍스트와 결합(concatenating)하는 페러프레이징(paraphrasing)을 수행함으로써, 적어도 하나의 학습 후보 텍스트를 생성할 수 있다. '학습 후보 텍스트'는 자연어 이해 모델(1320)에 의한 학습에 입력 데이터로 이용되는 학습 데이터(training data)의 후보(candidate)가 되는 텍스트를 의미한다. The training candidate text generation module 1360 is configured to input the replacement text generated by the replacement text generation module 1350 to the replacement target text identified from the input text, thereby inputting the learning by the natural language understanding model 1320 . It is a module configured to generate learning candidate texts. In an embodiment, the processor 1200 uses the data and program code of the learning candidate text generation module 1360 to replace the replacement target text in the input text with the replacement text, or the remaining words except the replacement target text in the input text. Alternatively, by performing paraphrasing of concatenating a phrase with an alternative text, at least one learning candidate text may be generated. The 'learning candidate text' means text that becomes a candidate of training data used as input data for learning by the natural language understanding model 1320 .

예를 들어, 입력 텍스트가 "페퍼로니 피자 3판 언주로 30길로 배달 주문해 줘" 인 경우, 프로세서(1200)는 학습 후보 텍스트 생성 모듈(1360)을 이용하여, 대체 대상 텍스트로 식별된 '페퍼로니'를 '페파로니', '파페로니', '페포로니' 등으로 각각 대체함으로써, "페파로니 피자 3판 언주로 30길로 배달 주문해 줘", "파페로니 피자 3판 언주로 30길로 배달 주문해 줘", 또는 "페포로니 피자 3판 언주로 30길로 배달 주문해 줘"를 포함하는 3개의 학습 후보 텍스트를 생성할 수 있다. 그러나, 이에 한정되는 것은 아니고, 입력 텍스트로부터 복수의 대체 대상 텍스트가 식별되는 경우, 프로세서(1200)는 복수의 대체 대상 텍스트 각각에 관한 대체 텍스트를 이용하여 학습 후보 텍스트를 생성할 수 있다. 예를 들어, "페퍼로니 피자 3판 언주로 30길로 배달 주문해 줘"인 입력 텍스트로부터 피자 토핑의 종류를 나타내는 슬롯인 ‘페퍼로니’와 주소를 나타내는 슬롯인 ‘언주로 30길’이 대체 대상 텍스트로 식별된 경우, 프로세서(1200)는 대체 텍스트 생성 모듈(1350)을 이용하여 ‘페퍼로니’와 ‘언주로 30길’ 각각에 관한 대체 텍스트를 생성하고, ‘페퍼로니’에 대해서는 ‘페파로니’, ‘파페로니’, ‘페포로니’를, ‘언주로 30길’에 대해서는 ‘안주로 30길’, ‘운주로 30길’, ‘언조로 30길’을 대체 텍스트로 생성할 수 있다. 상기 예와 같이 '페퍼로니'와 '언주로 30길' 각각에 대하여 3개의 대체 텍스트가 생성된 경우, 프로세서(1200)는 학습 후보 텍스트 생성 모듈(1360)을 이용하여 대체 텍스트를 조합함으로써, 총 9개의 학습 후보 텍스트를 생성할 수 있다. For example, if the input text is "Order for delivery at 30 gil with 3 slices of pepperoni pizza", the processor 1200 uses the learning candidate text generation module 1360 to generate 'pepperoni' identified as the replacement target text. By replacing 'Pepperoni', 'Paperoni', 'Pephoroni', etc. respectively, "Order 3 slices of pepperoni pizza for delivery at 30 gil", "Order for delivery at 30 gil with 3 slices of pepperoni pizza." You can create three learning candidate texts that include "Do it" or "Order delivery by 30 gil with 3 slices of pepperoni pizza." However, the present invention is not limited thereto, and when a plurality of replacement target texts are identified from the input text, the processor 1200 may generate a learning candidate text by using the replacement text for each of the plurality of replacement target texts. For example, from the input text "Order for delivery of 3 pepperoni pizzas by Unjuro 30 gil", 'Pepperoni', a slot indicating the type of pizza topping, and 'Eonjuro 30 gil', a slot indicating an address, are replaced with text. When identified, the processor 1200 generates alternative texts for each of 'Pepperoni' and 'Eonjuro 30-gil' using the alternative text generation module 1350, and for 'Pepperoni', 'Pepperoni' and 'Pape' 'Roni' and 'Pephoroni', and for 'Eonju-ro 30-gil', 'Anju-ro 30-gil', 'Unju-ro 30-gil', and 'Eonjo-ro 30-gil' can be created as alternative texts. As in the above example, when three alternative texts are generated for each of 'Pepperoni' and 'Eonju-ro 30-gil', the processor 1200 combines the alternative texts using the learning candidate text generation module 1360, so that a total of 9 Can generate learning candidate texts.

입력 텍스트 내에서 슬롯으로 식별된 대체 대상 텍스트를 대체 텍스트로 대체하여 학습 후보 텍스트를 생성한 경우, 학습 후보 텍스트 생성 모듈(1360)은 학습 후보 텍스트의 도메인, 인텐트, 및 슬롯 정보를 자연어 이해 모델에 제공할 수 있다. 일 실시예에서, 학습 후보 텍스트 생성 모듈(1360)은 학습 후보 텍스트 내에서 대체된 대체 텍스트의 슬롯을 대체 대상 텍스트의 슬롯과 동일한 슬롯으로 식별하고, 슬롯 정보 및 대체 대상 텍스트를 자연어 이해 모델(1320)에 제공할 수 있다. When the training candidate text is generated by replacing the replacement target text identified as a slot in the input text with the replacement text, the training candidate text generation module 1360 converts the domain, intent, and slot information of the training candidate text into the natural language understanding model. can be provided to In an embodiment, the training candidate text generation module 1360 identifies a slot of the alt text replaced within the training candidate text as the same slot as the slot of the replacement target text, and divides the slot information and the replacement target text into the natural language understanding model 1320 ) can be provided.

일 실시예에서, 학습 후보 텍스트 생성 모듈(1360)은 대체 텍스트, 대체 텍스트의 슬롯 정보, 및 입력 텍스트로부터 식별된 대체 대상 텍스트를 자연어 이해 모델에 제공할 수 있다. 슬롯 정보는 예를 들어, 식별된 슬롯의 타입(slot type), 슬롯 값(slot value), 및 슬롯 대체값을 포함할 수 있다. 여기서, 슬롯 대체값은 대체 텍스트가 식별된 슬롯과 동일한 슬롯값을 갖는 대체 대상 텍스트, 즉 원본 텍스트를 의미한다. 예를 들어, 학습 후보 텍스트가 "페파로니 피자 3판 언주로 30길로 배달 주문해 줘", "파페로니 피자 3판 언주로 30길로 배달 주문해 줘", "페포로니 피자 3판 언주로 30길로 배달 주문해 줘"인 경우, 학습 후보 텍스트 생성 모듈(1360)은 학습 후보 텍스트 중 대체 텍스트인 '페파로니', '파페로니', '페포로니' 각각에 대한 슬롯 타입인 '피자 토핑', 피자 토핑에 관한 슬롯 값, 및 대체 텍스트로 대체된 원본 텍스트인 '페퍼로니'를 자연어 이해 모델에 제공할 수 있다. In an embodiment, the learning candidate text generation module 1360 may provide the replacement text, slot information of the replacement text, and the replacement target text identified from the input text to the natural language understanding model. The slot information may include, for example, a type of an identified slot, a slot value, and a slot replacement value. Here, the slot replacement value means the replacement target text having the same slot value as the slot in which the replacement text is identified, that is, the original text. For example, if the learning candidate text is "Order for delivery by 30 gil with 3 slices of pepperoni pizza", "Order for delivery by 30 gil with 3 slices of pepperoni pizza", "Order for delivery with 3 slices of pepperoni pizza by 30 gil" In the case of "Order delivery by road", the learning candidate text generation module 1360 generates 'pizza topping' which is a slot type for each of 'peparoni', 'paperoni', and 'peporoni' which are alternative texts among the learning candidate texts. , a slot value for the pizza topping, and 'pepperoni', which is the original text replaced with the alternative text, may be provided to the natural language understanding model.

프로세서(1200)는 적어도 하나의 학습 후보 텍스트를 자연어 이해 모델(1320)에 입력 데이터로 입력함으로써, 학습(training)을 수행할 수 있다. 일 실시예에서, 자연어 이해 모델(1320)에 의해 입력 텍스트의 인텐트 액션, 인텐트 객체, 및 슬롯이 모두 식별된 경우, 학습 후보 텍스트는 입력 텍스트의 인텐트 액션, 인텐트 객체, 및 슬롯에 관한 정보를 모두 승계하여 입력 텍스트와 동일한 인텐트 액션, 인텐트 객체, 및 슬롯에 관한 정보를 가질 수 있다. 일 실시예에서, 프로세서(1200)는 학습 후보 텍스트 내의 대체 텍스트의 슬롯 타입 및 슬롯 값을 포함하는 슬롯 정보와 대체 대상 텍스트를 이용하여 자연어 이해 모델을 학습할 수 있다. 이 경우, 프로세서(1200)는 기학습된 복수의 자연어 이해 모델(1320a 내지 1320c) 중 적어도 하나의 학습 후보 텍스트의 도메인 및 인텐트에 대응되는 어느 하나의 자연어 이해 모델을 선택할 수 있다. 프로세서(1200)는 선택된 자연어 이해 모델에 적어도 하나의 학습 후보 텍스트, 인텐트 액션, 인텐트 객체, 및 슬롯 정보를 학습 데이터로 입력하여 학습을 수행함으로써, 자연어 이해 모델을 갱신(update)할 수 있다. The processor 1200 may perform training by inputting at least one training candidate text to the natural language understanding model 1320 as input data. In one embodiment, when the intent action, intent object, and slot of the input text are all identified by the natural language understanding model 1320 , the learning candidate text is added to the intent action, intent object, and slot of the input text. By inheriting all related information, it is possible to have the same intent action, intent object, and slot information as the input text. In an embodiment, the processor 1200 may learn the natural language understanding model by using the target text and slot information including the slot type and slot value of the replacement text in the training candidate text. In this case, the processor 1200 may select any one natural language understanding model corresponding to the domain and intent of at least one learning candidate text from among the plurality of pre-trained natural language understanding models 1320a to 1320c. The processor 1200 may update the natural language understanding model by inputting at least one learning candidate text, an intent action, an intent object, and slot information into the selected natural language understanding model as training data to perform learning. .

프로세서(1200)가 기학습된 자연어 이해 모델을 이용하여 입력 텍스트를 해석하지 않은 경우, 프로세서(1200)는 기학습된 자연어 이해 모델을 이용하여 적어도 하나의 학습 후보 텍스트를 해석함으로써, 적어도 하나의 학습 후보 텍스트의 도메인, 인텐트 액션, 및 인텐트 객체를 식별하고, 슬롯 태깅을 수행할 수 있다. 이 경우, 프로세서(1200)는 식별된 도메인, 인텐트 액션, 인텐트 객체, 및 슬롯 정보를 자연어 이해 모델의 학습 데이터로 이용하여, 학습을 수행하고, 자연어 이해 모델을 갱신할 수 있다.When the processor 1200 does not interpret the input text using the pre-learned natural language understanding model, the processor 1200 interprets the at least one learning candidate text using the pre-learned natural language understanding model, so that at least one learning The domain of the candidate text, the intent action, and the intent object may be identified, and slot tagging may be performed. In this case, the processor 1200 may perform learning and update the natural language understanding model by using the identified domain, intent action, intent object, and slot information as training data of the natural language understanding model.

일 실시예에서, 프로세서(1200)는 기학습된 자연어 이해 모델을 갱신하지 않고, 새로운 자연어 이해 모델을 생성할 수도 있다. 프로세서(1200)는 입력 텍스트 및 적어도 하나의 학습 후보 텍스트의 개수를 카운트하고, 카운트된 개수가 기설정된 제3 임계치 보다 큰 경우, 입력 텍스트 및 적어도 하나의 학습 후보 텍스트와 식별된 도메인, 인텐트, 및 슬롯 정보를 학습을 위한 입력 데이터로 이용하여 학습을 수행함으로써, 신규 자연어 이해 모델을 생성할 수도 있다. 프로세서(1200)가 기학습된 복수의 자연어 이해 모델(1320a 내지 1320c) 중 어느 하나를 갱신하거나, 신규 자연어 이해 모델을 생성하는 구체적인 방법에 대해서는 도 16에서 상세하게 설명하기로 한다. In an embodiment, the processor 1200 may generate a new natural language understanding model without updating the previously learned natural language understanding model. The processor 1200 counts the number of the input text and the at least one learning candidate text, and when the counted number is greater than a third preset threshold, the input text and the at least one learning candidate text and the identified domain, intent, and by performing learning using slot information as input data for learning, a new natural language understanding model may be generated. A specific method for the processor 1200 to update any one of the plurality of pre-learned natural language understanding models 1320a to 1320c or to generate a new natural language understanding model will be described in detail with reference to FIG. 16 .

일 실시예에서, 프로세서(1200)는 적어도 하나의 학습 후보 텍스트를 모두 이용하지 않고, 일부만을 선택하여 자연어 이해 모델에 입력 데이터로 입력함으로써, 학습을 수행할 수 있다. 이 경우, 프로세서(1200)는 통신 인터페이스(1100)를 통해 적어도 하나의 학습 후보 텍스트를 클라이언트 디바이스(2000)에 전송하고, 적어도 하나의 학습 후보 텍스트 중 적어도 하나를 선택하는 사용자 입력에 기초하여 선택된 텍스트에 관한 식별 값을 클라이언트 디바이스(2000)로부터 수신할 수 있다. 프로세서(1200)는 수신된 식별 값에 기초하여, 적어도 하나의 텍스트를 선택하고, 선택된 적어도 하나의 텍스트와 입력 텍스트를 자연어 이해 모델에 입력 데이터로 입력하여 학습할 수 있다. In an embodiment, the processor 1200 may perform learning by not using all of the at least one learning candidate text, but by selecting only a part and inputting it as input data to the natural language understanding model. In this case, the processor 1200 transmits at least one learning candidate text to the client device 2000 through the communication interface 1100, and the text selected based on a user input for selecting at least one of the at least one learning candidate text. An identification value of , may be received from the client device 2000 . The processor 1200 may select at least one text based on the received identification value, and input the selected at least one text and the input text to the natural language understanding model as input data to learn.

일 실시예에서, 프로세서(1200)는 입력 텍스트 및 학습 후보 텍스트를 이용한 학습을 수행하는 경우, 텍스트 정규화 모듈(text normalizer module)을 생성할 수 있다. 텍스트 정규화 모듈은 학습 후보 텍스트 내의 대체 텍스트를 입력받고, 대체 텍스트의 원본 텍스트인 대체 대상 텍스트를 출력하는 룰 테이블(Rule table)을 포함할 수 있다. 예를 들어, 학습 후보 텍스트 내의 대체 텍스트가 '페파로니', '파페로니', '페포로니'이고, 대체 대상 텍스트가 '페퍼로니'인 경우 텍스트 정규화 모듈은 하기 표 1과 같이 입력 텍스트로서 '페파로니', '파페로니', '페포로니'를, 출력 텍스트로서 '페퍼로니'를 정의하는 룰 테이블을 포함할 수 있다. In an embodiment, the processor 1200 may generate a text normalizer module when learning using the input text and the learning candidate text is performed. The text normalization module may include a rule table that receives the replacement text in the training candidate text and outputs the replacement target text that is the original text of the replacement text. For example, when the replacement text in the learning candidate text is 'Pepperoni', 'Papperoni', and 'Pephoroni', and the replacement target text is 'Pepperoni', the text normalization module is 'Pepperoni' as the input text as shown in Table 1 below. It may include a rule table defining 'Pepperoni', 'Pepperoni', 'Pephoroni', and 'Pepperoni' as output text.

InputInput OutputOutput 페파로니pepperoni 페퍼로니pepperoni 파페로니paperoni 페퍼로니pepperoni 페포로니pepperoni 페퍼로니pepperoni

텍스트 정규화 모듈은 입력 텍스트 및 출력 텍스트를 다이얼로그 매니저(Dialog Manager)(3300)(도 22 참조) 또는 응답 생성기 모델(Response Generator)(3400)(도 22 참조)에 제공할 수 있다. The text normalization module may provide input text and output text to a Dialog Manager 3300 (see FIG. 22 ) or a Response Generator model 3400 (see FIG. 22 ).

전술한 실시예에서, 텍스트 정규화 모듈이 룰 테이블을 포함한다고 설명되었지만, 이에 한정되는 것은 아니다. 텍스트 정규화 모듈은 신경망 모델(Neural Network)로 구현될 수도 있다. 텍스트 정규화 모듈이 신경망 모델로 구현되는 경우, 신경망 모델은 학습 후보 텍스트 내의 대체 텍스트를 입력 데이터로서, 대체 텍스트의 원본 텍스트를 정답값(Groundtruth)인 출력 데이터로서 각각 입력받아 학습(training)될 수 있다. 기 학습된 신경망 모델에 대체 텍스트가 입력되는 경우, 학습을 통해 원본 텍스트가 출력될 수 있다. 예를 들어, 기 학습된 신경망 모델에 대체 텍스트인 '페파로니', '파페로니', '페포로니'가 입력되는 경우, 학습을 통해 원본 텍스트인 '페퍼로니'가 출력될 수 있다. In the above-described embodiment, although it has been described that the text normalization module includes a rule table, it is not limited thereto. The text normalization module may be implemented as a neural network model. When the text normalization module is implemented as a neural network model, the neural network model can be trained by receiving the alternative text in the training candidate text as input data and the original text of the alternative text as output data that is the groundtruth. . When an alternative text is input to the pre-trained neural network model, the original text may be output through training. For example, when alternative texts 'Pepperoni', 'Paperoni', and 'Pephoroni' are input to the pre-trained neural network model, the original text 'Pepperoni' may be output through training.

도 3은 본 개시의 일 실시예에 따른 클라이언트 디바이스(2000)의 구성을 도시한 블록도이다. 도 3에 도시된 구성은 클라이언트 디바이스(2000)의 일부 구성을 도시한 것이고, 클라이언트 디바이스(2000)는 도 3에 도시되지 않은 구성을 더 포함할 수도 있다. 3 is a block diagram illustrating a configuration of a client device 2000 according to an embodiment of the present disclosure. The configuration shown in FIG. 3 is a partial configuration of the client device 2000 , and the client device 2000 may further include a configuration not shown in FIG. 3 .

클라이언트 디바이스(2000)는 입력 텍스트를 입력하는 사용자 입력을 수신하고, 입력 텍스트와 적어도 하나의 학습 후보 텍스트를 이용하여 학습된 결과를 나타내는 그래픽 사용자 인터페이스(Graphic User Interface)를 디스플레이하는 컴퓨팅 장치일 수 있다. 클라이언트 디바이스(2000)는 예를 들어, 스마트 폰, 태블릿 PC, PC, 랩톱, 스마트 TV, 휴대폰, PDA(personal digital assistant), 미디어 플레이어, GPS(global positioning system) 장치, 전자책 단말기, 디지털방송용 단말기, 네비게이션, 키오스크, MP3 플레이어, 디지털 카메라, 가전기기 및 기타 모바일 또는 비모바일 컴퓨팅 장치일 수 있으나, 이에 한정되지 않는다. 또한, 클라이언트 디바이스(2000)는 통신 기능 및 데이터 프로세싱 기능을 구비한 시계, 안경, 헤어 밴드 및 반지 등의 웨어러블 디바이스일 수 있다. 그러나, 이에 한정되지 않으며, 클라이언트 디바이스(2000)는 음성 인식을 위하여 서버(1000)와 네트워크를 통하여 데이터를 송수신할 수 있는 모든 종류의 컴퓨팅 장치를 포함할 수 있다. The client device 2000 may be a computing device that receives a user input for inputting input text, and displays a graphical user interface indicating a result of learning using the input text and at least one learning candidate text. . The client device 2000 is, for example, a smart phone, a tablet PC, a PC, a laptop, a smart TV, a mobile phone, a personal digital assistant (PDA), a media player, a global positioning system (GPS) device, an e-book terminal, or a digital broadcasting terminal. , navigation, kiosks, MP3 players, digital cameras, home appliances and other mobile or non-mobile computing devices. Also, the client device 2000 may be a wearable device such as a watch, glasses, a hair band, and a ring having a communication function and a data processing function. However, the present invention is not limited thereto, and the client device 2000 may include any type of computing device capable of transmitting and receiving data through a network with the server 1000 for voice recognition.

도 3을 참조하면, 클라이언트 디바이스(2000)는 통신 인터페이스(2100), 프로세서(2200), 및 메모리(2300), 입력부(2400), 및 출력부(2500)를 포함할 수 있다. Referring to FIG. 3 , the client device 2000 may include a communication interface 2100 , a processor 2200 , and a memory 2300 , an input unit 2400 , and an output unit 2500 .

통신 인터페이스(2100)는 서버(1000) 또는 외부 장치(미도시)와의 데이터 통신을 위한 하나 이상의 구성요소를 포함할 수 있다. 통신 인터페이스(2100)는 프로세서(2200)의 제어에 의해 음성 인식 및 보이스 어시스턴트 서비스를 위한 데이터를 서버(1000) 또는 외부 장치(미도시)와 송수신할 수 있다. 통신 인터페이스(2100)는 예를 들어, 유선 랜, 무선 랜(Wireless LAN), 와이파이(Wi-Fi), 블루투스(Bluetooth), 지그비(zigbee), WFD(Wi-Fi Direct), 적외선 통신(IrDA, infrared Data Association), BLE (Bluetooth Low Energy), NFC(Near Field Communication), 와이브로(Wireless Broadband Internet, Wibro), 와이맥스(World Interoperability for Microwave Access, WiMAX), SWAP(Shared Wireless Access Protocol), 와이기그(Wireless Gigabit Allicance, WiGig) 및 RF 통신을 포함하는 데이터 통신 방식 중 적어도 하나를 이용하여 서버(1000) 또는 다른 외부 장치(미도시)와 데이터 통신을 수행할 수 있다. The communication interface 2100 may include one or more components for data communication with the server 1000 or an external device (not shown). The communication interface 2100 may transmit/receive data for voice recognition and voice assistant service to and from the server 1000 or an external device (not shown) under the control of the processor 2200 . Communication interface 2100 is, for example, wired LAN, wireless LAN (Wireless LAN), Wi-Fi (Wi-Fi), Bluetooth (Bluetooth), Zigbee (zigbee), WFD (Wi-Fi Direct), infrared communication (IrDA, Infrared Data Association), BLE (Bluetooth Low Energy), NFC (Near Field Communication), WiBro (Wireless Broadband Internet, Wibro), WiMAX (World Interoperability for Microwave Access, WiMAX), SWAP (Shared Wireless Access Protocol), WiGig Data communication may be performed with the server 1000 or another external device (not shown) using at least one of a data communication method including (Wireless Gigabit Allicance, WiGig) and RF communication.

프로세서(2200)는 메모리(2300)에 저장된 프로그램의 하나 이상의 명령어들(instructions), 및 프로그램 코드(program code)를 실행할 수 있다. 프로세서(2200)는 산술, 로직 및 입출력 연산과 시그널 프로세싱을 수행하는 하드웨어 구성 요소로 구성될 수 있다. 프로세서(2200)는 예를 들어, 중앙 처리 장치(Central Processing Unit), 마이크로 프로세서(microprocessor), 그래픽 프로세서(Graphic Processing Unit), ASICs(Application Specific Integrated Circuits), DSPs(Digital Signal Processors), DSPDs(Digital Signal Processing Devices), PLDs(Programmable Logic Devices), 및 FPGAs(Field Programmable Gate Arrays) 중 적어도 하나로 구성될 수 있으나, 이에 한정되는 것은 아니다. The processor 2200 may execute one or more instructions of a program stored in the memory 2300 , and a program code. The processor 2200 may include hardware components that perform arithmetic, logic, input/output operations, and signal processing. The processor 2200 is, for example, a central processing unit (Central Processing Unit), a microprocessor (microprocessor), a graphic processor (Graphic Processing Unit), ASICs (Application Specific Integrated Circuits), DSPs (Digital Signal Processors), DSPDs (Digital Signal Processing Devices), PLDs (Programmable Logic Devices), and FPGAs (Field Programmable Gate Arrays) may be configured as at least one, but is not limited thereto.

메모리(2300)는 예를 들어, 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 롬(ROM, Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 비휘발성 메모리 및 램(RAM, Random Access Memory) 또는 SRAM(Static Random Access Memory)의 휘발성 메모리를 포함할 수 있다. The memory 2300 may include, for example, a flash memory type, a hard disk type, a multimedia card micro type, or a card type memory (eg, SD or XD memory). etc.), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, magnetic disk, optical disk, and It may include volatile memory of random access memory (RAM) or static random access memory (SRAM).

메모리(2300)에는 프로세서(1200)가 판독할 수 있는 명령어들 및 프로그램 코드가 저장되어 있다. 이하의 실시예에서, 프로세서(2200)는 메모리(2300)에 저장된 프로그램의 명령어들 또는 코드들을 실행함으로써 구현될 수 있다. The memory 2300 stores instructions and program codes readable by the processor 1200 . In the following embodiment, the processor 2200 may be implemented by executing instructions or codes of a program stored in the memory 2300 .

메모리(2300)에는 운영 체제(Operating System, 2310) 및 애플리케이션 구동 모듈(2320)을 포함할 수 있다. 운영 체제(2310)는 클라이언트 디바이스(2000)의 동작 또는 기능을 제어하는데 사용되는 소프트웨어(software)이다. 프로세서(2200)는 운영 체제(2310) 상에서 애플리케이션을 실행할 수 있다. The memory 2300 may include an operating system 2310 and an application driving module 2320 . The operating system 2310 is software used to control an operation or function of the client device 2000 . The processor 2200 may execute an application on the operating system 2310 .

애플리케이션 구동 모듈(2320)은 애플리케이션 프로그램을 구동하는데 사용되는 모듈이다. 애플리케이션 구동 모듈(2320)은 언어 모델 학습 애플리케이션(2330) 및 복수의 애플리케이션(2332, 2334)을 구동하는데 사용될 수 있다. 프로세서(2200)는 애플리케이션 구동 모듈(2320)의 데이터 및 프로그램 코드를 이용하여, 메모리(2300)로부터 하나 또는 그 이상의 애플리케이션 프로그램을 로딩(load)하고, 하나 또는 그 이상의 애플리케이션 프로그램을 실행할 수 있다. 일 실시예에서, 프로세서(2200)는 애플리케이션 구동 모듈(2320)을 이용하여 언어 모델 학습 애플리케이션(2330)을 실행할 수 있다. The application driving module 2320 is a module used to drive an application program. The application driving module 2320 may be used to drive the language model learning application 2330 and the plurality of applications 2332 and 2334 . The processor 2200 may load one or more application programs from the memory 2300 using data and program codes of the application driving module 2320 and execute the one or more application programs. In an embodiment, the processor 2200 may execute the language model learning application 2330 using the application driving module 2320 .

언어 모델 학습 애플리케이션(2330)은, 신규 언어 모델을 학습 또는 개발하거나, 기 학습된 언어 모델을 갱신하기 위하여 개발자에게 제공되는 프로그램이다. 언어 모델 학습 애플리케이션(2330)은 언어 모델 학습을 위하여 서버(1000)와 상호 작용하기 위한 개발자용 툴(tool)을 포함할 수 있다. 언어 모델 학습 애플리케이션(2330)은 예를 들어, 자연어 이해 모델을 학습하기 위한 애플리케이션일 수 있다.The language model learning application 2330 is a program provided to a developer to learn or develop a new language model or to update a previously learned language model. The language model learning application 2330 may include a tool for developers to interact with the server 1000 for learning the language model. The language model learning application 2330 may be, for example, an application for learning a natural language understanding model.

일 실시예에서, 프로세서(2200)는 언어 모델 학습 애플리케이션(2330)을 실행함으로서, 언어 모델 학습을 위한 입력 텍스트를 입력하는 사용자 입력을 수신하는 그래픽 사용자 인터페이스(Graphic User Interface; GUI)를 디스플레이부(2510) 상에 디스플레이할 수 있다. 프로세서(2200)는 사용자 입력부(2410)을 통해 입력 텍스트를 수신하고, 메모리(2300)에 저장할 수 있다. 일 실시예에서, 프로세서(2200)는 통신 인터페이스(2100)를 제어함으로써, 그래픽 사용자 인터페이스를 통해 입력받은 입력 텍스트를 서버(1000)에 전송할 수 있다.In an embodiment, the processor 2200 executes the language model learning application 2330 to display a graphic user interface (GUI) that receives a user input for inputting input text for learning the language model on the display unit ( 2510) can be displayed. The processor 2200 may receive the input text through the user input unit 2410 and store it in the memory 2300 . In an embodiment, the processor 2200 may transmit the input text received through the graphical user interface to the server 1000 by controlling the communication interface 2100 .

일 실시예에서, 프로세서(2200)는 통신 인터페이스(2100)를 제어함으로써, 서버(1000)에 의해 입력 텍스트로부터 식별된 도메인, 인텐트, 및 슬롯 태깅 정보를 수신할 수 있다. 프로세서(2200)는 언어 모델 학습 애플리케이션(2330)을 실행함으로써, 식별된 도메인, 인텐트, 및 슬롯 태깅 정보를 디스플레이부(2510) 상에 디스플레이할 수 있다. 이 경우, 프로세서(2200)는 하나 이상의 도메인, 하나 이상의 인텐트 각각에 관한 신뢰도 점수(confidence score)를 나타내는 수치 값을 함께 디스플레이할 수 있다. 마찬가지로, 프로세서(2200)는 서버(1000)에 의해 자동으로 태깅된 하나 이상의 슬롯에 관한 신뢰도 점수를 나타내는 수치 값을 디스플레이할 수 있다. In an embodiment, the processor 2200 may receive domain, intent, and slot tagging information identified by the server 1000 from the input text by controlling the communication interface 2100 . The processor 2200 may display the identified domain, intent, and slot tagging information on the display unit 2510 by executing the language model learning application 2330 . In this case, the processor 2200 may display a numerical value indicating a confidence score for each of one or more domains and one or more intents together. Similarly, the processor 2200 may display a numerical value indicative of a confidence score for one or more slots automatically tagged by the server 1000 .

그러나, 이에 한정되는 것은 아니고, 클라이언트 디바이스(2000)는 메모리(2300)에 자체적으로 기학습된 자연어 이해 모델을 저장하고 있고, 프로세서(2200)는 메모리(2300)에 저장된 기학습된 자연어 이해 모델을 이용하여 입력 텍스트로부터 도메인 및 인텐트를 검출하고, 슬롯 태깅을 수행할 수 있다. 프로세서(2200)가 입력 텍스트로부터 도메인, 인텐트, 및 슬롯을 검출하고. 슬롯 태깅을 수행하는 구체적인 방법은 서버(1000)의 프로세서(1200, 도 2 참조)에 의한 방법과 동일하므로, 구체적인 설명은 생략한다. However, the present invention is not limited thereto, and the client device 2000 stores the previously learned natural language understanding model in the memory 2300 , and the processor 2200 stores the previously learned natural language understanding model stored in the memory 2300 . It can be used to detect domains and intents from input text and perform slot tagging. The processor 2200 detects the domain, intent, and slot from the input text. Since the specific method of performing slot tagging is the same as the method performed by the processor 1200 (refer to FIG. 2 ) of the server 1000, a detailed description thereof is omitted.

일 실시예에서, 프로세서(2200)는 언어 모델 학습 애플리케이션(2330)을 실행함으로써, 입력 텍스트에 해당되는 도메인, 인텐트 액션, 및 인텐트 객체를 각각 선택 또는 결정하는 사용자 입력을 수신하기 위한 그래픽 사용자 인터페이스를 디스플레이부(2510) 상에 디스플레이할 수 있다. 사용자 입력부(2410)은 그래픽 사용자 인터페이스를 통해, 입력 텍스트가 분류되는 도메인, 인텐트 액션, 및 인텐트 객체를 입력하는 사용자 입력을 수신할 수 있다. 프로세서(2200)는 사용자 입력부(2410)를 통해 수신한 사용자 입력에 기초하여 입력 텍스트에 관한 도메인, 인텐트 액션, 및 인텐트 객체를 결정할 수 있다. In one embodiment, the processor 2200 executes the language model learning application 2330, thereby a graphical user for receiving user input for selecting or determining, respectively, a domain, an intent action, and an intent object corresponding to the input text. The interface may be displayed on the display unit 2510 . The user input unit 2410 may receive a user input for inputting a domain into which input text is classified, an intent action, and an intent object through a graphical user interface. The processor 2200 may determine a domain, an intent action, and an intent object related to the input text based on a user input received through the user input unit 2410 .

언어 모델 학습 애플리케이션(2330)은 서버(1000)에 의해 식별되거나, 또는 사용자 입력에 기초하여 결정된 도메인, 예를 들어, 영화, 사진, 스포츠, 날씨, 항공 등에 기초하여, 하나 이상의 애플리케이션 프로그램 인터페이스(Application Program Interface, API)를 선택할 수 있다. 일 실시예에서, 언어 모델 학습 애플리케이션(2330)은 기존의 도메인을 선택 또는 결정할 뿐 아니라, 기존의 도메인을 확장하거나, 또는 신규 도메인을 생성하는데 사용되는 예시적인 데이터(예컨대, 예시 텍스트)를 포함할 수도 있다. 일 실시예에서, 프로세서(2200)는 언어 모델 학습 애플리케이션(2330)을 실행함으로서, 입력 텍스트에 관하여 식별 또는 결정된 도메인과 관련된 하나 이상의 애플리케이션 프로그램 인터페이스를 선택하는 사용자 입력을 수신할 수도 있다. The language model learning application 2330 is identified by the server 1000 or based on a domain determined based on user input, for example, based on movies, photos, sports, weather, aviation, etc., one or more application program interfaces (Applications). Program Interface, API). In one embodiment, the language model learning application 2330 may include example data (eg, example text) used to select or determine an existing domain, as well as extend an existing domain, or create a new domain. may be In one embodiment, the processor 2200 may receive user input selecting one or more application program interfaces associated with the identified or determined domain with respect to the input text by executing the language model learning application 2330 .

프로세서(2200)는 언어 모델 학습 애플리케이션(2330)을 실행함으로써, 입력 텍스트 내에서 대체 대상 텍스트를 선택하는 사용자 입력을 수신하도록 구성되는 그래픽 사용자 인터페이스를 디스플레이부(2510) 상에 디스플레이할 수 있다. 이 경우, 사용자 입력부(2410)는 입력 텍스트 내의 특정 단어 또는 구를 선택하는 마우스 입력, 터치 입력, 또는 드래그(drag) 입력 중 어느 하나의 사용자 입력을 수신할 수 있다. 프로세서(2200)는 사용자 입력부(2410)를 통해 수신된 사용자 입력에 기초하여 선택된 단어 또는 구를 대체 대상 텍스트로 결정할 수 있다. The processor 2200 may display, on the display unit 2510 , a graphical user interface configured to receive a user input for selecting a replacement target text within the input text by executing the language model learning application 2330 . In this case, the user input unit 2410 may receive any one of a mouse input, a touch input, and a drag input for selecting a specific word or phrase in the input text. The processor 2200 may determine the selected word or phrase as the replacement target text based on the user input received through the user input unit 2410 .

일 실시예에서, 프로세서(2200)는 그래픽 사용자 인터페이스를 통해, 입력 텍스트로부터 검출된 도메인 및 인텐트와 관련되는 슬롯에 해당되는 단어 또는 구를 선택하는 사용자 입력을 수신할 수 있다. 프로세서(2200)는 언어 모델 학습 애플리케이션(2330)을 실행함으로써, 슬롯으로 결정된 단어 또는 구에 대응되는 슬롯 태깅의 적어도 하나의 예시 슬롯을 디스플레이하고, 적어도 하나의 예시 슬롯 중 어느 하나를 선택하는 사용자 입력에 기초하여 슬롯 태깅을 수행할 수 있다. 사용자 입력에 기초하여 입력 텍스트로부터 슬롯을 결정하고, 슬롯 태깅을 수행하는 실시예에 대해서는 도 19 a에서 상세하게 설명하기로 한다. In an embodiment, the processor 2200 may receive a user input for selecting a word or phrase corresponding to a slot related to a domain and an intent detected from the input text through the graphical user interface. The processor 2200 executes the language model learning application 2330 to display at least one example slot of slot tagging corresponding to the word or phrase determined as the slot, and a user input for selecting any one of the at least one example slot Slot tagging may be performed based on . An embodiment of determining a slot from an input text based on a user input and performing slot tagging will be described in detail with reference to FIG. 19A .

일 실시예에서, 프로세서(2200)는 대체 대상 텍스트가 선택되는 경우, 대체 대상 텍스트를 대체하는 대체 텍스트를 생성하기 위하여 고려되는 컨텍스트(context) 정보 중 적어도 하나를 선택하는 사용자 입력을 수신하는 그래픽 사용자 인터페이스를 디스플레이하도록 디스플레이부(2510)를 제어할 수 있다. 컨텍스트 정보는 예를 들어, 대체 텍스트를 이용하는 애플리케이션 사용자의 나이, 성별, 지역, 사용 언어, 및 사투리 중 적어도 하나를 포함할 수 있다. 일 실시예에서, 프로세서(2200)는 컨텍스트 정보를 고려한 애플리케이션의 특성을 선택하는 사용자 입력을 수신할 수도 있다. 사용자 입력에 기초하여 대체 텍스트 생성에 사용되는 컨텍스트 정보를 선택하는 실시예에 대해서는, 도 19 b에서 상세하게 설명하기로 한다. In one embodiment, the processor 2200 is a graphical user receiving a user input selecting at least one of context information considered to generate an alt text to be substituted for the replacement target text when the replacement target text is selected. The display unit 2510 may be controlled to display the interface. The context information may include, for example, at least one of the age, gender, region, language, and dialect of the application user using the alternative text. In an embodiment, the processor 2200 may receive a user input for selecting a characteristic of an application in consideration of context information. An embodiment of selecting context information used to generate alternative text based on a user input will be described in detail with reference to FIG. 19B .

프로세서(2200)는 통신 인터페이스(2100)를 제어함으로써, 입력 텍스트 중 대체 대상 텍스트에 대하여 사용자가 발화할 것으로 예상되는 대체 텍스트로 대체함으로써 생성된 적어도 하나의 학습 후보 텍스트를 서버(1000)로부터 수신할 수 있다. 적어도 하나의 학습 후보 텍스트는 서버(1000)의 프로세서(1200)에 의해 생성되는 것으로서, 도 2에서 설명한 적어도 하나의 학습 후보 텍스트와 동일한바, 중복되는 설명은 생략한다. The processor 2200 controls the communication interface 2100 to receive, from the server 1000, at least one learning candidate text generated by replacing the replacement target text among the input text with the replacement text expected to be uttered by the user. can The at least one learning candidate text is generated by the processor 1200 of the server 1000 and is the same as the at least one learning candidate text described with reference to FIG. 2 , and thus a redundant description will be omitted.

도 2에서는, 입력 텍스트 중에서 대체 대상 텍스트를 식별하고, 식별된 대체 대상 텍스트를 대체 텍스트로 대체함으로써, 적어도 하나의 학습 후보 텍스트를 생성하는 동작 주체가 서버(1000)인 것으로 설명되었지만, 이에 한정되는 것은 아니다. 일 실시예에서, 클라이언트 디바이스(2000)는 입력 텍스트로부터 대체 대상 텍스트를 식별하고, 식별된 대체 대상 텍스트를 대체 텍스트로 대체함으로써, 적어도 하나의 학습 후보 텍스트를 생성할 수 있다. In FIG. 2 , it has been described that the server 1000 is the operating subject for generating at least one learning candidate text by identifying the replacement target text among the input texts and replacing the identified replacement target text with the replacement text, but is limited thereto. it is not In an embodiment, the client device 2000 may generate at least one learning candidate text by identifying the replacement target text from the input text and replacing the identified replacement target text with the replacement text.

프로세서(2200)는 언어 모델 학습 애플리케이션(2330)을 실행함으로써, 입력 텍스트 및 적어도 하나의 학습 후보 텍스트 중 적어도 하나를 선택하는 사용자 입력을 수신하기 위한 그래픽 사용자 인터페이스를 디스플레이부(2510) 상에 디스플레이할 수 있다. 이 경우, 사용자 입력부(2410)은 디스플레이부(2510) 상에 표시된 그래픽 사용자 인터페이스를 통해 입력 텍스트 및 적어도 하나의 학습 후보 텍스트 중 적어도 하나를 선택하는 사용자 입력을 수신할 수 있다. 일 실시예에서, 프로세서(2200)는 언어 모델 학습 애플리케이션(2330)을 실행함으로써, 적어도 하나의 학습 후보 텍스트에 포함되는 단어들 중 서버(1000)에 의해 대체된 대체 텍스트에 해당되는 단어를 다른 단어들과 구별되도록 디스플레이할 수 있다. The processor 2200 executes the language model learning application 2330 to display a graphical user interface on the display unit 2510 for receiving a user input for selecting at least one of the input text and the at least one learning candidate text. can In this case, the user input unit 2410 may receive a user input for selecting at least one of the input text and the at least one learning candidate text through the graphic user interface displayed on the display unit 2510 . In one embodiment, the processor 2200 executes the language model learning application 2330, so that the word corresponding to the alternative text replaced by the server 1000 among the words included in the at least one learning candidate text is another word. They can be displayed to distinguish them from others.

프로세서(2200)는 수신된 사용자 입력에 기초하여 적어도 하나의 텍스트를 선택하고, 통신 인터페이스(2100)를 제어함으로써, 선택된 적어도 하나의 텍스트에 관한 식별 값을 서버(1000)에 전송할 수 있다. The processor 2200 may select at least one text based on the received user input and transmit an identification value related to the selected at least one text to the server 1000 by controlling the communication interface 2100 .

일 실시예에서, 프로세서(2200)는 언어 모델 학습 애플리케이션(2330)을 실행함으로써, 특정 지역 또는 특정 사람의 컨텍스트 정보를 반영한 개인화된 TTS 모델을 통해 출력된 적어도 하나의 학습 후보 텍스트를 디스플레이부(2510) 상에 디스플레이할 수 있다. 개인화된 TTS 모델을 통해 출력된 적어도 하나의 학습 후보 텍스트는 서버(1000)로부터 수신할 수 있다. 일 실시예에서, 프로세서(2200)는 서버(1000)로부터 개인화된 TTS 모델을 통해 출력된 적어도 하나의 학습 후보 텍스트의 컨텍스트 정보를 반영하여, 음향 출력부(2520)를 통해 오디오 신호로 출력할 수 있다. 개인화된 TTS 모델을 통해 출력된 적어도 하나의 학습 후보 텍스트를 디스플레이하거나, 오디오 신호로 출력하는 구체적인 실시예는 도 21에서 상세하게 설명하기로 한다. In an embodiment, the processor 2200 executes the language model learning application 2330 to display at least one learning candidate text output through a personalized TTS model reflecting context information of a specific region or a specific person on the display unit 2510 ) can be displayed on the At least one learning candidate text output through the personalized TTS model may be received from the server 1000 . In an embodiment, the processor 2200 may reflect the context information of at least one learning candidate text output from the server 1000 through the personalized TTS model, and output it as an audio signal through the sound output unit 2520 . have. A specific embodiment of displaying at least one learning candidate text output through the personalized TTS model or outputting it as an audio signal will be described in detail with reference to FIG. 21 .

입력부(2400)는 사용자 입력부(2410) 및 마이크로폰(2420)을 포함할 수 있다. 사용자 입력부(2410)는 예를 들어, 마우스, 키보드, 키 패드(key pad), 돔 스위치 (dome switch), 터치 패드(접촉식 정전 용량 방식, 압력식 저항막 방식, 적외선 감지 방식, 표면 초음파 전도 방식, 적분식 장력 측정 방식, 피에조 효과 방식 등), 조그 휠, 조그 스위치 등이 있을 수 있으나 이에 한정되는 것은 아니다. 사용자 입력부(2410)가 터치 패드인 경우, 디스플레이부(2510)와 통합되어 터치스크린으로 구성될 수 있다. 이 경우, 사용자 입력부(2410)는 터치 입력, 집기(pinch) 입력, 드래그(drag) 입력, 스와이프(swipe) 입력, 및 스크롤(scroll) 입력 중 적어도 하나를 터치스크린을 통해 수신할 수 있다. 일 실시예에서, 사용자 입력부(2410)는 사용자의 제스처, 예를 들어, 스트레치 제스처, 집기(pinch) 제스처, 선택 및 유지 제스처, 가이드 제스처, 스와이핑 액션, 드래깅 액션 등을 수신할 수도 있다. The input unit 2400 may include a user input unit 2410 and a microphone 2420 . The user input unit 2410 may include, for example, a mouse, a keyboard, a key pad, a dome switch, and a touch pad (contact capacitive method, pressure resistance film method, infrared sensing method, surface ultrasonic conduction). method, integral tension measurement method, piezo effect method, etc.), a jog wheel, a jog switch, etc., but is not limited thereto. When the user input unit 2410 is a touch pad, it may be integrated with the display unit 2510 to form a touch screen. In this case, the user input unit 2410 may receive at least one of a touch input, a pinch input, a drag input, a swipe input, and a scroll input through the touch screen. In an embodiment, the user input unit 2410 may receive a user's gesture, for example, a stretch gesture, a pinch gesture, a selection and hold gesture, a guide gesture, a swiping action, a dragging action, and the like.

마이크로폰(2420)은 사용자의 발화를 입력받아, 전기적인 음성 신호로 처리할 수 있다. 프로세서(2200)는 마이크로폰(2420)을 통해 사용자로부터 음성 입력을 수신하고, 수신된 음성 입력으로부터 음성 신호를 획득할 수 있다. 일 실시예에서, 프로세서(2200)는, 마이크로폰(2420)를 통해 수신된 소리를 음향 신호로 변환하고, 음향 신호로부터 노이즈(예를 들어, 비음성 성분)를 제거하여 음성 신호를 획득할 수 있다.The microphone 2420 may receive the user's utterance and process it as an electrical voice signal. The processor 2200 may receive a voice input from the user through the microphone 2420 and obtain a voice signal from the received voice input. In an embodiment, the processor 2200 converts a sound received through the microphone 2420 into an acoustic signal, and removes noise (eg, non-voice component) from the acoustic signal to obtain a voice signal. .

출력부(2500)는 디스플레이부(2510) 및 음향 출력부(2520)를 포함할 수 있다. The output unit 2500 may include a display unit 2510 and a sound output unit 2520 .

디스플레이부(2510)는 프로세서(2200)의 제어에 의해, 클라이언트 디바이스(2000)에서 처리되는 정보를 디스플레이할 수 있다. 예를 들어, 디스플레이부(2510)는 언어 모델 학습 애플리케이션(2330)에 의해 출력되는 언어 모델 학습을 위한 그래픽 사용자 인터페이스(GUI)를 디스플레이할 수 있다. 일 실시예에서, 디스플레이부(2510)는 터치 패드와 레이어(layer) 구조를 이루어 터치 스크린을 구성할 수 있다. 이 경우, 디스플레이부(2510)는 출력 장치 이외에 입력 장치로도 사용될 수 있다. The display unit 2510 may display information processed by the client device 2000 under the control of the processor 2200 . For example, the display unit 2510 may display a graphical user interface (GUI) for learning a language model output by the language model learning application 2330 . In an embodiment, the display unit 2510 may form a layer structure with a touch pad to configure a touch screen. In this case, the display unit 2510 may be used as an input device in addition to an output device.

디스플레이부(2510)는 액정 디스플레이(liquid crystal display), 박막 트랜지스터 액정 디스플레이(thin film transistor-liquid crystal display), 유기 발광 다이오드(organic light-emitting diode), 플렉시블 디스플레이(flexible display), 3차원 디스플레이(3D display), 전기영동 디스플레이(electrophoretic display) 중에서 적어도 하나로 구성될 수 있다. The display unit 2510 includes a liquid crystal display, a thin film transistor-liquid crystal display, an organic light-emitting diode, a flexible display, and a three-dimensional display ( 3D display) and electrophoretic display (electrophoretic display) may be configured as at least one.

음향 출력부(2520)는 오디오 데이터를 출력할 수 있다. 음향 출력부(2520)는 예를 들어, 스피커(speaker), 버저(Buzzer) 등을 포함할 수 있다.The sound output unit 2520 may output audio data. The sound output unit 2520 may include, for example, a speaker, a buzzer, or the like.

도 1 내지 도 3을 통해 개시된 실시예에서, 서버(1000)는 ASR 모델(1310, 도 2 참조)의 출력 텍스트가 기학습된 복수의 자연어 이해 모델(1320a 내지 1320c)에 포함된 텍스트와 일치하지 않거나, 또는 사용자가 발화할 때 부정확하게 알고 있어 잘못 발화하거나, 또는 정확하게 발음하기 어려운 대체 대상 텍스트를, 발음적으로 유사하고, 발음할 것으로 예측되는 대체 텍스트로 대체하고, 대체의 결과로서 입력 텍스트의 대체 후보가 될 수 있는 적어도 하나의 학습 후보 텍스트를 생성하여 자연어 이해 모델을 학습함으로써, 입력 텍스트만을 가지고 자연어 이해 모델을 학습하는 경우 보다 정확도 높은 자연어 이해 모델을 제공할 수 있다. In the embodiment disclosed through FIGS. 1 to 3 , the server 1000 determines that the output text of the ASR model 1310 (see FIG. 2 ) does not match the text included in the plurality of pre-trained natural language understanding models 1320a to 1320c. Substitute text that is uttered incorrectly or is difficult to pronounce correctly because it is not correctly known, or that the user knows incorrectly when uttering, with an alternative text that is phonetically similar and predicted to be pronounced, and, as a result of the substitution, By learning the natural language understanding model by generating at least one learning candidate text that can be a replacement candidate, it is possible to provide a more accurate natural language understanding model when the natural language understanding model is trained using only the input text.

본 개시의 클라이언트 디바이스(2000)는 사용자, 즉 언어 모델을 학습하여 신규 자연어 이해 모델을 생성하거나, 또는 기존 자연어 이해 모델을 갱신하려는 개발자로 하여금, 입력 텍스트에 관한 도메인, 인텐트, 및 슬롯 태깅을 사용자 입력을 통해 수행하도록 하는 개발자용 툴(tool)을 제공함으로써, 개발자가 원하는 애플리케이션에 특화된 언어 모델을 개발하도록 할 수 있다. 또한, 본 개시의 클라이언트 디바이스(2000)는 서버(1000)에 의해 생성된 적어도 하나의 학습 후보 텍스트 중 적어도 하나를 선택하게 하는 개발자용 툴을 제공함으로써, 개발자가 원하는 텍스트만을 가지고 언어 모델 학습을 수행할 수 있도록 하여 언어 모델 학습의 정확도를 향상시킬 수 있다. The client device 2000 of the present disclosure allows a user, that is, a developer who wants to learn a language model to generate a new natural language understanding model or update an existing natural language understanding model, domain, intent, and slot tagging for input text. By providing a tool for developers to perform through user input, it is possible to develop a language model specialized for a desired application by the developer. In addition, the client device 2000 of the present disclosure provides a developer tool for selecting at least one of the at least one learning candidate text generated by the server 1000 , thereby performing language model learning with only the text desired by the developer. This can improve the accuracy of language model learning.

도 4는 본 개시의 서버(1000)가 자연어 이해 모델을 학습하는 실시예를 도시한 흐름도이다.4 is a flowchart illustrating an embodiment in which the server 1000 of the present disclosure learns a natural language understanding model.

단계 S410에서, 서버(1000)는 클라이언트 디바이스(2000)로부터 사용자가 입력한 입력 텍스트를 수신한다. 클라이언트 디바이스(2000)는 사용자 입력부(2410, 도 3 참조)를 통한 타이핑(typing) 입력을 통해 입력 텍스트를 입력 받고, 입력 텍스트를 서버(1000)에 전송할 수 있다. 그러나, 이에 한정되는 것은 아니다. 일 실시예에서, 클라이언트 디바이스(2000)는 마이크로폰(2420, 도 3 참조)을 통해 사용자로부터 음성 입력을 입력받고, 음성 입력을 포함하는 음성 신호를 서버(1000)에 전송할 수 있다. 서버(1000)의 프로세서(1200, 도 2 참조)는 ASR 모델(1310, 도 2 참조)을 이용하여, 수신된 음성 신호에 ASR(Auto Speech Recognition)을 수행함으로써, 음성 신호를 텍스트로 변환하고, 이로부터 입력 텍스트를 수신할 수 있다. In step S410 , the server 1000 receives the input text input by the user from the client device 2000 . The client device 2000 may receive input text through a typing input through the user input unit 2410 (refer to FIG. 3 ) and transmit the input text to the server 1000 . However, the present invention is not limited thereto. In an embodiment, the client device 2000 may receive a voice input from the user through the microphone 2420 (refer to FIG. 3 ) and transmit a voice signal including the voice input to the server 1000 . The processor (1200, see Fig. 2) of the server 1000 uses the ASR model (1310, see Fig. 2) to perform ASR (Auto Speech Recognition) on the received voice signal, thereby converting the voice signal into text, It can receive input text from it.

단계 S420에서, 서버(1000)는 입력 텍스트에 포함되는 적어도 하나의 단어들 중 대체가 필요한 대체 대상 텍스트를 식별할 수 있다. '대체 대상 텍스트'는 학습을 통해 갱신되거나, 신규로 생성되는 언어 모델을 포함하는 애플리케이션을 사용하는 사용자가, 음성 명령 등을 위하여 발화 시 부정확하게 알고 있어 잘못 발화하거나, 또는 정확하게 발음하기 어려운 단어(word) 또는 구(phrase)를 의미한다. 예를 들어, 대체 대상 텍스트는 개체명(Named Entity), 키워드, 장소, 지명, 영화 제목, 또는 게임 용어 등을 포함할 수 있다. 일 실시예에서, 대체 대상 텍스트는 기학습된 자연어 이해 모델을 이용하여 입력 텍스트를 해석함으로써, 식별된 슬롯(slot)일 수 있다. 예를 들어, 입력 텍스트가 "페퍼로니 피자 3판 언주로 30길로 배달 주문해 줘" 인 경우, 대체 대상 텍스트는 개체명 또는 슬롯으로 식별되는 '페퍼로니' 또는 '언주로 30길'일 수 있다. In operation S420 , the server 1000 may identify a replacement target text requiring replacement among at least one word included in the input text. 'Substitute text' is a word that a user who uses an application including a language model that is updated through learning or that is newly created for a voice command is uttered incorrectly or is difficult to pronounce correctly ( word) or phrase. For example, the replacement target text may include a named entity, a keyword, a place, a place name, a movie title, or a game term. In an embodiment, the replacement target text may be a slot identified by interpreting the input text using a pre-learned natural language understanding model. For example, if the input text is "Order for delivery of 3 pepperoni pizzas by Unjuro 30 Gil", the replacement target text may be 'Pepperoni' or 'Eonjuro 30 Gil' identified by an entity name or slot.

일 실시예에서, 대체 대상 텍스트는 사용자 입력에 기초하여 결정될 수 있다. 일 실시예에서, 클라이언트 디바이스(2000)는 입력 텍스트에 포함되는 단어 또는 구를 선택하는 사용자 입력을 수신하고, 수신된 사용자 입력에 기초하여 선택된 단어 또는 구의 식별 정보를 서버(1000)에 전송할 수 있다. 서버(1000)의 프로세서(1200)는 클라이언트 디바이스(2000)로부터 수신한 식별 정보에 기초하여, 사용자 입력에 의해 선택된 단어 또는 구를 식별하고, 식별된 단어 또는 구를 대체 대상 텍스트로 결정할 수 있다. In one embodiment, the replacement target text may be determined based on a user input. In an embodiment, the client device 2000 may receive a user input for selecting a word or phrase included in the input text, and transmit identification information of the selected word or phrase to the server 1000 based on the received user input. . The processor 1200 of the server 1000 may identify a word or phrase selected by a user input based on the identification information received from the client device 2000 , and determine the identified word or phrase as the replacement target text.

일 실시예에서, 서버(1000)는 사전 DB(1370, 도 2 참조)에 기 저장된 단어에 관한 검색 결과에 기초하여, 대체 대상 텍스트를 식별할 수 있다. 일 실시예에서, 서버(1000)의 프로세서(1200)는 입력 텍스트를 단어, 형태소, 및 구문 단위로 파싱하고, 사전 DB(1370)에서 파싱된 적어도 하나의 단어를 검색하고, 사전 DB(1370)의 검색 결과에 기초하여, 검색되지 않거나, 또는 사용 빈도가 기설정된 제1 임계치 보다 낮은 단어를 대체 대상 텍스트로 결정할 수 있다. In an embodiment, the server 1000 may identify the replacement target text based on a search result for a word pre-stored in the dictionary DB 1370 (refer to FIG. 2 ). In one embodiment, the processor 1200 of the server 1000 parses the input text in units of words, morphemes, and phrases, searches for at least one word parsed in the dictionary DB 1370 , and the dictionary DB 1370 . Based on the search result of , a word that is not searched or has a frequency of use lower than a preset first threshold may be determined as the replacement target text.

일 실시예에서, 서버(1000)의 프로세서(1200)는 기 학습된 자연어 이해 모델을 이용하여 입력 텍스트를 해석함으로써, 입력 텍스트로부터 슬롯을 식별하고, 식별된 슬롯을 대체 대상 텍스트로 결정할 수 있다. In an embodiment, the processor 1200 of the server 1000 may identify a slot from the input text by interpreting the input text using a pre-learned natural language understanding model, and may determine the identified slot as the replacement target text.

서버(1000)가 대체 대상 텍스트를 식별하는 구체적인 실시예에 대해서는 도 5 내지 도 7에서 상세하게 설명하기로 한다. A specific embodiment in which the server 1000 identifies the text to be replaced will be described in detail with reference to FIGS. 5 to 7 .

단계 S430에서, 서버(1000)는 대체 대상 텍스트에 대하여 사용자가 발화할 것으로 예상되고, 발음적으로 유사도가 높은 대체 대상 텍스트를 생성한다. '대체 텍스트'는 특정 개체명이나, 키워드, 또는 기 학습된 자연어 이해 모델에 의해 슬롯으로 식별되는 단어 또는 구에 대하여, 발음적으로 유사하거나, 사용자가 발음할 것으로 예측되는 텍스트의 예시이다. 예를 들어, 대체 대상 텍스트가 '페퍼로니 피자'인 경우, 대체 텍스트는 사용자가 대체 대상 텍스트를 발음할 때, 부정확하거나 또는 유사한 발음으로 발화할 수 있는 '페파로니', '파페로니', '페포로니' 등일 수 있다. In step S430 , the server 1000 generates the replacement target text that is expected to be uttered by the user with respect to the replacement target text and has a high phonological similarity. The 'alternative text' is an example of a text that is phonetically similar to a specific entity name, keyword, or a word or phrase identified as a slot by a pre-learned natural language understanding model, or is predicted to be pronounced by the user. For example, if the alt text is 'Pepperoni Pizza', the alt text is 'Pepperoni', 'Paperoni', 'Peperoni', which may be uttered with incorrect or similar pronunciation when the user pronounces the alt text. 'poroni' and the like.

서버(1000)는 단계 S420에서 식별된 대체 대상 텍스트를 대체(replacing)할 수 있는 대체 대상 텍스트를 생성한다. 일 실시예에서, 프로세서(1200)는 대체 텍스트 생성 모듈(1350, 도 2 참조)을 이용하여, 대체 대상 텍스트를 대체할 수 있는 대체 텍스트를 생성할 수 있다. The server 1000 generates the replacement target text that can replace the replacement target text identified in step S420 . In an embodiment, the processor 1200 may generate an alternative text capable of replacing the replacement target text by using the alternative text generation module 1350 (refer to FIG. 2 ).

대체 대상 텍스트가 기학습된 자연어 이해 모델을 이용한 입력 텍스트 해석 결과 자동으로 식별된 슬롯에 해당되는 경우, 대체 텍스트는 대체 대상 텍스트의 슬롯과 동일한 슬롯 정보를 획득할 수 있다. 다시 말하면, 대체 텍스트는 대체 대상 텍스트를 대체할 뿐 아니라, 대체 대상 텍스트의 슬롯 정보도 승계하여 획득할 수 있다. 예를 들어, 입력 텍스트로부터 식별된 슬롯이 피자의 토핑을 의미하는 '페퍼로니'인 경우, '페퍼로니'를 대체하는 대체 텍스트인 '페파로니', '파페로니', '페포로니'도 대체 대상 텍스트인 '페퍼로니'의 슬롯 정보와 동일한 피자의 토핑을 슬롯 정보로서 획득할 수 있다. When the replacement target text corresponds to a slot automatically identified as a result of analyzing the input text using the pre-learned natural language understanding model, the replacement text may acquire the same slot information as the slot of the replacement target text. In other words, the replacement text may not only replace the replacement target text, but also inherit and acquire slot information of the replacement target text. For example, if the slot identified from the input text is 'Pepperoni', which means the topping of pizza, the replacement text for 'Pepperoni', 'Pepperoni', 'Pepperoni', and 'Pephoroni' will also be replaced. The same topping of pizza as slot information of the text 'pepperoni' may be obtained as slot information.

일 실시예에서, 서버(1000)는 대체 대상 텍스트에 관한 발음 열(phoneme sequence)을 추출하고, 발음 열의 연관성 정보(phonetic relevance)에 기초하여, 사전 DB(1370, 도 2 참조)에 기 저장된 단어 중 추출된 발음 열과 유사한 텍스트를 검색하고, 검색 결과에 기초하여, 추출된 발음 열과의 유사도(similarity)가 높은 적어도 하나의 텍스트를 이용하여 대체 텍스트를 생성할 수 있다. In an embodiment, the server 1000 extracts a phoneme sequence related to the replacement target text, and based on phonetic relevance of the pronunciation sequence, a word pre-stored in the dictionary DB 1370 (refer to FIG. 2 ). Text similar to the extracted pronunciation string may be searched for, and an alternative text may be generated using at least one text having a high similarity to the extracted pronunciation string based on the search result.

다른 실시예에서, 서버(1000)는 대체 대상 텍스트를 임베딩 벡터(embedding vector)로 변환하고, 신경망 모델(Neural Network model)을 이용하여, 변환된 임베딩 벡터와 유사한 벡터값을 갖는 텍스트를 생성하고, 생성된 텍스트를 이용하여 대체 텍스트를 생성할 수 있다.In another embodiment, the server 1000 converts the replacement target text into an embedding vector, and uses a neural network model to generate text having a vector value similar to the transformed embedding vector, Alternative text can be created using the generated text.

또 다른 실시예에서, 서버(1000)는 TTS 모델(1330, 도 2 참조)을 이용하여 대체 대상 텍스트를 음향 신호(wave signal)로 변환하고, 변환된 음향 신호를 출력하고, ASR 모델(1310, 도 2 참조)을 이용하여 출력된 음향 신호를 출력 텍스트로 변환할 수 있다. 서버(1000)는 출력 텍스트를 이용하여 대체 텍스트를 생성할 수 있다. In another embodiment, the server 1000 converts the replacement target text into a sound signal (wave signal) using the TTS model 1330 (see FIG. 2), and outputs the converted sound signal, and the ASR model 1310, 2) may be used to convert the output sound signal into output text. The server 1000 may generate an alternative text by using the output text.

서버(1000)가 대체 텍스트를 생성하는 구체적인 실시예에 대해서는 도 8 내지 도 13에서 상세하게 설명하기로 한다.A specific embodiment in which the server 1000 generates the alternative text will be described in detail with reference to FIGS. 8 to 13 .

단계 S440에서, 서버(1000)는 대체 대상 텍스트를 대체 텍스트로 대체(replacing)함으로써, 학습에 사용될 적어도 하나의 학습 후보 텍스트를 생성한다. '학습 후보 텍스트'는 언어 모델 학습에 입력 데이터로써 이용되는 학습 데이터(training data)의 후보(candidate)가 되는 텍스트를 의미한다. 일 실시예에서, 프로세서(1200)는 대체 텍스트 생성 모듈(1350)을 이용하여, 입력 텍스트 내의 대체 대상 텍스트를 대체 텍스트로 대체하거나, 입력 텍스트에서 대체 대상 텍스트를 제외한 나머지 단어 또는 구를 대체 텍스트와 결합(concatenating)하는 페러프레이징(paraphrasing)을 수행함으로써, 적어도 하나의 학습 후보 텍스트를 생성할 수 있다. In step S440 , the server 1000 generates at least one learning candidate text to be used for learning by replacing the replacement target text with the replacement text. The 'training candidate text' means text that becomes a candidate of training data used as input data for learning a language model. In an embodiment, the processor 1200 uses the replacement text generation module 1350 to replace the replacement target text in the input text with the replacement text, or to replace the remaining words or phrases in the input text except for the replacement target text with the replacement text. By performing concatenating paraphrasing, at least one learning candidate text may be generated.

예를 들어, 입력 텍스트가 "페퍼로니 피자 3판 언주로 30길로 배달 주문해 줘" 인 경우, 서버(1000)는 대체 대상 텍스트로 식별된 '페퍼로니'를 '페파로니', '파페로니', '페포로니' 등으로 각각 대체함으로써, "페파로니 피자 3판 언주로 30길로 배달 주문해 줘", "파페로니 피자 3판 언주로 30길로 배달 주문해 줘", 또는 "페포로니 피자 3판 언주로 30길로 배달 주문해 줘"를 포함하는 3개의 학습 후보 텍스트를 생성할 수 있다. For example, if the input text is "Order for delivery at 30 gil with 3 slices of pepperoni pizza", the server 1000 returns 'pepperoni' identified as the replacement target text to 'pepperoni', 'paperoni', ' Pepperoni', etc., "Order for delivery at 30 gil with 3 slices of pepperoni pizza", "Order for delivery with 3 slices of pepperoni pizza at 30 gil", or "3 slices of pepperoni pizza You can create 3 learning candidate texts that include "Order delivery to 30 Gil as Eonjuro."

단계 S450에서, 서버(1000)는 입력 텍스트 및 적어도 하나의 학습 후보 텍스트 중 적어도 하나의 텍스트를 학습 데이터로 이용하여, 자연어 이해 모델을 학습(training)한다. In operation S450, the server 1000 trains a natural language understanding model by using at least one text among the input text and the at least one learning candidate text as training data.

단계 S440과 단계 S450 사이에서, 서버(1000)는 기 학습된 자연어 이해 모델을 이용하여 입력 텍스트 및 적어도 하나의 학습 후보 텍스트를 해석함으로써, 입력 텍스트 및 적어도 하나의 학습 후보 텍스트로부터 도메인(domain), 인텐트(intent), 및 슬롯(slot)을 식별할 수 있다. 일 실시예에서, 서버(1000)는 식별된 도메인, 인텐트, 및 슬롯에 관한 정보를 자연어 이해 모델에 입력 데이터로 이용하여 학습을 수행할 수 있다. Between steps S440 and S450, the server 1000 interprets the input text and at least one learning candidate text using the pre-learned natural language understanding model, thereby generating a domain from the input text and the at least one learning candidate text. An intent and a slot may be identified. In an embodiment, the server 1000 may perform learning by using the identified domain, intent, and slot information as input data to the natural language understanding model.

일 실시예에서, 서버(1000)는 입력 텍스트 및 적어도 하나의 학습 후보 텍스트로부터 식별된 도메인 및 인텐트에 기초하여 메모리(1300, 도 2 참조)에 기저장된 복수의 자연어 이해 모델(1320a 내지 1320c) 중 어느 하나의 자연어 이해 모델을 선택하고, 선택된 자연어 이해 모델을 이용하는 학습을 수행할 수 있다. In one embodiment, the server 1000 includes a plurality of natural language understanding models 1320a to 1320c pre-stored in the memory 1300 (see FIG. 2 ) based on the domains and intents identified from the input text and at least one learning candidate text. Any one natural language understanding model may be selected, and learning may be performed using the selected natural language understanding model.

그러나, 이에 한정되는 것은 아니고, 다른 실시예에서, 서버(1000)는 입력 텍스트 및 적어도 하나의 학습 후보 텍스트를 이용하는 학습을 통해 신규 자연어 이해 모델을 생성할 수 있다. However, the present invention is not limited thereto, and in another embodiment, the server 1000 may generate a new natural language understanding model through learning using the input text and at least one learning candidate text.

일 실시예에서, 서버(1000)는 적어도 하나의 학습 후보 텍스트를 모두 이용하지 않고, 일부만을 선택하여 자연어 이해 모델에 입력 데이터로 입력함으로써, 학습을 수행할 수 있다. 서버(1000)는 단계 S440에서 생성된 적어도 하나의 학습 후보 텍스트를 클라이언트 디바이스(2000)에 전송하고, 적어도 하나의 학습 후보 텍스트 중 적어도 하나를 선택하는 사용자 입력에 기초하여 선택된 텍스트에 관한 식별 값을 클라이언트 디바이스(2000)로부터 수신할 수 있다. 서버(1000)는 수신된 식별 값에 기초하여, 적어도 하나의 텍스트를 선택하고, 선택된 적어도 하나의 텍스트와 입력 텍스트를 자연어 이해 모델에 입력 데이터로 입력하여 학습할 수 있다. In an embodiment, the server 1000 may perform learning by not using all of the at least one learning candidate text, but selecting only a part and inputting it as input data to the natural language understanding model. The server 1000 transmits the at least one learning candidate text generated in step S440 to the client device 2000, and an identification value regarding the selected text based on a user input for selecting at least one of the at least one learning candidate text. It may be received from the client device 2000 . The server 1000 may select at least one text based on the received identification value, and input the selected at least one text and the input text as input data to the natural language understanding model to learn.

도 5는 본 개시의 서버(1000)가 사용자 입력에 기초하여 대체 대상 텍스트를 결정하는 실시예에 관한 흐름도이다. 도 5에 도시된 방법의 단계들은 도 4의 단계 S420을 구체화한 것으로서, 단계 S510은 도 4에 도시된 단계 S410이 수행된 이후에 수행된다. 도 5의 단계 S530이 수행된 이후에, 도 4에 도시된 단계 S430가 수행된다. 5 is a flowchart of an embodiment in which the server 1000 of the present disclosure determines an alternative target text based on a user input. The steps of the method shown in FIG. 5 embodied step S420 of FIG. 4 , and step S510 is performed after step S410 shown in FIG. 4 is performed. After step S530 of FIG. 5 is performed, step S430 shown in FIG. 4 is performed.

단계 S510에서, 서버(1000)는 클라이언트 디바이스(2000)로부터, 입력 텍스트 중 적어도 하나의 단어를 선택하는 사용자 입력을 수신한다. 일 실시예에서, 클라이언트 디바이스(2000)는 입력 텍스트에 포함되는 단어 또는 구를 선택하는 사용자 입력을 수신할 수 있다. 예를 들어, 클라이언트 디바이스(2000)는 입력 텍스트 중 특정 단어 또는 구를 선택하는 마우스 클릭, 드래그 입력, 터치 앤 드래그(touch and drag) 입력을 수신할 수 있다. 클라이언트 디바이스(2000)는 수신된 사용자 입력에 기초하여 입력 텍스트 중 단어 또는 구를 선택할 수 있다. 사용자 입력에 의해 선택된 단어 또는 구는 예를 들어, 개체명, 키워드, 장소, 지명, 영화 제목, 또는 게임 용어 등을 포함할 수 있으나, 이에 한정되지 않는다. In operation S510 , the server 1000 receives, from the client device 2000 , a user input for selecting at least one word among input texts. In an embodiment, the client device 2000 may receive a user input for selecting a word or phrase included in the input text. For example, the client device 2000 may receive a mouse click, a drag input, and a touch and drag input for selecting a specific word or phrase from among input text. The client device 2000 may select a word or phrase from the input text based on the received user input. The word or phrase selected by the user input may include, for example, an entity name, a keyword, a place, a place name, a movie title, or a game term, but is not limited thereto.

일 실시예에서, 클라이언트 디바이스(2000)는 언어 모델을 이용하는 애플리케이션의 개발 목적에 기초하여 대체 대상 텍스트를 선택하는 사용자 입력을 수신할 수 있다. 예를 들어, 애플리케이션의 개발 목적이 특정 지역의 사투리를 고려하는 것이거나, 사용 언어(예컨대, 영어, 스페인어 등)를 고려하거나, 특정 연령대, 특정 성별의 사용자를 고려하는 경우, 클라이언트 디바이스는 입력 텍스트 중 사투리로 인하여 자연어 이해 모델에 포함되지 않은 단어로 잘못 발음되거나, 사용 언어에 따라 대체될 가능성이 높은 단어를 대체 대상 텍스트로 선택하는 사용자 입력을 수신할 수 있다. 일 실시예에서, 클라이언트 디바이스(2000)는 애플리케이션 사용자의 나이, 지역, 사용 언어, 성별 등을 고려한 애플리케이션 개발 목적을 선택하는 사용자 입력을 수신하고, 수신된 사용자 입력에 따라 대체 대상 텍스트를 자동으로 추천할 수도 있다. 클라이언트 디바이스(2000)가 대체 대상 텍스트를 선택하는 사용자 입력을 수신하는 구체적인 실시예는 도 18에서 상세하게 설명하기로 한다. In an embodiment, the client device 2000 may receive a user input for selecting an alternative target text based on a development purpose of an application using a language model. For example, when the purpose of developing the application is to consider the dialect of a specific region, considering the language used (eg, English, Spanish, etc.), or considering users of a specific age group and a specific gender, the client device It is possible to receive a user input for selecting a word that is incorrectly pronounced as a word not included in the natural language understanding model due to a heavy dialect or is highly likely to be replaced according to the language used as the replacement target text. In an embodiment, the client device 2000 receives a user input for selecting an application development purpose in consideration of an application user's age, region, language, gender, etc., and automatically recommends an alternative target text according to the received user input You may. A specific embodiment in which the client device 2000 receives a user input for selecting an alternative text will be described in detail with reference to FIG. 18 .

클라이언트 디바이스(2000)는 대체 대상 텍스트로서 선택된 단어 또는 구에 관한 식별 정보를 서버(1000)에 전송할 수 있다. 서버(1000)는 클라이언트 디바이스(2000)로부터 사용자 입력에 의해 선택된 단어 또는 구에 관한 식별 정보를 수신할 수 있다. The client device 2000 may transmit identification information regarding the selected word or phrase as the replacement target text to the server 1000 . The server 1000 may receive identification information about a word or phrase selected by a user input from the client device 2000 .

단계 S520에서, 서버(1000)는 사용자 입력에 기초하여 선택된 적어도 하나의 단어를 식별한다. 일 실시예에서, 서버(1000)는 클라이언트 디바이스(2000)로부터 수신한 식별 정보를 분석함으로써, 입력 텍스트 중 사용자 입력에 의해 선택된 단어 또는 구를 식별할 수 있다. In step S520, the server 1000 identifies at least one selected word based on the user input. In an embodiment, the server 1000 may identify the word or phrase selected by the user input from among the input text by analyzing the identification information received from the client device 2000 .

단계 S530에서, 서버(1000)는 식별된 단어를 대체 대상 텍스트로 결정한다. 일 실시예에서, 서버(1000)는 식별된 단어 뿐만 아니라, 구를 대체 대상 텍스트로 결정할 수 있다.In step S530, the server 1000 determines the identified word as the replacement target text. In an embodiment, the server 1000 may determine not only the identified word but also the phrase as the replacement target text.

도 6은 본 개시의 서버(1000)가 사전 DB(1370, 도 2 참조)의 검색 결과에 기초하여 대체 대상 텍스트를 식별하는 실시예에 관한 흐름도이다. 도 6에 도시된 방법의 단계들은 도 4의 단계 S420을 구체화한 것으로서, 단계 S610은 도 4에 도시된 단계 S410이 수행된 이후에 수행된다. 도 6의 단계 S630이 수행된 이후에, 도 4에 도시된 단계 S430가 수행된다. 6 is a flowchart of an embodiment in which the server 1000 of the present disclosure identifies a text to be replaced based on a search result of the dictionary DB 1370 (refer to FIG. 2 ). The steps of the method shown in FIG. 6 embodied step S420 of FIG. 4 , and step S610 is performed after step S410 shown in FIG. 4 is performed. After step S630 of FIG. 6 is performed, step S430 shown in FIG. 4 is performed.

단계 S610에서, 서버(1000)는 입력 텍스트를 파싱(parse)하여 적어도 하나의 단어를 추출한다. 일 실시예에서, 서버(1000)는 입력 텍스트를 단어, 형태소, 및 구문 단위로 파싱하고, 형태소, 단어, 또는 구의 언어적 특징(예를 들어, 문법적 요소)을 이용하여, 파싱된 텍스트로부터 적어도 하나의 단어를 추출할 수 있다. In step S610, the server 1000 extracts at least one word by parsing the input text. In one embodiment, the server 1000 parses the input text into word, morpheme, and syntactic units, and uses linguistic features (eg, grammatical elements) of the morpheme, word, or phrase to at least from the parsed text. One word can be extracted.

단계 S620에서, 서버(1000)는 파싱 결과 추출된 적어도 하나의 단어를, 발음열 정보 또는 임베딩 벡터에 관한 정보를 저장하고 있는 사전 DB(Dictionary DB)(1370, 도 2 참조)에서 검색(search)한다. 사전 DB(1370)는 단어들에 관한 발음 열 정보 및 단어들 각각에 포함되는 발음 열에 관한 임베딩 벡터를 저장하는 데이터베이스로 구성될 수 있다. 사전 DB(1370)는 예를 들어, 발음 열 사전일 수 있다. 일 실시예에서, 사전 DB(1370)는 단어들 각각에 관한 발음 열 정보 및 임베딩 벡터를 포함하는 룩 업 테이블(Look-Up table)을 포함할 수도 있다.In step S620, the server 1000 searches at least one word extracted as a result of the parsing in a dictionary DB 1370 (refer to FIG. 2) that stores pronunciation string information or embedding vector information. do. The dictionary DB 1370 may be configured as a database that stores pronunciation column information on words and embedding vectors on pronunciation columns included in each of the words. The dictionary DB 1370 may be, for example, a pronunciation column dictionary. In an embodiment, the dictionary DB 1370 may include a look-up table including pronunciation column information and embedding vectors for each word.

일 실시예에서, 서버(1000)는 파싱 결과 추출된 적어도 하나의 단어를 발음 열로 변환하거나, 또는 임베딩 벡터(embedding vector)로 변환할 수 있다. 일 실시예에서, 서버(1000)는 워드 임베딩(Word Embedding) 방법을 이용하여, 적어도 하나의 단어를 임베딩 벡터로 변환할 수 있다. 예를 들어, 서버(1000)는 Bag of Words, word2vec과 같은 임베딩 모델을 이용하여, 적어도 하나의 단어를 임베딩 벡터로 변환할 수 있다. In an embodiment, the server 1000 may convert at least one word extracted as a result of parsing into a pronunciation sequence or into an embedding vector. In an embodiment, the server 1000 may convert at least one word into an embedding vector using a word embedding method. For example, the server 1000 may convert at least one word into an embedding vector using an embedding model such as Bag of Words or word2vec.

서버(1000)는 적어도 하나의 단어로부터 변환된 발음 열 정보 또는 임베딩 벡터를 사전 DB(1370)에서 검색할 수 있다. The server 1000 may search the dictionary DB 1370 for pronunciation column information or embedding vectors converted from at least one word.

단계 S630에서, 서버(1000)는 사전 DB(1370)의 검색 결과에 기초하여, 검색되지 않거나, 또는 사용 빈도가 낮은 단어를 대체 대상 텍스트로 결정한다. 일 실시예에서, 서버(1000)는 입력 텍스트의 파싱 결과 추출된 적어도 하나의 단어를 사전 DB(1370)에서 검색하고, 검색 결과 사전 DB(1370)에 포함되지 않아 검색이 되지 않거나, 또는 사전 DB(1370)의 단어 사용 빈도가 기설정된 제1 임계치 보다 낮은 단어를 대체 대상 텍스트로 결정할 수 있다. In step S630 , the server 1000 determines, as the replacement target text, a word that is not searched or has a low frequency of use, based on the search result of the dictionary DB 1370 . In an embodiment, the server 1000 searches the dictionary DB 1370 for at least one word extracted as a result of parsing the input text, and the search is not performed because it is not included in the search result dictionary DB 1370, or the dictionary DB A word whose frequency of use of the word 1370 is lower than a preset first threshold may be determined as the replacement target text.

도 7은 본 개시의 서버(1000)가 자연어 이해 모델을 이용하여 자동으로 대체 대상 텍스트를 식별하는 실시예에 관한 흐름도이다. 도 7에 도시된 방법의 단계들 중 단계 S710 내지 S730은 도 4의 단계 S410과 S420 사이에 수행되는 단계들이다. 도 7의 단계 S740은 도 4에 도시된 단계 S420을 구체화한 실시예이다. 도 7의 단계 S740이 수행된 이후에, 도 4에 도시된 단계 S430이 수행된다. 7 is a flowchart of an embodiment in which the server 1000 of the present disclosure automatically identifies an alternative target text using a natural language understanding model. Among the steps of the method shown in FIG. 7 , steps S710 to S730 are steps performed between steps S410 and S420 of FIG. 4 . Step S740 of FIG. 7 is a concrete embodiment of step S420 illustrated in FIG. 4 . After step S740 of FIG. 7 is performed, step S430 shown in FIG. 4 is performed.

단계 S710에서, 서버(1000)는 자연어 이해 모델을 이용하여 입력 텍스트를 해석함으로써, 입력 텍스트가 분류되는 도메인을 검출한다. 일 실시예에서, 서버(1000)는 기 학습된 제1 자연어 이해 모델(1320a, 도 2 참조) 내지 제3 자연어 이해 모델(1320c, 도 2 참조) 중 어느 하나에 포함되는 도메인 식별 모델을 이용하여 입력 텍스트를 해석함으로써, 입력 텍스트가 속하거나, 또는 분류될 수 있는 카테고리 정보인 도메인을 검출할 수 있다. 일 실시예에서, 서버(1000)는 입력 텍스트로부터 하나 또는 복수의 도메인을 식별할 수 있다. 일 실시예에서, 서버(1000)는 도메인 식별 모델을 이용하여 입력 텍스트를 해석함으로써, 입력 텍스트와 도메인과의 관련도를 수치 값으로 산출할 수 있다. 일 실시예에서, 서버(1000)는 하나 또는 복수의 도메인 각각과 입력 텍스트 간의 관련도를 확률값으로 산출하고, 산출된 확률값 중 높은 확률값을 갖는 도메인을 입력 텍스트가 분류되는 도메인으로 결정할 수 있다.In step S710 , the server 1000 detects a domain into which the input text is classified by interpreting the input text using the natural language understanding model. In an embodiment, the server 1000 uses a domain identification model included in any one of the pre-trained first natural language understanding model 1320a (refer to FIG. 2) to the third natural language understanding model 1320c (refer to FIG. 2). By analyzing the input text, it is possible to detect a domain to which the input text belongs or is category information that can be classified. In one embodiment, the server 1000 may identify one or more domains from the input text. In an embodiment, the server 1000 may calculate the relation between the input text and the domain as a numerical value by interpreting the input text using the domain identification model. In an embodiment, the server 1000 may calculate a degree of relevance between each of the one or a plurality of domains and the input text as a probability value, and determine a domain having a high probability value among the calculated probability values as a domain into which the input text is classified.

단계 S720에서, 서버(1000)는 자연어 이해 모델을 이용하여 입력 텍스트를 해석함으로써, 입력 텍스트의 인텐트를 검출한다. 일 실시예에서, 서버(1000)는 기 학습된 자연어 이해 모델의 인텐트 식별 모델을 이용하여 입력 텍스트로부터 인텐트를 검출할 수 있다. 일 실시예에서, 서버(1000)는 제1 자연어 이해 모델(1320a) 내지 제3 자연어 이해 모델(1320c) 중 어느 하나에 포함되는 인텐트 액션 식별 모델을 이용하여 입력 텍스트로부터 인텐트 액션을 검출하고, 인텐트 객체 식별 모델을 이용하여 입력 텍스트로부터 인텐트 객체를 검출할 수 있다. In step S720 , the server 1000 detects the intent of the input text by interpreting the input text using the natural language understanding model. In an embodiment, the server 1000 may detect an intent from the input text using an intent identification model of a pre-learned natural language understanding model. In an embodiment, the server 1000 detects an intent action from the input text using an intent action identification model included in any one of the first natural language understanding model 1320a to the third natural language understanding model 1320c, and , an intent object can be detected from the input text using the intent object identification model.

인텐트 액션은 입력 텍스트가 수행하는 액션, 예를 들어, 검색, 포스팅, 플레이, 구매, 주문 등을 의미한다. 서버(1000)는 인텐트 액션 식별 모델을 이용하여 문법적 분석(syntactic analyze) 또는 의미적 분석(semantic analyze)을 수행함으로써, 입력 텍스트로부터 인텐트 액션을 검출할 수 있다. 일 실시예에서, 서버(1000)는 인텐트 액션 식별 모델을 이용하여 입력 텍스트를 형태소, 단어(word), 또는 구(phrase)의 단위로 파싱(parse)하고, 파싱된 형태소, 단어, 또는 구의 언어적 특징(예: 문법적 요소)을 이용하여 파싱된 텍스트로부터 추출된 단어 또는 구의 의미를 추론할 수 있다. 서버(1000)는, 추론된 단어 또는 구의 의미를 자연어 이해 모델에서 제공되는 기 정의된 인텐트들과 비교함으로써, 추론된 단어 또는 구의 의미에 대응되는 인텐트 액션을 결정할 수 있다. The intent action refers to an action performed by the input text, for example, search, posting, play, purchase, order, and the like. The server 1000 may detect an intent action from the input text by performing syntactic analysis or semantic analysis using the intent action identification model. In an embodiment, the server 1000 parses the input text into units of morphemes, words, or phrases using the intent action identification model, and uses the parsed morpheme, word, or phrase. The meaning of a word or phrase extracted from the parsed text can be inferred using linguistic features (eg, grammatical elements). The server 1000 may determine an intent action corresponding to the meaning of the inferred word or phrase by comparing the meaning of the inferred word or phrase with predefined intents provided from the natural language understanding model.

인텐트 객체는 식별된 인텐트 액션과 관련된 객체를 의미한다. 인텐트 객체는 검출된 인텐트 액션의 대상이 되는 객체로서, 예를 들어 영화, 사진, 스포츠, 날씨, 항공 등을 의미할 수 있다. 서버(1000)는 인텐트 객체 식별 모델을 이용하여 입력 텍스트를 해석함으로써, 입력 텍스트로부터 인텐트 액션과 관련된 인텐트 객체를 검출할 수 있다. The intent object means an object related to the identified intent action. The intent object is an object that is the target of the detected intent action, and may mean, for example, a movie, a photo, sports, weather, or aviation. The server 1000 may detect an intent object related to an intent action from the input text by interpreting the input text using the intent object identification model.

단계 S730에서, 서버(1000)는 자연어 이해 모델을 이용하여 입력 텍스트로부터 슬롯(slot)을 식별하고, 슬롯 태깅(slot tagging)을 수행한다. 일 실시예에서, 서버(1000)는 기 학습된 제1 자연어 이해 모델(1320a) 내지 제3 자연어 이해 모델(1320c) 중 어느 하나에 포함되는 슬롯 태깅 모델을 이용하여 입력 텍스트로부터 슬롯을 식별하고, 식별된 슬롯을 도메인, 인텐트 액션, 및 인텐트 객체와 연관시키는 슬롯 태깅을 수행할 수 있다. 슬롯은 입력 텍스트로부터 도메인, 인텐트 액션 및 인텐트 객체와 관련된 세부 정보들을 획득하거나, 세부 동작을 결정하기 위한 변수(variable) 정보를 의미한다. 예를 들어, 슬롯은 개체명(Named Entity), 키워드, 장소, 지명, 영화 제목, 또는 게임 용어 등을 포함할 수 있다. 일 실시예에서, 슬롯은 인텐트 액션 및 인텐트 객체와 관련된 정보이고, 하나의 인텐트에 대하여 복수 종류의 슬롯이 대응될 수 있다. In step S730, the server 1000 identifies a slot from the input text using the natural language understanding model, and performs slot tagging. In one embodiment, the server 1000 identifies a slot from the input text using a slot tagging model included in any one of the pre-trained first natural language understanding model 1320a to the third natural language understanding model 1320c, Slot tagging may be performed that associates the identified slots with domains, intent actions, and intent objects. The slot means variable information for obtaining detailed information related to a domain, an intent action, and an intent object from an input text or determining a detailed action. For example, the slot may include a named entity, a keyword, a place, a place name, a movie title, or a game term. According to an embodiment, a slot is information related to an intent action and an intent object, and a plurality of types of slots may correspond to one intent.

일 실시예에서, 서버(1000)는 입력 텍스트로부터 하나 또는 복수의 슬롯을 식별할 수 있다. In one embodiment, the server 1000 may identify one or more slots from the input text.

단계 S740에서, 서버(1000)는 식별된 슬롯에 해당되는 텍스트를 대체 대상 텍스트로 결정한다. 입력 텍스트로부터 식별된 슬롯이 복수 개인 경우, 서버(1000)는 어느 하나의 슬롯을 대체 대상 텍스트로 결정할 수 있다. In step S740, the server 1000 determines the text corresponding to the identified slot as the replacement target text. When there are a plurality of slots identified from the input text, the server 1000 may determine any one slot as the replacement target text.

일 실시예에서, 서버(1000)는 복수의 슬롯 중 적어도 하나를 선택하는 사용자 입력에 기초하여, 선택된 적어도 하나의 슬롯을 대체 대상 텍스트로 결정할 수 있다. 복수의 슬롯 중 적어도 하나를 선택하는 사용자 입력은 클라이언트 디바이스(2000)를 통해 수신할 수 있다. 클라이언트 디바이스(2000)는 입력 텍스트로부터 식별된 복수의 슬롯을 포함하는 리스트를 디스플레이하고, 리스트 중 포함되는 복수의 슬롯 중 적어도 하나를 선택하는 사용자 입력을 수신하고, 수신된 사용자 입력에 기초하여 적어도 하나의 슬롯을 선택할 수 있다. 클라이언트 디바이스(2000)는 선택된 적어도 하나의 슬롯에 관한 식별 정보를 서버(1000)에 전송할 수 있다. 클라이언트 디바이스(2000)가 복수의 슬롯 중 대체 대상 텍스트로 결정할 적어도 하나의 슬롯을 선택하는 사용자 입력을 수신하는 구체적인 실시예에 대해서는 도 18에서 상세하게 설명하기로 한다. In an embodiment, the server 1000 may determine the selected at least one slot as the replacement target text based on a user input for selecting at least one of the plurality of slots. A user input for selecting at least one of the plurality of slots may be received through the client device 2000 . The client device 2000 displays a list including a plurality of slots identified from the input text, receives a user input for selecting at least one of the plurality of slots included in the list, and receives at least one based on the received user input You can select a slot of The client device 2000 may transmit identification information regarding at least one selected slot to the server 1000 . A specific embodiment in which the client device 2000 receives a user input for selecting at least one slot to be determined as the replacement target text from among a plurality of slots will be described in detail with reference to FIG. 18 .

일 실시예에서, 서버(1000)는 대체 텍스트를 이용하는 애플리케이션의 특성에 기초하여 복수의 슬롯 중 어느 하나의 슬롯을 대체 대상 텍스트로 결정할 수 있다. 일 실시예에서, 서버(1000)는 언어 이해 모델을 이용하는 애플리케이션을 사용하는 사용자의 나이, 성별, 지역, 사용 언어, 사투리, 및 억양 등 특성 정보에 기초하여, 복수의 슬롯 중 어느 하나의 슬롯을 대체 대상 텍스트로 결정할 수 있다. 일 실시예에서, 클라이언트 디바이스(2000)는 복수의 애플리케이션의 특성 정보 중 적어도 하나를 선택하는 사용자 입력을 수신하고, 수신된 사용자 입력에 기초하여 선택된 특성 정보를 서버(1000)에 전송할 수 있다. 서버(1000)는 클라이언트 디바이스(2000)로부터 수신한 애플리케이션의 특성 정보에 기초하여, 복수의 슬롯 중 어느 하나의 슬롯을 대체 대상 텍스트로 결정할 수 있다. 클라이언트 디바이스(2000)가 애플리케이션의 특성 정보를 선택하는 사용자 입력을 수신하는 구체적인 실시예는 도 19 b에서 상세하게 설명하기로 한다. In an embodiment, the server 1000 may determine any one of the plurality of slots as the replacement target text based on the characteristics of the application using the replacement text. In an embodiment, the server 1000 selects any one of a plurality of slots based on characteristic information such as age, gender, region, language, dialect, and intonation of a user who uses an application using a language understanding model. It can be determined by the text to be replaced. In an embodiment, the client device 2000 may receive a user input for selecting at least one of characteristic information of a plurality of applications, and transmit the selected characteristic information to the server 1000 based on the received user input. The server 1000 may determine any one of the plurality of slots as the replacement target text based on the characteristic information of the application received from the client device 2000 . A specific embodiment in which the client device 2000 receives a user input for selecting application characteristic information will be described in detail with reference to FIG. 19B .

도 7에는 도시되지 않았지만, 서버(1000)는 대체 대상 텍스트에 대하여 사용자가 발화할 것으로 예상되고, 발음적으로 유사도가 높은 대체 텍스트를 생성할 수 있다(도 4의 단계 S430 참조). 이 경우, 대체 텍스트는 대체 대상 텍스트가 태깅된 슬롯과 동일한 슬롯으로 태깅될 수 있다. 다시 말하면, 대체 텍스트는 대체 대상 텍스트의 슬롯 정보를 그대로 승계할 수 있다. Although not shown in FIG. 7 , the server 1000 may generate an alternative text that is expected to be uttered by the user with respect to the replacement target text and has a high phonological similarity (refer to step S430 of FIG. 4 ). In this case, the replacement text may be tagged with the same slot as the slot in which the replacement target text is tagged. In other words, the replacement text may inherit slot information of the replacement target text as it is.

단계 S710에서 검출된 도메인, 단계 S720에서 검출된 인텐트, 및 단계 S730에서 수행된 슬롯 태깅 정보는 자연어 이해 모델에 입력 데이터로서 제공될 수 있다. The domain detected in step S710, the intent detected in step S720, and slot tagging information performed in step S730 may be provided as input data to the natural language understanding model.

도 7에 도시된 단계 S710 내지 단계 S730은 대체 대상 텍스트를 결정하는 단계인 S420 이전에 수행되는 것으로 도시되었지만, 본 개시의 실시예가 도 7로 한정되는 것은 아니다. 입력 텍스트로부터 도메인 및 인텐트를 검출하고, 슬롯 태깅을 수행하는 단계는 학습 후보 텍스트를 생성한 이후에 수행될 수도 있다. 이에 관한 구체적인 실시예는 도 14에서 상세하게 설명하기로 한다. Although steps S710 to S730 shown in FIG. 7 are shown to be performed before the step S420 of determining the replacement target text, the embodiment of the present disclosure is not limited to FIG. 7 . Detecting the domain and intent from the input text and performing slot tagging may be performed after generating the learning candidate text. A specific embodiment related thereto will be described in detail with reference to FIG. 14 .

도 8은 본 개시의 서버(1000)가 대체 대상 텍스트(810)로부터 대체 텍스트(830)를 획득하는 실시예를 도시한 도면이다.8 is a diagram illustrating an embodiment in which the server 1000 of the present disclosure obtains the alternative text 830 from the alternative text 810 .

도 8을 참조하면, 서버(1000)는 사전 DB(800)를 이용하여 대체 대상 텍스트(810)로부터 대체 텍스트(830)를 획득할 수 있다. 사전 DB(800)는 복수의 단어들(802) 각각에 관한 발음 열(804)에 관한 정보, 및 복수의 단어들(802) 각각을 벡터 값으로 변환한 임베딩 벡터(806)에 관한 정보를 저장하는 데이터베이스이다. 사전 DB(800)는 예를 들어, 발음 열 사전일 수 있다. 일 실시예에서, 사전 DB(800)는 복수의 단어들(802) 각각에 관한 발음 열(804) 정보 및 임베딩 벡터(806)를 포함하는 룩 업 테이블(Look-Up table)로 대체될 수도 있다.Referring to FIG. 8 , the server 1000 may obtain the replacement text 830 from the replacement target text 810 by using the dictionary DB 800 . The dictionary DB 800 stores information about a pronunciation column 804 for each of the plurality of words 802 and information about an embedding vector 806 obtained by converting each of the plurality of words 802 into a vector value. It is a database that The dictionary DB 800 may be, for example, a pronunciation column dictionary. In an embodiment, the dictionary DB 800 may be replaced with a look-up table including the pronunciation column 804 information and the embedding vector 806 for each of the plurality of words 802 . .

일 실시예에서, 서버(1000)의 프로세서(1200, 도 2 참조)는 대체 대상 텍스트(810)를 어절 단위 또는 형태소 단위로 분할하고, 분할 결과 어절 단위와 형태소 단위로 구성된 말뭉치로부터 발음 열(820)을 추출할 수 있다. 일 실시예에서, 프로세서(1200)는 룰 기반(rule-based)으로 정의된 모델을 이용하여, 대체 대상 텍스트(810)로부터 발음 열(820)을 추출할 수 있다. 대체 대상 텍스트(810)로부터 발음 열(820)을 추출하는 방법은 기 공지된 기술인바, 구체적인 설명은 생략한다. 예를 들어, 대체 대상 텍스트(810)가 '설로몬'인 경우, 프로세서(1200)는 '설로몬'을 어절 단위로 분할함으로써, '[S_B OW_I L_I R_I OW_I M_I OW_I NN_E]'과 같은 발음 열(820)을 추출할 수 있다. In an embodiment, the processor 1200 (refer to FIG. 2 ) of the server 1000 divides the replacement target text 810 into word units or morpheme units, and as a result of the division, a pronunciation column 820 from a corpus composed of word units and morpheme units. ) can be extracted. In an embodiment, the processor 1200 may extract the pronunciation column 820 from the replacement target text 810 using a rule-based defined model. Since the method of extracting the pronunciation column 820 from the replacement target text 810 is a known technique, a detailed description thereof will be omitted. For example, when the replacement target text 810 is 'Solomon', the processor 1200 divides 'Sulomone' into word units, so that a pronunciation string such as '[S_B OW_I L_I R_I OW_I M_I OW_I NN_E]' 820 can be extracted.

서버(1000)는 발음 열의 연관성 정보(phonetic relevance)에 기초하여, 대체 대상 텍스트(810)를 대체할 대체 텍스트(830)를 획득할 수 있다. 일 실시예에서, 프로세서(1200)는 사전 DB(800)로부터 대체 대상 텍스트(810)를 검색하고, 검색 결과 발음 열의 유사도(similarity)(808)에 기초하여 대체 텍스트를 결정할 수 있다. 일 실시예에서, 프로세서(1200)는 사전 DB(800)에서 검색된 단어들 중 유사도(808)가 기 설정된 제2 임계치 이상인 적어도 하나의 단어들을 추출하여 리스트를 생성하고, 리스트에 포함된 적어도 하나의 단어들 중 유사도가 가장 높은 단어를 대체 텍스트(830)로 결정할 수 있다. 제2 임계치는 사전 DB(800) 내에서 검색된 단어와 대체 대상 텍스트(810) 간의 유사도를 판단하는 기준이 되는 값이다. 일 실시예에서, 제2 임계치는 클라이언트 디바이스(2000)를 통해 입력되는 사용자 입력에 의해 설정될 수 있다. 제2 임계치를 설정하는 UI에 대해서는 도 18에서 상세하게 설명하기로 한다.The server 1000 may acquire the alternative text 830 to replace the replacement target text 810 based on phonetic relevance of the pronunciation column. In an embodiment, the processor 1200 may search for the replacement target text 810 from the dictionary DB 800 , and determine the replacement text based on a similarity 808 of a pronunciation column as a result of the search. In an embodiment, the processor 1200 generates a list by extracting at least one word having a similarity 808 equal to or greater than a preset second threshold among the words searched in the dictionary DB 800 , and generates a list of at least one word included in the list. A word having the highest similarity among words may be determined as the alternative text 830 . The second threshold is a standard value for determining the similarity between the word searched in the dictionary DB 800 and the replacement target text 810 . In an embodiment, the second threshold may be set by a user input input through the client device 2000 . The UI for setting the second threshold will be described in detail with reference to FIG. 18 .

예를 들어, 제2 임계치가 상대적으로 낮게 설정되는 경우, 사전 DB(800) 내에서 대체 대상 텍스트(810)와 유사한 단어들이 검색될 확률이 높으므로, 상대적으로 많은 수의 대체 텍스트(830)가 결정될 수 있다. 반대의 예로, 제2 임계치가 상대적으로 높게 설정되는 경우, 사전 DB(800) 내에서 대체 대상 텍스트(810)와 유사한 단어들이 검색될 확률이 낮아지므로, 상대적으로 적은 수의 대체 텍스트(830)가 결정될 수 있다. For example, when the second threshold is set to be relatively low, there is a high probability that words similar to the replacement target text 810 will be searched for in the dictionary DB 800 , so that a relatively large number of replacement texts 830 are generated. can be decided. Conversely, when the second threshold is set to be relatively high, the probability of searching for words similar to the replacement target text 810 in the dictionary DB 800 is low, so that a relatively small number of replacement texts 830 is can be decided.

도 8에 도시된 실시예에서, '설로몬'을 사전 DB(800)를 통해 검색한 결과, '솔로몬'과의 유사도가 0.7822로 검색되었고, 유사도가 기 설정된 제2 임계치를 초과하는바, 프로세서(1200)는 '설로몬'을 대체하기 위한 대체 텍스트(830)로서 '솔로몬'을 획득할 수 있다. In the embodiment shown in FIG. 8 , as a result of searching for 'Solomon' through the dictionary DB 800, the similarity with 'Solomon' was found to be 0.7822, and the similarity exceeds a preset second threshold, the processor 1200 may obtain 'Solomon' as an alternative text 830 for replacing 'Solomon'.

도 9는 본 개시의 서버(1000)가 대체 대상 텍스트로부터 대체 텍스트를 획득하는 실시예에 관한 흐름도이다. 도 9에 도시된 방법의 단계들은 도 4의 단계 S430을 구체화한 것으로서, 단계 S910은 도 4에 도시된 단계 S420이 수행된 이후에 수행된다. 도 9의 단계 S930이 수행된 이후에, 도 4에 도시된 단계 S440가 수행된다. 9 is a flowchart of an embodiment in which the server 1000 of the present disclosure obtains an alternative text from an alternative text. The steps of the method shown in FIG. 9 embody step S430 of FIG. 4 , and step S910 is performed after step S420 shown in FIG. 4 is performed. After step S930 of FIG. 9 is performed, step S440 shown in FIG. 4 is performed.

단계 S910에서, 서버(1000)는 대체 대상 텍스트에 관한 발음 열(phoneme sequence)을 추출한다. 일 실시예에서, 서버(1000)는 대체 대상 텍스트를 어절 단위 또는 형태소 단위로 분할하고, 분할 결과 어절 단위와 형태소 단위로 구성된 말뭉치로부터 발음 열을 추출할 수 있다. 일 실시예에서, 서버(1000)는 룰 기반(rule-based)으로 정의된 모델을 이용하여, 대체 대상 텍스트로부터 발음 열을 추출할 수 있으나, 이에 한정되는 것은 아니다. 서버(1000)는 기 공지된 방법을 이용하여, 대체 대상 텍스트로부터 발음 열을 추출할 수 있다.In step S910 , the server 1000 extracts a phoneme sequence related to the replacement target text. In an embodiment, the server 1000 may divide the replacement target text into word units or morpheme units, and extract a pronunciation sequence from a corpus composed of word units and morphemes as a result of the division. In an embodiment, the server 1000 may extract a pronunciation column from the replacement target text using a rule-based defined model, but is not limited thereto. The server 1000 may extract a pronunciation column from the replacement target text by using a known method.

단계 S920에서, 서버(1000)는 발음 열의 연관성 정보(phonetic relevance)에 기초하여, 사전 DB에 포함된 복수의 단어들 중 추출된 발음 열과 유사한 발음 열을 갖는 텍스트를 검색한다. In step S920 , the server 1000 searches for text having a pronunciation column similar to the extracted pronunciation column among a plurality of words included in the dictionary DB based on phonetic relevance of the pronunciation column.

단계 S930에서, 서버(1000)는 검색된 텍스트 중 추출된 발음 열과 유사도(similarity)가 높은 적어도 하나의 텍스트를 이용하여 대체 텍스트를 획득한다. 일 실시예에서, 서버(1000)는 사전 DB에서 대체 대상 텍스트를 검색하고, 사전 DB에서 검색된 단어들 중 유사도가 기 설정된 제2 임계치 이상인 적어도 하나의 단어들을 추출하여 리스트를 생성할 수 있다. 서버(1000)는 리스트에 포함된 적어도 하나의 단어들 중 대체 대상 텍스트와 유사도가 가장 높은 단어를 대체 텍스트로서 결정할 수 있다. In step S930, the server 1000 obtains an alternative text by using at least one text having a high similarity to the extracted pronunciation string among the searched texts. In an embodiment, the server 1000 may generate a list by searching for a replacement target text in the dictionary DB, and extracting at least one word having a similarity greater than or equal to a preset second threshold among words searched in the dictionary DB. The server 1000 may determine, as the replacement text, a word having the highest similarity to the replacement target text among at least one word included in the list.

도 10은 본 개시의 서버(1000)가 신경망 모델을 이용하여 대체 대상 텍스트(1010)로부터 대체 텍스트(1050)를 생성하는 실시예를 도시한 도면이다.10 is a diagram illustrating an embodiment in which the server 1000 of the present disclosure generates the replacement text 1050 from the replacement target text 1010 using a neural network model.

도 10을 참조하면, 서버(1000)는 대체 대상 텍스트(1010)를 임베딩 벡터(1030)로 변환할 수 있다. 일 실시예에서, 서버(1000)의 프로세서(1200, 도 2 참조)는 워드 임베딩 모델(1020)을 이용하여, 대체 대상 텍스트(1010)를 임베딩 벡터(1030)로 변환할 수 있다. 워드 임베딩 모델(1020)은 텍스트를 구성하는 단어를 벡터 값으로 수치화하도록 구성되는 모델이다. 예를 들어, 워드 임베딩 모델(1020)은 Bag of Words, word2vec과 같은 임베딩 모델을 이용하여, 대체 대상 텍스트(1010)를 임베딩 벡터(1030)로 변환할 수 있다. 그러나, 이에 한정되는 것은 아니고, 프로세서(1200)는 기 공지된 모든 임베딩 모델을 이용하여 대체 대상 텍스트(1010)를 임베딩 벡터(1030)로 변환할 수 있다. 도 10에 도시된 실시예에서, 프로세서(1200)는 워드 임베딩 모델(1020)을 이용하여, '설로몬'이라는 대체 대상 텍스트(1010)를 [0.00002645. 0.23621, 0.00499278, …]의 벡터 값을 갖는 N차원 임베딩 벡터(1030)로 변환할 수 있다.Referring to FIG. 10 , the server 1000 may convert the replacement target text 1010 into an embedding vector 1030 . In an embodiment, the processor 1200 (refer to FIG. 2 ) of the server 1000 may convert the replacement target text 1010 into an embedding vector 1030 using the word embedding model 1020 . The word embedding model 1020 is a model configured to digitize words constituting text into vector values. For example, the word embedding model 1020 may convert the replacement target text 1010 into an embedding vector 1030 using an embedding model such as Bag of Words or word2vec. However, the present invention is not limited thereto, and the processor 1200 may convert the replacement target text 1010 into an embedding vector 1030 using all known embedding models. In the embodiment shown in FIG. 10, the processor 1200 uses the word embedding model 1020 to convert the replacement target text 1010 called 'Solomon' to [0.00002645. 0.23621, 0.00499278, … ] can be converted into an N-dimensional embedding vector 1030 having a vector value of .

서버(1000)는 신경망 모델을 이용하여, 변환된 임베딩 벡터(1030)와 유사한 벡터값을 갖는 대체 텍스트(1050)를 생성할 수 있다. 일 실시예에서, 프로세서(1200)는 생성 모델(Generative Network)(1040)을 이용하여, 대체 대상 텍스트(1010)의 임베딩 벡터(1030)와 유사한 벡터 값을 갖는 텍스트를 생성할 수 있다. 생성 모델(1040)은 생성기 모델(1044)을 포함할 수 있다. The server 1000 may generate the alternative text 1050 having a vector value similar to the transformed embedding vector 1030 using the neural network model. In an embodiment, the processor 1200 may generate text having a vector value similar to the embedding vector 1030 of the replacement target text 1010 by using the generative network 1040 . The generative model 1040 may include a generator model 1044 .

생성 모델(1040)은, 예를 들어 생성적 적대 신경망 모델(Generative Adversarial Network; GAN)으로 구성될 수 있다. 생성 모델(1040)이 생성적 적대 신경망 모델로 구성되는 경우, 생성 모델(1040)은 판별기 모델(1042) 및 생성기 모델(1044)를 포함할 수 있다. 판별기 모델(1042)과 생성기 모델(1044)는 서로 적대적으로 트레이닝될 수 있다. 생성기 모델(1044)은 생성적 적대 신경망 모델에 입력된 임베딩 벡터(1030)를 갖는 대체 대상 텍스트(1010)의 임베딩 벡터(1030)에 기초하여, 판별기 모델(1042)을 속일 수 있는 가짜(fake) 텍스트를 생성하도록 트레이닝된 모델이다. 판별기 모델(1042)은 생성기 모델(1044)에 의해 생성된 가짜 텍스트를 가짜인 것으로 판별할 수 있도록 트레이닝된 모델이다. 판별기 모델(1042)과 생성기 모델(1044)은 서로 균형점에 도달할 때까지 트레이닝될 수 있다. 판별기 모델(1042)과 생성기 모델(1044)이 균형점에 도달하면서도, 생성기 모델(1044)이 진정 텍스트, 즉 대체 대상 텍스트(1010)의 데이터 확률 분포를 충분히 모사할 수 있도록, 생성적 적대 신경망 모델의 손실 함수가 적절히 수정될 수 있다. The generative model 1040 may be configured as a generative adversarial network (GAN), for example. When the generative model 1040 is configured as a generative adversarial neural network model, the generative model 1040 may include a discriminator model 1042 and a generator model 1044 . The discriminator model 1042 and the generator model 1044 may be trained adversarially to each other. The generator model 1044 is a fake that can fool the discriminator model 1042 based on the embedding vector 1030 of the replacement object text 1010 with the embedding vector 1030 input to the generative adversarial neural network model. ) is a model trained to generate text. The discriminator model 1042 is a model trained to discriminate fake text generated by the generator model 1044 as fake. The discriminator model 1042 and the generator model 1044 may be trained until they reach a balance point with each other. While the discriminator model 1042 and the generator model 1044 reach a balance point, the generative adversarial neural network model can be used to sufficiently simulate the data probability distribution of the true text, that is, the replacement target text 1010 . The loss function of can be appropriately modified.

일 실시예에서, 생성기 모델(1044)에 의해 생성된 가짜 텍스트들이 판별기 모델(1042)에 의해 가짜가 아닌, 대체 대상 텍스트(1010)를 대체할 수 있는 텍스트로 판별될 확률은 0.5에 수렴할 수 있다. 이 경우, 프로세서(1200)는 생성기 모델(1044)에 의해 생성된 텍스트를 대체 텍스트(1050)로 결정할 수 있다. 도 10에 도시된 실시예에서, 프로세서(1200)는 대체 대상 텍스트(1010)인 '설로몬'에 관한 대체 텍스트(1050)로서, '솔로몬'을 생성할 수 있다. In one embodiment, the probability that fake texts generated by the generator model 1044 are determined by the discriminator model 1042 to be non-fake texts that can replace the replacement target text 1010 will converge to 0.5. can In this case, the processor 1200 may determine the text generated by the generator model 1044 as the alternative text 1050 . In the embodiment illustrated in FIG. 10 , the processor 1200 may generate 'Solomon' as the alternative text 1050 for 'Solomon', which is the replacement target text 1010 .

도 10에서는 생성 모델(1040)을 이용하여 대체 대상 텍스트(1010)의 임베딩 벡터(1030)와 유사한 대체 텍스트(1050)를 생성하는 것으로 설명하였지만, 본 개시가 이에 한정되는 것은 아니다. 일 실시예에서, 서버(1000)는 CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN(Bidirectional Recurrent Deep Neural Network) 또는 심층 Q-네트워크 (Deep Q-Networks) 중 적어도 하나의 신경망 모델을 이용하여, 대체 텍스트(1050)를 생성할 수 있다. Although it has been described in FIG. 10 that the replacement text 1050 similar to the embedding vector 1030 of the replacement target text 1010 is generated using the generation model 1040, the present disclosure is not limited thereto. In one embodiment, the server 1000 is a Convolutional Neural Network (CNN), Deep Neural Network (DNN), Recurrent Neural Network (RNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Bidirectional Recurrent Deep (BRDNN). The alternative text 1050 may be generated by using a neural network model of at least one of a Neural Network and Deep Q-Networks.

도 11은 본 개시의 서버(1000)가 신경망 모델을 이용하여 대체 텍스트를 생성하는 실시예에 관한 흐름도이다. 도 11에 도시된 방법의 단계들은 도 4의 단계 S430을 구체화한 것으로서, 단계 S1110은 도 4에 도시된 단계 S420이 수행된 이후에 수행된다. 도 11의 단계 S1130이 수행된 이후에, 도 4에 도시된 단계 S440가 수행된다. 11 is a flowchart of an embodiment in which the server 1000 of the present disclosure generates an alternative text using a neural network model. The steps of the method shown in FIG. 11 embodied step S430 of FIG. 4 , and step S1110 is performed after step S420 shown in FIG. 4 is performed. After step S1130 of FIG. 11 is performed, step S440 shown in FIG. 4 is performed.

단계 S1110에서, 서버(1000)는 대체 대상 텍스트를 임베딩 벡터(embedding vector)로 변환한다. 일 실시예에서, 서버(1000)는 텍스트를 구성하는 하나의 단어를 벡터 값으로 수치화하는 방법인 워드 임베딩(Word Embedding) 방법을 이용하여, 대체 대상 텍스트를 임베딩 벡터로 변환할 수 있다. 예를 들어, 서버(1000)는 Bag of Words, word2vec과 같은 임베딩 모델을 이용하여, 대체 대상 텍스트를 임베딩 벡터로 변환할 수 있다.In step S1110, the server 1000 converts the replacement target text into an embedding vector. In an embodiment, the server 1000 may convert the replacement target text into an embedding vector by using a word embedding method, which is a method of digitizing one word constituting text into a vector value. For example, the server 1000 may convert the replacement target text into an embedding vector by using an embedding model such as Bag of Words or word2vec.

단계 S1120에서, 서버(1000)는 신경망 모델(Neural Network)을 이용하여, 변환된 임베딩 벡터와 유사한 벡터값을 갖는 텍스트를 생성한다. 서버(1000)는 예를 들어, CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), GAN (Generative Adversarial Network), DCGAN (Deep Convolutional Generative Adversarial Network), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN(Bidirectional Recurrent Deep Neural Network) 또는 심층 Q-네트워크 (Deep Q-Networks) 중 적어도 하나의 신경망 모델을 이용하여, 대체 대상 텍스트의 임베딩 벡터와 유사한 벡터 값을 갖는 텍스트를 생성할 수 있다. In step S1120, the server 1000 uses a neural network model to generate text having a vector value similar to the transformed embedding vector. Server 1000 is, for example, CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), GAN (Generative Adversarial Network), DCGAN (Deep Convolutional Generative Adversarial Network), RBM (Restricted Boltzmann) Machine), DBN (Deep Belief Network), BRDNN (Bidirectional Recurrent Deep Neural Network), or deep Q-Networks using at least one neural network model, a vector value similar to the embedding vector of the target text You can create text with

일 실시예에서, 서버(1000)는 생성 모델을 이용하여, 대체 대상 텍스트의 임베딩 벡터와 유사한 벡터 값을 갖는 텍스트를 생성할 수 있다. 생성 모델은 예를 들어, 생성적 적대 신경망 모델로 구성될 수 있다. 일 실시예에서, 서버(1000)는 생성적 적대 신경망 모델에 포함되는 생성기 모델(1044, 도 10 참조)이 대체 대상 텍스트의 임베딩 벡터에 기초하여 생성한 가짜 텍스트에 대하여, 판별기 모델(1042, 도 10 참조)을 이용하여 가짜 텍스트인지 판별하는 적대적 트레이닝을 수행할 수 있다. 트레이닝 결과, 생성기 모델(1044)에 의해 생성된 가짜 텍스트들이 판별기 모델(1042)에 의해 가짜가 아닌, 대체 대상 텍스트를 대체할 수 있는 텍스트로 판별될 확률은 0.5에 수렴하는 경우, 서버(1000)는 텍스트를 출력할 수 있다.In an embodiment, the server 1000 may generate text having a vector value similar to an embedding vector of the replacement target text by using the generation model. The generative model may consist of, for example, a generative adversarial neural network model. In one embodiment, the server 1000 is configured to generate a discriminator model 1042, 1042, for fake text generated by a generator model 1044 (see FIG. 10) included in the generative adversarial neural network model based on the embedding vector of the replacement target text. 10) can be used to perform adversarial training to determine whether the text is fake. As a result of the training, when the probability that the fake texts generated by the generator model 1044 are not fake by the discriminator model 1042, but are determined as texts that can replace the replacement target text converges to 0.5, the server 1000 ) can print text.

단계 S1130에서, 서버(1000)는 생성된 텍스트를 이용하여, 대체 텍스트를 생성한다. In step S1130 , the server 1000 generates an alternative text by using the generated text.

도 12는 본 개시의 서버(1000)가 ASR 모델(1310) 및 TTS 모델(1330)을 이용하여, 대체 대상 텍스트(1210)로부터 대체 텍스트(1230)를 생성하는 실시예를 도시한 도면이다. 12 is a diagram illustrating an embodiment in which the server 1000 of the present disclosure generates an alternative text 1230 from the replacement target text 1210 by using the ASR model 1310 and the TTS model 1330 .

도 12를 참조하면, 서버(1000)의 프로세서(1200, 도 2 참조)는 TTS 모델(1330)을 이용하여, 대체 대상 텍스트(1210)를 음향 신호(wave signal)(1220)으로 변환할 수 있다. TTS 모델(1330)은 텍스트를 음향 신호로 변환하도록 구성되는 모델이다. 일 실시예에서, 프로세서(1200)는 TTS 모델(1330)을 이용하여 대체 대상 텍스트(1210)를 음향 신호(1220)로 변환하고, 변환된 음향 신호(1220)를 바이너리 데이터 스트리밍(binary data streaming) 형태의 신호로 출력할 수 있다. 도 12에 도시된 실시예에서, 프로세서(1200)는 '과자 설로몬 주문해줘~'라는 입력 텍스트 중 대체 대상 텍스트(1210)인 '설로몬'을 TTS 모델(1330)을 이용하여 음향 신호(1220)로 변환하고, 변환된 음향 신호(1220)를 바이너리 데이터 스트리밍 신호로 인코딩하여 출력할 수 있다. Referring to FIG. 12 , the processor 1200 (refer to FIG. 2 ) of the server 1000 may convert the replacement target text 1210 into a wave signal 1220 using the TTS model 1330 . . The TTS model 1330 is a model configured to convert text into an acoustic signal. In an embodiment, the processor 1200 converts the replacement target text 1210 into an acoustic signal 1220 using the TTS model 1330 , and converts the converted acoustic signal 1220 into binary data streaming. It can be output in the form of a signal. In the embodiment shown in FIG. 12 , the processor 1200 uses the TTS model 1330 for 'Sulomone', which is the replacement target text 1210 among the input text 'Order sweets sulomone~', with the sound signal 1220 ), and the converted sound signal 1220 may be encoded as a binary data streaming signal and output.

일 실시예에서, 프로세서(1200)는 개인화된 TTS 모델(Personalized TTS model)을 이용하여, 대체 대상 텍스트(1210)를 커스텀 음향 신호로 변환할 수 있다. 개인화된 TTS 모델은 발화자의 나이, 성별, 지역, 사투리, 억양, 발음(예컨대, 영국 영어, 미국 영어) 등 개인화된 특징을 반영하여 텍스트로부터 음향 신호를 생성하는 모델이다. 개인화된 TTS 모델은 특정 사용자의 음향 신호를 녹음하고, 녹음된 음향 신호와 텍스트를 심층 신경망 모델에 학습 데이터로 입력하여 트레이닝함으로써, 생성될 수 있다. 개인화된 TTS 모델은 메모리(1300, 도 2 참조)에 기 저장되어 있을 수 있다. 일 실시예에서, 서버(1000)는 클라이언트 디바이스(2000)로부터 애플리케이션 이용자의 사용 지역, 나이, 성별, 억양, 사투리, 발음 등 정보를 입력하는 사용자 입력에 기초하여 선택된 개인화된 특성값을 수신할 수 있다. 서버(1000)의 프로세서(1200)는 개인화된 TTS 모델을 이용하여, 클라이언트 디바이스(2000)로부터 수신된 개인화된 특성 값을 반영함으로써, 대체 대상 텍스트(1210)를 음향 신호(1220)로 변환할 수 있다. In an embodiment, the processor 1200 may convert the replacement target text 1210 into a custom sound signal using a personalized TTS model. The personalized TTS model is a model that generates a sound signal from text by reflecting personalized features such as the speaker's age, gender, region, dialect, intonation, and pronunciation (eg, British English, American English). The personalized TTS model may be generated by recording a specific user's acoustic signal and training by inputting the recorded acoustic signal and text into the deep neural network model as training data. The personalized TTS model may be pre-stored in the memory 1300 (refer to FIG. 2 ). In an embodiment, the server 1000 may receive, from the client device 2000, a personalized characteristic value selected based on a user input for inputting information such as the usage region, age, gender, intonation, dialect, and pronunciation of the application user. have. The processor 1200 of the server 1000 may convert the replacement target text 1210 into an acoustic signal 1220 by reflecting the personalized characteristic value received from the client device 2000 using the personalized TTS model. have.

클라이언트 디바이스(2000)가 애플리케이션 이용자의 사용 지역, 나이, 성별, 억양, 사투리, 발음 등 정보를 입력하는 사용자 입력에 기초하여 선택된 개인화된 특성값을 수신하는 구체적인 실시예는 도 19b에서 상세하게 설명하기로 한다. A specific embodiment in which the client device 2000 receives a personalized characteristic value selected based on a user input for inputting information such as an application user's usage region, age, gender, intonation, dialect, and pronunciation will be described in detail with reference to FIG. 19B . do it with

서버(1000)는 출력된 음향 신호(1220)를 ASR 모델(1310)을 이용하여 텍스트로 변환할 수 있다. 도 12에 도시된 실시예에서, 프로세서(1200)는 ASR 모델(1310)을 이용하여 출력된 음향 신호(1220)를 텍스트로 변환하고, 대체 대상 텍스트(1210)인 '설로몬'을 대체할 대체 텍스트(1230)로서 '솔로몬'을 생성할 수 있다. The server 1000 may convert the output sound signal 1220 into text using the ASR model 1310 . In the embodiment shown in FIG. 12 , the processor 1200 converts the sound signal 1220 outputted using the ASR model 1310 into text, and replaces 'Solomon', which is the replacement target text 1210 . 'Solomon' may be generated as text 1230 .

도 13은 본 개시의 서버(1000)가 대체 대상 텍스트로부터 대체 텍스트를 생성하는 실시예에 관한 흐름도이다. 도 13에 도시된 방법의 단계들은 도 4의 단계 S430을 구체화한 것으로서, 단계 S1310은 도 4에 도시된 단계 S420이 수행된 이후에 수행된다. 도 13의 단계 S1340이 수행된 이후에, 도 4에 도시된 단계 S440가 수행된다.13 is a flowchart of an embodiment in which the server 1000 of the present disclosure generates an alternative text from an alternative text. The steps of the method shown in FIG. 13 embodied step S430 of FIG. 4 , and step S1310 is performed after step S420 shown in FIG. 4 is performed. After step S1340 of FIG. 13 is performed, step S440 shown in FIG. 4 is performed.

단계 S1310에서, 서버(1000)는 TTS 모델을 이용하여, 입력 텍스트를 음향 신호로 변환한다. TTS 모델은 텍스트를 음향 신호(wave signal)로 변환하는 모델로서, 서버(1000)는 TTS 모델을 이용하여 입력 텍스트에 포함되는 대체 대상 텍스트를 음향 신호로 변환할 수 있다. In step S1310, the server 1000 converts the input text into an acoustic signal using the TTS model. The TTS model is a model for converting text into a wave signal, and the server 1000 may convert the replacement target text included in the input text into a sound signal using the TTS model.

일 실시예에서, 서버(1000)는 개인화된 TTS 모델을 이용하여, 입력 텍스트를 음향 신호로 변환할 수 있다. 개인화된 TTS 모델은 발화자의 나이, 성별, 지역, 사투리, 억양, 발음(예컨대, 영국 영어, 미국 영어) 등 개인화된 특징을 반영하여 텍스트로부터 음향 신호를 생성하는 모델이다. 일 실시예에서, 서버(1000)는 클라이언트 디바이스(2000)로부터 애플리케이션 이용자의 사용 지역, 나이, 성별, 억양, 사투리, 발음 등 정보를 입력하는 사용자 입력에 기초하여 선택된 개인화된 특성값을 수신하고, 수신된 개인화된 특성값에 기초하여 입력 텍스트 중 대체 대상 텍스트를 음향 신호로 변환할 수 있다. In an embodiment, the server 1000 may convert the input text into an acoustic signal using the personalized TTS model. The personalized TTS model is a model that generates a sound signal from text by reflecting personalized features such as the speaker's age, gender, region, dialect, intonation, and pronunciation (eg, British English, American English). In an embodiment, the server 1000 receives, from the client device 2000, a personalized characteristic value selected based on a user input for inputting information such as an application user's usage region, age, gender, intonation, dialect, and pronunciation, The replacement target text among the input texts may be converted into an acoustic signal based on the received personalized characteristic value.

단계 S1320에서, 서버(1000)는 변환된 음향 신호를 출력한다. 일 실시예에서, 서버(1000)는 음향 신호를 0과 1을 포함하는 이진 데이터로 구성된 바이너리 데이터 스트리밍 신호로 인코딩하고, 인코딩된 바이너리 데이터 스트리밍 신호를 출력할 수 있다. In step S1320, the server 1000 outputs the converted sound signal. In an embodiment, the server 1000 may encode the sound signal into a binary data streaming signal composed of binary data including 0 and 1, and output the encoded binary data streaming signal.

단계 S1330에서, 서버(1000)는 ASR 모델을 이용하여, 출력된 음향 신호를 출력 텍스트로 변환한다. ASR 모델은 음성 입력 또는 음성 신호를 텍스트로 변환하는 모델이다.In step S1330, the server 1000 converts the output sound signal into output text using the ASR model. The ASR model is a model that converts a voice input or voice signal into text.

단계 S1340에서, 서버(1000)는 변환된 출력 텍스트를 이용하여, 대체 텍스트를 생성한다. 일 실시예에서, 대체 텍스트는 발화자의 나이, 성별, 지역, 사투리, 억양, 발음(예컨대, 영국 영어, 미국 영어) 등 개인화된 특징이 반영된 텍스트일 수 있다. In step S1340, the server 1000 generates an alternative text by using the converted output text. In an embodiment, the alternative text may be a text in which personalized characteristics such as the speaker's age, gender, region, dialect, intonation, and pronunciation (eg, British English or American English) are reflected.

도 14는 본 개시의 서버(1000)가 자연어 이해 모델을 이용하여 입력 텍스트 및 적어도 하나의 학습 후보 텍스트를 해석하고, 해석 결과에 관한 정보를 생성하는 방법에 관한 흐름도이다. 도 14에 도시된 방법의 단계들은 도 4의 단계 S440과 단계 S450 사이에 수행될 수 있다. 도 14의 단계 S1410은 도 4에 도시된 단계 S440 이후에 수행되고, 단계 S1440 이후에는 도 4의 단계 S450이 수행될 수 있다.14 is a flowchart illustrating a method in which the server 1000 of the present disclosure interprets an input text and at least one learning candidate text using a natural language understanding model, and generates information about an interpretation result. The steps of the method shown in FIG. 14 may be performed between steps S440 and S450 of FIG. 4 . Step S1410 of FIG. 14 may be performed after step S440 illustrated in FIG. 4 , and step S450 of FIG. 4 may be performed after step S1440 .

단계 S1410에서, 서버(1000)는 자연어 이해 모델을 이용하여 입력 텍스트 및 적어도 하나의 학습 후보 텍스트를 해석함으로써, 도메인(domain)을 검출한다. 일 실시예에서, 서버(1000)는 기 학습된 제1 자연어 이해 모델(1320a, 도 2 참조) 내지 제3 자연어 이해 모델(1320c, 도 2 참조) 중 어느 하나에 포함되는 도메인 식별 모델을 이용하여 입력 텍스트 및 적어도 하나의 학습 후보 텍스트를 각각 해석함으로써, 입력 텍스트 및 적어도 하나의 학습 후보 텍스트가 속하거나, 또는 분류될 수 있는 카테고리 정보인 도메인을 검출할 수 있다. 일 실시예에서, 서버(1000)는 입력 텍스트 및 적어도 하나의 학습 후보 텍스트로부터 하나 또는 복수의 도메인을 식별할 수 있다. 일 실시예에서, 서버(1000)는 도메인 식별 모델을 이용하여 입력 텍스트 및 적어도 하나의 학습 후보 텍스트를 해석함으로써, 텍스트와 도메인과의 관련도를 수치 값으로 산출할 수 있다. 일 실시예에서, 서버(1000)는 하나 또는 복수의 도메인 각각과 입력 텍스트 및 적어도 하나의 학습 후보 텍스트 간의 관련도를 확률값으로 산출하고, 산출된 확률값 중 높은 확률값을 갖는 도메인을 입력 텍스트가 분류되는 도메인으로 결정할 수 있다.In step S1410, the server 1000 detects a domain by interpreting the input text and at least one learning candidate text using the natural language understanding model. In an embodiment, the server 1000 uses a domain identification model included in any one of the pre-trained first natural language understanding model 1320a (refer to FIG. 2) to the third natural language understanding model 1320c (refer to FIG. 2). By analyzing the input text and the at least one learning candidate text, respectively, it is possible to detect a domain that is category information to which the input text and the at least one learning candidate text belong or can be classified. In an embodiment, the server 1000 may identify one or more domains from the input text and the at least one learning candidate text. In an embodiment, the server 1000 may calculate the relation between the text and the domain as a numerical value by interpreting the input text and at least one learning candidate text using the domain identification model. In one embodiment, the server 1000 calculates a degree of relevance between each of one or a plurality of domains, the input text and at least one learning candidate text as a probability value, and selects a domain having a high probability value among the calculated probability values in which the input text is classified. domain can be determined.

입력 텍스트와 적어도 하나의 학습 후보 텍스트로부터 동일한 도메인이 검출될 수 있으나, 이에 한정되는 것은 아니다. 일 실시예에서, 입력 텍스트와 적어도 하나의 학습 후보 텍스트는 서로 다른 도메인으로 분류될 수 있고, 적어도 하나의 학습 후보 텍스트 각각도 서로 다른 도메인으로 분류될 수 있다.The same domain may be detected from the input text and the at least one learning candidate text, but is not limited thereto. In an embodiment, the input text and the at least one learning candidate text may be classified into different domains, and each of the at least one learning candidate text may also be classified into different domains.

단계 S1420에서, 서버(1000)는 자연어 이해 모델을 이용하여 입력 텍스트 및 적어도 하나의 학습 후보 텍스트를 해석함으로서, 인텐트(intent)를 검출한다. 서버(1000)는 기 학습된 자연어 이해 모델의 인텐트 식별 모델을 이용하여 입력 텍스트 및 적어도 하나의 학습 후보 텍스트로부터 인텐트를 검출할 수 있다. 일 실시예에서, 서버(1000)는 제1 자연어 이해 모델(1320a) 내지 제3 자연어 이해 모델(1320c) 중 어느 하나에 포함되는 인텐트 액션 식별 모델을 이용하여 입력 텍스트 및 적어도 하나의 학습 후보 텍스트로부터 인텐트 액션을 검출할 수 있다. 일 실시예에서, 서버(1000)는 제1 자연어 이해 모델(1320a) 내지 제3 자연어 이해 모델(1320c) 중 어느 하나에 포함되는 인텐트 객체 식별 모델을 이용하여 입력 텍스트 및 적어도 하나의 학습 후보 텍스트로부터 인텐트 객체를 검출할 수 있다. In step S1420, the server 1000 detects an intent by interpreting the input text and at least one learning candidate text using the natural language understanding model. The server 1000 may detect an intent from the input text and at least one learning candidate text by using the intent identification model of the pre-trained natural language understanding model. In an embodiment, the server 1000 uses an intent action identification model included in any one of the first natural language understanding model 1320a to the third natural language understanding model 1320c to input text and at least one learning candidate text. Intent action can be detected from In an embodiment, the server 1000 uses an intent object identification model included in any one of the first natural language understanding model 1320a to the third natural language understanding model 1320c to input text and at least one learning candidate text. You can detect an Intent object from .

인텐트 액션은 입력 텍스트가 수행하는 액션, 예를 들어, 검색, 포스팅, 플레이, 구매, 주문 등을 의미한다. 서버(1000)는 인텐트 액션 식별 모델을 이용하여 문법적 분석(syntactic analyze) 또는 의미적 분석(semantic analyze)을 수행함으로써, 입력 텍스트 및 적어도 하나의 학습 후보 텍스트로부터 인텐트 액션을 검출할 수 있다. 일 실시예에서, 서버(1000)는 인텐트 액션 식별 모델을 이용하여 입력 텍스트 및 적어도 하나의 학습 후보 텍스트 각각을 형태소, 단어(word), 또는 구(phrase)의 단위로 파싱(parse)하고, 파싱된 형태소, 단어, 또는 구의 언어적 특징(예: 문법적 요소)을 이용하여 파싱된 텍스트로부터 추출된 단어 또는 구의 의미를 추론할 수 있다. 서버(1000)는, 추론된 단어 또는 구의 의미를 자연어 이해 모델에서 제공되는 기 정의된 인텐트들과 비교함으로써, 추론된 단어 또는 구의 의미에 대응되는 인텐트 액션을 결정할 수 있다. The intent action refers to an action performed by the input text, for example, search, posting, play, purchase, order, and the like. The server 1000 may detect an intent action from the input text and at least one learning candidate text by performing syntactic analysis or semantic analysis using the intent action identification model. In one embodiment, the server 1000 parses each of the input text and at least one learning candidate text into a unit of a morpheme, a word, or a phrase using the intent action identification model, The meaning of the word or phrase extracted from the parsed text may be inferred by using the linguistic features (eg, grammatical elements) of the parsed morpheme, word, or phrase. The server 1000 may determine an intent action corresponding to the meaning of the inferred word or phrase by comparing the meaning of the inferred word or phrase with predefined intents provided from the natural language understanding model.

인텐트 객체는 식별된 인텐트 액션과 관련된 객체를 의미한다. 인텐트 객체는 검출된 인텐트 액션의 대상이 되는 객체로서, 예를 들어 영화, 사진, 스포츠, 날씨, 항공 등을 의미할 수 있다. 서버(1000)는 인텐트 객체 식별 모델을 이용하여 입력 텍스트 및 적어도 하나의 학습 후보 텍스트 각각을 해석함으로써, 입력 텍스트 및 적어도 하나의 학습 후보 텍스트로부터 인텐트 액션과 관련된 인텐트 객체를 검출할 수 있다. The intent object means an object related to the identified intent action. The intent object is an object that is the target of the detected intent action, and may mean, for example, a movie, a photo, sports, weather, or aviation. The server 1000 may detect an intent object related to an intent action from the input text and the at least one learning candidate text by interpreting each of the input text and the at least one learning candidate text using the intent object identification model. .

입력 텍스트와 적어도 하나의 학습 후보 텍스트로부터 동일한 인텐트가 검출될 수 있으나, 이에 한정되는 것은 아니다. 일 실시예에서, 입력 텍스트와 적어도 하나의 학습 후보 텍스트로부터 서로 다른 인텐트가 검출될 수 있고, 적어도 하나의 학습 후보 텍스트 각각으로부터 서로 다른 인텐트가 검출될 수 있다. The same intent may be detected from the input text and at least one learning candidate text, but is not limited thereto. In an embodiment, different intents may be detected from the input text and the at least one training candidate text, and different intents may be detected from each of the at least one training candidate text.

단계 S1430에서, 서버(1000)는 자연어 이해 모델을 이용하여, 입력 텍스트 및 적어도 하나의 학습 후보 텍스트 내의 슬롯(slot)을 식별하고, 슬롯 태깅(slot tagging)을 수행한다. 일 실시예에서, 서버(1000)는 기 학습된 제1 자연어 이해 모델(1320a) 내지 제3 자연어 이해 모델(1320c) 중 어느 하나에 포함되는 슬롯 태깅 모델을 이용하여 입력 텍스트 및 적어도 하나의 학습 후보 텍스트로부터 슬롯을 식별하고, 식별된 슬롯을 도메인, 인텐트 액션, 및 인텐트 객체와 연관시키는 슬롯 태깅을 수행할 수 있다. 슬롯은 입력 텍스트로부터 도메인, 인텐트 액션 및 인텐트 객체와 관련된 세부 정보들을 획득하거나, 세부 동작을 결정하기 위한 변수(variable) 정보를 의미한다. 예를 들어, 슬롯은 개체명(Named Entity), 키워드, 장소, 지명, 영화 제목, 또는 게임 용어 등을 포함할 수 있다. 일 실시예에서, 슬롯은 인텐트 액션 및 인텐트 객체와 관련된 정보이고, 하나의 인텐트에 대하여 복수 종류의 슬롯이 대응될 수 있다. In step S1430 , the server 1000 identifies a slot in the input text and at least one learning candidate text using the natural language understanding model, and performs slot tagging. In one embodiment, the server 1000 uses a slot tagging model included in any one of the pre-trained first natural language understanding model 1320a to third natural language understanding model 1320c to input text and at least one learning candidate. Slot tagging can be performed by identifying slots from text and associating the identified slots with domains, intent actions, and intent objects. The slot means variable information for obtaining detailed information related to a domain, an intent action, and an intent object from an input text or determining a detailed action. For example, the slot may include a named entity, a keyword, a place, a place name, a movie title, or a game term. According to an embodiment, a slot is information related to an intent action and an intent object, and a plurality of types of slots may correspond to one intent.

일 실시예에서, 서버(1000)는 입력 텍스트 및 적어도 하나의 학습 후보 텍스트 각각으로부터 하나 또는 복수의 슬롯을 식별할 수 있다. In an embodiment, the server 1000 may identify one or more slots from each of the input text and the at least one learning candidate text.

단계 S1410, 단계 S1420, 및 단계 S1430에서는 서버(1000)가 입력 텍스트로부터 도메인 및 인텐트를 검출하고, 슬롯 태깅을 수행하는 것으로 설명하였지만, 이에 한정되는 것은 아니다. 일 실시예에서, 서버(1000)는 클라이언트 디바이스(2000)를 통해 수신된 사용자 입력에 기초하여 입력 텍스트에 해당되는 도메인 및 인텐트를 결정하고, 슬롯 태깅을 수행할 수 있다. 일 실시예에서, 서버(1000)는 기학습된 복수의 자연어 이해 모델(1320a 내지 1320c)을 이용하여 입력 텍스트를 해석함으로써, 입력 텍스트와 관련되는 적어도 하나의 인텐트 후보, 및 슬롯 태깅을 위한 적어도 하나의 슬롯을 검출하고, 검출된 적어도 하나의 인텐트 후보 및 적어도 하나의 슬롯을 클라이언트 디바이스(2000)에 전송할 수 있다. 클라이언트 디바이스(2000)는 서버(1000)로부터 수신한 적어도 하나의 인텐트 후보 및 적어도 하나의 슬롯의 후보과 관련된 그래픽 사용자 인터페이스(GUI)를 디스플레이하고, GUI를 통해 수신된 사용자 입력에 기초하여 선택된 인텐트 및 슬롯 태깅 정보를 서버(1000)에 전송할 수 있다. 서버(1000)는 수신된 인텐트 및 슬롯 태깅 정보에 기초하여 입력 텍스트에 관련된 인텐트를 결정하고, 슬롯 태깅을 수행할 수 있다. 이에 관한 구체적인 실시예에 대해서는 도 18에서 상세하게 설명하기로 한다. In steps S1410, S1420, and S1430, it has been described that the server 1000 detects a domain and an intent from the input text and performs slot tagging, but is not limited thereto. In an embodiment, the server 1000 may determine a domain and an intent corresponding to the input text based on a user input received through the client device 2000 , and perform slot tagging. In an embodiment, the server 1000 interprets the input text by using the plurality of pre-trained natural language understanding models 1320a to 1320c, so that at least one intent candidate related to the input text, and at least for slot tagging One slot may be detected, and the detected at least one intent candidate and the at least one slot may be transmitted to the client device 2000 . The client device 2000 displays a graphical user interface (GUI) related to at least one intent candidate and at least one slot candidate received from the server 1000 , and an intent selected based on a user input received through the GUI. and slot tagging information may be transmitted to the server 1000 . The server 1000 may determine an intent related to the input text based on the received intent and slot tagging information and perform slot tagging. A specific embodiment related thereto will be described in detail with reference to FIG. 18 .

단계 S1440에서, 서버(1000)는 도메인, 인텐트, 및 슬롯에 관한 정보를 자연어 이해 모델의 학습을 위한 입력 데이터로 제공한다. 일 실시예에서, 서버(1000)는 검출된 도메인 및 인텐트에 기초하여 기 저장된 복수의 자연어 이해 모델 중 입력 텍스트 및 적어도 하나의 학습 후보 텍스트와 관련된 자연어 이해 모델을 선택할 수 있다. In step S1440, the server 1000 provides information about the domain, intent, and slot as input data for learning the natural language understanding model. In an embodiment, the server 1000 may select a natural language understanding model related to the input text and at least one learning candidate text from among a plurality of pre-stored natural language understanding models based on the detected domain and intent.

도 15는 본 개시의 서버(1000)가 클라이언트 디바이스(2000)에 의해 수신된 사용자 입력에 기초하여 자연어 이해 모델에 학습 데이터로서 입력되는 텍스트를 결정하는 실시예에 관한 흐름도이다. 도 15에 도시된 방법의 단계들은 도 4의 단계 S450을 구체화한 것으로서, 단계 S1510은 도 4에 도시된 단계 S440이 수행된 이후에 수행된다. 15 is a flowchart according to an embodiment in which the server 1000 of the present disclosure determines text input as training data to a natural language understanding model based on a user input received by the client device 2000 . The steps of the method shown in FIG. 15 embodied step S450 of FIG. 4 , and step S1510 is performed after step S440 shown in FIG. 4 is performed.

단계 S1510에서, 서버(1000)는 입력 텍스트 및 적어도 하나의 학습 후보 텍스트를 클라이언트 디바이스(2000)에 전송한다. In step S1510 , the server 1000 transmits the input text and at least one learning candidate text to the client device 2000 .

단계 S1520에서, 서버(1000)는 클라이언트 디바이스(2000)로부터, 입력 텍스트 및 적어도 하나의 학습 후보 텍스트 중 사용자 입력에 의해 선택된 적어도 하나의 텍스트에 관한 식별 값을 수신한다. 일 실시예에서, 클라이언트 디바이스(2000)는 서버(1000)로부터 입력 텍스트 및 적어도 하나의 학습 후보 텍스트를 수신하고, 적어도 하나의 학습 후보 텍스트를 포함하는 리스트 및 리스트 중 적어도 하나의 텍스트를 선택하는 사용자 입력을 수신하기 위한 그래픽 사용자 인터페이스를 디스플레이할 수 있다. 클라이언트 디바이스(2000)는 그래픽 사용자 인터페이스를 통해 수신한 사용자 입력에 기초하여 적어도 하나의 학습 후보 텍스트 중 적어도 하나를 선택하고, 선택된 적어도 하나의 학습 후보 텍스트에 관한 식별 값을 서버(1000)에 전송할 수 있다. 클라이언트 디바이스(2000)가 적어도 하나의 학습 후보 텍스트 중 적어도 하나를 선택하는 사용자 입력을 수신하는 구체적인 실시예에 대해서는 도 20에서 상세하게 설명하기로 한다. In operation S1520 , the server 1000 receives, from the client device 2000 , an identification value regarding at least one text selected by a user input among the input text and the at least one learning candidate text. In an embodiment, the client device 2000 receives the input text and the at least one learning candidate text from the server 1000 , and a list including the at least one learning candidate text and a user who selects at least one text from the list A graphical user interface for receiving input may be displayed. The client device 2000 may select at least one of the at least one learning candidate text based on a user input received through the graphical user interface, and transmit an identification value regarding the selected at least one learning candidate text to the server 1000 . have. A specific embodiment in which the client device 2000 receives a user input for selecting at least one of at least one learning candidate text will be described in detail with reference to FIG. 20 .

단계 S1530에서, 서버(1000)는 수신된 식별 값에 기초하여, 적어도 하나의 텍스트를 선택한다. 서버(1000)는 클라이언트 디바이스(2000)로부터 수신한 식별 값에 기초하여 리스트에 포함된 적어도 하나의 학습 후보 텍스트 중 적어도 하나를 식별하고, 식별된 적어도 하나의 학습 후보 텍스트를 선택할 수 있다. In step S1530, the server 1000 selects at least one text based on the received identification value. The server 1000 may identify at least one of the at least one learning candidate text included in the list based on the identification value received from the client device 2000 and select the identified at least one learning candidate text.

단계 S1540에서, 서버(1000)는 선택된 적어도 하나의 텍스트를 학습 데이터로 이용하여, 자연어 이해 모델을 학습한다. 서버(1000)는 입력 텍스트 및 단계 S1530에서 선택된 적어도 하나의 텍스트를 자연어 이해 모델에 학습 데이터로서 입력하여, 학습(training)을 수행할 수 있다. 일 실시예에서, 서버(1000)는 단계 S1440(도 14 참조)에서 입력 텍스트 및 학습 후보 텍스트로부터 식별된 도메인, 인텐트, 및 슬롯에 관한 정보를 제공 받고, 제공 받은 도메인, 인텐트, 및 슬롯 정보와 단계 S1530에서 선택된 적어도 하나의 텍스트를 입력 데이터로 이용하여 학습을 수행할 수 있다. 그러나, 이에 한정되는 것은 아니고, 서버(1000)는 단계 S710(도 7 참조)에서 검출된 도메인, 단계 S720(도 7 참조)에서 검출된 인텐트, 및 단계 S730(도 7 참조)에서 수행된 슬롯 태깅 정보를 제공받고, 도메인, 인텐트, 및 슬롯 태깅 정보와 단계 S1530에서 선택된 적어도 하나의 텍스트를 입력 데이터로 이용하여 학습을 수행할 수 있다.In step S1540, the server 1000 learns a natural language understanding model by using the selected at least one text as training data. The server 1000 may perform training by inputting the input text and at least one text selected in step S1530 as training data to the natural language understanding model. In one embodiment, the server 1000 is provided with information about a domain, an intent, and a slot identified from the input text and the learning candidate text in step S1440 (see FIG. 14 ), and the provided domain, intent, and slot Learning may be performed using the information and at least one text selected in step S1530 as input data. However, the present invention is not limited thereto, and the server 1000 determines the domain detected in step S710 (see FIG. 7 ), the intent detected in step S720 (see FIG. 7 ), and the slot performed in step S730 (see FIG. 7 ). Learning may be performed by receiving tagging information and using domain, intent, and slot tagging information and at least one text selected in step S1530 as input data.

자연어 이해 모델의 학습에 관한 구체적인 실시예에 대해서는 도 16에서 상세하게 설명하기로 한다.A specific example of learning the natural language understanding model will be described in detail with reference to FIG. 16 .

도 16은 본 개시의 서버(1000)가 입력 텍스트 및 적어도 하나의 학습 후보 텍스트의 개수에 기초하여, 자연어 이해 모델을 학습하는 실시예에 관한 흐름도이다. 도 16에 도시된 방법의 단계들은 도 4에 도시된 단계 S450을 구체화한 것이다. 또한, 단계 S1610 내지 단계 S1640은 도 15에 도시된 단계 S1530과 단계 S1540 사이에 수행된다.16 is a flowchart of an embodiment in which the server 1000 of the present disclosure learns a natural language understanding model based on the number of input texts and at least one learning candidate text. The steps of the method shown in FIG. 16 embody step S450 shown in FIG. 4 . Also, steps S1610 to S1640 are performed between steps S1530 and S1540 shown in FIG. 15 .

단계 S1610에서, 서버(1000)는 입력 텍스트 및 적어도 하나의 학습 후보 텍스트의 개수를 카운트하고, 카운트 결과 입력 텍스트 및 적어도 하나의 학습 후보 텍스트의 개수가 기설정된 제3 임계값(α)를 초과하는지를 판단한다. 제3 임계값(α)은 입력 텍스트 및 적어도 하나의 학습 후보 텍스트 개수의 총합이고, 사용자 입력에 의해 설정될 수 있다. 제3 임계값(α)은 클라이언트 디바이스(2000)에 디스플레이되는 UI를 통해 수신되는 사용자 입력에 의해 설정될 수 있다. 제3 임계값(α)을 설정하기 위한 UI에 대해서는 도 18에서 상세하게 설명하기로 한다. In step S1610, the server 1000 counts the number of the input text and the at least one learning candidate text, and as a result of the count, the number of the input text and the at least one learning candidate text exceeds the third threshold value α. judge The third threshold value α is the sum of the number of input texts and at least one learning candidate text, and may be set by a user input. The third threshold value α may be set by a user input received through a UI displayed on the client device 2000 . A UI for setting the third threshold value α will be described in detail with reference to FIG. 18 .

기설정된 제3 임계값(α)은 예를 들어, 10개일 수 있으나, 이에 한정되지 않는다. The preset third threshold value α may be, for example, 10, but is not limited thereto.

단계 S1610의 카운트 결과 입력 텍스트 및 적어도 하나의 학습 후보 텍스트의 개수가 기설정된 제3 임계값(α)을 초과하는 경우(단계 S1620), 서버(1000)는 검출된 도메인, 인텐트, 및 슬롯 정보를 이용하여 신규 자연어 이해 모델을 학습(training)한다. 일 실시예에서, 서버(1000)는 입력 텍스트 및 적어도 하나의 학습 후보 텍스트로부터 검출된 도메인, 인텐트, 및 슬롯에 관한 레이블 값과 레이블 값 각각에 대응되는 입력 텍스트 및 적어도 하나의 학습 후보 텍스트를 학습 데이터로 이용하여 신규 자연어 이해 모델을 학습할 수 있다. 자연어 이해 모델은 입력 텍스트 및 학습 후보 텍스트와 각각에 대응되는 도메인, 인텐트, 및 슬롯에 관한 레이블 값을 입력 데이터로 이용하는 룰 기반 시스템(rule-based system)을 통해 트레이닝될 수 있으나, 이에 한정되지 않는다. 자연어 이해 모델은 예를 들어, 신경망 베이스 시스템(neural network-based system)(예: 컨볼루션 신경망(convolution neural network; CNN, 순환 신경망(recurrent neural network; RNN), SVM(Support Vector Machine), 선형 회귀(linear regression), 로지스틱 회귀(logistic regression), 나이브 베이즈 분류(Naive Bayes), 랜덤 포레스트(random forest), decision tree, 또는 k-nearest neighbor algorithm 일 수 있다. 또는, 전술한 것의 조합 또는 이와 다른 인공지능 모델을 이용하여 트레이닝될 수도 있다.When the number of input texts and at least one learning candidate text as a result of the count in step S1610 exceeds a third preset threshold α (step S1620), the server 1000 determines the detected domain, intent, and slot information is used to train a new natural language understanding model. In an embodiment, the server 1000 receives the input text and at least one learning candidate text corresponding to each of the label values and the label values for domains, intents, and slots detected from the input text and the at least one learning candidate text. A new natural language understanding model can be trained by using it as training data. The natural language understanding model may be trained through a rule-based system using the input text and the learning candidate text and the label values for the domain, intent, and slot corresponding to each as input data, but is not limited thereto. does not Natural language understanding models are, for example, neural network-based systems (eg, convolutional neural networks (CNNs), recurrent neural networks (RNNs), support vector machines (SVMs), linear regression). (linear regression), logistic regression (logistic regression), naive Bayes classification (Naive Bayes), random forest (random forest), decision tree, or k-nearest neighbor algorithm, or a combination of the foregoing or other It can also be trained using artificial intelligence models.

예를 들어, 입력 텍스트인 "페퍼로니 피자 3판 언주로 30길로 배달 주문해 줘"와 복수의 학습 후보 텍스트인 "페파로니 피자 3판 언주로 30길로 배달 주문해 줘", "파페로니 피자 3판 언주로 30길로 배달 주문해 줘", 및 "페포로니 피자 3판 언주로 30길로 배달 주문해 줘"를 포함하는 텍스트의 개수가 기설정된 제3 임계값(α)을 초과하는 경우, 서버(1000)는 '주문' 또는 '배달'과 관련된 도메인을 갖는 신규 자연어 이해 모델을 학습할 수 있다. 서버(1000)는 '음식 주문'을 입력 텍스트 및 복수의 학습 후보 텍스트에 관한 인텐트의 레이블 값으로 입력하고, '페퍼로니', '페파로니', '파페로니', '페포로니'를 입력 텍스트 및 복수의 학습 후보 텍스트에 관한 슬롯의 레이블 값으로 입력함으로써, 신규 자연어 이해 모델을 학습할 수 있다. For example, the input text "Order for delivery by 30 gil with 3 pepperoni pizzas" and multiple learning candidate texts "Order for delivery by 30 gil with 3 pepperoni pizzas", "3 slices of pepperoni pizza If the number of texts including "Order for delivery by 30 gil in Unjuro" and "Order for delivery in 3 slices of pepperoni pizza in Unjuro 30 gil" exceeds a preset third threshold (α), the server ( 1000) may train a new natural language understanding model with a domain related to 'order' or 'delivery'. The server 1000 inputs 'food order' as the label value of the intent related to the input text and the plurality of learning candidate texts, and inputs 'pepperoni', 'pepperoni', 'paperoni', and 'peporoni' By inputting the text and the plurality of learning candidate texts as label values of slots, a new natural language understanding model can be learned.

단계 S1610의 카운트 결과 입력 텍스트 및 적어도 하나의 학습 후보 텍스트의 개수가 기설정된 제3 임계값(α) 이하인 경우(단계 S1630), 서버(1000)는 기 학습된 자연어 이해 모델(pre-trained NLU model) 중 검출된 도메인 및 인텐트 정보에 대응되는 자연어 이해 모델을 선택한다. 일 실시예에서, 서버(1000)는 입력 텍스트 및 적어도 하나의 학습 후보 텍스트로부터 검출된 도메인에 기초하여, 메모리(1300, 도 2 참조)에 저장된 복수의 기 학습된 자연어 이해 모델들(1320a 내지 1320c, 도 2 참조) 중 어느 하나의 자연어 이해 모델을 선택할 수 있다. 일 실시예에서, 복수의 기 학습된 자연어 이해 모델(1320a 내지 1320c) 각각은 특정 도메인에 특화되어 학습된 모델이고, 서버(1000)는 복수의 기 학습된 자연어 이해 모델(1320a 내지 1320c) 중 입력 텍스트 및 적어도 하나의 학습 후보 텍스트로부터 검출된 도메인과 유사한 도메인으로 학습된 자연어 이해 모델을 선택할 수 있다. 입력 텍스트 및 적어도 하나의 학습 후보 텍스트로부터 검출된 도메인과 복수의 기 학습된 자연어 이해 모델(1320a 내지 1320c)이 학습된 도메인 간의 유사도는 확률값으로 계산될 수 있다. When the number of input texts and at least one learning candidate text as a result of the count in step S1610 is less than or equal to a third threshold α (step S1630), the server 1000 performs a pre-trained NLU model ), a natural language understanding model corresponding to the detected domain and intent information is selected. In an embodiment, the server 1000 is configured to provide a plurality of pre-learned natural language understanding models 1320a to 1320c stored in the memory 1300 (refer to FIG. 2 ) based on the domain detected from the input text and at least one learning candidate text. , see FIG. 2), any one of the natural language understanding models may be selected. In an embodiment, each of the plurality of pre-trained natural language understanding models 1320a to 1320c is a model trained specifically for a specific domain, and the server 1000 is an input of the plurality of pre-trained natural language understanding models 1320a to 1320c. A natural language understanding model trained with a domain similar to a domain detected from the text and at least one learning candidate text may be selected. A similarity between a domain detected from the input text and at least one learning candidate text and a domain from which the plurality of pre-trained natural language understanding models 1320a to 1320c have been learned may be calculated as a probability value.

예를 들어, 제1 자연어 이해 모델(1320a)이 영화 도메인으로 학습된 모델이고, 제2 자연어 이해 모델(1320b)은 게임 도메인으로 학습된 모델이며, 제3 자연어 이해 모델(1320c)은 가전 기기 제어 도메인으로 학습된 모델일 수 있다. 일 실시예에서, 서버(1000)는 기 학습된 자연어 이해 모델을 이용하여 입력 텍스트 및 적어도 하나의 학습 후보 텍스트를 해석함으로써, 입력 텍스트 및 적어도 하나의 학습 후보 텍스트로부터 영화에 관한 인텐트 액션 및 인텐트 객체를 검출하고, 영화 제목, 영화 배우 이름 등과 같은 슬롯을 식별하며, 슬롯 태깅을 수행할 수 있다. 이 경우, 서버(1000)는 입력 텍스트 및 적어도 하나의 학습 후보 텍스트가 영화 도메인과 관련된 제1 확률 값, 게임 도메인과 관련된 제2 확률 값, 가전 기기 제어 도메인과 관련된 제3 확률 값을 각각 계산하고, 계산된 확률 값들 중 제1 확률 값을 최대값으로 결정할 수 있다. 서버(1000)는 제1 확률 값에 기초하여, 제1 자연어 이해 모델(1320a) 내지 제3 자연어 이해 모델(1320c) 중 제1 자연어 이해 모델(1320a)을 선택할 수 있다.For example, the first natural language understanding model 1320a is a model trained in the movie domain, the second natural language understanding model 1320b is a model trained in the game domain, and the third natural language understanding model 1320c is home appliance control. It may be a model trained with a domain. In an embodiment, the server 1000 interprets the input text and the at least one learning candidate text using the pre-learned natural language understanding model, so that an intent action and an action related to a movie from the input text and the at least one learning candidate text are performed. It is possible to detect tent objects, identify slots such as movie titles, movie actors names, and the like, and perform slot tagging. In this case, the server 1000 calculates a first probability value associated with the movie domain, a second probability value associated with the game domain, and a third probability value associated with the home appliance control domain for the input text and the at least one learning candidate text, respectively, , a first probability value among the calculated probability values may be determined as a maximum value. The server 1000 may select the first natural language understanding model 1320a from among the first natural language understanding model 1320a to the third natural language understanding model 1320c based on the first probability value.

단계 S1640에서, 서버(1000)는 선택된 자연어 이해 모델에 입력 텍스트 및 적어도 하나의 학습 후보 텍스트를 학습 데이터로서 입력하여 학습을 수행함으로써, 자연어 이해 모델을 갱신(update)한다. 일 실시예에서, 서버(1000)는 제1 자연어 이해 모델(1320a) 내지 제3 자연어 이해 모델(1320c) 중 선택된 제1 자연어 이해 모델(1320a)에 입력 텍스트 및 적어도 하나의 텍스트와 도메인, 인텐트, 및 슬롯에 관한 레이블 값을 입력하고, 입력된 데이터를 이용하여 학습을 수행함으로써, 제1 자연어 이해 모델을 갱신할 수 있다. 일 실시예에서, 제1 자연어 이해 모델(1320a)에 포함되는 도메인 식별 모델(1322a, 도 2 참조), 인텐트 액션 식별 모델(1323a, 도 2 참조), 인텐트 객체 식별 모델(1324a, 도 2 참조), 및 슬롯 태깅 모델(1325a, 도 2 참조)을 포함하는 4개 모델이 갱신될 수 있다. In step S1640 , the server 1000 updates the natural language understanding model by inputting the input text and at least one learning candidate text as training data to the selected natural language understanding model to perform learning. In an embodiment, the server 1000 may input text and at least one text, domain, and intent to the first natural language understanding model 1320a selected from the first natural language understanding model 1320a to the third natural language understanding model 1320c. By inputting label values for , , and slots, and performing learning using the input data, the first natural language understanding model may be updated. In an embodiment, a domain identification model 1322a (see FIG. 2 ), an intent action identification model 1323a (see FIG. 2 ), and an intent object identification model 1324a (see FIG. 2 ) included in the first natural language understanding model 1320a ), and the four models including the slot tagging model 1325a (refer to FIG. 2 ) may be updated.

일 실시예에서, 서버(1000)는 제1 자연어 이해 모델(1320a)의 인텐트 액션 식별 모델(1323a)에 아직 포함되지 않은 임의의 신규 인텐트 액션을 지원하기 위하여, 인텐트 액션 식별 모델(1323a)에 입력 텍스트 및 적어도 하나의 학습 후보 텍스트를 입력하고, 이에 대응되는 신규 인텐트 액션의 레이블 값을 입력하여 트레이닝함으로써, 인텐트 액션 식별 모델(1323a)을 갱신할 수 있다. 일 실시예에서, 신규 도메인과 연관되는 인텐트 액션은 갱신되기 전에 이미 인텐트 액션 식별 모델(1323a)에 의해 지원될 수도 있다. In one embodiment, the server 1000 configures the intent action identification model 1323a to support any new intent actions not yet included in the intent action identification model 1323a of the first natural language understanding model 1320a. ), the intent action identification model 1323a may be updated by inputting an input text and at least one learning candidate text, and inputting a label value of a new intent action corresponding thereto for training. In one embodiment, the intent actions associated with the new domain may already be supported by the intent action identification model 1323a before being updated.

일 실시예에서, 서버(1000)는 제1 자연어 이해 모델(1320a)의 인텐트 객체 식별 모델(1324a)에 아직 포함되지 않은 임의의 신규 인텐트 객체를 지원하기 위하여, 인텐트 객체 식별 모델(1324a)에 입력 텍스트 및 적어도 하나의 학습 후보 텍스트를 입력하고, 이에 대응되는 신규 인텐트 객체의 레이블 값을 입력하여 트레이닝함으로써, 인텐트 객체 식별 모델(1324a)을 갱신할 수 있다. 일 실시예에서, 신규 도메인과 연관되는 인텐트 객체는 갱신되기 전에 이미 인텐트 객체 식별 모델(1324a)에 의해 지원될 수 있다.In one embodiment, the server 1000 configures the Intent object identification model 1324a to support any new Intent objects not yet included in the Intent object identification model 1324a of the first natural language understanding model 1320a. ), the intent object identification model 1324a may be updated by inputting an input text and at least one learning candidate text, and inputting a label value of a new intent object corresponding thereto for training. In one embodiment, the Intent object associated with the new domain may already be supported by the Intent object identification model 1324a before being updated.

일 실시예에서, 서버(1000)는 입력 텍스트 및 적어도 하나의 학습 후보 텍스트으로부터 식별된 슬롯과 슬롯에 관한 레이블 값을 입력하여 트레이닝함으로써, 슬롯 태깅 모델(1325a)을 갱신할 수 있다. In an embodiment, the server 1000 may update the slot tagging model 1325a by inputting the input text and the slot identified from the at least one learning candidate text and a label value related to the slot for training.

예를 들어, 입력 텍스트가 "영화 어벤저스 엔드게임 개봉일 검색해줘"이고, 복수의 학습 후보 텍스트가 "영화 어벤쟈스 엔드게임 개봉일 검색해줘", "영화 아밴저스 엔드게임 개봉일 검색해줘", "영화 어벤주스 엔드게임 개봉일 검색해줘"인 경우, 서버(1000)는 입력 텍스트 및 복수의 학습 후보 텍스트로부터 '영화' 도메인을 검출하고, 제1 자연어 이해 모델(1320a) 내지 제3 자연어 이해 모델(1320c)의 도메인과의 연관성 정도를 나타내는 확률값을 계산하며, 제1 자연어 이해 모델(1320a)을 선택할 수 있다. 일 실시예에서, 서버(1000)는 제1 자연어 이해 모델(1320a)의 인텐트 액션 식별 모델(1323a)에 아직 포함되지 않은 신규 인텐트 액션인 '검색(search)'에 관한 레이블 값과, 입력 텍스트 및 복수의 학습 후보 텍스트를 이용하여 트레이닝함으로써, 인텐트 액션 식별 모델(1323a)을 갱신할 수 있다. 일 실시예에서, 서버(1000)는 제1 자연어 이해 모델(1320a)의 인텐트 객체 식별 모델(1324a)에 아직 포함되지 않은 신규 인텐트 객체인 '영화 개봉일'에 관한 레이블 값과, 입력 텍스트 및 복수의 학습 후보 텍스트를 이용하여 트레이닝함으로써, 인텐트 객체 식별 모델(1324a)을 갱신할 수 있다. 일 실시예에서, 서버(1000)는 제1 자연어 이해 모델(1320a)의 슬롯 태깅 모델(1325a)에 아직 포함되지 않은 슬롯 엘리먼트인 '어벤저스 엔드게임', '어벤쟈스 엔드게임', 아밴저스 엔드게임', '어벤주스 엔드게임'에 관한 레이블 값과, 입력 텍스트 및 복수의 학습 후보 텍스트를 이용하여 트레이닝함으로써, 슬롯 태깅 모델(1325a)을 갱신할 수 있다. For example, if the input text is "Search for the movie Avengers Endgame release date", the plurality of learning candidate texts are "Find the movie Avengers Endgame release date", "Search the movie Avengers Endgame release date", "The Movie Avengers" In the case of "Search for the release date of the juice endgame", the server 1000 detects a 'movie' domain from the input text and a plurality of learning candidate texts, and the first natural language understanding model 1320a to the third natural language understanding model 1320c. A probability value indicating a degree of association with a domain may be calculated, and the first natural language understanding model 1320a may be selected. In an embodiment, the server 1000 may include a label value for 'search', which is a new intent action that is not yet included in the intent action identification model 1323a of the first natural language understanding model 1320a, and an input By training using the text and the plurality of learning candidate texts, the intent action identification model 1323a may be updated. In one embodiment, the server 1000 includes a label value for 'movie release date', a new Intent object that is not yet included in the intent object identification model 1324a of the first natural language understanding model 1320a, and the input text and By training using a plurality of learning candidate texts, the intent object identification model 1324a may be updated. In one embodiment, the server 1000 is configured to provide slot elements that are not yet included in the slot tagging model 1325a of the first natural language understanding model 1320a 'Avengers Endgame', 'Avengers Endgame', and Avengers End The slot tagging model 1325a may be updated by training using the label values for 'Game' and 'Avenjuice Endgame', input text, and a plurality of learning candidate texts.

도 17은 본 개시의 클라이언트 디바이스(2000)의 동작을 도시한 흐름도이다.17 is a flowchart illustrating an operation of the client device 2000 of the present disclosure.

단계 S1710에서, 클라이언트 디바이스(2000)는 입력 텍스트를 입력하는 사용자 입력을 수신하기 위한 제1 그래픽 사용자 인터페이스(Graphic User Interface; GUI)를 디스플레이한다. 클라이언트 디바이스(2000)의 프로세서(2200, 도 3 참조)는 디스플레이부(2510, 도 3 참조) 상에 제1 GUI를 디스플레이하고, 사용자 입력부(2410, 도 3 참조)를 통해 사용자로부터 입력 텍스트를 입력받을 수 있다. In step S1710 , the client device 2000 displays a first graphical user interface (GUI) for receiving a user input for inputting input text. The processor 2200 (refer to FIG. 3) of the client device 2000 displays the first GUI on the display unit 2510 (refer to FIG. 3), and inputs input text from the user through the user input unit 2410 (refer to FIG. 3). can receive

단계 S1720에서, 클라이언트 디바이스(2000)는 제1 GUI를 통해 입력 받은 입력 텍스트를 서버(1000)에 전송한다. 일 실시예에서, 프로세서(2200)는 사용자로부터 입력 받은 입력 텍스트를 서버(1000)에 전송하도록 통신 인터페이스(2100, 도 3 참조)를 제어할 수 있다.In step S1720 , the client device 2000 transmits the input text received through the first GUI to the server 1000 . In an embodiment, the processor 2200 may control the communication interface 2100 (refer to FIG. 3 ) to transmit the input text received from the user to the server 1000 .

단계 S1730에서, 클라이언트 디바이스(2000)는 입력 텍스트로부터 식별된 적어도 하나의 대체 대상 텍스트 중 적어도 하나를 선택하는 사용자 입력을 수신하기 위한 제2 GUI를 디스플레이한다. 일 실시예에서, 적어도 하나의 대체 대상 텍스트는 서버(1000)에 의해 식별되고, 클라이언트 디바이스(2000)는 통신 인터페이스(2100)를 이용하여, 적어도 하나의 대체 대상 텍스트에 관한 식별 값을 서버(1000)로부터 수신할 수 있다. 일 실시예에서, 프로세서(2200)는 디스플레이부(2510) 상에 적어도 하나의 대체 대상 텍스트를 포함하는 리스트를 디스플레이할 수 있다. 일 실시예에서, 프로세서(2200)는 리스트에 포함되는 적어도 하나의 대체 대상 텍스트 중 적어도 하나를 선택하는 사용자 입력을 수신하기 위한 제2 GUI를 디스플레이부(2510) 상에 디스플레이할 수 있다. 프로세서(2200)는 사용자 입력부(2410)를 통해 리스트에 포함되는 적어도 하나의 대체 대상 텍스트 중 적어도 하나를 선택하는 사용자 입력을 수신할 수 있다. In operation S1730, the client device 2000 displays a second GUI for receiving a user input for selecting at least one of the at least one replacement target text identified from the input text. In an embodiment, the at least one replacement target text is identified by the server 1000 , and the client device 2000 uses the communication interface 2100 to transmit an identification value related to the at least one replacement target text to the server 1000 . ) can be received from In an embodiment, the processor 2200 may display a list including at least one replacement target text on the display unit 2510 . In an embodiment, the processor 2200 may display a second GUI for receiving a user input for selecting at least one of at least one replacement target text included in the list on the display unit 2510 . The processor 2200 may receive a user input for selecting at least one of at least one replacement target text included in the list through the user input unit 2410 .

그러나, 이에 한정되는 것은 아니고, 클라이언트 디바이스(2000)는 자체적으로 입력 텍스트로부터 대체 대상 텍스트를 식별할 수도 있다. 클라이언트 디바이스(2000)의 프로세서(2200)가 입력 텍스트로부터 대체 대상 텍스트를 식별하는 방법은 서버(1000)의 프로세서(1200, 도 2 참조)가 수행하는 방법과 동일하므로, 중복되는 설명은 생략한다. However, the present invention is not limited thereto, and the client device 2000 may identify the replacement target text from the input text by itself. Since the method for the processor 2200 of the client device 2000 to identify the replacement target text from the input text is the same as the method performed by the processor 1200 (refer to FIG. 2 ) of the server 1000 , a redundant description will be omitted.

단계 S1740에서, 클라이언트 디바이스(2000)는 입력 텍스트 중 제2 GUI를 통해 선택된 적어도 하나의 대체 대상 텍스트를 사용자가 발화할 것으로 예상되는 텍스트로 대체함으로써 생성된 적어도 하나의 학습 후보 텍스트를 서버(1000)로부터 수신한다. 일 실시예에서, 적어도 하나의 학습 후보 텍스트는 서버(1000)에 의해 생성되고, 프로세서(2200)는 통신 인터페이스(2100)를 이용하여, 적어도 하나의 학습 후보 텍스트에 관한 식별 값을 서버(1000)로부터 수신할 수 있다. In step S1740 , the client device 2000 replaces at least one learning candidate text generated by replacing at least one replacement target text selected through the second GUI among the input texts with text expected to be uttered by the server 1000 . receive from In an embodiment, the at least one learning candidate text is generated by the server 1000 , and the processor 2200 uses the communication interface 2100 to generate an identification value related to the at least one learning candidate text to the server 1000 . can be received from

그러나, 이에 한정되는 것은 아니고, 클라이언트 디바이스(2000)는 자체적으로 대체 대상 텍스트를 사용자가 발화할 것으로 예상되는 대체 텍스트로 대체함으로써, 적어도 하나의 학습 후보 텍스트를 생성할 수도 있다. 클라이언트 디바이스(2000)의 프로세서(2200)가 대체 대상 텍스트를 대체 텍스트로 대체함으로써 적어도 하나의 학습 후보 텍스트를 생성하는 구체적인 방법은 서버(1000)의 프로세서(1200)가 수행하는 방법과 동일하므로, 중복되는 설명은 생략한다. However, the present invention is not limited thereto, and the client device 2000 may generate at least one learning candidate text by itself replacing the replacement target text with the replacement text expected to be uttered by the user. A specific method of generating at least one learning candidate text by the processor 2200 of the client device 2000 replacing the replacement target text with the replacement text is the same as the method performed by the processor 1200 of the server 1000, A description will be omitted.

단계 S1750에서, 클라이언트 디바이스(2000)는 적어도 하나의 학습 후보 텍스트 중 적어도 하나를 선택하는 사용자 입력을 수신하기 위한 제3 GUI를 디스플레이한다. 일 실시예에서, 프로세서(2200)는 적어도 하나의 학습 후보 텍스트를 포함하는 리스트를 디스플레이부(2510) 상에 디스플레이할 수 있다. 일 실시예에서, 프로세서(2200)는 리스트에 포함되는 적어도 하나의 학습 후보 텍스트 중 적어도 하나를 선택하는 사용자 입력을 수신하기 위한 제3 GUI를 디스플레이부(2510) 상에 디스플레이할 수 있다. In operation S1750, the client device 2000 displays a third GUI for receiving a user input for selecting at least one of the at least one learning candidate text. In an embodiment, the processor 2200 may display a list including at least one learning candidate text on the display 2510 . In an embodiment, the processor 2200 may display a third GUI for receiving a user input for selecting at least one of at least one learning candidate text included in the list on the display unit 2510 .

프로세서(2200)는 사용자 입력부(2410)를 통해 리스트에 포함되는 적어도 하나의 학습 후보 텍스트 중 적어도 하나를 선택하는 사용자 입력을 수신할 수 있다. The processor 2200 may receive a user input for selecting at least one of at least one learning candidate text included in the list through the user input unit 2410 .

도 18은 본 개시의 클라이언트 디바이스(2000)가 디스플레이하는 GUI의 일 예시를 도시한 도면이다. 18 is a diagram illustrating an example of a GUI displayed by the client device 2000 of the present disclosure.

도 18을 참조하면, 클라이언트 디바이스(2000)는 디스플레이부(2510) 상에 제1 GUI(2511), 제2 GUI(2512), 제3 GUI(2513-1), 제4 GUI(2514), 제5 GUI(2515), 및 제6 GUI(2516)를 디스플레이할 수 있다. 제1 GUI(2511), 제2 GUI(2512), 제3 GUI(2513-1), 제4 GUI(2514), 제5 GUI(2515), 및 제6 GUI(2516)는 사용자의 입력을 수신하는 그래픽 인터페이스이고, 여기서 사용자는 개발하려는 애플리케이션에 포함시키기 위한 언어 모델을 학습하려는 개발자(developer)일 수 있다. Referring to FIG. 18 , the client device 2000 includes a first GUI 2511 , a second GUI 2512 , a third GUI 2513 - 1 , a fourth GUI 2514 , and a second GUI on the display unit 2510 . A fifth GUI 2515 and a sixth GUI 2516 may be displayed. The first GUI 2511 , the second GUI 2512 , the third GUI 2513 - 1 , the fourth GUI 2514 , the fifth GUI 2515 , and the sixth GUI 2516 receive user input. It is a graphical interface where the user can be a developer who wants to learn a language model for inclusion in an application to be developed.

제1 GUI(2511)는 입력 텍스트를 입력하는 사용자 입력을 수신하기 위한 그래픽 사용자 인터페이스이다. 클라이언트 디바이스(2000)의 프로세서(2200, 도 3 참조)는 제1 GUI(2511)를 디스플레이부(2510) 상에 디스플레이하고, 사용자 입력부(2410, 도 3 참조)를 통해 개발자로부터 입력 텍스트를 입력 받을 수 있다. 도 18에 도시된 실시예에서, 프로세서(2200)는 "페퍼로니 피자 3판 언주로 30길로 배달해 줘"라는 입력 텍스트를 개발자로부터 입력 받을 수 있다. The first GUI 2511 is a graphical user interface for receiving a user input for entering input text. The processor 2200 (refer to FIG. 3) of the client device 2000 displays the first GUI 2511 on the display unit 2510, and receives input text from the developer through the user input unit 2410 (refer to FIG. 3). can In the embodiment shown in FIG. 18 , the processor 2200 may receive input text from the developer, "Deliver 3 pepperoni pizzas to Eonjuro 30 Gil".

제2 GUI(2512)는 입력 텍스트로부터 검출된 인텐트에 관한 정보를 나타내는 그래픽 사용자 인터페이스이다. 일 실시예에서, 인텐트는 서버(1000)에 의해 입력 텍스트로부터 검출되고, 클라이언트 디바이스(2000)의 프로세서(2200)는 통신 인터페이스(2100, 도 3 참조)를 통해 검출된 인텐트에 관한 정보를 수신할 수 있다. 프로세서(2200)는 서버(1000)로부터 수신된 인텐트에 관한 정보를 나타내는 제2 GUI(2512)를 디스플레이부(2510) 상에 디스플레이할 수 있다. 그러나, 이에 한정되는 것은 아니고, 프로세서(2200)는 자체적으로 기학습된 자연어 이해 모델을 이용하여 입력 텍스트를 해석함으로써, 입력 텍스트로부터 인텐트를 검출할 수 있다. 인텐트는 인텐트 액션 및 인텐트 객체를 포함할 수 있다. The second GUI 2512 is a graphical user interface that displays information about an intent detected from the input text. In one embodiment, the intent is detected from the input text by the server 1000 , and the processor 2200 of the client device 2000 transmits information about the detected intent through the communication interface 2100 (refer to FIG. 3 ). can receive The processor 2200 may display the second GUI 2512 indicating information about the intent received from the server 1000 on the display unit 2510 . However, the present invention is not limited thereto, and the processor 2200 may detect an intent from the input text by interpreting the input text using the pre-learned natural language understanding model. An intent may include an intent action and an intent object.

제2 GUI(2512)는 적어도 하나의 인텐트 후보, 적어도 하나의 인텐트 후보 각각에 관한 신뢰도 점수(confidence score), 및 선택 옵션 GUI(2512a)를 포함할 수 있다. '신뢰도 점수'는 입력 텍스트에 대하여 인텐트가 정확하게 검출되었는지를 나타내는 수치 값이다. 일 실시예에서, 신뢰도 점수는 0과 1 사이의 범위의 수치일 수 있고, 신뢰도 점수가 높을수록 인텐트가 입력 텍스트의 발화 의도와 더욱 연관됨을 의미할 수 있다. 선택 옵션 GUI(2512a)는 개발자에게 입력 텍스트로부터 검출된 적어도 하나의 인텐트 후보를 제공하고, 프로세서(2200)는 선택 옵션 GUI(2512a)를 통해 개발자로부터 적어도 하나의 인텐트 후보 중 어느 하나를 선택 또는 결정하는 입력을 수신할 수 있다. 도 18에 도시된 실시예에서, 제2 GUI(2512)는 입력 텍스트로부터 검출된 인텐트 후보로서 '음식 주문', '음식 검색'을 디스플레이하고, 음식 주문에 대한 신뢰도 점수는 0.9로, 음식 검색에 관한 신뢰도 점수는 0.1로 디스플레이할 수 있다. 클라이언트 디바이스(2000)는 선택 옵션 GUI(2512a)를 통해 두 개의 인텐트 후보 중 음식 주문을 인텐트로 결정하는 입력을 수신할 수 있다. The second GUI 2512 may include at least one intent candidate, a confidence score for each of the at least one intent candidate, and a selection option GUI 2512a. The 'reliability score' is a numerical value indicating whether the intent is correctly detected with respect to the input text. In an embodiment, the confidence score may be a number in a range between 0 and 1, and a higher confidence score may mean that the intent is more related to the utterance intention of the input text. The selection option GUI 2512a provides the developer with at least one intent candidate detected from the input text, and the processor 2200 selects any one of the at least one intent candidate from the developer through the selection option GUI 2512a. Or it may receive an input for determining. 18 , the second GUI 2512 displays 'food order' and 'food search' as intent candidates detected from the input text, and the confidence score for the food order is 0.9, and the food search A confidence score of 0.1 can be displayed. The client device 2000 may receive an input for determining a food order among two intent candidates as an intent through the selection option GUI 2512a.

도 18에는 도시되지 않았지만, 제2 GUI(2512)는 신규 인텐트를 추가하기 위한 그래픽 사용자 인터페이스를 더 포함할 수 있다. Although not shown in FIG. 18 , the second GUI 2512 may further include a graphical user interface for adding a new intent.

제3 GUI(2513-1)는 입력 텍스트로부터 검출된 적어도 하나의 슬롯, 적어도 하나의 슬롯에 관한 신뢰도 점수, 및 선택 옵션 GUI(2513a, 2513b, 2513c)를 포함할 수 있다. 신뢰도 점수는 입력 텍스트로부터 검출된 슬롯의 슬롯 태깅에 관한 정확도를 나타내는 수치 값이다. 선택 옵션 GUI(2513a, 2513b, 2513c)는 개발자에게 입력 텍스트로부터 검출된 적어도 하나의 슬롯을 제공하고, 프로세서(2200)는 선택 옵션 GUI(2513a, 2513b, 2513c)를 통해 개발자로부터 적어도 하나의 슬롯 각각에 관하여 태깅(tagging)을 수행하는 입력을 수신할 수 있다. 도 18에 도시된 실시예에서, 제3 GUI(2513-1)는 입력 텍스트로부터 검출된 슬롯으로서 '페퍼로니', '3', 및 '언주로 30길'을 디스플레이하고, '페퍼로니'에 관한 슬롯 태깅 결과로서 토핑, 사람 이름을 디스플레이하고, '3'에 관한 슬롯 태깅 결과로서 개수를 디스플레이하고, '언주로 30길'에 관한 슬롯 태깅 결과로서 주소, 교통을 디스플레이할 수 있다. 제3 GUI(2513-1)에서, '페퍼로니'의 슬롯 태깅에 관한 수치 값은 토핑이 0.8이고, 사람 이름이 0.2일 수 있다. 제3 GUI(2513-1)에서, '3'의 슬롯 태깅에 관한 수치 값은 개수가 1.0일 수 있다. 제3 GUI(2513-1)에서, '언주로 30길'의 슬롯 태깅에 관한 수치 값은 주소가 0.9, 교통이 0.1일 수 있다. 프로세서(2200)는 선택 옵션 GUI(2513a)를 통해 '페퍼로니'에 관한 슬롯 태깅 결과로서 디스플레이되는 토핑 및 사람 이름 중 어느 하나를 선택하는 입력을 개발자로부터 수신할 수 있다. 프로세서(2200)는 선택 옵션 GUI(2513b)를 통해 '3'에 관한 슬롯 태깅 결과로서 디스플레이되는 개수를 선택하는 입력을 개발자로부터 수신할 수 있다. 클라이언트 디바이스(2000)는 선택 옵션 GUI(2513c)를 통해 '언주로 30길'에 관한 슬롯 태깅 결과로서 디스플레이되는 주소 및 교통 중 어느 하나를 선택하는 입력을 개발자로부터 수신할 수 있다.The third GUI 2513 - 1 may include at least one slot detected from the input text, a confidence score for the at least one slot, and selection option GUIs 2513a , 2513b , and 2513c . The confidence score is a numerical value indicating the accuracy with respect to slot tagging of the slots detected from the input text. The selection options GUIs 2513a, 2513b, and 2513c provide the developer with at least one slot detected from the input text, and the processor 2200 provides each of the at least one slot from the developer via the selection options GUI 2513a, 2513b, 2513c. may receive an input for performing tagging. 18, the third GUI 2513-1 displays 'Pepperoni', '3', and 'Eonjuro 30 Gil' as slots detected from the input text, and a slot related to 'Pepperoni' Toppings and person names may be displayed as a tagging result, a number may be displayed as a slot tagging result for '3', and an address and traffic may be displayed as a slot tagging result for 'Eonju-ro 30-gil'. In the third GUI 2513 - 1 , a numerical value for slot tagging of 'pepperoni' may be 0.8 for a topping and 0.2 for a person's name. In the third GUI 2513 - 1 , the number of '3' for slot tagging may be 1.0. In the third GUI 2513-1, numerical values related to slot tagging of 'Eonju-ro 30-gil' may have an address of 0.9 and a traffic of 0.1. The processor 2200 may receive an input for selecting one of a topping and a person's name displayed as a slot tagging result regarding 'pepperoni' from the developer through the selection option GUI 2513a. The processor 2200 may receive an input for selecting the number to be displayed as a slot tagging result regarding '3' from the developer through the selection option GUI 2513b. The client device 2000 may receive, from the developer, an input for selecting one of an address and traffic displayed as a slot tagging result for 'Eonju-ro 30-gil' through the selection option GUI 2513c.

도 18에 도시되지는 않았지만, 제3 GUI(2513-1)는 신규 슬롯을 추가하기 위한 그래픽 사용자 인터페이스를 더 포함할 수 있다. Although not shown in FIG. 18 , the third GUI 2513 - 1 may further include a graphic user interface for adding a new slot.

제4 GUI(2514)는 대체 대상 텍스트의 리스트 및 리스트에 포함되는 복수의 대체 대상 텍스트 각각을 선택하기 위한 선택 옵션 GUI(2514a)를 포함할 수 있다. 제4 GUI(2514)를 통해 디스플레이되는 대체 대상 텍스트는 서버(1000)에 의해 입력 텍스트로부터 식별된 슬롯일 수 있으나, 이에 한정되지 않는다. 일 실시예에서, 대체 대상 텍스트는 클라이언트 디바이스(2000)의 프로세서(2200)에 의해 입력 텍스트로부터 식별된 슬롯일 수 있다. 다른 실시예에서, 프로세서(2200)는 사용자 입력부(2410, 도 3 참조)를 통해 입력 텍스트에 포함되는 단어 또는 구를 선택하는 입력을 개발자로부터 수신하고, 수신된 입력에 기초하여 선택된 단어 또는 구를 대체 대상 텍스트로 결정할 수 있다. 예를 들어, 개발자는 입력 텍스트에 포함되는 '페퍼로니'를 터치 앤드 드래그(touch and drag) 입력을 통해 선택하거나, 또는 마우스를 통한 클릭 앤드 드래그(click and drag) 입력을 통해 선택할 수 있다. The fourth GUI 2514 may include a list of replacement target texts and a selection option GUI 2514a for selecting each of a plurality of replacement target texts included in the list. The replacement target text displayed through the fourth GUI 2514 may be a slot identified from the input text by the server 1000 , but is not limited thereto. In one embodiment, the replacement target text may be a slot identified from the input text by the processor 2200 of the client device 2000 . In another embodiment, the processor 2200 receives an input from the developer for selecting a word or phrase included in the input text through the user input unit 2410 (refer to FIG. 3 ), and selects the selected word or phrase based on the received input. It can be determined by the text to be replaced For example, a developer may select 'pepperoni' included in the input text through a touch and drag input or a click and drag input through a mouse.

도 18에 도시된 실시예에서, 프로세서(2200)는 선택 옵션 GUI(2514a)를 통해 대체 대상 텍스트 리스트에 포함된 '페퍼로니', '3', 및 '언주로 30길' 각각에 관하여 대체 대상 텍스트로 선택하는 입력을 개발자로부터 수신할 수 있다. In the embodiment shown in FIG. 18 , the processor 2200 performs the replacement target text for each of 'pepperoni', '3', and 'Eonjuro 30-gil' included in the replacement target text list through the selection option GUI 2514a. You can receive input from the developer to select .

제5 GUI(2515)는 애플리케이션의 특성을 나열하는 리스트 및 리스트에 포함되는 애플리케이션의 특성 중 적어도 하나를 선택하는 입력을 수신하는 선택 옵션 GUI(2515a)를 포함할 수 있다. 애플리케이션의 특성은, 개발자가 학습된 언어 모델을 포함하는 애플리케이션을 개발하고자 하는 개발 목적을 나타낼 수 있다. 애플리케이션의 개발 목적은 예를 들어, 지역 사투리 고려 애플리케이션의 개발, 외국인 고려 애플리케이션의 개발, 및 키즈 고려 필요 애플리케이션의 개발 등과 같이, 애플리케이션 사용자의 나이, 성별, 지역, 사용 언어, 억양, 및 사투리 중 적어도 하나를 포함하는 컨텍스트(context) 정보에 기초하여 결정될 수 있다. 프로세서(2200)는 선택 옵션 GUI(2515a)를 통해 애플리케이션의 개발 목적 중 어느 하나를 선택하는 입력을 개발자로부터 수신할 수 있다.The fifth GUI 2515 may include a selection option GUI 2515a that receives an input for selecting at least one of a list listing application characteristics and an application characteristic included in the list. The characteristics of the application may indicate a development purpose for which a developer intends to develop an application including the learned language model. The purpose of the development of the application is, for example, at least among the age, gender, region, language, accent, and dialect of the application user, such as the development of the application considering the local dialect, the development of the application considering the foreigner, and the development of the application that needs to be considered for kids. It may be determined based on context information including one. The processor 2200 may receive an input for selecting any one of the development purposes of the application from the developer through the selection option GUI 2515a.

클라이언트 디바이스(2000)는 개발자로부터 수신한 입력에 기초하여 대체 대상 텍스트와 애플리케이션의 개발 목적을 결정하고, 결정된 애플리케이션의 대체 대상 텍스트와 개발 목적에 따른 컨텍스트 정보를 고려하여 대체 텍스트를 생성할 수 있다. 일 실시예에서, 클라이언트 디바이스(2000)의 프로세서(2200)는 사용자 입력에 기초하여 선택된 애플리케이션의 개발 목적에 기초하여, 대체 대상 텍스트를 결정할 수 있다. 프로세서(2200)는 애플리케이션 개발 목적과 대체 대상 텍스트를 학습 데이터로 이용하여 기 학습된 심층 신경망 모델(Neural Network)을 이용하거나, 또는 애플리케이션 개발 목적과 대체 대상 텍스트 간의 페어링 관계를 정의한 룰 기반 모델(rule-based model)을 이용하여, 사용자 입력을 통해 선택된 애플리케이션 개발 목적에 따른 대체 대상 텍스트를 결정할 수 있다. 예를 들어, 제5 GUI(2515)를 통해 애플리케이션 개발 목적으로서 외국인 고려 필요 앱이 선택된 경우, 프로세서(2200)는 기 학습된 심층 신경망 모델 또는 룰 기반 모델을 이용하여 대체 대상 텍스트 리스트에 포함되는 단어들 중 상대적으로 부정확하게 발음될 가능성이 높은 "페퍼로니"를 대체 대상 텍스트로 결정할 수 있다. The client device 2000 may determine the replacement target text and the development purpose of the application based on the input received from the developer, and generate the replacement text in consideration of the determined replacement target text of the application and context information according to the development purpose. In an embodiment, the processor 2200 of the client device 2000 may determine the replacement target text based on the development purpose of the application selected based on the user input. The processor 2200 uses a pre-trained deep neural network model using the application development purpose and the replacement target text as training data, or a rule-based model defining a pairing relationship between the application development purpose and the replacement target text. -based model), it is possible to determine an alternative target text according to an application development purpose selected through a user input. For example, when an app that requires consideration for foreigners is selected as the purpose of application development through the fifth GUI 2515, the processor 2200 uses a pre-trained deep neural network model or a rule-based model to be a word included in the replacement target text list Among them, "pepperoni", which is likely to be pronounced relatively inaccurately, may be determined as the replacement target text.

클라이언트 디바이스(2000)는 결정된 대체 대상 텍스트와 애플리케이션 개발 목적에 따른 컨텍스트 정보를 고려하여, 대체 텍스트를 생성할 수 있다. 예를 들어, 대체 대상 텍스트가 '페퍼로니'이고 애플리케이션의 개발 목적이 '외국인 고려 필요 애플리케이션'으로 결정되는 경우, 프로세서(2200)는 애플리케이션 사용자의 지역, 사용 언어 등 컨텍스트 정보를 고려함으로써, '페퍼로니'를 '페파로니', '페포로니' 와 같이 외국인이 자주 발음하는 텍스트를 대체 텍스트로서 생성할 수 있다. The client device 2000 may generate the replacement text in consideration of the determined replacement target text and context information according to the purpose of application development. For example, when the replacement target text is 'pepperoni' and the development purpose of the application is determined to be 'application that needs consideration for foreigners', the processor 2200 considers context information such as the region and language of the application user to be 'pepperoni' can be generated as an alternative text with texts frequently pronounced by foreigners, such as 'peparoni' and 'peporoni'.

제6 GUI(2516)는 임계치를 설정하는 사용자 입력을 수신하기 위한 사용자 인터페이스이다. 일 실시예에서, 제6 GUI(2516)는 제1 임계치를 설정하는 사용자 입력을 수신하는 제1 임계치 UI(2516a), 제2 임계치를 설정하는 사용자 입력을 수신하는 제2 임계치 UI(2516b), 및 제3 임계치를 설정하는 사용자 입력을 수신하는 제3 임계치 UI(2516c)를 포함할 수 있다. The sixth GUI 2516 is a user interface for receiving a user input for setting a threshold. In one embodiment, the sixth GUI 2516 includes a first threshold UI 2516a for receiving a user input for setting a first threshold, a second threshold UI 2516b for receiving a user input for setting a second threshold; and a third threshold UI 2516c for receiving a user input for setting a third threshold.

제1 임계치 UI(2516a)는 제1 GUI(2511)를 통해 입력된 입력 텍스트에 포함되는 복수의 단어들 각각을 사전 DB(1370, 도 2 참조)에서 검색한 결과, 대체 대상 텍스트를 결정하는 기준값이 되는 사용 빈도의 임계치를 설정하기 위한 사용자 입력을 수신하는 사용자 인터페이스이다. 제1 임계치 UI(2516a)는 예를 들어, 낮은 단계(Low), 중간 단계(Medium), 및 높은 단계(High)로 구성되는 바(bar)와 임계치를 조절하기 위한 커서(cursor)의 UI를 포함할 수 있다. 사용자는 예를 들어, 커서를 이동시킴으로써 임계치를 높거나 또는 낮게 설정 또는 조절할 수 있다. 그러나, 이에 한정되는 것은 아니고, 제1 임계치 UI(2516a)는 사용 빈도의 수치값을 직접 입력하는 UI로 구성될 수도 있다. The first threshold UI 2516a searches the dictionary DB 1370 (refer to FIG. 2 ) for each of a plurality of words included in the input text input through the first GUI 2511 , and as a result, a reference value for determining the replacement target text This is a user interface that receives a user input for setting the threshold of the frequency of use. The first threshold UI 2516a is, for example, a bar consisting of a low level, a medium level, and a high level, and a UI of a cursor for adjusting the threshold. may include The user may set or adjust the threshold higher or lower, for example by moving the cursor. However, the present invention is not limited thereto, and the first threshold UI 2516a may be configured as a UI for directly inputting a numerical value of the frequency of use.

예를 들어, 제1 임계치 UI(2516a)를 통해 제1 임계치가 상대적으로 낮게 설정되면, 사전 DB(1370)의 검색 결과, 사용 빈도가 제1 임계치 보다 낮은 단어가 적게 검색될 것이므로, 대체 대상 텍스트의 수가 상대적으로 적을 수 있다. 도 18에 도시된 실시예에서, 제1 임계치가 낮게 설정된 경우, 입력 텍스트 중 '페퍼로니'만 대체 대상 텍스트로 결정될 수 있다. For example, if the first threshold is set to be relatively low through the first threshold UI 2516a, as a result of the search of the dictionary DB 1370, fewer words with a frequency of use lower than the first threshold will be searched, so the replacement target text may be relatively small. In the embodiment shown in FIG. 18 , when the first threshold is set low, only 'pepperoni' among the input texts may be determined as the replacement target text.

반대의 예로, 제1 임계치 UI(2516a)를 통해 제1 임계치가 상대적으로 높게 설정되면, 사전 DB(1370)의 검색 결과 사용 빈도가 제1 임계치 보다 낮은 단어라도 대체 대상 텍스트로 결정될 확률이 높으므로, 상대적으로 많은 수의 대체 대상 텍스트가 결정될 수 있다. 도 18에 도시된 실시예에서, 제1 임계치가 상대적으로 높게 설정된 경우, 입력 텍스트 중 '페퍼로니', '3', '언주로 30길'이 모두 대체 대상 텍스트로 결정될 수도 있다. As a converse example, if the first threshold is set to be relatively high through the first threshold UI 2516a, even if the frequency of use of the search result of the dictionary DB 1370 is lower than the first threshold, the probability of being determined as the replacement target text is high. , a relatively large number of replacement target texts may be determined. In the embodiment shown in FIG. 18 , when the first threshold is set to be relatively high, all of 'pepperoni', '3', and 'Eonjuro 30gil' among the input texts may be determined as the replacement target text.

제2 임계치 UI(2516b)는 입력 텍스트로부터 식별된 대체 대상 텍스트를 사전 DB(1370)에서 검색한 결과, 대체 텍스트를 결정하는 기준이 되는 유사도의 임계치를 설정하는 사용자 입력을 수신하는 사용자 인터페이스이다. 제2 임계치는 사전 DB(1370) 내에서 검색된 단어와 대체 대상 텍스트 간의 유사도를 판단하는 기준이 되는 값이다. 일 실시예에서, 제2 임계치 UI(2516b)는 제1 임계치 UI(2516a)와 마찬가지로 낮은 단계, 중간 단계, 및 높은 단계로 구성되는 바(bar)와 임계치를 조절하기 위한 커서(cursor)의 UI를 포함할 수 있다. 사용자는 예를 들어, 커서를 이동시킴으로써 제2 임계치를 높거나 또는 낮게 설정 또는 조절할 수 있다. The second threshold UI 2516b is a user interface that receives a user input for setting a threshold of similarity, which is a criterion for determining the replacement text, as a result of searching the dictionary DB 1370 for the replacement target text identified from the input text. The second threshold is a standard value for determining the degree of similarity between the word searched in the dictionary DB 1370 and the text to be replaced. In one embodiment, the second threshold UI 2516b, like the first threshold UI 2516a, consists of a bar consisting of a low step, a medium step, and a high step, and a UI of a cursor for adjusting the threshold. may include. The user may set or adjust the second threshold higher or lower, for example by moving the cursor.

예를 들어, 제2 임계치 UI(2516b)를 통해 제2 임계치가 상대적으로 낮게 설정되는 경우, 사전 DB(1370) 내에서 대체 대상 텍스트와 유사한 단어들이 검색될 확률이 높으므로, 상대적으로 많은 수의 대체 텍스트가 결정될 수 있다. 반대의 예로, 제2 임계치가 상대적으로 높게 설정되는 경우, 사전 DB(1370) 내에서 대체 대상 텍스트와 유사한 단어들이 검색될 확률이 낮아지므로, 상대적으로 적은 수의 대체 텍스트가 결정될 수 있다. For example, when the second threshold is set to be relatively low through the second threshold UI 2516b, there is a high probability that words similar to the replacement target text will be searched for in the dictionary DB 1370, so a relatively large number of Alternative text may be determined. Conversely, when the second threshold is set to be relatively high, the probability of searching for words similar to the replacement target text in the dictionary DB 1370 is low, so that a relatively small number of replacement texts may be determined.

제3 임계치 UI(2516c)는 입력 텍스트 및 적어도 하나의 학습 후보 텍스트 중 자연어 이해 모델을 이용한 학습(training)을 수행하는 텍스트의 개수의 기준값을 설정하는 사용자 입력을 수신하기 위한 사용자 인터페이스이다. 제3 임계치는 입력 텍스트 및 적어도 하나의 학습 후보 텍스트 개수의 총합이고, 제3 임계치 UI(2516c)를 통해 수신되는 사용자 입력에 의해 설정될 수 있다. 제3 임계치 UI(2516c)는 제1 임계치 UI(2516a) 및 제2 임계치 UI(2516b)와 마찬가지로 낮은 단계, 중간 단계, 및 높은 단계로 구성되는 바(bar)와 임계치를 조절하기 위한 커서(cursor)의 UI를 포함할 수 있다. 사용자는 예를 들어, 커서를 이동시킴으로써 제3 임계치를 높거나 또는 낮게 설정 또는 조절할 수 있다. 그러나, 이에 한정되는 것은 아니고, 제3 임계치 UI(2516c)는 자연어 이해 모델의 학습에 입력 데이터로서 사용될 학습 후보 텍스트의 개수를 직접 입력하는 UI를 포함할 수도 있다.The third threshold UI 2516c is a user interface for receiving a user input for setting a reference value of the number of texts for which training is performed using a natural language understanding model among input texts and at least one training candidate text. The third threshold is the sum of the input text and the number of at least one learning candidate text, and may be set by a user input received through the third threshold UI 2516c. The third threshold UI 2516c, like the first threshold UI 2516a and the second threshold UI 2516b, includes a bar consisting of a low step, a medium step, and a high step, and a cursor for adjusting the threshold. ) of the UI. The user may set or adjust the third threshold higher or lower, for example by moving the cursor. However, the present invention is not limited thereto, and the third threshold UI 2516c may include a UI for directly inputting the number of learning candidate texts to be used as input data for learning the natural language understanding model.

도 19a는 본 개시의 클라이언트 디바이스(2000)가 디스플레이하는 GUI의 일 예시를 도시한 도면이다.19A is a diagram illustrating an example of a GUI displayed by the client device 2000 of the present disclosure.

도 19a를 참조하면, 클라이언트 디바이스(2000)는 디스플레이부(2510) 상에 제1 GUI(2511) 및 슬롯 태깅 GUI(2513-2)를 디스플레이할 수 있다. 클라이언트 디바이스(2000)는 제1 GUI(2511)를 통해 개발자로부터 입력 텍스트를 입력 받을 수 있다. 클라이언트 디바이스(2000)는 슬롯 태깅 GUI(2513-2)를 통해 입력 텍스트로부터 검출되거나, 또는 선택된 슬롯을 태깅하는 입력을 수신할 수 있다. Referring to FIG. 19A , the client device 2000 may display the first GUI 2511 and the slot tagging GUI 2513 - 2 on the display unit 2510 . The client device 2000 may receive input text from a developer through the first GUI 2511 . The client device 2000 may receive an input for tagging a selected slot or detected from the input text through the slot tagging GUI 2513 - 2 .

일 실시예에서, 슬롯 태깅 GUI(2513-2)는 입력 텍스트로부터 검출되거나 또는 선택된 슬롯에 관하여 태깅될 수 있는 슬롯 후보를 나열하고, 나열된 슬롯 후보 중 어느 하나를 선택하는 입력을 개발자로부터 수신할 수 있다. 도 19a에 도시된 실시예에서, "페퍼로니 피자 3판 언주로 30길로 배달해줘"라는 입력 텍스트로부터 검출되거나, 또는 선택된 슬롯 엘리먼트인 '페퍼로니'에 대하여, 슬롯 태깅 GUI(2513-2)는 피자 토핑(PizzaTopping), 위치(Location), 및 개수(Number)를 태깅될 슬롯 후보로서 디스플레이할 수 있다. 클라이언트 디바이스(2000)의 프로세서(2200)는 슬롯 태깅 GUI(2513-2)를 통해, 피자 토핑, 위치, 및 개수 중 '페퍼로니'에 태깅될 슬롯을 선택하는 입력을 개발자로부터 수신할 수 있다. In one embodiment, the slot tagging GUI 2513-2 lists the slot candidates that can be detected from the input text or can be tagged with respect to the selected slot, and can receive input from the developer to select any one of the listed slot candidates. have. In the embodiment shown in FIG. 19A , for 'pepperoni', which is detected from the input text "deliver 3 slices of pepperoni pizza to 30 gil," or the selected slot element, the slot tagging GUI 2513-2 is the pizza topping. (PizzaTopping), a location (Location), and a number (Number) may be displayed as candidates for slots to be tagged. The processor 2200 of the client device 2000 may receive, from a developer, an input for selecting a slot to be tagged with 'pepperoni' among pizza toppings, locations, and numbers through the slot tagging GUI 2513 - 2 .

도 19b는 본 개시의 클라이언트 디바이스(2000)가 디스플레이하는 GUI의 일 예시를 도시한 도면이다. 19B is a diagram illustrating an example of a GUI displayed by the client device 2000 of the present disclosure.

도 19b를 참조하면, 클라이언트 디바이스(2000)는 디스플레이부(2510) 상에 제1 GUI(2511) 및 애플리케이션 특성 선택 GUI(2517)를 디스플레이할 수 있다. 프로세서(2200)는 제1 GUI(2511)를 통해 개발자로부터 입력 텍스트를 입력 받을 수 있다. 프로세서(2200)는 애플리케이션 특성 선택 GUI(2517)를 통해 적어도 하나의 애플리케이션의 특성 정보를 디스플레이하고, 적어도 하나의 애플리케이션의 특성 중 적어도 하나를 선택하는 입력을 수신할 수 있다. Referring to FIG. 19B , the client device 2000 may display the first GUI 2511 and the application characteristic selection GUI 2517 on the display unit 2510 . The processor 2200 may receive input text from a developer through the first GUI 2511 . The processor 2200 may display characteristic information of at least one application through the application characteristic selection GUI 2517 and receive an input for selecting at least one of characteristics of the at least one application.

애플리케이션 특성 선택 GUI(2517)는 학습된 언어 모델을 이용하는 애플리케이션을 사용하는 사용자의 컨텍스트 정보를 나열하고, 나열된 컨텍스트 정보 중 적어도 하나를 선택하는 입력을 수신하도록 구성된 그래픽 사용자 인터페이스일 수 있다. 컨텍스트 정보는 예를 들어, 애플리케이션 사용자의 나이, 성별, 지역, 사용 언어, 억양, 및 사투리 중 적어도 하나를 포함할 수 있다. 도 19b에 도시된 실시예에서, 애플리케이션 특성 선택 GUI(2517)는 애플리케이션 사용자의 사용 지역, 애플리케이션 사용자의 나이, 및 애플리케이션 사용자의 성별을 선택하는 그래픽 사용자 인터페이스를 디스플레이하고, 프로세서(2200)는 애플리케이션 특성 선택 GUI(2517)를 통해 개발자로부터 애플리케이션 사용자의 사용 지역, 나이, 및 성별을 선택하는 입력을 수신할 수 있다.The application characteristic selection GUI 2517 may be a graphical user interface configured to list context information of a user using an application using the learned language model and receive an input for selecting at least one of the listed context information. The context information may include, for example, at least one of an application user's age, gender, region, language, accent, and dialect. In the embodiment shown in FIG. 19B , the application characteristic selection GUI 2517 displays a graphical user interface that selects the application user's region of use, the application user's age, and the application user's gender, and the processor 2200 provides the application characteristic An input for selecting a usage region, age, and gender of an application user may be received from a developer through the selection GUI 2517 .

일 실시예에서, 클라이언트 디바이스(2000)의 프로세서(2200)는 애플리케이션 특성 선택 GUI(2517)를 통해 수신한 입력에 기초하여, 애플리케이션 사용자의 컨텍스트 정보를 획득하고, 컨텍스트 정보에 기초하여 대체 대상 텍스트를 대체할 대체 텍스트를 생성할 수 있다. 예를 들어, 애플리케이션 사용자의 지역이 외국인 경우, 각 국가 별 발음에 특화된 개인화된 TTS 모델(Personalized TTS model) 및 ASR 모델을 이용하여 대체 텍스트를 생성할 수 있다. 마찬가지로, 애플리케이션 사용자의 나이, 및 성별에 관해서도 특화된 개인화된 TTS 모델과 ASR 모델을 이용하여 대체 텍스트를 생성할 수 있다. 개인화된 TTS 모델 및 ASR 모델을 이용하여 대체 텍스트를 생성하는 방법은 도 12 및 도 13에서 설명한 방법과 동일한바, 중복되는 설명은 생략한다.In an embodiment, the processor 2200 of the client device 2000 obtains context information of the application user based on an input received through the application characteristic selection GUI 2517, and selects an alternative target text based on the context information. You can create alt text to be replaced. For example, when the region of the application user is a foreigner, the alternative text may be generated using a personalized TTS model and an ASR model specialized for pronunciation for each country. Similarly, alternative texts may be generated by using the personalized TTS model and the ASR model specialized for the age and gender of the application user. A method of generating the alternative text using the personalized TTS model and the ASR model is the same as the method described with reference to FIGS. 12 and 13 , and thus a redundant description is omitted.

일 실시예에서, 클라이언트 디바이스(2000)의 프로세서(2200)는 애플리케이션 특성 선택 GUI(2517)를 통해 선택된 애플리케이션 사용자의 컨텍스트 정보를 서버(1000)에 전송할 수 있다. 서버(1000)는 컨텍스트 정보에 기초하여 개인화된 TTS 모델을 선택하고, 선택된 개인화된 TTS 모델을 이용하여 대체 대상 텍스트를 음향 신호로 변환하고, 음향 신호를 출력하며, 출력된 음향 신호를 다시 ASR 모델(1310, 도 2 참조)을 이용하여 텍스트로 변환함으로써, 대체 텍스트를 생성할 수 있다. 서버(1000)는 생성된 대체 텍스트를 클라이언트 디바이스(2000)에 전송할 수 있다. 다른 실시예에서, 서버(1000)는 일반적인 TTS 모델(1330, 도 2 참조)을 이용하여 대체 대상 텍스트를 음향 신호로 변환하고, VTLP(Vocal Tract Length Perturbation) 기술을 이용하여 TTS 모델(1330)을 통해 출력된 음향 신호를 컨텍스트 정보를 반영한 대체 텍스트로 변환함으로써, 대체 텍스트를 생성할 수도 있다. 예를 들어, VTLP 기술을 이용하여 TTS 모델(1330)의 출력 음향 신호에 대하여 아이 목소리 또는 어른, 할머니 목소리 등을 반영하도록 주파수(frequency)를 변환하고, 주파수가 변환된 음향 신호를 ASR 모델(1310)을 거쳐 ASR을 수행함으로써, 나이, 성별, 지역, 사용 언어, 억양, 및 사투리 중 적어도 하나를 포함하는 컨텍스트 정보를 반영한 대체 텍스트를 생성할 수 있다. In an embodiment, the processor 2200 of the client device 2000 may transmit the context information of the application user selected through the application characteristic selection GUI 2517 to the server 1000 . The server 1000 selects a personalized TTS model based on the context information, converts the replacement target text into a sound signal using the selected personalized TTS model, outputs a sound signal, and converts the output sound signal back to the ASR model (1310, see FIG. 2) may be used to convert the text into text, thereby generating an alternative text. The server 1000 may transmit the generated alternative text to the client device 2000 . In another embodiment, the server 1000 converts the replacement target text into an acoustic signal using a general TTS model 1330 (see FIG. 2 ), and converts the TTS model 1330 using VTLP (Vocal Tract Length Perturbation) technology. Alternative text may be generated by converting the sound signal output through the text into alternative text reflecting context information. For example, using VTLP technology, a frequency is converted to reflect the voice of a child, adult, or grandmother with respect to the output sound signal of the TTS model 1330, and the frequency-converted sound signal is converted into the ASR model 1310. ) through ASR, an alternative text reflecting context information including at least one of age, gender, region, language used, intonation, and dialect may be generated.

그러나, 이에 한정되는 것은 아니고, 클라이언트 디바이스(2000)는 자체적으로 개인화된 TTS 모델을 이용하여, 컨텍스트 정보를 반영한 대체 텍스트를 생성할 수 있다. However, the present invention is not limited thereto, and the client device 2000 may generate an alternative text reflecting context information by using its own personalized TTS model.

일 실시예에서, 애플리케이션 특성 선택 GUI(2517)는 팝업 창(pop-up window) 형태로 디스플레이될 수 있다. In one embodiment, the application characteristic selection GUI 2517 may be displayed in the form of a pop-up window.

도 20은 본 개시의 클라이언트 디바이스(2000)가 디스플레이하는 GUI의 일 예시를 도시한 도면이다. 20 is a diagram illustrating an example of a GUI displayed by the client device 2000 of the present disclosure.

도 20을 참조하면, 클라이언트 디바이스(2000)는 디스플레이부(2510) 상에 제1 GUI(2511), 제6 GUI(2516), 및 신규 문장 리스트 GUI(2518)를 디스플레이할 수 있다. 클라이언트 디바이스(2000)는 제1 GUI(2511)를 통해 개발자로부터 입력 텍스트를 입력 받을 수 있다. 클라이언트 디바이스(2000)는 임계치를 설정하는 사용자 입력을 수신하기 위한 제6 GUI(2516)를 디스플레이할 수 있다. 클라이언트 디바이스(2000)는 신규 문장 리스트 GUI(2518)를 통해 적어도 하나의 학습 후보 텍스트를 포함하는 리스트를 디스플레이할 수 있다. Referring to FIG. 20 , the client device 2000 may display a first GUI 2511 , a sixth GUI 2516 , and a new sentence list GUI 2518 on the display unit 2510 . The client device 2000 may receive input text from a developer through the first GUI 2511 . The client device 2000 may display a sixth GUI 2516 for receiving a user input for setting a threshold. The client device 2000 may display a list including at least one learning candidate text through the new sentence list GUI 2518 .

제6 GUI(2516)는 대체 대상 텍스트를 결정하기 위한 사용 빈도의 기준값인 제1 임계치를 설정하는 사용자 입력을 수신하는 제1 임계치 UI(2516a), 대체 텍스트를 획득하기 위한 유사도의 기준값인 제2 임계치를 설정하는 사용자 입력을 수신하는 제2 임계치 UI(2516b), 및 자연어 이해 모델의 학습을 위한 입력 데이터로 사용될 학습 후보 텍스트의 개수의 기준값인 제3 임계치를 설정하는 사용자 입력을 수신하는 제3 임계치 UI(2516c) 중 적어도 하나를 포함할 수 있다. 제1 임계치 UI(2516a), 제2 임계치 UI(2516b), 및 제3 임계치 UI(2516c) 각각에 관한 설명은 도 18에 도시된 UI들과 동일하므로, 중복되는 설명은 생략한다. 예를 들어, 제6 GUI(2516)가 대체 대상 텍스트를 결정하기 위한 사전 DB(1370)에서의 단어 사용 빈도에 관한 제1 임계치를 설정하는 UI인 경우, 제1 임계치의 설정에 따라 입력 텍스트 중 '페퍼로니'만 대체 대상 텍스트로 결정되거나, 또는 '페퍼로니', '3판', 및 '언주로 30길'이 모두 대체 대상 텍스트로 결정될 수도 있다. The sixth GUI 2516 includes a first threshold UI 2516a that receives a user input for setting a first threshold that is a reference value of the frequency of use for determining the replacement target text, and a second threshold UI 2516a that receives a reference value of similarity for obtaining the replacement text. A second threshold UI 2516b for receiving a user input for setting a threshold, and a third for receiving a user input for setting a third threshold that is a reference value of the number of learning candidate texts to be used as input data for learning a natural language understanding model At least one of the threshold UI 2516c may be included. A description of each of the first threshold UI 2516a , the second threshold UI 2516b , and the third threshold UI 2516c is the same as the UIs shown in FIG. 18 , and thus a redundant description will be omitted. For example, in the case where the sixth GUI 2516 is a UI for setting a first threshold regarding the frequency of use of words in the dictionary DB 1370 for determining the replacement target text, among the input texts according to the setting of the first threshold Only 'Pepperoni' may be determined as the replacement target text, or 'Pepperoni', '3rd edition', and 'Eonjuro 30gil' may all be determined as the replacement target text.

일 실시예에서, 신규 문장 리스트 GUI(2518)는 리스트(2518a) 및 선택 옵션 GUI(2518b)를 포함할 수 있다. 리스트(2518a)는 적어도 하나의 학습 후보 텍스트를 포함하는 리스트이다. 일 실시예에서, 클라이언트 디바이스(2000)의 프로세서(2200)는 서버(1000)에 의해 생성된 적어도 하나의 학습 후보 텍스트를 서버(1000)로부터 수신하고, 수신된 적어도 하나의 학습 후보 텍스트를 포함하는 리스트(2518a)를 디스플레이할 수 있다. 그러나, 이에 한정되는 것은 아니고, 프로세서(2200)는 입력 텍스트로부터 대체 대상 텍스트를 식별하고, 식별된 대체 대상 텍스트를 대체 텍스트로 대체함으로써, 적어도 하나의 학습 후보 텍스트를 생성할 수 있다. 클라이언트 디바이스(2000)의 프로세서(2200)는 자체적으로 생성한 적어도 하나의 학습 후보 텍스트를 포함하는 리스트(2518a)를 디스플레이부(2510) 상에 디스플레이할 수 있다. In one embodiment, the new sentence list GUI 2518 may include a list 2518a and a selection options GUI 2518b. The list 2518a is a list including at least one learning candidate text. In an embodiment, the processor 2200 of the client device 2000 receives at least one learning candidate text generated by the server 1000 from the server 1000, and includes the received at least one learning candidate text. List 2518a may be displayed. However, the present invention is not limited thereto, and the processor 2200 may generate at least one learning candidate text by identifying the replacement target text from the input text and replacing the identified replacement target text with the replacement text. The processor 2200 of the client device 2000 may display a list 2518a including at least one learning candidate text generated by itself on the display unit 2510 .

선택 옵션 GUI(2518b)는 적어도 하나의 학습 후보 텍스트 중 적어도 하나를 선택하기 위한 입력을 수신하기 위한 그래픽 사용자 인터페이스일 수 있다. 프로세서(2200)는 선택 옵션 GUI(2518b)를 통해 적어도 하나의 학습 후보 텍스트 중 적어도 하나를 선택하는 입력을 개발자로부터 수신할 수 있다. The selection options GUI 2518b may be a graphical user interface for receiving an input for selecting at least one of the at least one learning candidate text. The processor 2200 may receive an input for selecting at least one of the at least one learning candidate text from the developer through the selection option GUI 2518b.

도 20에 도시된 실시예에서, 클라이언트 디바이스(2000)는 신규 문장 리스트 GUI(2518)를 통해 "페파로니 피자 3판 언주로 30길로 배달해줘", "파페로니 피자 3판 언주로 30길로 배달해줘", 및 "페포로니 피자 3판 언주로 30길로 배달해줘"와 같은 3개의 학습 후보 텍스트를 포함하는 리스트(2518a)를 디스플레이하고, 선택 옵션 GUI(2518b)를 통해 첫번째 학습 후보 텍스트인 "페파로니 피자 3판 언주로 30길로 배달해줘"와 세번째 학습 후보 텍스트인 "페포로니 피자 3판 언주로 30길로 배달해줘"를 선택하는 입력을 개발자로부터 수신할 수 있다. In the embodiment shown in FIG. 20 , the client device 2000 via the new sentence list GUI 2518 "Deliver 3 slices of pepperoni pizza to 30 gil", "Deliver 3 slices of pepperoni pizza to 30 gil" Display a list 2518a including three learning candidate texts, such as ", and "Deliver 3 pieces of pepperoni pizza to 30 gil in Unjuro", and display the first learning candidate text "Peppa" through the selection option GUI 2518b. An input may be received from the developer to select "Deliver 3 slices of roni pizza by 30 gil" and the third learning candidate text "deliver 3 slices of pepperoni pizza by 30 gil".

도 21은 본 개시의 클라이언트 디바이스(2000)가 디스플레이하는 GUI의 일 예시를 도시한 도면이다. 21 is a diagram illustrating an example of a GUI displayed by the client device 2000 of the present disclosure.

도 21을 참조하면, 클라이언트 디바이스(2000)는 디스플레이부(2510) 상에 제1 GUI(2511), 제6 GUI(2516), 및 신규 문장 리스트 GUI(2519)를 디스플레이할 수 있다. 클라이언트 디바이스(2000)의 프로세서(2200)는 제1 GUI(2511)를 통해 개발자로부터 입력 텍스트를 입력 받을 수 있다. 프로세서(2200)는 임계치를 설정하는 사용자 입력을 수신하기 위한 제6 GUI(2516)를 디스플레이부(2510) 상에 디스플레이하고, 제6 GUI(2516)를 통해 개발자로부터 임계치를 입력받을 수 있다. 프로세서(2200)는 신규 문장 리스트 GUI(2519)를 통해 적어도 하나의 학습 후보 텍스트를 포함하는 리스트를 디스플레이할 수 있다. Referring to FIG. 21 , the client device 2000 may display a first GUI 2511 , a sixth GUI 2516 , and a new sentence list GUI 2519 on the display unit 2510 . The processor 2200 of the client device 2000 may receive input text from a developer through the first GUI 2511 . The processor 2200 may display a sixth GUI 2516 for receiving a user input for setting a threshold on the display unit 2510 , and may receive a threshold input from a developer through the sixth GUI 2516 . The processor 2200 may display a list including at least one learning candidate text through the new sentence list GUI 2519 .

제6 GUI(2516)는 대체 대상 텍스트를 결정하기 위한 사용 빈도의 기준값인 제1 임계치를 설정하는 사용자 입력을 수신하는 제1 임계치 UI(2516a), 대체 텍스트를 획득하기 위한 유사도의 기준값인 제2 임계치를 설정하는 사용자 입력을 수신하는 제2 임계치 UI(2516b), 및 자연어 이해 모델의 학습을 위한 입력 데이터로 사용될 학습 후보 텍스트의 개수의 기준값인 제3 임계치를 설정하는 사용자 입력을 수신하는 제3 임계치 UI(2516c) 중 적어도 하나를 포함할 수 있다. 제1 임계치 UI(2516a), 제2 임계치 UI(2516b), 및 제3 임계치 UI(2516c) 각각에 관한 설명은 도 18에 도시된 UI들과 동일하므로, 중복되는 설명은 생략한다. The sixth GUI 2516 includes a first threshold UI 2516a that receives a user input for setting a first threshold that is a reference value of the frequency of use for determining the replacement target text, and a second threshold UI 2516a that receives a reference value of similarity for obtaining the replacement text. A second threshold UI 2516b for receiving a user input for setting a threshold, and a third for receiving a user input for setting a third threshold that is a reference value of the number of learning candidate texts to be used as input data for learning a natural language understanding model At least one of the threshold UI 2516c may be included. A description of each of the first threshold UI 2516a , the second threshold UI 2516b , and the third threshold UI 2516c is the same as the UIs shown in FIG. 18 , and thus a redundant description will be omitted.

일 실시예에서, 신규 문장 리스트 GUI(2519)는 대체 텍스트 GUI(2519a), 컨텍스트 GUI(2519b), 및 음성 출력 GUI(2519c)를 포함할 수 있다. 대체 텍스트 GUI(2519a)는 입력 텍스트로부터 검출되거나, 또는 선택된 대체 대상 텍스트를 대체하는 대체 텍스트를 디스플레이하는 그래픽 사용자 인터페이스이다. 대체 텍스트 GUI(2519a)는 학습 후보 텍스트의 대체 텍스트를 다른 단어 또는 구와 구별되도록 디스플레이할 수 있다. 예를 들어, 대체 텍스트 GUI(2519a)는 대체 텍스트의 컬러를 다른 단어 또는 구와는 다르게 표시하거나, 볼드체로 표시하거나, 폰트를 다르게 하거나, 대체 텍스트를 둘러싸는 영역에 하이라이트 표시를 함으로써, 대체 텍스트를 학습 후보 텍스트의 다른 단어 또는 구와는 시각적으로 구별되도록 처리할 수 있다. 도 21에 도시된 실시예에서, 대체 텍스트인 '등킨 드나쓰', '던킹 도너츠', 및 '덩긴 도나츠'는 다른 단어들과는 달리 노란색 하이라이트 처리를 하여 디스플레이될 수 있다. In one embodiment, the new sentence list GUI 2519 may include an alternative text GUI 2519a, a context GUI 2519b, and a voice output GUI 2519c. The alt text GUI 2519a is a graphical user interface that displays an alt text that is detected from the input text or replaces the selected substitute target text. The alternative text GUI 2519a may display the alternative text of the learning candidate text to be distinguished from other words or phrases. For example, the alt text GUI 2519a may display the alt text in a different color from other words or phrases, display the alt text in bold, have a different font, or highlight an area surrounding the alt text. It can be processed to be visually distinct from other words or phrases in the learning candidate text. In the embodiment shown in FIG. 21 , alternative texts 'Dungkin Donatsu', 'Dunking Donuts', and 'Dumpling Donuts' may be displayed with yellow highlighting, unlike other words.

컨텍스트 GUI(2519b)는 대체 텍스트를 생성하는데 고려되는 컨텍스트 정보를 디스플레이할 수 있다. 컨텍스트 정보는 예를 들어, 애플리케이션 사용자의 나이, 성별, 지역, 사용 언어, 억양, 및 사투리 중 적어도 하나를 포함할 수 있다. 도 21에 도시된 실시예에서는 컨텍스트 GUI(2519b)는 대체 텍스트를 생성하는데 고려된 애플리케이션 사용자의 지역에 관한 정보를 나타낼 수 있다. 예를 들어, 대체 텍스트 중 '등킨 드나쓰'는 부산 지역 사투리이므로, 컨텍스트 정보는 애플리케이션 사용자의 사용 지역인 부산 지역이고, 컨텍스트 GUI(2519b)는 '부산 지역'이라는 정보를 디스플레이할 수 있다.The context GUI 2519b may display context information that is considered for generating the alt text. The context information may include, for example, at least one of an application user's age, gender, region, language, accent, and dialect. In the embodiment shown in FIG. 21 , the context GUI 2519b may indicate information about the region of the application user considered for generating the alternative text. For example, since 'Deungkin Donatsu' in the alternative text is a Busan regional dialect, the context information is the Busan region that is used by the application user, and the context GUI 2519b may display information 'Busan region'.

음성 출력 GUI(2519c)는 TTS 모델을 이용하여, 대체 텍스트를 음향 신호로 변환하고, 변환된 음향 신호를 출력하도록 구성된 그래픽 사용자 인터페이스이다. 일 실시예에서, 클라이언트 디바이스(2000)는 컨텍스트 정보에 기초하여 개인화된 TTS 모델을 선택하고, 선택된 개인화된 TTS 모델을 이용하여 대체 텍스트를 음향 신호로 변환할 수 있다. 예를 들어, 컨텍스트 정보가 부산 지역 또는 부산 지역 사투리인 경우, 클라이언트 디바이스(2000)는 부산 지역 사투리로 기 녹음되어 트레이닝된 개인화된 TTS 모델을 이용하여 '등킨 드나쓰'를 음향 신호로 변환하고, 변환된 음향 신호를 출력할 수 있다. The voice output GUI 2519c is a graphical user interface configured to convert an alternative text into an acoustic signal using the TTS model, and output the converted acoustic signal. In an embodiment, the client device 2000 may select a personalized TTS model based on context information, and convert the alternative text into an acoustic signal using the selected personalized TTS model. For example, if the context information is a Busan area or a Busan area dialect, the client device 2000 converts 'Dungkin Donatsu' into an acoustic signal using a personalized TTS model that has been pre-recorded and trained in the Busan area dialect, The converted sound signal can be output.

도 22는 본 개시의 일 실시예에 따른 언어 이해 서비스(3000)의 구성을 도시한 블록도이다.22 is a block diagram illustrating a configuration of a language understanding service 3000 according to an embodiment of the present disclosure.

도 22를 참조하면, 언어 이해 서비스(3000)는 ASR 모델(3100), 자연어 이해 모델(3200), 다이얼로그 매니저(3300), 및 응답 생성기 모델(3400)를 포함할 수 있다. 도 22에 도시된 언어 이해 서비스(3000)는 도 2에 도시된 서버(1000)의 메모리(1300, 도 2 참조)에 기 저장될 수 있으나, 이에 한정되지 않는다. 언어 이해 서비스(3000)의 전체 또는 일부 구성 요소는 클라이언트 디바이스(2000)의 메모리(2300, 도 3 참조)에 기 저장될 수도 있다. Referring to FIG. 22 , the language understanding service 3000 may include an ASR model 3100 , a natural language understanding model 3200 , a dialog manager 3300 , and a response generator model 3400 . The language understanding service 3000 shown in FIG. 22 may be pre-stored in the memory 1300 (refer to FIG. 2 ) of the server 1000 shown in FIG. 2 , but is not limited thereto. All or some components of the language understanding service 3000 may be pre-stored in the memory 2300 (refer to FIG. 3 ) of the client device 2000 .

애플리케이션(3500)은 언어 이해 서비스(3000)와 상호 작용하도록 구성될 수 있다. 일 실시예에서, 애플리케이션(3500)은 언어 이해 서비스(3000)와 상호 작용하기 위한 자연 사용자 인터페이스(Natural User Interface; NUI)를 포함할 수 있다. 애플리케이션(3500)을 통한 자연 언어 다이얼로그 및 인텐트를 표현하는 다른 비언어적 양상, 예를 들어, 제스처, 터치, 응시, 이미지, 또는 비디오 등의 조합이 언어 이해 서비스(3000)와의 상호작용에 사용될 수 있다. 일 실시예에서, 언어 이해 서비스(3000)는 애플리케이션(3500)을 통해 사용자의 발화와 같은 음성 입력을 수신하고, 수신된 음성 입력에 대한 이해 및 응답을 함으로써, 사용자와 대화(dialog) 하도록 구성될 수 있다. Application 3500 may be configured to interact with language understanding service 3000 . In one embodiment, the application 3500 may include a Natural User Interface (NUI) for interacting with the language understanding service 3000 . Combinations of natural language dialogs and other non-verbal modalities expressing intent through application 3500 , such as gestures, touch, gaze, images, or video, may be used in interaction with language understanding service 3000 . . In one embodiment, the language understanding service 3000 may be configured to receive a voice input, such as a user's utterance, through the application 3500 , and to understand and respond to the received voice input, thereby communicating with the user. can

ASR 모델(3100)은 애플리케이션(3500)을 통해 수신한 사용자의 음성 입력을 텍스트로 변환하는 모델이다. ASR 모델(3100)은 음성 입력으로부터 변환된 텍스트를 자연어 이해 모델(3200)에 제공할 수 있다.The ASR model 3100 is a model for converting a user's voice input received through the application 3500 into text. The ASR model 3100 may provide the text converted from the voice input to the natural language understanding model 3200 .

도 22에서 텍스트를 자연어 이해 모델(3200)에 제공하는 구성이 ASR 모델(3100)로 도시되었지만, 이에 한정되는 것은 아니다. 다른 실시예에서, 언어 이해 서비스(3000)는 음성 입력이 아닌, 다른 유형(type)의 사용자 입력을 수신하고, 수신된 사용자 입력을 자연어 이해 모델(3200)에 제공하는 입력부를 포함할 수도 있다. 입력부는 예를 들어, 애플리케이션으로부터 수신된 사용자의 터치 입력, 제스처 입력, 및 텍스트 입력 중 적어도 하나를 수신하고, 수신된 입력에 대응되는 텍스트 정보를 자연어 이해 모델(3200)에 제공할 수 있다. Although the configuration for providing the text to the natural language understanding model 3200 in FIG. 22 is illustrated as the ASR model 3100, the present invention is not limited thereto. In another embodiment, the language understanding service 3000 may include an input unit that receives a user input other than the voice input and provides the received user input to the natural language understanding model 3200 . The input unit may receive, for example, at least one of a user's touch input, a gesture input, and a text input received from an application, and provide text information corresponding to the received input to the natural language understanding model 3200 .

자연어 이해 모델(3200)은 ASR 모델(3100)로부터 획득된 텍스트를 해석 또는 분석하도록 구성된 모델이다. 자연어 이해 모델(3200)은 텍스트를 태깅하고, 의미론적 분석을 수행함으로써, 텍스트로부터 도메인(domain) 및 인텐트(intent)를 검출하고, 슬롯(slot)에 관한 정보를 식별하도록 학습된 인공지능 모델일 수 있다. 자연어 이해 모델(3200)은 본 개시의 도 2 내지 도 16 각각의 실시예를 통해 학습된 모델일 수 있다. 일 실시예에서, 자연어 이해 모델(3200)은 입력 텍스트에 포함되는 적어도 하나의 대체 대상 텍스트를 식별하고, 대체 대상 텍스트에 대하여 사용자가 발화할 것으로 예상되고, 발음적으로 유사도가 높은 대체 텍스트를 생성하며, 생성된 대체 텍스트로 대체 대상 텍스트를 대체함으로써 적어도 하나의 학습 후보 텍스트를 생성하고, 입력 텍스트 및 적어도 하나의 학습 후보 텍스트를 이용하여 학습을 수행한 결과 생성되거나, 또는 갱신된 모델일 수 있다. The natural language understanding model 3200 is a model configured to interpret or analyze the text obtained from the ASR model 3100 . The natural language understanding model 3200 is an artificial intelligence model trained to tag text, perform semantic analysis, detect domains and intents from text, and identify information about slots. can be The natural language understanding model 3200 may be a model learned through each of the embodiments of FIGS. 2 to 16 of the present disclosure. In an embodiment, the natural language understanding model 3200 identifies at least one replacement target text included in the input text, and generates the replacement text that is expected to be uttered by the user with respect to the replacement target text and has a high phonological similarity. and generating at least one learning candidate text by replacing the replacement target text with the generated replacement text, and performing learning using the input text and the at least one learning candidate text, it may be a generated or updated model. .

자연어 이해 모델(3200)은 텍스트로부터 검출된 도메인, 인텐트, 및 슬롯에 관한 정보를 다이얼로그 매니저(3300)에 제공할 수 있다. The natural language understanding model 3200 may provide information about a domain, an intent, and a slot detected from the text to the dialog manager 3300 .

도면에는 도시되지 않았지만, 언어 이해 서비스(3000)는 텍스트 정규화 모듈(text normalizer module)을 더 포함할 수 있다. 텍스트 정규화 모듈은 학습 후보 텍스트 내의 대체 텍스트를 입력받고, 대체 텍스트의 원본 텍스트인 대체 대상 텍스트를 출력하는 룰 테이블(Rule table)을 포함할 수 있다. 예를 들어, 입력 텍스트가 "던킨 도너츠의 신제품 검색해줘"이고, 입력 텍스트로부터 식별된 대체 대상 텍스트가 '던킨 도너츠'이며, 대체 텍스트가 '등킨 드나쓰', '던킹 도너츠', '덩긴 도나츠'인 경우, 텍스트 정규화 모듈은 하기 표 2와 같이 입력 텍스트로서 대체 텍스트인 '등킨 드나쓰', '던킹 도너츠', '덩긴 도나츠'를, 출력 텍스트로서 대체 대상 텍스트인 '던킨 도너츠'를 정의하는 룰 테이블을 포함할 수 있다. Although not shown in the drawings, the language understanding service 3000 may further include a text normalizer module. The text normalization module may include a rule table that receives the replacement text in the training candidate text and outputs the replacement target text that is the original text of the replacement text. For example, the input text is "Search for new products from Dunkin' Donuts", the replacement target text identified from the input text is 'Dunkin Donuts', and the alt text is 'Dungkin Donuts', 'Dunkin Donuts', 'Dungkin Donuts'. ', the text normalization module defines the replacement text 'Dunkin Donuts', 'Dunking Donuts', and 'Dunkin Donuts' as the input text as the input text as shown in Table 2 below, and the replacement target text 'Dunkin Donuts' as the output text. It may include a rule table to

InputInput OutputOutput 등킨 드나쓰Dungkin Dnath 던킨 도너츠Dunkin Donuts 던킹 도너츠Dunking Donuts 던킨 도너츠Dunkin Donuts 덩긴 도나츠chunky donuts 던킨 도너츠Dunkin Donuts

일 실시예에서, 텍스트 정규화 모듈은 자연어 이해 모델(3200)과 다이얼로그 매니저(3300) 사이에 배치될 수 있다. 자연어 이해 모델(3200)은 대체 텍스트의 슬롯 정보 및 대체 대상 텍스트를 텍스트 정규화 모듈에 제공하고, 텍스트 정규화 모듈은 대체 텍스트의 원본 텍스트인 대체 대상 텍스트를 출력하며, 출력된 대체 대상 텍스트를 다이얼로그 매니저(3300)에 제공할 수 있다. In one embodiment, the text normalization module may be disposed between the natural language understanding model 3200 and the dialog manager 3300 . The natural language understanding model 3200 provides slot information and replacement target text of the replacement text to the text normalization module, the text normalization module outputs the replacement target text, which is the original text of the replacement text, and sends the output replacement target text to the dialog manager ( 3300) can be provided.

그러나, 이에 한정되는 것은 아니고, 텍스트 정규화 모듈은 대체 대상 텍스트를 응답 생성기 모델(3400)에 제공할 수도 있다.However, the present invention is not limited thereto, and the text normalization module may provide the replacement target text to the response generator model 3400 .

텍스트 정규화 모듈을 통해서는 '등킨 드나쓰', '던킹 도너츠', '덩긴 도나츠'가 입력되더라도, 원본 텍스트인 '던킨 도너츠'가 출력되는바, 다이얼로그 매니저(3300)는 원본 텍스트인 '던킨 도너츠'에 관한 머신 액션을 결정할 수 있다.Through the text normalization module, even if 'Dunkin' Donuts', 'Dunkin' Donuts' and 'Dunkin' Donuts' are input, the original text 'Dunkin' Donuts' is output, and the dialog manager 3300 displays the original text of 'Dunkin' Donuts. can determine the machine action for '.

일 실시예에서, 텍스트 정규화 모듈은 언어 이해 서비스(3000)가 아닌, 써드파티(3^rd party) 서버에 저장될 수도 있다. In one embodiment, the text normalization module may be a non-language understanding services (3000) stored in third-party (3 ^rd party) server.

다이얼로그 매니저(3300)는 자연어 이해 모델(3200)로부터 출력된 인텐트 및 슬롯에 응답하는 응답 액션을 결정함으로써, 다이얼로그를 관리하도록 구성된 모델이다. 일 실시예에서, 다이얼로그 매니저(3300)는 기 정의된 질문-응답 쌍(pair)에 기초하여, 획득된 텍스트의 인텐트 및 슬롯에 응답하는 머신 액션을 결정할 수 있다.The dialog manager 3300 is a model configured to manage a dialog by determining a response action in response to an intent and a slot output from the natural language understanding model 3200 . In an embodiment, the dialog manager 3300 may determine a machine action responding to an intent and a slot of the obtained text based on a predefined question-answer pair.

응답 생성기 모델(Response Generator)(3400)은 사용자에게 제공할 응답을 결정하고, 결정된 응답을 출력하는 모델이다. 일 실시예에서, 응답 생성기 모델(3400)은 자연어 생성 모델(Natural Language Generator; NLG) 및 TTS 모델(Text-to-Speech)을 포함할 수 있다. The response generator model 3400 is a model that determines a response to be provided to the user and outputs the determined response. In an embodiment, the response generator model 3400 may include a Natural Language Generator (NLG) and a Text-to-Speech (TTS) model.

일 실시예에서, 응답 생성기 모델(3400)은 애플리케이션(3500)의 특성 또는 애플리케이션(3500)을 구동하는 디바이스의 특성에 기초하여, 출력할 응답의 유형(type)을 결정할 수 있다. 예를 들어, 애플리케이션(3500)이 텍스트 응답을 입력받는 경우, 응답 생성기 모델(3400)은 NLG 모델을 이용하여 응답 텍스트를 생성하고, 생성된 응답 텍스트를 애플리케이션(3500)에 텍스트 형태로 제공할 수 있다. 애플리케이션(3500)이 음성 출력 기능을 갖는 디바이스, 예를 들어, AI 스피커를 통해 구동되는 경우, 응답 생성기 모델(3400)은 TTS 모델을 이용하여 응답 텍스트를 음성 신호로 변환하여 출력할 수 있다. In an embodiment, the response generator model 3400 may determine the type of response to be output based on the characteristics of the application 3500 or the characteristics of the device driving the application 3500 . For example, when the application 3500 receives a text response, the response generator model 3400 generates a response text using the NLG model, and provides the generated response text to the application 3500 in text form. have. When the application 3500 is driven through a device having a voice output function, for example, an AI speaker, the response generator model 3400 may convert the response text into a voice signal using the TTS model and output it.

본 개시의 언어 이해 서비스(3000)는 종래의 음성 비서 서비스와는 달리, 음성 명령 또는 발화 시 사용자가 잘못 발음하거나, 명칭 등을 정확하게 몰라서 부정확하게 발음하더라도 자연어 이해 모델(3200)을 이용하여 사용자 입력을 해석하고, 다이얼로그 매니저(3300) 및 응답 생성기 모델(3400)을 이용하여 응답 메시지를 출력할 수 있다. Unlike the conventional voice assistant service, the language understanding service 3000 of the present disclosure uses the natural language understanding model 3200 to input a user input even if the user pronounces a voice command or a utterance incorrectly or pronounces the name inaccurately because he or she does not know the name correctly. , and output a response message using the dialog manager 3300 and the response generator model 3400 .

일 실시예에서, 언어 이해 서비스(3000)의 자연어 이해 모델(3200)은 사용자의 나이, 성별, 지역, 사용 언어, 및 사투리 중 적어도 하나를 포함하는 컨텍스트(context) 정보를 이용하여 학습될 수 있다. 예를 들어, 자연어 이해 모델(3200)이 부산 지역 사투리를 쓰는 사용자의 컨텍스트 정보를 반영하여 학습된 모델인 경우, 사용자가 애플리케이션(3500)을 통해 "등킨 드나쓰 신제품 검색해 줘"라는 음성 명령을 입력하면 언어 이해 서비스(3000)는 자연어 이해 모델(3200)을 이용하여 음성 명령을 해석함으로써, 음성 명령으로부터 '제품 검색'이라는 인텐트와 '던킨 도너츠'라는 슬롯을 검출하고, 다이얼로그 매니저(3300) 및 응답 생성기 모델(3400)을 이용하여 던킨 도너츠의 신제품을 검색한 결과를 출력할 수 있다. 종래의 음성 비서 서비스의 경우, '등킨 드나쓰'가 의미하는 바를 정확하게 인식할 수 없어 "다시 말씀해주세요~" 등의 메시지를 출력해야 하는 것과는 달리, 본 개시의 일 실시예에 따른 언어 이해 서비스(3000)는 사용자의 발화 의도와 개인화된 컨텍스트 정보를 정확하게 파악할 수 있는바, 사용의 만족도를 향상시키고, 진보된 사용자 경험(User eXperience; UX)을 제공할 수 있다. In an embodiment, the natural language understanding model 3200 of the language understanding service 3000 may be learned using context information including at least one of the user's age, gender, region, language used, and dialect. . For example, if the natural language understanding model 3200 is a model learned by reflecting context information of a user who speaks a Busan dialect, the user inputs a voice command "Search for a new product, Deungkin Denatsu" through the application 3500 When the language understanding service 3000 interprets the voice command using the natural language understanding model 3200, the intent of 'product search' and the slot of 'Dunkin Donuts' are detected from the voice command, and the dialog manager 3300 and The result of searching for new products of Dunkin Donuts may be output using the response generator model 3400 . In the case of the conventional voice assistant service, the language understanding service ( 3000) can accurately grasp the user's utterance intention and personalized context information, thereby improving user satisfaction and providing an advanced user experience (UX).

본 명세서에서 설명된 서버(1000) 또는 클라이언트 디바이스(2000)에 의해 실행되는 프로그램은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 프로그램은 컴퓨터로 읽을 수 있는 명령어들을 수행할 수 있는 모든 시스템에 의해 수행될 수 있다. The program executed by the server 1000 or the client device 2000 described herein may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. A program may be executed by any system capable of executing computer readable instructions.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령어(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. Software may include a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device.

소프트웨어는, 컴퓨터로 읽을 수 있는 저장 매체(computer-readable storage media)에 저장된 명령어를 포함하는 컴퓨터 프로그램으로 구현될 수 있다. 컴퓨터가 읽을 수 있는 기록 매체로는, 예를 들어 마그네틱 저장 매체(예컨대, ROM(read-only memory), RAM(random-access memory), 플로피 디스크, 하드 디스크 등) 및 광학적 판독 매체(예컨대, 시디롬(CD-ROM), 디브이디(DVD: Digital Versatile Disc)) 등이 있다. 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템들에 분산되어, 분산 방식으로 컴퓨터가 판독 가능한 코드가 저장되고 실행될 수 있다. 매체는 컴퓨터에 의해 판독가능하며, 메모리에 저장되고, 프로세서에서 실행될 수 있다. The software may be implemented as a computer program including instructions stored in a computer-readable storage medium. The computer-readable recording medium includes, for example, a magnetic storage medium (eg, read-only memory (ROM), random-access memory (RAM), floppy disk, hard disk, etc.) and an optically readable medium (eg, CD-ROM). (CD-ROM), DVD (Digital Versatile Disc), etc. The computer-readable recording medium is distributed among computer systems connected through a network, so that the computer-readable code can be stored and executed in a distributed manner. The medium may be readable by a computer, stored in a memory, and executed on a processor.

컴퓨터로 읽을 수 있는 저장매체는, 비일시적(non-transitory) 저장매체의 형태로 제공될 수 있다. 여기서, '비일시적'은 저장매체가 신호(signal)를 포함하지 않으며 실재(tangible)한다는 것을 의미할 뿐 데이터가 저장매체에 반영구적 또는 임시적으로 저장됨을 구분하지 않는다. The computer-readable storage medium may be provided in the form of a non-transitory storage medium. Here, 'non-transitory' means that the storage medium does not include a signal and is tangible, and does not distinguish that data is semi-permanently or temporarily stored in the storage medium.

또한, 본 명세서에 개시된 실시예들에 따른 프로그램은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다.In addition, the program according to the embodiments disclosed in the present specification may be provided by being included in a computer program product. Computer program products may be traded between sellers and buyers as commodities.

컴퓨터 프로그램 제품은 소프트웨어 프로그램, 소프트웨어 프로그램이 저장된 컴퓨터로 읽을 수 있는 저장 매체를 포함할 수 있다. 예를 들어, 컴퓨터 프로그램 제품은 디바이스의 제조사 또는 전자 마켓(예를 들어, 구글 플레이 스토어, 앱 스토어)을 통해 전자적으로 배포되는 소프트웨어 프로그램 형태의 상품(예를 들어, 다운로드 가능한 애플리케이션(downloadable application))을 포함할 수 있다. 전자적 배포를 위하여, 소프트웨어 프로그램의 적어도 일부는 저장 매체에 저장되거나, 임시적으로 생성될 수 있다. 이 경우, 저장 매체는 제조사의 서버, 전자 마켓의 서버, 또는 소프트웨어 프로그램을 임시적으로 저장하는 중계 서버의 저장매체가 될 수 있다.The computer program product may include a software program, a computer-readable storage medium in which the software program is stored. For example, a computer program product is a product (eg, a downloadable application) in the form of a software program distributed electronically through a manufacturer of a device or an electronic market (eg, Google Play Store, App Store). may include. For electronic distribution, at least a portion of the software program may be stored in a storage medium or may be temporarily generated. In this case, the storage medium may be a server of a manufacturer, a server of an electronic market, or a storage medium of a relay server temporarily storing a software program.

컴퓨터 프로그램 제품은, 서버(1000) 및 클라이언트 디바이스(2000)로 구성되는 시스템에서, 서버의 저장매체 또는 디바이스의 저장매체를 포함할 수 있다. 또는, 서버(1000) 또는 클라이언트 디바이스(2000)와 통신 연결되는 제3 디바이스(예, 스마트폰)가 존재하는 경우, 컴퓨터 프로그램 제품은 제3 디바이스의 저장매체를 포함할 수 있다. 또는, 컴퓨터 프로그램 제품은 서버(1000)로부터 디바이스 또는 제3 디바이스로 전송되거나, 제3 디바이스로부터 디바이스로 전송되는 소프트웨어 프로그램 자체를 포함할 수 있다.The computer program product may include a storage medium of a server or a storage medium of a device in a system including the server 1000 and the client device 2000 . Alternatively, when there is a third device (eg, a smartphone) that is communicatively connected to the server 1000 or the client device 2000 , the computer program product may include a storage medium of the third device. Alternatively, the computer program product may include a software program itself transmitted from the server 1000 to the device or a third device, or transmitted from the third device to the device.

이 경우, 서버(1000), 클라이언트 디바이스(2000) 및 제3 디바이스 중 하나가 컴퓨터 프로그램 제품을 실행하여 개시된 실시예들에 따른 방법을 수행할 수 있다. 또는, 서버(1000), 클라이언트 디바이스(2000) 및 제3 디바이스 중 둘 이상이 컴퓨터 프로그램 제품을 실행하여 개시된 실시예들에 따른 방법을 분산하여 실시할 수 있다.In this case, one of the server 1000 , the client device 2000 , and the third device may execute a computer program product to perform the method according to the disclosed embodiments. Alternatively, two or more of the server 1000 , the client device 2000 , and the third device may execute a computer program product to distribute the method according to the disclosed embodiments.

예를 들면, 서버(1000)가 메모리(1300, 도 2 참조)에 저장된 컴퓨터 프로그램 제품을 실행하여, 서버(1000)와 통신 연결된 클라이언트 디바이스(2000)가 개시된 실시예들에 따른 방법을 수행하도록 제어할 수 있다. For example, the server 1000 executes a computer program product stored in the memory 1300 (see FIG. 2 ) to control the client device 2000 communicatively connected with the server 1000 to perform the method according to the disclosed embodiments. can do.

또 다른 예로, 제3 디바이스가 컴퓨터 프로그램 제품을 실행하여, 제3 디바이스와 통신 연결된 디바이스가 개시된 실시예에 따른 방법을 수행하도록 제어할 수 있다. As another example, the third device may execute a computer program product to control the device communicatively connected to the third device to perform the method according to the disclosed embodiment.

제3 디바이스가 컴퓨터 프로그램 제품을 실행하는 경우, 제3 디바이스는 서버(1000)로부터 컴퓨터 프로그램 제품을 다운로드하고, 다운로드된 컴퓨터 프로그램 제품을 실행할 수 있다. 또는, 제3 디바이스는 프리로드된 상태로 제공된 컴퓨터 프로그램 제품을 실행하여 개시된 실시예들에 따른 방법을 수행할 수도 있다.When the third device executes the computer program product, the third device may download the computer program product from the server 1000 and execute the downloaded computer program product. Alternatively, the third device may execute the computer program product provided in a preloaded state to perform the method according to the disclosed embodiments.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 컴퓨터 시스템 또는 모듈 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible from the above description by those skilled in the art. For example, the described techniques are performed in an order different from the described method, and/or the described components, such as computer systems or modules, are combined or combined in a different form than the described method, other components or equivalents Appropriate results can be achieved even if substituted or substituted by

Claims

In a method for a server to learn a language model using text,
receiving input text input by a user from a client device;
identifying replacement target text requiring replacement from among at least one word included in the input text;
generating, with respect to the identified replacement target text, replacement text expected to be spoken by a user and having a high phonological similarity;
generating at least one learning candidate text by replacing the replacement target text among the input text with the generated replacement text;
training a natural language understanding model by using the input text and the at least one learning candidate text as training data;
A method comprising

According to claim 1,
The step of identifying the replacement target text includes:
receiving a user input for selecting at least one word or phrase from the input text from the client device;
identifying at least one word or phrase selected based on the user input from among the input text; and
determining the identified at least one word or phrase as the replacement target text;
A method comprising

According to claim 1,
The step of identifying the replacement target text includes:
parsing the received input text into words, morphemes, and phrases;
searching for at least one parsed word in a dictionary DB (Dictionary DB) including pronunciation sequence information for a plurality of words or information on embedding vectors; and
determining, as the replacement target text, a word that is not searched or has a frequency of use lower than a preset threshold based on a search result of the dictionary DB;
A method comprising

According to claim 1,
The step of identifying the replacement target text includes:
detecting a domain into which the input text is classified by analyzing the input text using a pre-learned natural language understanding model;
detecting an intent from the input text by interpreting the input text using the pre-learned natural language understanding model;
interpreting the input text using the pre-learned natural language understanding model, thereby identifying a slot in the input text and performing slot tagging; and
determining the text corresponding to the slot as the replacement target text;
A method comprising

5. The method of claim 4,
The generating of the at least one learning candidate text comprises:
determining the generated replacement text to be the same slot as the slot identified from the input text;
A method comprising

According to claim 1,
The step of generating the alternative text comprises:
extracting a phoneme sequence for the replacement target text;
searching for text similar to the extracted pronunciation column among words previously stored in a dictionary DB based on phonetic relevance of the pronunciation column; and
generating the replacement text using at least one text having a high similarity to the extracted pronunciation string based on a search result;
A method comprising

According to claim 1,
The step of generating the alternative text comprises:
converting the replacement target text into an embedding vector using a word embedding model;
generating text having a vector value similar to the transformed embedding vector by using a neural network model; and
generating the alternative text by using the generated text;
A method comprising

According to claim 1,
The step of generating the alternative text comprises:
converting the replacement target text into a wave signal using a text-to-speech (TTS) model;
outputting the converted sound signal;
converting the output sound signal into output text using an ASR model; and
generating the replacement text by replacing the replacement target text using the converted output text;
A method comprising

According to claim 1,
The step of learning the natural language understanding model comprises:
transmitting the input text and the at least one learning candidate text to the client device;
receiving, from the client device, an identification value for at least one text selected by a user input among the input text and the at least one learning candidate text;
selecting at least one of the input text and the at least one learning candidate text based on the received identification value; and
learning the natural language understanding model by using the selected at least one text as training data;
A method comprising

In a method for a client device to provide an application for learning a language model,
displaying a first graphical user interface (GUI) for receiving a user input for inputting input text for learning the language model on a display unit of the client device;
transmitting the input text received through the first GUI to a server;
displaying a second GUI for receiving a user input for selecting at least one of the at least one replacement target text identified from the input text;
receiving, from a server, at least one learning candidate text generated by replacing at least one replacement target text selected through the second GUI among the input text with an alternative text expected to be uttered by a user; and
displaying a third GUI for receiving a user input for selecting at least one of the at least one learning candidate text on the display unit;
A method comprising

11. The method of claim 10,
Display a fourth GUI for receiving a user input for selecting at least one of context information including at least one of age, gender, region, spoken language, and dialect of a user to be considered for generating the alternative text to do;
A method further comprising:

In a server for learning a language model using text,
a communication interface for performing data communication with the client device;
a memory storing a program including one or more instructions; and
a processor executing one or more instructions of a program stored in the memory;
including,
The processor is
receiving the input text input by the user from the client device through the communication interface, identifying the replacement target text requiring replacement from among at least one word included in the input text, with respect to the identified replacement target text, generating at least one learning candidate text by generating an alternative text expected to be spoken by a user and having a high phonological similarity, and replacing the replacement target text among the input text with the generated replacement text; , A server for learning a natural language understanding model (Natural Language Understanding) by using the input text and the at least one learning candidate text as training data.

13. The method of claim 12,
The processor is
receiving a user input for selecting at least one word or phrase among the input text from the client device using the communication interface;
identify at least one word or phrase selected based on the user input among the input text;
determining the identified at least one word or phrase as the replacement target text.

13. The method of claim 12,
The memory stores a dictionary DB (dictionary DB) including information on pronunciation sequence information or embedding vectors for a plurality of words,
The processor is
Parsing the received input text into words, morphemes, and phrases, and searching for the parsed at least one word in the dictionary DB;
A server that determines, as the replacement target text, a word that is not searched or has a frequency of use lower than a preset threshold based on a search result.

13. The method of claim 12,
The memory stores at least one pre-trained natural language understanding model,
The processor is
Detecting a domain and an intent into which the input text is classified by analyzing the input text using any one of the natural language understanding models pre-stored in the memory;
By interpreting the input text using the pre-learned natural language understanding model, a slot in the input text is identified, and slot tagging is performed,
Determining the text corresponding to the slot as the replacement target text, the server.

13. The method of claim 12,
The processor is
Extracting a phoneme sequence for the replacement target text, and searching for text similar to the extracted pronunciation sequence among words included in the dictionary DB pre-stored in the memory based on phonetic relevance of the pronunciation sequence and,
Based on a search result, the server generates the replacement text by using at least one text having a high similarity to the extracted pronunciation string.

13. The method of claim 12,
The memory stores a word embedding model,
The processor is
converting the replacement target text into an embedding vector using the word embedding model;
By using a neural network model, a text having a vector value similar to the transformed embedding vector is generated,
A server that generates the alternative text by using the generated text.

13. The method of claim 12,
The memory stores a TTS model (Text-to-Speech) that converts text into a wave signal,
The processor is
using the TTS model to convert the replacement target text into a sound signal, output the converted sound signal, and convert the output sound signal into an output text using the ASR model pre-stored in the memory;
and generating the replacement text by replacing the replacement target text using the converted output text.

13. The method of claim 12,
The processor is
transmit the input text and the at least one learning candidate text to the client device using the communication interface, and at least one selected by a user input among the input text and the at least one learning candidate text from the client device receive an identification value relating to the text;
select at least one text among the input text and the at least one learning candidate text based on the received identification value;
A server for learning the natural language understanding model by using the selected at least one text as training data.

A client device that provides an application for learning a language model, comprising:
display unit;
a user input unit configured to receive a user input for entering input text;
a communication interface for performing data communication with the server;
a memory storing a program including one or more instructions; and
a processor executing one or more instructions of a program stored in the memory;
including,
The processor is
control the display unit to display a first graphical user interface (GUI) for receiving a user input for inputting the input text, and select at least one of at least one replacement target text identified from the input text generating by controlling the display unit to display a second GUI for receiving a user input, and replacing at least one replacement target text selected through the second GUI among the input text with an alternative text expected to be uttered by the user controlling the communication interface to receive at least one learning candidate text from a server, and controlling the display unit to display a third GUI for receiving a user input for selecting at least one of the at least one learning candidate text, client device.

21. The method of claim 20,
The processor is
Display a fourth GUI for receiving a user input for selecting at least one of context information including at least one of age, gender, region, spoken language, and dialect of a user to be considered for generating the alternative text A client device that controls the display unit to do so.

A computer-readable recording medium in which a program for executing the method of claim 1 in a computer is recorded.

A computer-readable recording medium recording a program for executing the method of claim 10 on a computer.