KR20220064948A

KR20220064948A - Apparatus for generating text and method thereof

Info

Publication number: KR20220064948A
Application number: KR1020220058383A
Authority: KR
Inventors: 최병주; 홍지민
Original assignee: 휴멜로 주식회사
Priority date: 2020-02-25
Filing date: 2022-05-12
Publication date: 2022-05-19
Also published as: KR102398993B9; KR20210108293A; KR102398993B1; KR102173382B1

Abstract

Provided is a method for generating output text from input text using a text generation model. A text generation method according to one embodiment of the present invention includes the steps of: acquiring a latent variable by inputting data indicating at least one word included in input text into the text generation model; predicting a target cluster to which a first target word to be included in the output text belongs using the latent variable; and predicting the first target word using the target cluster and the latent variable.

Description

Apparatus and method for generating text

본 발명은 텍스트 생성 장치 및 방법에 관한 것이다. 보다 자세하게는, 신경망 기반의 텍스트 생성 모델을 이용한 텍스트 생성에 있어서, 생성되는 텍스트의 다양성을 촉진하는 방법 및 장치와, 그 텍스트 생성 모델을 구축하는 방법에 관한 것이다.The present invention relates to an apparatus and method for generating text. More particularly, it relates to a method and apparatus for facilitating the diversity of generated texts in text generation using a neural network-based text generation model, and a method for constructing the text generation model.

기계 번역, 기사 요약, 챗봇 등 다양한 분야에서 텍스트 생성 기술이 활용되고 있다.Text generation technology is being used in various fields, such as machine translation, article summarization, and chatbots.

전통적인 텍스트 생성 기술들 중 하나인, 규칙 기반(룰 기반) 텍스트 생성 기술은, 사전에 설정된 규칙들을 바탕으로 입력 문장 또는 단어에 대응하는 출력 텍스트를 생성한다. 규칙을 기반으로 텍스트를 생성하는 접근 방식에서는, 사람이 모든 규칙을 사전에 일일이 생성해야 한다. 그런데 인간의 언어는 수 많은 예외와 불확실성을 가진다는 점에서, 규칙 기반 텍스트 생성 방식은 한계를 가진다.A rule-based (rule-based) text generation technology, which is one of the traditional text generation techniques, generates output text corresponding to an input sentence or word based on preset rules. In the rule-based text generation approach, a person has to manually create all the rules in advance. However, in that human language has numerous exceptions and uncertainties, the rule-based text generation method has limitations.

머신 러닝 기술이 발전하고 다양한 분야에 적용되면서, 인공 신경망에 기반한 텍스트 생성 방법론들이 연구되고 있다. 인공 신경망 기반 텍스트 생성은, 예컨대 기존에 존재하는 방대한 분량의 다양한 예제 텍스트들로 구성된 학습 대상 말뭉치(corpus)를 이용하여, 입력 텍스트에 대응하는 텍스트를 출력하도록 텍스트 생성 신경망 모델을 비지도 방식으로 학습(unsupervised learning) 시킬 수 있다.As machine learning technology develops and is applied to various fields, text generation methodologies based on artificial neural networks are being studied. In the artificial neural network-based text generation, for example, an unsupervised method of learning a text generation neural network model to output a text corresponding to an input text using a learning target corpus composed of a vast amount of existing texts of various types. (unsupervised learning) can be done.

텍스트 생성 모델의 활용 목적에 따라, 학습 대상 말뭉치는, 예컨대 온라인 백과사전으로부터 추출된 문장들로 구성된 말뭉치, 기존 뉴스 기사들의 문장들로 구성된 말뭉치, 기존 노래 가사의 문장들로 구성된 말뭉치, 사람 사이의 일상적인 질의 응답들로 구성된 말뭉치 등, 다양한 내용과 특성을 가지는 서로 다른 말뭉치들일 수 있다.Depending on the purpose of the text generation model, the learning target corpus is, for example, a corpus composed of sentences extracted from an online encyclopedia, a corpus composed of sentences from existing news articles, a corpus composed of sentences from existing song lyrics, and The corpus may be different corpora having various contents and characteristics, such as a corpus composed of everyday questions and answers.

그런데 학습에 사용될 수 있는 대다수의 말뭉치에 포함된 단어들의 분포는 균일하지 못하다. 따라서 종래의 인공 신경망 기반 텍스트 생성 모델의 경우, 학습 대상 말뭉치에 높은 빈도로 등장하는 단어들을 지나치게 자주 생성한다는 한계가 있다. 예를 들어, 노래 가사 말뭉치로 학습된 텍스트 생성 모델의 경우 "사랑해", "그리워" 등 노래 가사에 자주 등장하는 단어들을 지나치게 자주 생성할 가능성이 높으며, 일상적인 질의 응답 말뭉치로 학습된 텍스트 생성 모델의 경우 "그렇습니다", "모르겠어요" 등의 표현들 위주로 텍스트를 생성할 가능성이 높다. 다시 말해, 인공 신경망이 생성하는 텍스트는 사람이 만들어 내는 텍스트에 비하여 다양성이 떨어진다.However, the distribution of words included in the majority of corpus that can be used for learning is not uniform. Therefore, in the case of the conventional artificial neural network-based text generation model, there is a limitation in that words appearing with high frequency in the learning target corpus are generated too frequently. For example, in the case of a text generation model trained with a corpus of song lyrics, it is highly likely that words that appear frequently in song lyrics such as “I love you” and “I miss you” are excessively frequent, and a text generation model trained with a routine Q&A corpus In the case of , there is a high possibility to generate text based on expressions such as "I do not know" and "I do not know". In other words, texts generated by artificial neural networks are less diverse than texts generated by humans.

따라서, 특정 단어로 편중되지 않고 다양하고 자연스러운 단어를 고르게 생성할 수 있는 텍스트 생성 방법이 요구된다.Accordingly, there is a need for a text generation method capable of evenly generating various and natural words without being biased toward a specific word.

한국등록특허 제10-2017229호 (2019.09.02. 공고)Korean Patent Registration No. 10-2017229 (2019.09.02. Announcement)

본 발명의 몇몇 실시예들을 통해 해결하고자 하는 기술적 과제는, 다양성이 증대된 시퀀스를 생성하는 장치 및 그 장치에서 수행되는 방법을 제공하는 것이다.A technical problem to be solved through some embodiments of the present invention is to provide an apparatus for generating a sequence with increased diversity and a method performed by the apparatus.

본 발명의 몇몇 실시예들을 통해 해결하고자 하는 다른 기술적 과제는, 다양성이 증대된 시퀀스를 생성할 수 있는 신경망 기반의 시퀀스 생성 모델을 구축하는 장치 및 그 장치에서 수행되는 방법을 제공하는 것이다.Another technical problem to be solved through some embodiments of the present invention is to provide an apparatus for constructing a neural network-based sequence generation model capable of generating a sequence with increased diversity, and a method performed by the apparatus.

본 발명의 몇몇 실시예들을 통해 해결하고자 하는 또 다른 기술적 과제는, 다양성이 증대된 시퀀스를 생성할 수 있는 신경망 기반의 시퀀스 생성 모델을 구축하기 위해 학습 대상 시퀀스를 군집화하는 장치 및 그 장치에서 수행되는 방법을 제공하는 것이다.Another technical problem to be solved through some embodiments of the present invention is an apparatus for clustering a learning target sequence in order to build a neural network-based sequence generation model capable of generating a sequence with increased diversity and an apparatus performed in the apparatus to provide a way

본 발명의 몇몇 실시예들을 통해 해결하고자 하는 또 다른 기술적 과제는, 다양성이 증대된 텍스트를 생성하는 장치 및 그 장치에서 수행되는 방법을 제공하는 것이다.Another technical problem to be solved through some embodiments of the present invention is to provide an apparatus for generating text with increased diversity and a method performed by the apparatus.

본 발명의 몇몇 실시예들을 통해 해결하고자 하는 또 다른 기술적 과제는, 다양성이 증대된 텍스트를 생성할 수 있는 신경망 기반의 텍스트 생성 모델을 구축하는 장치 및 그 장치에서 수행되는 방법을 제공하는 것이다.Another technical problem to be solved through some embodiments of the present invention is to provide an apparatus for constructing a neural network-based text generation model capable of generating text with increased diversity, and a method performed by the apparatus.

본 발명의 몇몇 실시예들을 통해 해결하고자 하는 또 다른 기술적 과제는, 다양성이 증대된 텍스트를 생성할 수 있는 신경망 기반의 텍스트 생성 모델을 구축하기 위해 학습 대상 말뭉치(corpus)를 군집화하는 장치 및 그 장치에서 수행되는 방법을 제공하는 것이다.Another technical problem to be solved through some embodiments of the present invention is an apparatus for clustering a learning target corpus in order to construct a neural network-based text generation model capable of generating text with increased diversity and an apparatus thereof to provide a way to do it.

본 발명의 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명의 기술분야에서의 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The technical problems of the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art from the following description.

상기 기술적 과제를 해결하기 위한, 본 발명의 일 실시예에 따른 텍스트 생성 방법은, 입력 텍스트에 포함된 적어도 하나의 단어를 가리키는 데이터를 텍스트 생성 모델에 입력하여 잠재 변수(latent variable)를 획득하는 단계와, 상기 잠재 변수를 이용하여 출력 텍스트에 포함될 제1 타깃 단어가 속하는 타깃 군집을 예측하는 단계와, 상기 타깃 군집 및 상기 잠재 변수를 이용하여 상기 제1 타깃 단어를 예측하는 단계를 포함한다.In order to solve the above technical problem, a text generation method according to an embodiment of the present invention includes inputting data indicating at least one word included in an input text into a text generation model to obtain a latent variable and predicting a target cluster to which a first target word to be included in an output text belongs by using the latent variable, and predicting the first target word using the target cluster and the latent variable.

일 실시예에서, 상기 텍스트 생성 모델은, 복수의 군집을 가지는 학습 대상 말뭉치(corpus)를 이용하여 학습된 것이고, 상기 복수의 군집들은, 각각의 군집 내에 포함된 단어들 각각의 상대 도수가 각각의 군집 내에서 균일하도록 상기 학습 대상 말뭉치가 분할된 것일 수 있다.In an embodiment, the text generation model is learned using a learning target corpus having a plurality of clusters, and the plurality of clusters have a relative frequency of each word included in each cluster. The learning target corpus may be divided to be uniform within the cluster.

몇몇 실시예에서, 상기 복수의 군집들은, 각각의 군집의 정규화된 엔트로피 값들의 평균이 최대가 되도록 상기 학습 대상 말뭉치가 분할된 것이며, 상기 각각의 군집의 정규화된 엔트로피 값은, 상기 각각의 군집에 포함된 단어들의 상대 도수에 기초하여 계산된 값일 수 있다.In some embodiments, in the plurality of clusters, the learning object corpus is divided such that the average of the normalized entropy values of each cluster is maximum, and the normalized entropy value of each cluster is in each of the clusters. It may be a value calculated based on the relative frequency of the included words.

몇몇 실시예에서, 상기 복수의 군집들은, 각각의 군집에 포함된 단어들의 상대 도수의 총합이 각각의 군집들 사이에 균일하도록 상기 학습 대상 말뭉치가 분할된 것일 수 있다.In some embodiments, in the plurality of clusters, the learning object corpus may be divided such that the sum of relative frequencies of words included in each cluster is uniform among the respective clusters.

일 실시예에서, 상기 학습 대상 말뭉치는, 복수의 군집을 가지는 것이고, 상기 텍스트 생성 모델은, 상기 학습 대상 말뭉치로부터 선택된 제1 단어 시퀀스의 마지막 단어를 제외한 제2 단어 시퀀스가 상기 텍스트 생성 모델에 입력될 때, 상기 텍스트 생성 모델에 의해 상기 마지막 단어가 예측될 제1 확률이 최대가 되도록 학습된 것이고, 상기 제1 확률은, 상기 제2 단어 시퀀스로부터 상기 마지막 단어가 속하는 군집이 예측되는 제2 확률과, 상기 제2 단어 시퀀스 및 상기 마지막 단어가 속하는 군집으로부터 상기 마지막 단어가 예측되는 제3 확률을 곱한 값일 수 있다.In an embodiment, the learning target corpus has a plurality of clusters, and in the text generation model, a second word sequence excluding the last word of the first word sequence selected from the training target corpus is input to the text generation model , it is learned by the text generation model such that a first probability that the last word is predicted is maximized, and the first probability is a second probability that a cluster to which the last word belongs is predicted from the second word sequence and a third probability that the last word is predicted from the cluster to which the second word sequence and the last word belong.

일 실시예에서, 상기 타깃 군집 및 상기 잠재 변수를 이용하여 상기 제1 타깃 단어를 예측하는 단계는, 상기 학습 대상 말뭉치의 단어들 중 상기 타깃 군집에 속하지 않는 단어들이 예측될 확률을 0으로 부여한 채, 상기 잠재 변수를 이용하여 상기 제1 타깃 단어를 샘플링하는 단계를 포함할 수 있다.In an embodiment, the predicting of the first target word using the target cluster and the latent variable may include, among the words of the learning target corpus, the predicted probability of words not belonging to the target cluster being 0 , sampling the first target word using the latent variable.

일 실시예에서, 상기 잠재 변수를 획득하는 단계는, 상기 제1 타깃 단어의 예측에 있어서 상기 입력 텍스트에 포함된 복수의 단어들 중에 집중할 부분을 가리키는 어텐션 정보가 반영된 상기 잠재 변수를 획득하는 단계를 포함할 수 있다.In an embodiment, the acquiring of the latent variable includes acquiring the latent variable in which attention information indicating a part to focus on among a plurality of words included in the input text is reflected in the prediction of the first target word. may include

일 실시예에서, 상기 방법은, 상기 예측된 제1 타깃 단어를 상기 텍스트 생성 모델에 입력하여 제2 타깃 단어를 예측하는 단계와, 상기 예측된 제1 타깃 단어 및 상기 예측된 제2 타깃 단어를 포함하는 출력 텍스트를 제공하는 단계를 더 포함할 수 있다.In one embodiment, the method comprises: inputting the predicted first target word into the text generation model to predict a second target word; The method may further include providing output text that includes.

상술한 기술적 과제를 해결하기 위한 본 발명의 다른 일 실시예에 따른 텍스트 생성 모델 학습 방법은, 학습 대상 말뭉치로부터 선택된 제1 단어 시퀀스의 마지막 단어를 제외한 제2 단어 시퀀스가 상기 텍스트 생성 모델에 입력될 때, 상기 텍스트 생성 모델에 의해 상기 마지막 단어가 예측될 제1 확률이 최대가 되도록 상기 텍스트 생성 모델을 학습시키는 단계를 포함한다. 이때, 상기 학습 대상 말뭉치는 복수의 군집을 가지는 말뭉치이고, 상기 제1 확률은, 상기 제2 단어 시퀀스로부터 상기 마지막 단어가 속하는 군집이 예측되는 제2 확률과, 상기 제2 단어 시퀀스 및 상기 마지막 단어가 속하는 군집으로부터 상기 마지막 단어가 예측되는 제3 확률을 곱한 값이다.In a method for learning a text generation model according to another embodiment of the present invention for solving the above-described technical problem, a second word sequence excluding the last word of a first word sequence selected from a learning target corpus is input to the text generation model. and training the text generation model such that a first probability that the last word is predicted by the text generation model is maximized. In this case, the learning target corpus is a corpus having a plurality of clusters, and the first probability includes a second probability that the cluster to which the last word belongs is predicted from the second word sequence, the second word sequence and the last word It is a value multiplied by a third probability that the last word is predicted from the cluster to which .

상술한 기술적 과제를 해결하기 위한 본 발명의 또 다른 일 실시예에 따른 텍스트 생성 장치는, 텍스트 생성 모델을 포함하며, 상기 텍스트 생성 모델은, 상기 입력 텍스트를 나타내는 데이터로부터 잠재 변수(latent variable)를 계산하는 인코더와, 상기 잠재 변수를 이용하여, 복수의 군집을 가지는 학습 대상 말뭉치(corpus)에 포함된 단어들 중 상기 출력 텍스트에 포함될 타깃 단어를 예측하는 디코더를 포함하며, 상기 디코더는, 상기 잠재 변수를 이용하여, 상기 복수의 군집들 중 상기 타깃 단어가 속하는 타깃 군집을 예측하는 군집 예측부와, 상기 잠재 변수 및 상기 타깃 군집으로부터 상기 타깃 단어를 예측하는 단어 예측부를 포함한다.A text generation apparatus according to another embodiment of the present invention for solving the above technical problem includes a text generation model, wherein the text generation model obtains a latent variable from data representing the input text. an encoder that calculates, and a decoder that predicts a target word to be included in the output text among words included in a learning object corpus having a plurality of clusters, using the latent variable, wherein the decoder includes: and a cluster predictor for predicting a target cluster to which the target word belongs among the plurality of clusters by using a variable, and a word predictor for predicting the target word from the latent variable and the target cluster.

일 실시예에서, 상기 텍스트 생성 모델은, 상기 디코더가 상기 타깃 단어를 예측함에 있어서 상기 입력 텍스트에 포함된 복수의 단어들 중 집중할 부분을 결정하는 어텐션 모듈을 더 포함할 수 있다.In an embodiment, the text generation model may further include an attention module for determining, by the decoder, a part to focus on among a plurality of words included in the input text when predicting the target word.

상술한 기술적 과제를 해결하기 위한 본 발명의 또 다른 일 실시예에 따른 시퀀스 생성 방법은, 입력 시퀀스의 적어도 일부 세그먼트를 상기 시퀀스 생성 모델에 입력하여 잠재 변수를 획득하는 단계와, 상기 잠재 변수를 이용하여 출력 시퀀스에 포함될 타깃 세그먼트가 속하는 타깃 군집을 예측하는 단계와, 상기 타깃 군집 및 상기 잠재 변수를 이용하여 상기 타깃 세그먼트를 예측하는 단계를 포함한다.A sequence generation method according to another embodiment of the present invention for solving the above-described technical problem includes the steps of: inputting at least some segments of an input sequence into the sequence generation model to obtain a latent variable; and predicting a target cluster to which a target segment to be included in an output sequence belongs, and predicting the target segment using the target cluster and the latent variable.

도 1은 본 발명의 일 실시예에 따른 텍스트 생성 장치의 입력 및 출력을 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시예에 따른 텍스트 생성 장치를 나타내는 예시적인 블록도이다.
도 3은 본 발명의 일 실시예에 따라 학습 대상 말뭉치를 이용하여 텍스트 생성 모델을 구축하고 텍스트를 생성하는 일련의 과정을 나타내는 예시적인 흐름도이다.
도 4는 본 발명의 일 실시예에 따른 텍스트 생성 모델의 학습에 사용되는 말뭉치를 군집화하는 방법을 나타내는 예시적인 흐름도이다.
도 5는 도 4를 참조하여 설명한 학습 대상 말뭉치의 군집화 방법을 설명하기 위한 도면이다.
도 6은 본 발명의 일 실시예에 따른 텍스트 생성 장치의 텍스트 생성 모델을 나타내는 예시적인 블록도이다.
도 7은 도 6을 참조하여 설명한 텍스트 생성 모델의 인코더 및 디코더를 설명하기 위한 예시적인 블록도이다.
도 8은 본 발명의 일 실시예에 따라 텍스트 생성 모델을 이용하여 텍스트를 생성하는 방법을 나타내는 예시적인 흐름도이다.
도 9는 본 발명의 일 실시예에 따른 텍스트 생성 장치의 동작과 텍스트 생성 모델 내부의 신경망 구조를 나타내는 도면이다.
도 10은 본 발명의 다양한 실시예들이 적용될 수 있는 응용 분야를 설명하기 위한 도면이다.
도 11은 본 발명의 몇몇 실시예들에 따른 텍스트 생성 장치를 구현할 수 있는 예시적인 컴퓨팅 장치를 설명하기 위한 도면이다.1 is a diagram for explaining input and output of a text generating apparatus according to an embodiment of the present invention.
2 is an exemplary block diagram illustrating an apparatus for generating text according to an embodiment of the present invention.
3 is an exemplary flowchart illustrating a series of processes for building a text generation model and generating text using a learning target corpus according to an embodiment of the present invention.
4 is an exemplary flowchart illustrating a method of clustering a corpus used for training a text generation model according to an embodiment of the present invention.
FIG. 5 is a diagram for explaining a method of clustering a learning target corpus described with reference to FIG. 4 .
6 is an exemplary block diagram illustrating a text generation model of a text generation apparatus according to an embodiment of the present invention.
7 is an exemplary block diagram illustrating an encoder and a decoder of the text generation model described with reference to FIG. 6 .
8 is an exemplary flowchart illustrating a method of generating text using a text generation model according to an embodiment of the present invention.
9 is a diagram illustrating an operation of a text generation apparatus according to an embodiment of the present invention and a structure of a neural network inside a text generation model.
10 is a diagram for explaining an application field to which various embodiments of the present invention can be applied.
11 is a diagram for explaining an exemplary computing device that can implement the text generating device according to some embodiments of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예들을 상세히 설명한다. 본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명의 기술적 사상은 이하의 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 이하의 실시예들은 본 발명의 기술적 사상을 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명의 기술적 사상은 청구항의 범주에 의해 정의될 뿐이다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the technical spirit of the present invention is not limited to the following embodiments, but may be implemented in various different forms, and only the following embodiments complete the technical spirit of the present invention, and in the technical field to which the present invention belongs It is provided to fully inform those of ordinary skill in the art of the scope of the present invention, and the technical spirit of the present invention is only defined by the scope of the claims.

각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.In adding reference numerals to the components of each drawing, it should be noted that the same components are given the same reference numerals as much as possible even though they are indicated on different drawings. In addition, in describing the present invention, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present invention, the detailed description thereof will be omitted.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다. 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다.Unless otherwise defined, all terms (including technical and scientific terms) used herein may be used with the meaning commonly understood by those of ordinary skill in the art to which the present invention belongs. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless clearly defined in particular. The terminology used herein is for the purpose of describing the embodiments and is not intended to limit the present invention. In this specification, the singular also includes the plural unless specifically stated otherwise in the phrase.

또한, 본 발명의 구성 요소를 설명하는 데 있어서, 제1, 제2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 어떤 구성 요소가 다른 구성요소에 "연결", "결합" 또는 "접속"된다고 기재된 경우, 그 구성 요소는 그 다른 구성요소에 직접적으로 연결되거나 또는 접속될 수 있지만, 각 구성 요소 사이에 또 다른 구성 요소가 "연결", "결합" 또는 "접속"될 수도 있다고 이해되어야 할 것이다.In addition, in describing the components of the present invention, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are only for distinguishing the elements from other elements, and the essence, order, or order of the elements are not limited by the terms. When it is described that a component is “connected”, “coupled” or “connected” to another component, the component may be directly connected or connected to the other component, but another component is formed between each component. It should be understood that elements may also be “connected,” “coupled,” or “connected.”

명세서에서 사용되는 "포함한다 (comprises)" 및/또는 "포함하는 (comprising)"은 언급된 구성 요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성 요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.As used herein, "comprises" and/or "comprising" refers to the presence of one or more other components, steps, operations and/or elements mentioned. or addition is not excluded.

본 명세서에 대한 설명에 앞서, 본 명세서에서 사용되는 몇몇 용어들에 대하여 명확하게 하기로 한다.Prior to the description of the present specification, some terms used in the present specification will be clarified.

본 명세서에서, 텍스트란 하나 이상의 단어들로 구성된 구절 및 문장 등을 포함하는 개념이다.In the present specification, a text is a concept including phrases and sentences composed of one or more words.

본 명세서에서, 토큰이란 의미를 가지는 문장 구성 단위로서, 본 발명의 실시예들의 구현예에 따라서는 어절, 단어, 형태소, 또는 그 보다 더 작은 단위에 해당할 수 있다. 본 명세서에서 토큰과 단어는, 텍스트를 구성하는 단위를 가리키는 범위 내에서 서로 혼용될 수 있다.In the present specification, a token is a sentence construction unit having a meaning, and may correspond to a word, a word, a morpheme, or a smaller unit than that according to an embodiment of the present invention. In this specification, tokens and words may be used interchangeably within a range indicating a unit constituting text.

본 명세서에서, 시퀀스(sequence)란 순서를 가지는 일련의 데이터들의 모음 또는 선형적으로 배열될 수 있는 일련의 데이터들의 모음을 의미한다. 본 명세서에서, 세그먼트(segment)란 시퀀스를 구성하는 단위를 가리킨다. 예를 들어, 단어들이 순서대로 나열된 구절, 문장, 텍스트 등은 본 명세서에서 텍스트 시퀀스로 이해될 수 있으며, 토큰이나 단어는 세그먼트로 이해될 수 있다. 다른 예로서, 악곡의 악보는 시퀀스에 대응될 수 있으며, 악보를 구성하는 소절, 마디, 음표 등은 세그먼트에 대응될 수 있다. 또한 악곡의 코드(chord)들의 나열은 시퀀스에 대응될 수 있으며, 하나 하나의 코드 또는 몇몇 코드들의 묶음은 세그먼트에 대응될 수 있다.In this specification, a sequence means a collection of a series of data having an order or a collection of a series of data that can be arranged linearly. In this specification, a segment refers to a unit constituting a sequence. For example, a phrase, sentence, text, etc. in which words are arranged in order may be understood as a text sequence herein, and a token or word may be understood as a segment. As another example, the score of a piece of music may correspond to a sequence, and a measure, bar, or note constituting the score may correspond to a segment. In addition, the arrangement of chords of a piece of music may correspond to a sequence, and one chord or a bundle of several chords may correspond to a segment.

이하, 본 발명의 몇몇 실시예들에 대하여 첨부된 도면에 따라 상세하게 설명한다.Hereinafter, some embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 텍스트 생성 장치의 입력 및 출력을 설명하기 위한 도면이다.1 is a diagram for explaining input and output of a text generating apparatus according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 텍스트 생성 장치(10)는 입력 텍스트(1)를 획득하여, 이를 기초로 텍스트를 생성하여 출력 텍스트(3)를 제공하는 컴퓨팅 장치이다. 텍스트 생성 장치(10)는, 예컨대 주요 키워드를 입력하면 가상의 신문 기사를 자동으로 생성하는 장치, 노래의 제목을 입력하면 노래 가사를 창작하는 장치, 소설의 도입부를 입력하면 소설의 나머지 부분을 창작하는 장치, 사용자의 발화를 입력 받으면 응답을 출력하는 무인 대화 장치 또는 챗봇 등일 수 있다.As shown in FIG. 1 , the text generating device 10 is a computing device that obtains an input text 1 , generates text based thereon, and provides an output text 3 . The text generating device 10 includes, for example, a device that automatically generates a virtual newspaper article when a key keyword is input, a device that creates song lyrics when a song title is input, and a device that creates the rest of a novel when an introductory part of a novel is input It may be an unmanned conversational device or chatbot that outputs a response when receiving a user's utterance.

상기 컴퓨팅 장치는 노트북, 데스크톱(desktop), 랩탑(laptop) 등이 될 수 있으나, 이에 국한되는 것은 아니며 컴퓨팅 기능이 구비된 모든 종류의 장치를 포함할 수 있다. 상기 컴퓨팅 장치의 일 예는 도 11을 더 참조하도록 한다.The computing device may be a notebook, desktop, or laptop, but is not limited thereto and may include any type of device having a computing function. An example of the computing device is further referenced to FIG. 11 .

도 1은 텍스트 생성 장치(10)가 단일 컴퓨팅 장치로 구현된 것을 예로써 도시하고 있으나, 텍스트 생성 장치(10)의 제1 기능은 제1 컴퓨팅 장치에서 구현되고, 제2 기능은 제2 컴퓨팅 장치에서 구현될 수도 있다.1 shows as an example that the text generating device 10 is implemented as a single computing device, a first function of the text generating device 10 is implemented in a first computing device, and a second function of the text generating device 10 is implemented in a second computing device may be implemented in

본 발명의 다양한 실시예들에 따르면, 텍스트 생성 장치(10)는 다양한 텍스트를 생성하기 위해서 신경망(neural network) 기반의 텍스트 생성 모델을 구축하고, 텍스트 생성 모델을 통해 출력 텍스트(3)를 생성할 수 있다.According to various embodiments of the present disclosure, the text generation apparatus 10 builds a neural network-based text generation model to generate various texts, and generates the output text 3 through the text generation model. can

텍스트 생성 모델은, 기존에 존재하는 예제 텍스트들로 구성된 학습 대상 말뭉치(corpus)를 이용하여 비지도 방식으로 학습될 수 있다.The text generation model may be learned in an unsupervised manner using a learning target corpus composed of existing example texts.

예를 들어, 기사 전문을 입력하면 요약문을 출력하는 텍스트 생성 모델의 경우, 학습 대상 기사 전문 및 이에 대응되는 학습 대상 요약문들의 수많은 쌍들로 구성된 말뭉치를 이용하여 학습될 수 있다. 구체적으로, 학습 대상 기사 전문이 입력되면 학습 대상 요약문이 출력되도록 텍스트 생성 모델의 파라미터들을 조정함으로써, 기사를 요약하는 텍스트 생성 모델이 학습될 수 있다.For example, in the case of a text generation model that outputs a summary when a full article is input, learning may be performed using a corpus composed of numerous pairs of a full text of a learning target article and a corresponding learning target summary sentence. Specifically, by adjusting the parameters of the text generation model so that a learning target summary is output when the full text of the learning target article is input, the text generation model summarizing the article may be learned.

또 다른 예를 들면, 노래 가사를 창작하는 텍스트 생성 모델의 경우, 학습 대상 가사들로 구성된 말뭉치를 이용하여 학습될 수 있다. 구체적으로, 학습 대상 가사의 첫 단어가 입력되면 학습 대상 가사의 두 번째 단어가 출력되고, 학습 대상 가사의 첫 단어와 두 번째 단어가 입력되면 학습 대상 가사의 세 번째 단어가 출력되도록 하는 등의 방식으로, 입력된 텍스트의 바로 다음 단어가 예측되도록 텍스트 생성 모델의 파라미터들을 조정함으로써, 노래 가사를 창작하는 텍스트 생성 모델이 학습될 수 있다.As another example, in the case of a text generation model for creating song lyrics, it may be learned using a corpus composed of lyrics to be learned. Specifically, when the first word of the learning target lyrics is input, the second word of the learning target lyrics is output, and when the first and second words of the learning target lyrics are input, the third word of the learning target lyrics is output, etc. Thus, by adjusting the parameters of the text generation model so that the immediately next word of the inputted text is predicted, the text generation model for creating song lyrics can be trained.

본 발명의 다양한 실시예들에 따른 텍스트 생성 모델의 학습 방법에 대해서는 도 6을 참조하여 후술한다.A method of learning a text generation model according to various embodiments of the present invention will be described later with reference to FIG. 6 .

이하에서는 본 발명의 일 실시예에 따른 텍스트 생성 장치(10)의 기능적인 구성에 대하여 도 2를 참조하여 설명한다.Hereinafter, a functional configuration of the text generating apparatus 10 according to an embodiment of the present invention will be described with reference to FIG. 2 .

도 2에 도시된 바와 같이, 텍스트 생성 장치(10)는 입력부(100), 텍스트 생성 모델(200), 출력부(300), 및 저장부(400)를 포함할 수 있다. 다만, 도 2에는 본 발명의 실시예와 관련 있는 구성 요소들만이 도시되어 있다. 따라서, 본 발명이 속한 기술분야의 통상의 기술자라면 도 2에 도시된 구성요소들 외에 다른 범용적인 구성 요소들이 더 포함될 수 있음을 알 수 있다. 또한, 도 2에 도시된 텍스트 생성 장치(10)의 각각의 구성 요소들은 기능적으로 구분되는 기능 요소들을 나타낸 것으로서, 복수의 구성 요소가 실제 물리적 환경에서는 서로 통합되는 형태로 구현될 수도 있음에 유의한다. 이하, 각 구성요소에 대하여 상세하게 설명한다.As shown in FIG. 2 , the text generating apparatus 10 may include an input unit 100 , a text generating model 200 , an output unit 300 , and a storage unit 400 . However, only the components related to the embodiment of the present invention are illustrated in FIG. 2 . Accordingly, one of ordinary skill in the art to which the present invention pertains can see that other general-purpose components other than those shown in FIG. 2 may be further included. In addition, it is noted that each component of the text generating apparatus 10 shown in FIG. 2 represents functionally separated functional elements, and a plurality of components may be implemented in a form integrated with each other in an actual physical environment. . Hereinafter, each component will be described in detail.

입력부(100)는 텍스트를 입력 받고, 입력 받은 텍스트에 대한 전처리를 수행한다. 입력받은 텍스트에 대한 전처리는, 입력 받은 텍스트를 단어(또는 토큰) 단위로 분할하는 과정을 포함한다. 입력부(100)는 입력 받은 텍스트를 단어 단위로 쪼개어 토큰화된 텍스트(71)를 표현하는 데이터를 텍스트 생성 모델(200)에 제공할 수 있다. 텍스트 생성 장치(10)가 학습 대상 말뭉치의 군집화를 수행하는 몇몇 실시예에서는, 입력부(100)가 학습 대상 말뭉치를 입력 받는 기능을 더 수행할 수 있다.The input unit 100 receives text and performs pre-processing on the received text. The preprocessing of the input text includes a process of dividing the input text into word (or token) units. The input unit 100 may provide the text generation model 200 with data representing the tokenized text 71 by dividing the received text into word units. In some embodiments in which the text generating apparatus 10 performs clustering of a learning object corpus, the input unit 100 may further perform a function of receiving an input of the learning object corpus.

텍스트 생성 모델(200)은 입력부(100)로부터 단어 단위로 제공받은 입력 텍스트에 기초하여, 새로운 단어를 생성한다. 텍스트 생성 모델(200)에 의해 생성된 단어는 출력부(300)로 제공되며, 또한 텍스트 생성 모델(200)에 다시 입력되어, 상기 생성된 단어의 후속 단어를 생성하는데 사용될 수도 있다.The text generation model 200 generates a new word based on the input text received from the input unit 100 in word units. The word generated by the text generation model 200 is provided to the output unit 300 , and may also be input back into the text generation model 200 and used to generate a subsequent word of the generated word.

텍스트 생성 모델(200)은, 입력 텍스트를 저차원의 벡터로 임베딩하고 이로부터 잠재 변수를 계산하는 인코더 및 잠재 변수에 기초하여 출력 텍스트를 샘플링하는 디코더로 구성될 수 있다. 텍스트 생성 모델(200)의 인코더 및 디코더는 하나 또는 그 이상의 인코더들과 디코더들을 포함하는 트랜스포머(transformer) 모델 등을 활용하여 구성될 수 있는데, 본 발명이 그러한 실시예로 한정되는 것은 아니다. 예를 들어 텍스트 생성 모델(200)의 인코더 및 디코더는 순환 신경망(RNN: Recurrent Neural Network) 또는 장단기 메모리(LSTM: Long Short-Term Memory) 모델 등을 활용하여 구성될 수도 있다. 텍스트 생성 모델(200)의 세부 사항에 대해서는 도 6 및 도 7을 참조하여 후술하기로 한다.The text generation model 200 may consist of an encoder that embeds input text into a low-dimensional vector and computes latent variables therefrom, and a decoder that samples the output text based on the latent variables. The encoder and decoder of the text generation model 200 may be configured using a transformer model including one or more encoders and decoders, and the like, but the present invention is not limited to such an embodiment. For example, the encoder and decoder of the text generation model 200 may be configured using a Recurrent Neural Network (RNN) or Long Short-Term Memory (LSTM) model. Details of the text generation model 200 will be described later with reference to FIGS. 6 and 7 .

출력부(300)는 텍스트 생성 모델(200)이 생성한 단어들에 대한 후처리를 수행한다. 예를 들어 출력부(300)는 텍스트 생성 모델(200)이 순차적으로 생성한 단어들을 이어 붙여서(concatenate) 구절 또는 문장 단위의 출력 텍스트를 출력할 수 있다.The output unit 300 performs post-processing on the words generated by the text generation model 200 . For example, the output unit 300 may output the output text in units of phrases or sentences by concatenating the words sequentially generated by the text generation model 200 .

저장부(400)는 각종 데이터를 저장하고 관리한다. 특히 저장부(400)는 텍스트 생성 모델(200)의 학습에 사용되는 학습 대상 말뭉치(410)를 저장할 수 있다. 또한 저장부(400)는 텍스트 생성 모델(200)을 구성하는 신경망에 관한 각종 파라미터 및 설정들을 저장하고 관리할 수 있다.The storage unit 400 stores and manages various data. In particular, the storage unit 400 may store the learning target corpus 410 used for learning the text generation model 200 . Also, the storage unit 400 may store and manage various parameters and settings related to the neural network constituting the text generation model 200 .

지금까지 도 1 및 도 2를 참조하여 본 발명의 일 실시예에 따른 텍스트 생성 장치(10)의 기능적인 구성과 입출력을 설명하였다. 이하에서는, 본 발명의 다른 일 실시예에 따라, 텍스트 생성 장치(10)의 텍스트 생성 모델(200)을 구축하고, 입력 텍스트에 기초하여 출력 텍스트를 생성하는 일련의 과정을 설명한다.So far, the functional configuration and input/output of the text generating apparatus 10 according to an embodiment of the present invention have been described with reference to FIGS. 1 and 2 . Hereinafter, a series of processes of building the text generation model 200 of the text generating apparatus 10 and generating output text based on the input text according to another embodiment of the present invention will be described.

도 3은 본 발명의 일 실시예에 따라 학습 대상 말뭉치를 이용하여 텍스트 생성 모델을 구축하고 텍스트를 생성하는 일련의 과정을 나타내는 예시적인 흐름도이다. 단, 이는 본 발명의 목적을 달성하기 위한 바람직한 실시예일 뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다.3 is an exemplary flowchart illustrating a series of processes for building a text generation model and generating text using a learning target corpus according to an embodiment of the present invention. However, this is only a preferred embodiment for achieving the object of the present invention, and it goes without saying that some steps may be added or deleted as needed.

도 3에 도시된 텍스트 생성 방법의 각 단계는 예컨대 텍스트 생성 장치(10)와 같은 컴퓨팅 장치에 의해 수행될 수 있다. 다시 말하면, 상기 텍스트 생성 방법의 각 단계는 컴퓨팅 장치의 프로세서에 의해 실행되는 하나 이상의 인스트럭션들로 구현될 수 있다. 상기 텍스트 생성 방법에 포함되는 모든 단계는 하나의 물리적인 컴퓨팅 장치에 의하여 실행될 수도 있을 것이나, 상기 방법의 제1 단계들은 제1 컴퓨팅 장치에 의하여 수행되고, 상기 방법의 제2 단계들은 제2 컴퓨팅 장치에 의하여 수행될 수도 있다. 예컨대, 도 3에 도시된 복수의 군집 식별 단계(S100), 텍스트 생성 모델 학습 단계(S200), 텍스트 생성 단계(S300)는 서로 다른 컴퓨팅 장치에 의해 수행될 수도 있다. 이하에서는, 상기 텍스트 생성 방법의 각 단계가 텍스트 생성 장치(10)에 의해 수행되는 것을 가정하여 설명을 이어가도록 한다. 다만, 설명의 편의를 위해, 상기 텍스트 생성 방법에 포함되는 각 단계의 동작 주체는 그 기재가 생략될 수도 있다.Each step of the text generating method shown in FIG. 3 may be performed by, for example, a computing device such as the text generating device 10 . In other words, each step of the text generating method may be implemented with one or more instructions executed by a processor of a computing device. All steps included in the text generating method may be executed by one physical computing device, but the first steps of the method are performed by a first computing device, and the second steps of the method are performed by a second computing device may be performed by For example, the step of identifying a plurality of clusters ( S100 ), the step of learning the text generation model ( S200 ), and the step of generating the text ( S300 ) illustrated in FIG. 3 may be performed by different computing devices. Hereinafter, it is assumed that each step of the text generating method is performed by the text generating apparatus 10 and the description will be continued. However, for convenience of description, the description of the operating subject of each step included in the text generating method may be omitted.

도 3에 도시된 바와 같이, 본 실시예에 따른 텍스트 생성 방법은, 텍스트 생성 모델이 학습할 말뭉치를 복수의 군집으로 나누는 과정, 학습 대상 말뭉치를 사용하여 텍스트 생성 모델을 구축하는 학습 과정, 및 상기 텍스트 생성 모델을 통해 입력 텍스트로부터 출력 텍스트를 생성하는 과정으로 구성될 수 있다.As shown in FIG. 3 , the text generation method according to the present embodiment includes a process of dividing a corpus to be learned by the text generation model into a plurality of clusters, a learning process of constructing a text generation model using the learning target corpus, and the It may consist of a process of generating output text from input text through a text generation model.

종래의 텍스트 생성 모델 구축 방법과는 달리, 본 발명의 몇몇 실시예들에서는 학습 대상 말뭉치를 사용하여 텍스트 생성 모델을 학습시키기에 앞서서, 먼저 학습 대상 말뭉치가 군집화된다(단계 S100). 보다 구체적으로, 각각의 군집에 포함된 단어들의 분포가 최대한 균일해지도록 하는 방식으로 학습 대상 말뭉치가 복수의 군집으로 구분될 수 있다. 이때, 학습 대상 말뭉치가 각각의 군집에 대응되는 복수의 데이터 세트들로 분할되어야 하는 것은 아니며, 학습 대상 말뭉치에 포함된 각각의 단어 또는 토큰들이 속하는 군집이 서로 구분되어 식별될 수 있으면 충분하다. 단계 S100에서 학습 대상 말뭉치를 군집화하는 과정은 텍스트 생성 장치(10)에 의해 수행될 수도 있지만, 별도의 컴퓨팅 장치에 의해 수행되어 텍스트 생성 장치(10)로 제공될 수도 있다.Unlike the conventional method for constructing a text generation model, in some embodiments of the present invention, prior to training the text generation model using the training target corpus, the training target corpus is first clustered (step S100 ). More specifically, the learning target corpus may be divided into a plurality of clusters in such a way that the distribution of words included in each cluster is as uniform as possible. In this case, the learning target corpus does not have to be divided into a plurality of data sets corresponding to each cluster, and it is sufficient if the clusters to which each word or tokens included in the learning target corpus belong can be distinguished from each other and identified. The process of clustering the learning target corpus in step S100 may be performed by the text generating apparatus 10 or may be performed by a separate computing device and provided to the text generating apparatus 10 .

학습 대상 말뭉치의 군집화 과정에 대해서는 도 4를 참조하여 보다 상세히 설명하기로 한다.The clustering process of the learning target corpus will be described in more detail with reference to FIG. 4 .

단계 S200에서는, 단계 S100에서 복수의 군집들이 식별된 학습 대상 말뭉치를 이용하여 텍스트 생성 모델(200)이 학습된다. 이때, 학습 대상 말뭉치로부터 식별된 복수의 군집 별로 각각 서로 다른 복수의 텍스트 생성 모델(200)이 학습되는 것이 아님에 유의한다. 텍스트 생성 모델(200)은, 학습 대상 말뭉치에 포함된 기존 문장들로부터 선택된 입력 텍스트로부터, 타깃 단어가 속하는 타깃 군집을 예측하고, 예측된 군집 내에서 타깃 단어를 예측할 수 있도록 학습된다. 여기서 상기 타깃 단어란, 예컨대 상기 학습 대상 말뭉치에서 선택된 상기 입력 텍스트가 속하는 문장 내에서 상기 입력 텍스트의 바로 뒤에 이어지는 단어일 수 있다.In step S200, the text generation model 200 is learned using the learning target corpus in which a plurality of clusters are identified in step S100. In this case, it is noted that a plurality of text generation models 200 different from each other for each of a plurality of clusters identified from the learning target corpus are not learned. The text generation model 200 is trained to predict a target cluster to which a target word belongs from an input text selected from existing sentences included in a learning target corpus, and to predict a target word within the predicted cluster. Here, the target word may be, for example, a word immediately following the input text in a sentence to which the input text selected from the learning target corpus belongs.

본 발명의 다양한 실시예들에 따른 텍스트 생성 모델(200)의 학습 과정에 대해서는 도 6을 참조하여 후술하기로 한다.The learning process of the text generation model 200 according to various embodiments of the present invention will be described later with reference to FIG. 6 .

단계 S300에서는, 학습된 텍스트 생성 모델에 텍스트가 입력되고, 이에 대응되는 출력 텍스트가 생성된다. 단계 S300은, 입력된 텍스트를 전처리하는 단계, 전처리된 텍스트를 텍스트 생성 모델(200)에 입력하는 단계, 텍스트 생성 모델이 출력한 단어 또는 텍스트를 처리하여 출력하는 단계 등의 세부 과정들을 포함할 수 있다. 본 발명의 다양한 실시예들에 따른 텍스트 생성 과정에 대해서는 도 8 및 도 9를 참조하여 후술하기로 한다.In step S300, text is input to the learned text generation model, and output text corresponding thereto is generated. Step S300 may include detailed processes such as pre-processing the input text, inputting the pre-processed text into the text generation model 200, processing and outputting the word or text output by the text generation model. there is. A text generation process according to various embodiments of the present invention will be described later with reference to FIGS. 8 and 9 .

지금까지 도 3를 참조하여 본 발명의 일 실시예에 따라 학습 대상 말뭉치를 이용하여 텍스트 생성 모델을 구축하고 텍스트를 생성하는 일련의 과정에 대하여 설명하였다. 이하에서는, 도 4 및 도 5를 참조하여 단계 S100에서 수행될 수 있는 학습 대상 말뭉치의 군집화 방법에 대하여 보다 상세하게 설명하도록 한다.Up to now, a series of processes for constructing a text generation model and generating text using a learning target corpus according to an embodiment of the present invention has been described with reference to FIG. 3 . Hereinafter, a method of clustering a learning target corpus that can be performed in step S100 will be described in more detail with reference to FIGS. 4 and 5 .

도 4는 본 발명의 일 실시예에 따른 텍스트 생성 모델(200)이 학습하는 학습 대상 말뭉치를 군집화하는 과정을 나타내는 예시적인 흐름도이다. 단, 이는 본 발명의 목적을 달성하기 위한 바람직한 실시예일 뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다.4 is an exemplary flowchart illustrating a process of clustering a learning target corpus to be learned by the text generation model 200 according to an embodiment of the present invention. However, this is only a preferred embodiment for achieving the object of the present invention, and it goes without saying that some steps may be added or deleted as needed.

전술한 바와 같이, 본 발명의 일 실시예에 따르면, 학습 대상 말뭉치를 사용하여 텍스트 생성 모델(200)을 학습시키기에 앞서서, 상기 학습 대상 말뭉치는 각각의 군집에 포함된 단어들의 분포가 최대한 균일해지도록 하는 방식으로 복수의 군집으로 구분된다.As described above, according to an embodiment of the present invention, prior to learning the text generation model 200 using the learning object corpus, the distribution of words included in each cluster is as uniform as possible in the learning object corpus. It is divided into a plurality of clusters in such a way that the

도 4에 도시된 바와 같이, 상기 학습 대상 말뭉치의 군집화 방법은 학습 대상 말뭉치를 토큰화하는 단계 S110에서 시작된다. 보다 구체적으로, 학습 대상 말뭉치에 포함된 문장 또는 텍스트들을 구성하는 각각의 단어 또는 토큰들이 식별된다. As shown in FIG. 4 , the method of clustering the learning object corpus starts in step S110 of tokenizing the learning object corpus. More specifically, each word or token constituting the sentences or texts included in the learning target corpus is identified.

단계 S120에서는, 각각의 단어들의 발생 빈도(또는 발생 회수)가 식별된다. 또한 각각의 단어들은 발생 빈도가 높은 순으로 정렬될 수 있다.In step S120, the frequency of occurrence (or the number of occurrences) of each word is identified. In addition, each word may be sorted in the order of occurrence frequency.

도 5에 도시된 그래프(41)는 예시적인 학습 대상 말뭉치에 포함된 각각의 단어들의 발생 빈도를, 발생 빈도가 높은 단어부터 낮은 단어의 순으로 나타낸 예시적인 그래프이다. 그래프(41)의 x축은 발생 빈도 순위를 나타내고 y축은 발생 빈도를 나타낸다. The graph 41 shown in FIG. 5 is an exemplary graph showing the frequency of occurrence of each word included in the exemplary learning target corpus, in order from the word with the highest frequency of occurrence to the word with the lowest frequency of occurrence. The x-axis of the graph 41 represents the frequency of occurrence ranking, and the y-axis represents the frequency of occurrence.

일상적으로 사용되는 언어에 있어서, 다양한 단어들이 사용되는 빈도는 균일하지 않다. 예를 들어 한국어의 단어들 중에 조사(postpositional particle)들의 사용 빈도가 다른 단어들보다 높고, 영어의 단어들 중에 관사(article), 대명사(pronoun), 전치사(preposition), 접속사(conjunction)들의 사용 빈도는 다른 단어들보다 월등히 높다. 또한, 각 단어가 사용되는 빈도는 서로 현격한 차이를 가지는 것이 일반적이다. 따라서 통상적으로 이용 가능한 학습 대상 말뭉치에 포함된 단어들의 발생 빈도를 카운트하면, 도 5의 그래프(41)에 나타난 것처럼 특정 단어들의 발생 빈도가 현격히 높은 분포를 가질 가능성이 크다. 다시 말해, 학습 대상 말뭉치에 포함된 단어들의 발생 빈도가 균일하지 않을 가능성이 크다. 전술한 바와 같이, 특정 단어들이 차지하는 비중이 매우 큰, 통상적인 말뭉치를 이용하여 텍스트 생성 모델(200)을 학습시킬 경우, 텍스트 생성 모델(200)에 의해 생성되는 텍스트도 상기 특정 단어들로 지나치게 편중되는 문제를 가진다.In a language used in daily life, the frequency of use of various words is not uniform. For example, among Korean words, the frequency of use of postpositional particles is higher than that of other words, and among English words, the frequency of use of articles, pronouns, prepositions, and conjunctions is higher. is significantly higher than other words. In addition, it is common that the frequency of using each word has a marked difference from each other. Therefore, if the frequency of occurrence of words included in the commonly available learning target corpus is counted, as shown in the graph 41 of FIG. 5 , there is a high probability that the occurrence frequency of specific words has a significantly high distribution. In other words, there is a high possibility that the occurrence frequency of the words included in the learning target corpus is not uniform. As described above, when the text generation model 200 is trained using a typical corpus, in which specific words occupy a very large proportion, the text generated by the text generation model 200 is also excessively biased toward the specific words. have the problem of being

단계 S130에서는 학습 대상 말뭉치가 복수의 군집들로 분할될 수 있다. 상기 복수의 군집들은, 각각의 군집에 속하는 단어들의 상대 도수(relative frequency)가 가급적 균일해지도록 하는 방식으로 결정될 수 있다. 다만 전술한 바와 같이, 학습 대상 말뭉치가 실제로 복수의 데이터 세트들로 분할되어야 하는 것은 아님에 유의한다. 학습 대상 말뭉치에 포함된 각각의 단어 또는 토큰들이 어느 군집에 속하는지 식별될 수 있으면 충분하다.In step S130, the learning target corpus may be divided into a plurality of clusters. The plurality of clusters may be determined in such a way that the relative frequencies of words belonging to each cluster are as uniform as possible. However, it should be noted that, as described above, the learning target corpus does not actually have to be divided into a plurality of data sets. It is sufficient if each word or token included in the learning target corpus can be identified to which cluster it belongs.

도 5는 단계 S130에서 학습 대상 말뭉치(C)가 복수의 군집들(C₁ 내지 C₄)로 군집화 된 예시적인 모습을 도시한다. 도 5의 그래프(42a 내지 42d)를 참조하면, 군집(C₁)에는 발생 빈도가 상대적으로 높은 단어들이 포함되고, 군집(C₄)에는 발생 빈도가 상대적으로 낮은 단어들이 포함되도록 말뭉치(C)가 군집화되었다. 즉, 발생 빈도가 서로 비슷한 단어들이 하나의 군집에 속하도록 군집화됨으로써, 각각의 군집(C₁ 내지 C₄) 내의 단어들의 상대 도수의 분포는, 말뭉치(C) 내의 단어들의 상대 도수의 분포보다 균일해졌다는 것을 이해할 것이다.5 shows an exemplary state in which the learning target corpus C is clustered into a plurality of clusters C ₁ to C ₄ in step S130. Referring to the graphs 42a to 42d of FIG. 5 , the cluster C ₁ includes words with a relatively high frequency of occurrence, and the cluster C ₄ includes the words with a relatively low frequency of occurrence. has been clustered That is, by clustering words having similar frequencies of occurrence to belong to one cluster, the distribution of relative frequencies of words in each cluster C ₁ to C ₄ is more uniform than the distribution of relative frequencies of words in corpus C you will understand that

몇몇 실시예에서, 각각의 군집에 속하는 단어들의 상대 도수가 균일해지도록 학습 대상 말뭉치를 분할하는 단계는, 각각의 군집에 포함된 단어들의 상대 도수에 기초하여 계산된 정규화된 엔트로피 값들의 평균이 최대가 되도록 함으로써 수행될 수 있다. 각각의 군집의 정규화된 엔트로피 값은 아래 수학식 1에 의해 계산될 수 있다.In some embodiments, the step of dividing the learning target corpus so that the relative frequencies of words belonging to each cluster are uniform may include: the average of normalized entropy values calculated based on the relative frequencies of words included in each cluster is the maximum This can be done by making The normalized entropy value of each cluster may be calculated by Equation 1 below.

(n은 군집에 포함된 단어의 수, p(x_i)는 단어 x_i의 상대 도수)(n is the number of words in the cluster, p(x _i ) is the relative frequency of the word x _i )

또한 몇몇 실시예에서, 각각의 군집에 속하는 단어들의 상대 도수가 균일해지도록 학습 대상 말뭉치를 분할하는 단계는, 각각의 군집에 포함된 단어들의 상대 도수의 합이 각각의 군집들 사이에 균일하도록 상기 학습 대상 말뭉치를 분할하는 단계를 포함하는 것일 수 있다. 예를 들어, 군집의 개수가 4개일 경우 각 군집에 포함된 단어들의 상대 도수의 합이 1/4에 근사하도록 군집이 결정될 수 있으며, 군집의 개수가 10개 인 경우 각 군집에 포함된 단어들의 상대 도수의 합이 1/10에 근사하도록 군집이 결정될 수 있다.In addition, in some embodiments, the step of dividing the learning target corpus so that the relative frequencies of words belonging to each cluster are uniform may include the step of dividing the learning target corpus so that the sum of the relative frequencies of words included in each cluster is uniform between the respective clusters. It may include dividing the learning target corpus. For example, when the number of clusters is 4, the clusters may be determined so that the sum of the relative frequencies of words included in each cluster approximates 1/4. Clusters may be determined such that the sum of the relative frequencies approximates 1/10.

몇몇 실시예에서, 각각의 군집의 정규화된 엔트로피 값들의 평균이 최대가 되며, 각각의 군집에 포함된 단어들의 상대 도수의 합이 각각의 군집들 사이에 균일하도록, 말뭉치를 적절한 개수의 군집으로 분할하는 과정은, 아래의 의사 코드로 표현된 알고리즘을 실행함으로써 수행될 수 있다.In some embodiments, the corpus is divided into an appropriate number of clusters such that the average of the normalized entropy values of each cluster is maximized, and the sum of the relative frequencies of the words included in each cluster is uniform among the respective clusters. This process can be performed by executing the algorithm expressed in the pseudocode below.

다시 도 5를 참조하면, 학습 대상 말뭉치 전체에 포함된 단어들의 상대 도수에 따라 계산된 정규화된 엔트로피는 0.932이다. 한편, 학습 대상 말뭉치를 4개의 군집 C₁ 내지 C₄로 분할하고, 군집 C₁ 내지 C₄에 대해 계산된 정규화된 엔트로피는 각각 0.986, 0.993, 0.986, 0.995이다. 즉, 각각의 군집의 정규화된 엔트로피가 학습 대상 말뭉치 전체의 정규화된 엔트로피(0.932)보다 크다는 것을 알 수 있으며, 이는 각각의 군집에 포함된 단어들의 상대 도수의 분포가 학습 대상 말뭉치 전체에 포함된 단어들의 분포보다 균일하다는 것을 나타낸다.Referring back to FIG. 5 , the normalized entropy calculated according to the relative frequency of words included in the entire learning target corpus is 0.932. On the other hand, the training target corpus is divided into four clusters C ₁ to C ₄ , and the calculated normalized entropies for the clusters C ₁ to C ₄ are 0.986, 0.993, 0.986, and 0.995, respectively. That is, it can be seen that the normalized entropy of each cluster is larger than the normalized entropy (0.932) of the entire learning corpus, which means that the distribution of relative frequencies of words included in each cluster is a word included in the entire learning target corpus. indicates that the distribution of them is more uniform.

지금까지 도 4 및 도 5를 참조하여 텍스트 생성 모델(200)의 학습과 텍스트 생성에 사용될 말뭉치를 복수의 군집으로 분할하되, 각각의 군집에 포함된 단어들의 상대 도수가 가급적 균일하도록 군집을 결정하고, 각각의 군집에 포함된 단어들의 상대 도수의 총합이 군집들 사이에 서로 균일하도록 군집을 결정하는 방법에 대하여 설명하였다. 이와 같은 방법으로 학습 대상 말뭉치를 균일한 분포를 가지는 군집들로 구성하고, 균일한 군집들로 구분된 학습 대상 말뭉치를 이용하여 후술할 텍스트 생성 모델을 구축함으로써, 텍스트 생성 모델에 의한 텍스트 생성 결과가, 학습 대상 말뭉치에 높은 빈도로 포함된 특정 단어들로 편중되는 문제를 해결할 수 있다.So far, with reference to FIGS. 4 and 5, the corpus to be used for training and text generation of the text generation model 200 is divided into a plurality of clusters, but the clusters are determined so that the relative frequency of words included in each cluster is as uniform as possible, , a method for determining clusters so that the sum of the relative frequencies of words included in each cluster is uniform among the clusters has been described. In this way, by constructing the learning target corpus into clusters having a uniform distribution, and constructing a text generation model to be described later using the training target corpus divided into uniform clusters, the text generation result by the text generation model is , it is possible to solve the problem of being biased towards specific words included with high frequency in the learning target corpus.

이하에서는 도 6 이하를 참조하여, 텍스트 생성 모델(200)의 구조, 복수의 군집으로 분할된 말뭉치를 이용하여 텍스트 생성 모델(200)을 학습시키는 방법, 및 텍스트 생성 모델(200)을 이용하여 텍스트를 생성하는 방법에 관하여 상세히 설명하기로 한다.Hereinafter, with reference to FIG. 6 or less, the structure of the text generation model 200, a method of learning the text generation model 200 using a corpus divided into a plurality of clusters, and text using the text generation model 200 A method of generating will be described in detail.

도 6은 본 발명의 일 실시예에 따른 텍스트 생성 장치(10)의 텍스트 생성 모델(200)을 나타내는 예시적인 블록도이다.6 is an exemplary block diagram illustrating a text generation model 200 of the text generation apparatus 10 according to an embodiment of the present invention.

텍스트 생성 모델(200)은 입력부(100)로부터 토큰화된 텍스트(71)를 나타내는 데이터를 입력 받아서 새로운 단어(73)를 생성하고 출력한다. 텍스트 생성 모델(200)은 인코더(210) 및 디코더(230)로 구성될 수 있다. 인코더(210) 및 디코더(230)는 신경망으로 구성될 수 있다.The text generation model 200 receives data representing the tokenized text 71 from the input unit 100 and generates and outputs a new word 73 . The text generation model 200 may include an encoder 210 and a decoder 230 . The encoder 210 and the decoder 230 may be configured as a neural network.

인코더(210)는 토큰화된 입력 텍스트를 하나 이상의 저차원의 벡터로 임베딩하고, 이로부터 잠재 변수를 계산한다. 인코더(210)에 의해 입력 텍스트가 벡터로 임베딩되는 과정은, 당해 기술 분야에서 널리 알려진 신경망 기반 텍스트 인코딩 기술들을 참고하여 수행될 수 있다.The encoder 210 embeds the tokenized input text into one or more low-dimensional vectors, and computes a latent variable therefrom. The process of embedding the input text as a vector by the encoder 210 may be performed with reference to neural network-based text encoding techniques well known in the art.

디코더(230)는 잠재 변수를 입력 받아서 새로 생성될 타깃 단어가 속하게 될 타깃 군집을 예측하고, 상기 잠재 변수 및 예측된 타깃 군집에 기초하여 타깃 단어를 예측하며, 예측된 타깃 단어를 출력한다. 디코더(230)에 의한 타깃 군집의 예측은, 주어진 잠재 변수에 의해 타깃 군집을 예측하는 제1 조건부 확률 모델에, 인코더(210)가 제공한 잠재 변수를 입력하여 타깃 군집을 예측함으로써 수행될 수 있다. 또한 디코더(230)에 의한 타깃 단어의 예측은, 주어진 잠재 변수 및 주어진 군집에 의해 타깃 단어를 예측하는 제2 조건부 확률 모델에, 상기 인코더(210)가 제공한 잠재 변수 및 상기 예측된 타깃 군집을 입력하여 타깃 단어를 샘플링함으로써 수행될 수 있다.The decoder 230 receives the latent variable and predicts a target cluster to which the newly generated target word belongs, predicts the target word based on the latent variable and the predicted target cluster, and outputs the predicted target word. The prediction of the target cluster by the decoder 230 may be performed by inputting the latent variable provided by the encoder 210 into the first conditional probabilistic model for predicting the target cluster by the given latent variable to predict the target cluster. . In addition, the prediction of the target word by the decoder 230 includes the latent variable provided by the encoder 210 and the predicted target cluster in a second conditional probabilistic model that predicts the target word by the given latent variable and the given cluster. This can be done by typing in and sampling the target word.

본 발명의 몇몇 실시예에 따른 텍스트 생성 모델(200)의 디코더(230)는, 잠재 변수로부터 타깃 단어의 군집을 예측하고, 다시 그 결과를 반영하여 타깃 단어를 예측한다는 점에서, 종래의 신경망 기반 디코더들과는 차이가 있음을 이해할 것이다.The decoder 230 of the text generation model 200 according to some embodiments of the present invention predicts a target word cluster from a latent variable and reflects the result again to predict the target word, based on a conventional neural network. It will be understood that there is a difference with decoders.

지금까지 텍스트 생성 모델(200)이 인코더(210)와 디코더(230)로 구성될 수 있음을 설명하였다. 이하에서는 본 발명의 몇몇 실시예에 따라, 학습 대상 말뭉치를 이용하여 상기 인코더(210) 및 상기 디코더(230)의 파라미터들을 조정함으로써, 텍스트 생성 모델(200)을 학습시키는 방법에 관하여 설명한다.So far, it has been described that the text generation model 200 can be composed of the encoder 210 and the decoder 230 . Hereinafter, a method for learning the text generation model 200 by adjusting parameters of the encoder 210 and the decoder 230 using a learning target corpus will be described, according to some embodiments of the present invention.

텍스트 생성 모델(200)은 학습 대상 말뭉치를 이용하여 학습된다. 특히, 텍스트 생성 모델(200)은 각 군집에 포함된 단어들의 상대 도수가 최대한 균일하도록 군집화된 학습 대상 말뭉치를 이용하여 학습된다. The text generation model 200 is trained using a learning target corpus. In particular, the text generation model 200 is trained using the clustered learning target corpus so that the relative frequencies of words included in each cluster are as uniform as possible.

인공 신경망에 기반하여 텍스트를 생성하는 모델들은, 학습 대상 말뭉치에 포함된 기존 문장들로부터 선택된 입력 텍스트가 주어질 때, 상기 입력 텍스트가 속하는 문장의 다음 단어를 예측하도록 학습될 수 있다. 가령, 학습 대상 말뭉치에 (x₁, x₂, x₃, ..., x_i-1, x_i)라는 텍스트가 존재하고, 상기 텍스트에서 마지막 단어를 제외한 (x₁, x₂, x₃, ..., x_i-1)가 텍스트 생성 모델(200)에 입력될 경우, 상기 텍스트의 마지막 단어인 x_i가 예측될 가능성이 최대화되도록, 텍스트 생성 모델의 인코더 신경망과 디코더 신경망의 파라미터들이 조정될 수 있다.Models for generating text based on the artificial neural network may be trained to predict the next word of a sentence to which the input text belongs when an input text selected from existing sentences included in a learning target corpus is given. For example, the text (x ₁ , x ₂ , x ₃ , ..., x _i-1 , x _i ) exists in the learning target corpus, and (x ₁ , x ₂ , x ₃ ) except for the last word in the text. , ..., x _i-1 ) are input to the text generation model 200, the parameters of the encoder neural network and the decoder neural network of the text generation model are adjusted so that the probability that the last word of the text, x _i , is predicted is maximized. can be adjusted.

본 발명의 몇몇 실시예에 따른 텍스트 생성 모델(200)은, 학습 대상 말뭉치에 포함된 기존 문장들로부터 선택된 입력 텍스트가 주어질 때, 상기 입력 텍스트가 속하는 문장의 다음 단어(타깃 단어)가 속하는 타깃 군집을 예측하고, 예측된 군집 내에서 상기 타깃 단어를 예측할 가능성이 최대화되도록 학습된다. In the text generation model 200 according to some embodiments of the present invention, when an input text selected from existing sentences included in a learning target corpus is given, a target group to which a next word (target word) of a sentence to which the input text belongs belongs , and is trained to maximize the probability of predicting the target word within the predicted cluster.

(x_1:i-1은 입력 텍스트, x_i는 타깃 단어, C(x_i)는 타깃 단어가 속하는 군집)(x _1:i-1 is the input text, x _i is the target word, C(x _i ) is the cluster to which the target word belongs)

상기 수학식 2에서, P₁은 입력 텍스트(x_1:i-1)가 주어질 때, 상기 입력 텍스트(x_1:i-1)가 속하는 문장의 다음 단어인 타깃 단어(x_i)가 속하는 군집(C(x_i))이 예측될 조건부 확률을 나타낸다. 상기 수학식 2에서, P₂는 입력 텍스트(x_1:i-1)와 타깃 단어(x_i)의 군집(C(x_i))이 주어질 때 타깃 단어(x_i)가 예측될 조건부 확률을 나타낸다.In Equation 2, P ₁ is a cluster to which, when an input text (x _1:i-1 ) is given, a target word (x _i ), which is a next word of a sentence to which the input text (x _1:i-1 ) belongs, belongs (C(x _i )) denotes the conditional probability to be predicted. In Equation 2, P ₂ is the conditional probability that the target word (x _i ) is predicted when the input text (x _1:i-1 ) and the cluster (C(x _i )) of the target word (x _i ) are given. indicates.

본 실시예에 따른 텍스트 생성 모델(200)은 각 군집에 포함된 단어들의 상대 도수가 최대한 균일해지도록 군집화된 학습 대상 말뭉치로부터 선택된 텍스트들을, 텍스트 생성 모델(200)에 반복적으로 입력하여, 상기 확률 P₁과 확률 P₂를 곱한 값이 최대화되도록, 인코더(210) 및 디코더(230)의 파라미터를 조정함으로써 학습될 수 있다.The text generation model 200 according to this embodiment repeatedly inputs texts selected from the clustered learning target corpus to the text generation model 200 so that the relative frequencies of words included in each cluster are as uniform as possible, and the probability It can be learned by adjusting the parameters of the encoder 210 and the decoder 230 so that the value obtained by multiplying P ₁ by the probability P ₂ is maximized.

지금까지 도 6을 참조하여, 인코더(210) 및 디코더(230)를 구비한 텍스트 생성 모델(200)의 구조를 설명한 후, 텍스트 생성 모델(200)을 학습시키는 방법에 대하여 설명하였다. 이하에서는 텍스트 생성 모델(200)의 인코더(210) 및 디코더(230)의 세부에 대하여 보다 자세히 설명한다.So far, with reference to FIG. 6 , after the structure of the text generation model 200 including the encoder 210 and the decoder 230 has been described, a method for training the text generation model 200 has been described. Hereinafter, details of the encoder 210 and the decoder 230 of the text generation model 200 will be described in more detail.

도 7은 도 6을 참조하여 설명한 텍스트 생성 모델의 인코더(210) 및 디코더(230)의 기능적인 구성을 설명하기 위한 블록도이다. 인코더(210) 및 디코더(230)는 신경망으로 구성될 수 있다.FIG. 7 is a block diagram for explaining the functional configuration of the encoder 210 and the decoder 230 of the text generation model described with reference to FIG. 6 . The encoder 210 and the decoder 230 may be configured as a neural network.

인코더(210)는 단어 임베딩 모듈(211) 및 잠재 변수 계산 모듈(215)을 구비할 수 있다. 몇몇 실시예에서 인코더(210)는 단어 임베딩 모듈(211)과 잠재 변수 계산 모듈(215)의 사이에 어텐션 모듈(213)을 추가로 구비할 수 있다. 다른 몇몇 실시예에서는, 단어 임베딩 모듈(211)과 어텐션 모듈(213)이 하나의 모듈로 구현될 수도 있다.The encoder 210 may include a word embedding module 211 and a latent variable calculation module 215 . In some embodiments, the encoder 210 may further include an attention module 213 between the word embedding module 211 and the latent variable calculation module 215 . In some other embodiments, the word embedding module 211 and the attention module 213 may be implemented as one module.

단어 임베딩 모듈(211)은 입력 텍스트(1) 또는 입력 텍스트를 토큰화한 텍스트(71)를 표현하는 데이터를 저차원의 벡터로 임베딩한다. 보다 구체적으로, 단어 임베딩 모듈(211)은 입력된 단어 또는 텍스트의 의미를 나타내는, 잠재 공간에 존재하는 벡터로 변환할 수 있다. 단어 임베딩 모듈(211)이 입력 단어 또는 텍스트를 임베딩하는 과정은, 당해 기술 분야에서 널리 알려진 기법(예컨대 word2vec 등)에 의해 학습되고 수행될 수 있다.The word embedding module 211 embeds the input text 1 or data representing the text 71 tokenized by the input text as a low-dimensional vector. More specifically, the word embedding module 211 may convert the input word or text into a vector existing in the latent space. The process of embedding the input word or text by the word embedding module 211 may be learned and performed by a technique well known in the art (eg, word2vec, etc.).

잠재 변수 계산 모듈(215)은 단어 임베딩 모듈(211)에 의해 임베딩된 벡터로부터 잠재 변수를 계산한다. 잠재 변수는, 단어 또는 텍스트의 의미를 나타내는 잠재 벡터 공간 내에서 선택되는 변수일 수 있다. 임베딩 벡터로부터 잠재 변수를 계산하는 과정은 당해 기술 분야에 널리 알려진 다양한 방법들을 사용할 수 있으며, 본 발명의 논지를 흐리지 않기 위해 이에 관한 더 이상의 설명은 생략하도록 한다.The latent variable calculation module 215 calculates the latent variable from the vector embedded by the word embedding module 211 . A latent variable may be a variable selected within a latent vector space representing the meaning of a word or text. The process of calculating the latent variable from the embedding vector may use various methods well known in the art, and further description thereof will be omitted so as not to obscure the subject matter of the present invention.

어텐션 모듈(213)은 텍스트 생성 모델(200)에 입력된 텍스트에 포함된 복수의 단어들 중에, 타깃 단어의 예측에 있어서 집중할 부분을 가리키는 어텐션 정보를 반영하는 모듈이다. 어텐션 모듈(213)은, 예컨대 단어 임베딩 모듈(211)이 입력 텍스트(1)에 포함된 복수의 단어들을 임베딩한 벡터들 각각에 대하여, 타깃 단어의 예측에 있어서 집중할 부분과 그렇지 않은 부분을 가리키는 가중치(즉, 어텐션 정보)를 적용하여 계산한 벡터를 출력할 수 있다. 어텐션 모듈(213)은 어텐션 정보가 반영된 벡터를 잠재 변수 계산 모듈(215)에 제공함으로써, 잠재 변수 계산 모듈(215)이 어텐션 정보가 반영된 잠재 변수를 계산할 수 있도록 한다.The attention module 213 is a module that reflects attention information indicating a part to focus on in predicting a target word among a plurality of words included in the text input to the text generation model 200 . The attention module 213, for example, with respect to each of vectors in which the word embedding module 211 embeds a plurality of words included in the input text 1, a weight indicating a portion to focus on and a portion not to focus on in predicting the target word (ie, attention information) may be applied to output a calculated vector. The attention module 213 provides the vector in which the attention information is reflected to the latent variable calculation module 215 so that the latent variable calculation module 215 can calculate the latent variable in which the attention information is reflected.

다른 몇몇 실시예에서는, 별도의 어텐션 모듈(213)을 구비하는 대신에, 단어의 임베딩 벡터에 단어의 위치에 관한 정보가 함께 인코딩되는 포지셔널 인코딩 방식이 사용될 수도 있다.In some other embodiments, instead of providing a separate attention module 213, a positional encoding method in which information about a position of a word is encoded together with an embedding vector of a word may be used.

지금까지 설명한 인코더(210)의 단어 임베딩 모듈(211), 어텐션 모듈(213), 및 잠재 변수 계산 모듈(215)은 하나의 유기적인 인코더 신경망을 구성하는 레이어로서 구현될 수 있다.The word embedding module 211 , the attention module 213 , and the latent variable calculation module 215 of the encoder 210 described so far may be implemented as layers constituting one organic encoder neural network.

디코더(230)는 군집 예측부(231) 및 단어 예측부(233)로 구성될 수 있다.The decoder 230 may include a cluster predictor 231 and a word predictor 233 .

군집 예측부(231)는 인코더(210)가 제공한 잠재 변수로부터 타깃 단어가 속하는 타깃 군집을 예측한다. 군집 예측부(231)는, 주어진 잠재 변수에 의해 타깃 군집을 예측하는 제1 조건부 확률 모델에, 인코더(210)가 제공한 잠재 변수를 입력하여 타깃 군집을 예측할 수 있다.The cluster prediction unit 231 predicts the target cluster to which the target word belongs from the latent variables provided by the encoder 210 . The cluster predictor 231 may predict the target cluster by inputting the latent variable provided by the encoder 210 into the first conditional probabilistic model for predicting the target cluster by the given latent variable.

단어 예측부(233)는 인코더(210)가 제공한 잠재 변수 및 군집 예측부(231)에 의해 예측된 타깃 단어의 군집으로부터, 타깃 단어를 예측한다. 단어 예측부(233)는, 주어진 잠재 변수 및 군집에 의해 타깃 단어를 예측하는 제2 조건부 확률 모델에, 상기 인코더(210)가 제공한 잠재 변수 및 상기 예측된 타깃 군집을 입력하여 타깃 단어를 샘플링함으로써 타깃 단어를 예측할 수 있다.The word prediction unit 233 predicts the target word from the latent variable provided by the encoder 210 and the target word cluster predicted by the cluster prediction unit 231 . The word prediction unit 233 samples the target word by inputting the latent variable provided by the encoder 210 and the predicted target cluster to a second conditional probabilistic model that predicts the target word by the given latent variable and cluster. By doing so, the target word can be predicted.

도 7에서는, 이해의 편의를 위하여 군집 예측부(231)와 단어 예측부(233)를 구분하여 도시하였으나, 군집 예측부(231)와 단어 예측부(233)는 하나의 유기적인 디코더 신경망을 구성하는 레이어들로서 구현될 수 있다.In FIG. 7 , the cluster prediction unit 231 and the word prediction unit 233 are separately illustrated for convenience of understanding, but the cluster prediction unit 231 and the word prediction unit 233 constitute one organic decoder neural network. It can be implemented as layers of

상기 수학식 3은 단어 예측부(233)가 주어진 잠재 변수 및 군집으로부터 타깃 단어를 예측하는 예시적인 제2 조건부 확률 모델을 나타낸다. 수학식 3으로부터 알 수 있듯이, 단어 예측부(233)는 군집 예측부(231)에 의해 예측된 군집에 속하는 단어들에 대해서만 확률 값을 부여하며(

), 예측된 군집에 속하지 않는 단어들에 대해서는 확률 값을 0으로 고정한 확률 모델을 사용한다. 수학식 3에 나타난 것과 같은 확률 모델을 사용함으로써 단어 예측부(233)는 군집 예측부(231)에 의해 예측된 군집에 속하는 단어들의 범주 내에서 타깃 단어를 샘플링할 수 있게 된다.Equation 3 above represents an exemplary second conditional probabilistic model in which the word prediction unit 233 predicts a target word from a given latent variable and cluster. As can be seen from Equation 3, the word prediction unit 233 assigns probability values only to words belonging to the cluster predicted by the cluster prediction unit 231 (

), a probabilistic model in which the probability value is fixed to 0 is used for words that do not belong to the predicted cluster. By using the probabilistic model as shown in Equation 3, the word predictor 233 can sample the target word within the range of words belonging to the cluster predicted by the cluster predictor 231 .

전술한 바와 같은 방법으로 균일한 군집들로 구분된 학습 대상 말뭉치를 이용하여 구축되며 전술한 확률 모델에 의해 타깃 단어를 예측하는 텍스트 생성 모델(200)을 이용함으로써, 텍스트 생성 결과가 학습 대상 말뭉치에 높은 빈도로 포함된 특정 단어들로 편중되는 문제를 해결할 수 있게 된다.By using the text generation model 200 that is constructed using the learning target corpus divided into uniform clusters in the same manner as described above and predicts the target word by the aforementioned probabilistic model, the text generation result is transmitted to the learning target corpus. It is possible to solve the problem of being biased towards specific words that are included with high frequency.

지금까지 도 7을 참조하여 텍스트 생성 모델(200)의 인코더(210) 및 디코더(230)의 세부에 대하여 설명하였다. 이하에서는, 도 8 및 도 9를 참조하여, 지금까지 설명한 텍스트 생성 장치(10)에 의해 텍스트를 생성하는 방법을 설명한다.So far, details of the encoder 210 and the decoder 230 of the text generation model 200 have been described with reference to FIG. 7 . Hereinafter, a method of generating text by the text generating apparatus 10 described above will be described with reference to FIGS. 8 and 9 .

도 8은 본 발명의 일 실시예에 따라 텍스트 생성 모델을 이용하여 텍스트를 생성하는 방법을 나타내는 예시적인 흐름도이다. 도 8에 도시된 텍스트 생성 방법의 각 단계는 예컨대 텍스트 생성 장치(10)와 같은 컴퓨팅 장치에 의해 수행될 수 있다. 이하에서는, 상기 텍스트 생성 방법의 각 단계가 텍스트 생성 장치(10)에 의해 수행되는 것을 가정하여 설명을 이어가도록 한다. 다만, 설명의 편의를 위해, 각 단계의 동작 주체는 그 기재가 생략될 수도 있다.8 is an exemplary flowchart illustrating a method of generating text using a text generation model according to an embodiment of the present invention. Each step of the text generating method shown in FIG. 8 may be performed by, for example, a computing device such as the text generating device 10 . Hereinafter, it is assumed that each step of the text generating method is performed by the text generating apparatus 10 and the description will be continued. However, for convenience of description, the description of the operating subject of each step may be omitted.

먼저 단계 S310에서 입력 텍스트가 획득되고 단계 S320에서는 입력 텍스트에 대한 전처리가 수행된다. 입력 텍스트에 대한 전처리는 입력 텍스트를 복수의 단어 또는 토큰으로 분할하는 과정을 포함할 수 있다. 입력 텍스트를 복수의 단어 또는 토큰으로 분할하는 과정은 예컨대 텍스트 생성 장치(10)의 입력부(100)에 의해 수행될 수 있다.First, input text is obtained in step S310, and preprocessing is performed on the input text in step S320. The preprocessing of the input text may include dividing the input text into a plurality of words or tokens. The process of dividing the input text into a plurality of words or tokens may be performed, for example, by the input unit 100 of the text generating apparatus 10 .

단계 S330에서는 입력 텍스트의 잠재 변수가 계산된다. 상기 잠재 변수는, 예컨대 상기 입력 텍스트를 나타내는 데이터를 텍스트 생성 장치(10)의 텍스트 생성 모델(200) 내의 인코더(210)에 입력함으로써 수행될 수 있다. 상기 잠재 변수의 계산은, 상기 입력 텍스트를 나타내는 데이터를 임베딩 벡터로 변환하고, 임베딩 벡터로부터 잠재 변수를 계산하는 일련의 과정을 포함할 수 있다. In step S330, a latent variable of the input text is calculated. The latent variable may be performed, for example, by inputting data representing the input text to the encoder 210 in the text generating model 200 of the text generating apparatus 10 . The calculation of the latent variable may include a series of steps of converting data representing the input text into an embedding vector and calculating the latent variable from the embedding vector.

입력 텍스트가 복수의 단어들을 포함한다면, 복수의 단어들에 대한 임베딩 벡터가 각각 획득되고, 복수의 단어들에 대응되는 임베딩 벡터들을 기초로 잠재 변수가 계산될 수 있다. 이때, 상기 복수의 단어들 중에 타깃 단어의 예측에 있어서 집중할 부분을 가리키는 어텐션 정보가 추가로 반영되어 잠재 변수가 계산될 수 있다. 예컨대 복수의 단어들에 대한 임베딩 벡터들 각각에 대하여, 타깃 단어의 예측에 있어서 집중할 부분과 그렇지 않은 부분을 가리키는 가중치(즉, 어텐션 정보)를 적용하여 계산된 벡터를 기초로 잠재 변수가 계산될 수 있다.If the input text includes a plurality of words, embedding vectors for the plurality of words may be respectively obtained, and a latent variable may be calculated based on the embedding vectors corresponding to the plurality of words. In this case, attention information indicating a portion to be focused on in prediction of a target word among the plurality of words may be additionally reflected to calculate a latent variable. For example, for each of the embedding vectors for a plurality of words, a latent variable can be calculated based on a vector calculated by applying a weight (ie, attention information) indicating a portion to be focused on and a portion not to be focused in the prediction of the target word. there is.

단계 S340에서는, 상기 잠재 변수를 기초로 타깃 단어가 속하는 군집이 예측된다. 타깃 단어의 군집의 예측은, 예컨대 텍스트 생성 모델(200)의 디코더(230) 내의 군집 예측부(231)에 의해 수행될 수 있다.In step S340, a cluster to which the target word belongs is predicted based on the latent variable. Prediction of the cluster of the target word may be performed, for example, by the cluster prediction unit 231 in the decoder 230 of the text generation model 200 .

단계 S350에서는, 단계 330에서 계산된 잠재 변수와, 단계 S340에서 예측된 군집에 기초하여, 타깃 단어가 예측된다. 이때 상기 타깃 단어는, 상기 예측된 군집에 속하는 단어들 중에서 예측된다. 예측된 군집 내에서 타깃 단어를 예측하는 것은, 수학식 3을 참조하여 전술한 확률 모델을 이용함으로써 달성될 수 있다. 타깃 단어의 예측은, 예컨대 텍스트 생성 모델(200)의 디코더(230) 내의 단어 예측부(233)에 의해 수행될 수 있다.In step S350, a target word is predicted based on the latent variable calculated in step 330 and the cluster predicted in step S340. In this case, the target word is predicted from among words belonging to the predicted cluster. Predicting the target word within the predicted cluster may be achieved by using the above-described probabilistic model with reference to Equation (3). Prediction of the target word may be performed, for example, by the word prediction unit 233 in the decoder 230 of the text generation model 200 .

도 8에 도시되지는 않았지만, 단계 S350에서 예측된 단어는 다시 텍스트 생성 모델(200)에 입력되어, 단계 S330 내지 S350을 거치면서 타깃 단어의 다음 단어를 예측하는 과정이 반복적으로 수행될 수 있다.Although not shown in FIG. 8 , the word predicted in step S350 is input to the text generation model 200 again, and the process of predicting the next word of the target word through steps S330 to S350 may be repeatedly performed.

단계 S360에서는, 단계 S350에서 예측된 타깃 단어를 기초로 출력 텍스트가 생성되고 제공된다. 예를 들어 단계 S330 내지 S350을 반복적으로 수행하면서 순차적으로 생성된 단어들을 이어 붙여서 구절 또는 문장 단위의 출력 텍스트가 제공될 수 있다.In step S360, an output text is generated and provided based on the target word predicted in step S350. For example, the output text in units of phrases or sentences may be provided by concatenating sequentially generated words while repeatedly performing steps S330 to S350.

도 9는, 본 발명의 일 실시예에 따라, 입력된 텍스트(10)에 포함된 복수의 단어들(W₁, W₂, W₃, ... W_n)로부터 타깃 단어들(T₁, T₂, T₃, ... T_m)을 생성하는 텍스트 생성 장치(10)의 동작과 텍스트 생성 모델(200) 내부의 신경망 구조를 나타내는 도면이다.9 illustrates target words T ₁ , from a plurality of words W ₁ , W ₂ , W ₃ , ... W _n included in the input text 10 according to an embodiment of the present invention. T ₂ , T ₃ , ... T _m ) is a diagram illustrating an operation of the text generation apparatus 10 for generating a text generation model 200 and a structure of a neural network inside the text generation model 200 .

텍스트 생성 장치(10)에 입력된 텍스트(1)는 입력부(100)에 의해 복수의 토큰들 또는 단어들(W₁, W₂, W₃, ... W_n)로 분할되고, 텍스트 생성 모델(200)의 인코더(210)로 전달된다.The text 1 input to the text generating device 10 is divided into a plurality of tokens or words W ₁ , W ₂ , W ₃ , ... W _n by the input unit 100 , and a text generation model It is passed to the encoder 210 at 200 .

인코더(210)의 단어 임베딩 모듈(2101)은 입력된 단어들(W₁, W₂, W₃, ... W_n) 각각을 임베딩 벡터로 변환하여 어텐션 모듈(213)에 전달한다. The word embedding module 2101 of the encoder 210 converts each of the input words W ₁ , W ₂ , W ₃ , ... W _n into an embedding vector and transmits it to the attention module 213 .

어텐션 모듈(213)은 타깃 단어의 예측에 있어서 집중할 부분을 가리키는 어텐션 정보가 반영된 벡터를 잠재 변수 계산부(215)로 전달한다.The attention module 213 transmits, to the latent variable calculator 215 , a vector in which attention information indicating a part to focus on in prediction of the target word is reflected.

잠재 변수 계산부(215)는 타깃 단어들(T₁, T₂, T₃, ... T_m)의 생성을 위한 잠재 변수들(Z₁, Z₂, Z₃, ... Z_m)을 계산하고, 잠재 변수들(Z₁, Z₂, Z₃, ... Z_m)을 디코더(230)에 전달한다.The latent variable calculation unit 215 is a latent variable (Z ₁ , Z ₂ , Z ₃ , ... Z _m ) for generating the target words (T ₁ , T ₂ , T ₃ , ... T _m ) , and transfers the latent variables Z ₁ , Z ₂ , Z ₃ , ... Z _m to the decoder 230 .

디코더(230)의 군집 예측부(231)는 각각의 잠재 변수(Z₁, Z₂, Z₃, ... Z_m)에 기초하여 타깃 단어(T₁, T₂, T₃, ... T_m)가 속할 군집을 예측하며, 예측된 군집(C₁, C₂, C₃, ... C_m)을 단어 예측부(233)에 전달한다.The cluster prediction unit 231 of the decoder 230 is based on the respective latent variables Z ₁ , Z ₂ , Z ₃ , ... Z _m , and the target word T ₁ , T ₂ , T ₃ , ... T _m ) predicts a cluster to which it belongs, and transmits the predicted clusters C ₁ , C ₂ , C ₃ , ... C _m to the word prediction unit 233 .

단어 예측부(233)는 인코더(210)의 잠재 변수 계산부(215)가 계산한 잠재 변수(Z₁, Z₂, Z₃, ... Z_m) 및 상기 군집 예측부(231)에 의해 예측된 군집(C₁, C₂, C₃, ... C_m)을 이용하여, 각각의 타깃 단어(T₁, T₂, T₃, ... T_m)를 예측한다. 이때 각각의 타깃 단어(T₁, T₂, T₃, ... T_m)는, 예측된 군집(C₁, C₂, C₃, ... C_m) 내에 속하는 단어들 중에서 예측된다.The word prediction unit 233 is a latent variable calculated by the latent variable calculation unit 215 of the encoder 210 (Z ₁ , Z ₂ , Z ₃ , ... Z _m ) and the cluster prediction unit 231 . Using the predicted clusters C ₁ , C ₂ , C ₃ , ... C _m , each target word T ₁ , T ₂ , T ₃ , ... T _m is predicted. At this time, each target word T ₁ , T ₂ , T ₃ , ... T _m is predicted among words belonging to the predicted cluster C ₁ , C ₂ , C ₃ , ... C _m .

단어 예측부(233)에 의해 예측된 타깃 단어들(T₁, T₂, T₃, ... T_m)은 출력부(300)에 의해 처리되어 출력 텍스트(3)로서 제공된다.The target words T ₁ , T ₂ , T ₃ , ... T _m predicted by the word prediction unit 233 are processed by the output unit 300 and provided as the output text 3 .

지금까지 도 8 및 도 9를 참조하여 본 발명의 일 실시예에 따른 텍스트 생성 장치(10)의 세부 구성에 의해 텍스트를 생성하는 방법을 설명하였다. 이하에서는, 도 10을 참조하여 본 발명의 몇몇 실시예들이 적용될 수 있는 예시적인 응용 분야를 설명한다.So far, a method of generating text by the detailed configuration of the text generating apparatus 10 according to an embodiment of the present invention has been described with reference to FIGS. 8 and 9 . Hereinafter, exemplary application fields to which some embodiments of the present invention may be applied will be described with reference to FIG. 10 .

도 10의 참조번호(901)는, 본 발명의 몇몇 실시예에 따른 텍스트 생성 방법이 적용될 수 있는 가사 생성 소프트웨어의 예시적인 입력과 출력을 나타낸다. 본 발명의 몇몇 실시예들은, 노래 가사의 첫 소절 또는 제목을 입력으로 받으면 노래 가사의 나머지 부분을 자동으로 생성하는 인공지능 가사 생성 소프트웨어에 적용될 수 있다. 이 경우, 기성곡들의 가사들을 모은 텍스트 데이터를 사용하여 텍스트 생성 모델이 학습될 수 있다. 기성곡들의 가사들로 학습된 텍스트 생성 모델은, 마치 사람이 작사한 것과 같은 노래 가사를 창작해 낼 수 있다. 이때, 본 발명의 실시예들에 따라 학습 대상 가사들(말뭉치)을 균일하게 군집화함으로써, 기성곡들의 가사에 자주 사용되는 특정 단어들로 편중되지 않은 다채로운 가사가 텍스트 생성 모델에 의해 생성될 수 있다.Reference numeral 901 of FIG. 10 denotes exemplary inputs and outputs of lyrics generation software to which the text generation method according to some embodiments of the present invention can be applied. Some embodiments of the present invention may be applied to artificial intelligence lyric generation software that automatically generates the rest of the song lyrics when receiving the first bar or title of the song lyrics as input. In this case, a text generation model may be trained using text data that collects lyrics of ready-made songs. The text generation model learned from the lyrics of ready-made songs can create song lyrics as if they were written by a human. In this case, by uniformly clustering the learning target lyrics (corpus) according to embodiments of the present invention, colorful lyrics that are not biased toward specific words frequently used in the lyrics of ready-made songs can be generated by the text generation model.

도 10의 참조번호(903)는, 본 발명의 몇몇 실시예에 따른 텍스트 생성 방법이 적용될 수 있는 가상 뉴스 기사 생성 소프트웨어의 예시적인 입력과 출력을 나타낸다. 본 발명의 몇몇 실시예들은, 뉴스 기사의 제목 또는 첫 단락을 입력으로 받으면 가상 뉴스 기사의 나머지 부분을 자동으로 생성하는 뉴스 기사 생성 소프트웨어에 적용될 수 있다. 이 경우, 기존 뉴스 기사들을 모은 텍스트 데이터를 사용하여 텍스트 생성 모델이 학습될 수 있으며, 텍스트 생성 모델에 의해 마치 사람이 작성한 것과 같은 가상의 뉴스 기사를 자동으로 생성할 수 있다.Reference numeral 903 of FIG. 10 denotes exemplary inputs and outputs of virtual news article generating software to which the text generating method according to some embodiments of the present invention can be applied. Some embodiments of the present invention may be applied to news article creation software that automatically generates the remainder of a virtual news article upon receiving a title or first paragraph of a news article as input. In this case, a text generation model may be trained using text data collected from existing news articles, and a virtual news article as if written by a human may be automatically generated by the text generation model.

도 10의 참조번호(905)는, 본 발명의 몇몇 실시예에 따른 텍스트 생성 방법이 적용될 수 있는 무인 대화 시스템의 예시적인 입력과 출력을 나타낸다. 본 발명의 몇몇 실시예들은, 마치 사람과 대화하는 것처럼 자연스러운 대화를 제공할 수 있는 인공지능 스피커 또는 챗봇 등에 적용될 수 있다. 이 경우, 사람 사이의 일상적인 질의와 응답들로 구성된 말뭉치 데이터를 사용하여 텍스트 생성 모델이 학습될 수 있다. 이때, 본 발명의 실시예들에 따라 학습 대상 말뭉치를 균일하게 군집화함으로써, "그렇습니다" 또는 "모르겠어요" 등과 같은 특정 표현들에 편중되지 않은 다양성이 증대된 대화를 제공할 수 있다.Reference numeral 905 of FIG. 10 denotes exemplary inputs and outputs of an unattended conversation system to which the text generation method according to some embodiments of the present invention can be applied. Some embodiments of the present invention may be applied to an artificial intelligence speaker or chatbot that can provide a natural conversation as if talking with a human. In this case, a text generation model may be trained using corpus data composed of human-to-human inquiries and responses. In this case, by uniformly clustering the learning target corpus according to the embodiments of the present invention, it is possible to provide a conversation with increased diversity that is not biased toward specific expressions such as “I do not know” or “I do not know”.

지금까지 도 10을 참조하여 본 발명의 몇몇 실시예들이 적용될 수 있는 예시적인 응용 분야를 설명하였다. An exemplary application field to which some embodiments of the present invention may be applied has been described with reference to FIG. 10 .

본 발명의 실시예들은 도 10에 나타낸 응용 분야 외에도 학습 데이터가 존재하는 모든 텍스트 생성 분야(예컨대, 문서 요약, 기계 번역 등)에 적용될 수 있다.Embodiments of the present invention can be applied to all text generation fields (eg, document summary, machine translation, etc.) in which learning data exists in addition to the application fields shown in FIG. 10 .

나아가, 본 발명의 실시예들은 단어들로 구성된 텍스트 시퀀스의 생성에 국한되지 않으며, 다양한 유형의 시퀀스를 생성하는 방법 및 장치에 적용될 수도 있다. 다시 말해 본 발명은, 텍스트처럼 순서를 가지는 일련의 데이터들의 모음 또는 선형적으로 배열될 수 있는 일련의 데이터들의 모음으로 표현되는 다른 유형의 시퀀스들을 생성하기 위하여 사용될 수 있다. 당해 기술 분야의 기술자라면 본 명세서에서 설명된 텍스트 생성 방법에 관한 실시예들을 참고하여, 본 발명의 기술 사상을 다른 유형의 시퀀스 생성에 적용할 수 있을 것이다. Furthermore, embodiments of the present invention are not limited to the generation of a text sequence composed of words, and may be applied to a method and apparatus for generating various types of sequences. In other words, the present invention can be used to create other types of sequences expressed as a collection of sequences of data having an order like text or a collection of sequences of data that can be arranged linearly. A person skilled in the art will be able to apply the technical spirit of the present invention to other types of sequence generation by referring to the embodiments related to the text generation method described in the present specification.

예를 들어, 본 발명의 실시예들을 인공 신경망을 이용한 자동 작곡 방법에 적용하여, 인공 신경망이 자동으로 생성하는 악곡의 구성과 진행에 다양성을 증진시킬 수 있다. 이 경우, 기성곡들의 코드(chord) 진행을 표현한 데이터들의 모음을 이용하여 시퀀스 생성 모델을 학습시키고, 학습된 시퀀스 생성 모델에 몇몇 코드를 입력함으로써, 후속하는 코드들을 자동으로 생성할 수 있다. 또는, 기성곡들의 멜로디 진행을 표현한 데이터들의 모음을 이용하여 시퀀스 생성 모델을 학습시키고, 학습된 시퀀스 생성 모델에 악곡 도입부의 몇 소절의 멜로디를 입력함으로써, 악곡의 나머지 부분의 멜로디를 자동으로 완성할 수 있다.For example, by applying the embodiments of the present invention to an automatic composition method using an artificial neural network, it is possible to increase diversity in the composition and progress of music automatically generated by the artificial neural network. In this case, following codes can be automatically generated by learning a sequence generation model using a collection of data representing the chord progression of ready-made songs, and inputting some codes into the learned sequence generation model. Alternatively, by learning a sequence generation model using a collection of data expressing the melody progression of ready-made songs, and inputting the melody of a few measures of the introductory part to the learned sequence generation model, the melody of the rest of the music can be automatically completed. there is.

지금까지 도 1 내지 도 10을 참조하여, 본 발명의 몇몇 실시예들에 따른 텍스트 생성 방법 및 장치와, 그 응용분야에 대해서 설명하였다. 이하에서는, 본 발명의 몇몇 실시예들에 따른 텍스트 생성 장치(10)를 구현할 수 있는 예시적인 컴퓨팅 장치(1500)에 대하여 설명하도록 한다.So far, a text generating method and apparatus according to some embodiments of the present invention and an application field thereof have been described with reference to FIGS. 1 to 10 . Hereinafter, an exemplary computing device 1500 capable of implementing the text generating device 10 according to some embodiments of the present invention will be described.

도 11은 본 발명의 몇몇 실시예들에 따른 텍스트 생성 장치(10)를 구현할 수 있는 예시적인 컴퓨팅 장치(1500)를 나타내는 하드웨어 구성도이다.11 is a hardware configuration diagram illustrating an exemplary computing device 1500 capable of implementing the text generating device 10 according to some embodiments of the present invention.

도 11에 도시된 바와 같이, 컴퓨팅 장치(1500)는 하나 이상의 프로세서(1510), 버스(1550), 통신 인터페이스(1570), 프로세서(1510)에 의하여 수행되는 컴퓨터 프로그램(1591)을 로드(load)하는 메모리(1530)와, 컴퓨터 프로그램(1591)을 저장하는 스토리지(1590)를 포함할 수 있다. 다만, 도 11에는 본 발명의 실시예와 관련 있는 구성요소들만이 도시되어 있다. 따라서, 본 발명이 속한 기술분야의 통상의 기술자라면 도 11에 도시된 구성요소들 외에 다른 범용적인 구성 요소들이 더 포함될 수 있음을 알 수 있다.11 , the computing device 1500 loads one or more processors 1510 , a bus 1550 , a communication interface 1570 , and a computer program 1591 executed by the processor 1510 . It may include a memory 1530 and a storage 1590 for storing the computer program 1591 . However, only the components related to the embodiment of the present invention are illustrated in FIG. 11 . Accordingly, those skilled in the art to which the present invention pertains can see that other general-purpose components other than the components shown in FIG. 11 may be further included.

프로세서(1510)는 컴퓨팅 장치(1500)의 각 구성의 전반적인 동작을 제어한다. 프로세서(1510)는 CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), GPU(Graphic Processing Unit) 또는 본 발명의 기술 분야에 잘 알려진 임의의 형태의 프로세서를 포함하여 구성될 수 있다. 또한, 프로세서(1510)는 본 발명의 실시예들에 따른 방법을 실행하기 위한 적어도 하나의 애플리케이션 또는 프로그램에 대한 연산을 수행할 수 있다. 컴퓨팅 장치(1500)는 하나 이상의 프로세서를 구비할 수 있다.The processor 1510 controls the overall operation of each component of the computing device 1500 . The processor 1510 includes a central processing unit (CPU), a micro processor unit (MPU), a micro controller unit (MCU), a graphic processing unit (GPU), or any type of processor well known in the art. can be In addition, the processor 1510 may perform an operation on at least one application or program for executing the method according to the embodiments of the present invention. The computing device 1500 may include one or more processors.

메모리(1530)는 각종 데이터, 명령 및/또는 정보를 저장한다. 메모리(1530)는 본 발명의 실시예들에 따른 텍스트 생성 방법을 실행하기 위하여 스토리지(1590)로부터 하나 이상의 프로그램(1591)을 로드할 수 있다. 가령, 메모리(1530)에 컴퓨터 프로그램(1591)이 로드되면, 도 2에 도시된 바와 같은 모듈이 메모리(1530) 상에 구현될 수 있다. 메모리(1530)는 RAM과 같은 휘발성 메모리로 구현될 수 있을 것이나, 본 발명의 기술적 범위가 이에 한정되는 것은 아니다.The memory 1530 stores various data, commands, and/or information. The memory 1530 may load one or more programs 1591 from the storage 1590 to execute the text generation method according to embodiments of the present invention. For example, when the computer program 1591 is loaded into the memory 1530 , a module as shown in FIG. 2 may be implemented on the memory 1530 . The memory 1530 may be implemented as a volatile memory such as RAM, but the technical scope of the present invention is not limited thereto.

버스(1550)는 컴퓨팅 장치(1500)의 구성 요소 간 통신 기능을 제공한다. 버스(1550)는 주소 버스(Address Bus), 데이터 버스(Data Bus) 및 제어 버스(Control Bus) 등 다양한 형태의 버스로 구현될 수 있다.The bus 1550 provides a communication function between components of the computing device 1500 . The bus 1550 may be implemented as various types of buses, such as an address bus, a data bus, and a control bus.

통신 인터페이스(1570)는 컴퓨팅 장치(1500)의 유무선 인터넷 통신을 지원한다. 또한, 통신 인터페이스(1570)는 인터넷 통신 외의 다양한 통신 방식을 지원할 수도 있다. 이를 위해, 통신 인터페이스(1570)는 본 발명의 기술 분야에 잘 알려진 통신 모듈을 포함하여 구성될 수 있다.The communication interface 1570 supports wired/wireless Internet communication of the computing device 1500 . Also, the communication interface 1570 may support various communication methods other than Internet communication. To this end, the communication interface 1570 may be configured to include a communication module well-known in the art.

몇몇 실시예들에 따르면, 통신 인터페이스(1570)는 생략될 수도 있다.According to some embodiments, the communication interface 1570 may be omitted.

스토리지(1590)는 상기 하나 이상의 프로그램(1591)과 각종 데이터를 비임시적으로 저장할 수 있다. 가령, 컴퓨팅 장치(1500)를 통해 텍스트 생성 장치(10)가 구현되는 경우라면, 상기 각종 데이터는 저장부(400)에 의해 관리되는 데이터를 포함할 수 있다.The storage 1590 may non-temporarily store the one or more programs 1591 and various data. For example, if the text generating apparatus 10 is implemented through the computing device 1500 , the various data may include data managed by the storage unit 400 .

스토리지(1590)는 ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리 등과 같은 비휘발성 메모리, 하드 디스크, 착탈형 디스크, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터로 읽을 수 있는 기록 매체를 포함하여 구성될 수 있다.The storage 1590 is a non-volatile memory such as a read only memory (ROM), an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a flash memory, a hard disk, a removable disk, or well in the art to which the present invention pertains. It may be configured to include any known computer-readable recording medium.

컴퓨터 프로그램(1591)은 메모리(1530)에 로드될 때 프로세서(1510)로 하여금 본 발명의 다양한 실시예에 따른 방법/동작을 수행하도록 하는 하나 이상의 인스트럭션들을 포함할 수 있다. 즉, 프로세서(1510)는 상기 하나 이상의 인스트럭션들을 실행함으로써, 본 발명의 다양한 실시예에 따른 방법/동작들을 수행할 수 있다.The computer program 1591 may include one or more instructions that, when loaded into the memory 1530 , cause the processor 1510 to perform methods/operations according to various embodiments of the present invention. That is, the processor 1510 may perform the methods/operations according to various embodiments of the present disclosure by executing the one or more instructions.

위와 같은 경우, 컴퓨팅 장치(1500)를 통해 본 발명의 몇몇 실시예들에 따른 텍스트 생성 장치(10)가 구현될 수 있다.In this case, the text generating apparatus 10 according to some embodiments of the present invention may be implemented through the computing device 1500 .

지금까지 도 1 내지 도 11을 참조하여 본 발명의 다양한 실시예들 및 그 실시예들에 따른 효과들을 언급하였다. 본 발명의 기술적 사상에 따른 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.So far, various embodiments of the present invention and effects according to the embodiments have been described with reference to FIGS. 1 to 11 . Effects according to the technical spirit of the present invention are not limited to the above-mentioned effects, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

지금까지 도 1 내지 도 11을 참조하여 설명된 본 발명의 기술적 사상은 컴퓨터가 읽을 수 있는 매체 상에 컴퓨터가 읽을 수 있는 코드로 구현될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체는, 예를 들어 이동형 기록 매체(CD, DVD, 블루레이 디스크, USB 저장 장치, 이동식 하드 디스크)이거나, 고정식 기록 매체(ROM, RAM, 컴퓨터 구비 형 하드 디스크)일 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체에 기록된 상기 컴퓨터 프로그램은 인터넷 등의 네트워크를 통하여 다른 컴퓨팅 장치에 전송되어 상기 다른 컴퓨팅 장치에 설치될 수 있고, 이로써 상기 다른 컴퓨팅 장치에서 사용될 수 있다.The technical ideas of the present invention described with reference to FIGS. 1 to 11 may be implemented as computer-readable codes on a computer-readable medium. The computer-readable recording medium may be, for example, a removable recording medium (CD, DVD, Blu-ray disk, USB storage device, removable hard disk) or a fixed recording medium (ROM, RAM, computer-equipped hard disk). can The computer program recorded on the computer-readable recording medium may be transmitted to another computing device through a network such as the Internet and installed in the other computing device, thereby being used in the other computing device.

이상에서, 본 발명의 실시예를 구성하는 모든 구성 요소들이 하나로 결합되거나 결합되어 동작하는 것으로 설명되었다고 해서, 본 발명의 기술적 사상이 반드시 이러한 실시예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다.In the above, even though it has been described that all components constituting the embodiment of the present invention are combined or operated in combination, the technical spirit of the present invention is not necessarily limited to this embodiment. That is, within the scope of the object of the present invention, all the components may operate by selectively combining one or more.

도면에서 동작들이 특정한 순서로 도시되어 있지만, 반드시 동작들이 도시된 특정한 순서로 또는 순차적 순서로 실행되어야만 하거나 또는 모든 도시 된 동작들이 실행되어야만 원하는 결과를 얻을 수 있는 것으로 이해되어서는 안 된다. 특정 상황에서는, 멀티태스킹 및 병렬 처리가 유리할 수도 있다. 더욱이, 위에 설명한 실시예들에서 다양한 구성들의 분리는 그러한 분리가 반드시 필요한 것으로 이해되어서는 안 되고, 설명된 프로그램 컴포넌트들 및 시스템들은 일반적으로 단일 소프트웨어 제품으로 함께 통합되거나 다수의 소프트웨어 제품으로 패키지 될 수 있음을 이해하여야 한다.Although acts are shown in a particular order in the drawings, it should not be understood that the acts must be performed in the specific order or sequential order shown, or that all illustrated acts must be performed to obtain a desired result. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of the various components in the embodiments described above should not be construed as necessarily requiring such separation, and the described program components and systems may generally be integrated together into a single software product or packaged into multiple software products. It should be understood that there is

이상 첨부된 도면을 참조하여 본 발명의 실시예들을 설명하였지만, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 본 발명이 다른 구체적인 형태로도 실시될 수 있다는 것을 이해할 수 있다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로 이해해야만 한다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명에 의해 정의되는 기술적 사상의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Although embodiments of the present invention have been described above with reference to the accompanying drawings, those of ordinary skill in the art to which the present invention pertains can practice the present invention in other specific forms without changing the technical spirit or essential features. can understand that there is Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. The protection scope of the present invention should be interpreted by the following claims, and all technical ideas within the equivalent range should be interpreted as being included in the scope of the technical ideas defined by the present invention.

Claims

A method for a computing device to generate output text from input text using a text generation model, the method comprising:
inputting a corpus divided into a plurality of clusters into the text generation model, and predicting a target cluster to which a first target token to be included in the output text belongs by using the output of the text generation model; and
Predicting the first target token by using the target population,
In the plurality of clusters, the distribution of the relative frequency of the tokens included in each cluster is more uniform than the distribution of the relative frequency of the tokens included in the entire learning target corpus,
How to create text.

According to claim 1,
predicting a second target token by inputting the predicted first target token into the text generation model; and
providing output text comprising the predicted first target token and the predicted second target token;
How to create text.

According to claim 1,
the input text is the title or first paragraph of a news article;
The output text is a fictional news article,
How to create text.

According to claim 1,
The input text is the title or first verse of a song,
The output text is a virtual song lyrics,
How to create text.

According to claim 1,
The input text is the introductory part of the novel,
The output text is the rest of the novel,
How to create text.

According to claim 1,
The target token is a token immediately following the input text in a sentence to which the input text selected from the corpus belongs.
How to create text.

According to claim 1,
The text generation model is
Predicting the target cluster to which the next word of the sentence to which the input text belongs belongs, and learning to maximize the probability of predicting the target token within the predicted cluster,
How to create text.