KR20230146398A

KR20230146398A - Sequence text summary processing device using bart model and control method thereof

Info

Publication number: KR20230146398A
Application number: KR1020220045399A
Authority: KR
Inventors: 김태균; 이정하; 김지현; 김우주; 조정제
Original assignee: 주식회사 엘지유플러스
Priority date: 2022-04-12
Filing date: 2022-04-12
Publication date: 2023-10-19

Abstract

본 발명은 바트 모델을 활용한 시퀀셜 텍스트 요약 처리 장치 및 그 제어방법에 관한 것이다. 본 발명에 따른 시퀀셜 텍스트 요약 처리 장치는, 인코더와 디코더를 포함하는 바트 모델을 활용한 시퀀셜 텍스트 요약 처리 장치에 있어서, 시퀀셜 텍스트를 토픽 모델에 입력하여 토픽 모델링 데이터를 생성하고, 그 생성된 토픽 모델링 데이터에 대한 문서 토픽 임베딩 값을 산출하는 문서 토픽 임베딩 산출부와; 상기 바트 모델의 인코더와 해당 인코더로 입력값을 전송하는 입력층 사이에 배치되는 토픽 어텐션 레이어를 포함하고, 상기 토픽 어텐션 레이어는, 상기 문서 토픽 임베딩 산출부에서 산출된 문서 토픽 임베딩 값을 수신하고, 그 수신된 문서 토픽 임베딩 값과 상기 입력층으로부터 입력된 입력값에 기초한 크로스 어텐션 처리를 수행하는 것을 특징으로 한다.The present invention relates to a sequential text summary processing device using the BART model and a control method thereof. The sequential text summary processing device according to the present invention uses a BART model including an encoder and a decoder, inputs sequential text into a topic model to generate topic modeling data, and models the generated topic. a document topic embedding calculation unit that calculates a document topic embedding value for data; and a topic attention layer disposed between an encoder of the BART model and an input layer that transmits an input value to the encoder, wherein the topic attention layer receives the document topic embedding value calculated by the document topic embedding calculation unit, Cross attention processing is performed based on the received document topic embedding value and the input value input from the input layer.

Description

Sequential text summary processing device and control method using BART model {SEQUENCE TEXT SUMMARY PROCESSING DEVICE USING BART MODEL AND CONTROL METHOD THEREOF}

본 발명은 시퀀셜 텍스트 요약 처리 장치 및 그 제어방법에 관한 것으로, 보다 상세하게는 바트 모델을 활용한 시퀀셜 텍스트 요약 처리 장치 및 그 제어방법에 관한 것이다.The present invention relates to a sequential text summary processing device and a control method thereof, and more specifically, to a sequential text summary processing device and a control method using the BART model.

최근 머신 러닝(Machine Learning : 기계 학습)의 한 방식으로 도입된 딥러닝(Deep Learning)이 뛰어난 성과를 보이면서, 다양한 분야에 인공지능을 도입하고자 하는 시도들이 이루어지고 있다.Recently, Deep Learning, which was introduced as a method of machine learning, has shown excellent results, and attempts are being made to introduce artificial intelligence in various fields.

특히 자연어 처리는 기계에게 인간의 언어를 이해시킨다는 점에서 의미있고 중요한 분야이다.In particular, natural language processing is a meaningful and important field in that it enables machines to understand human language.

자연어(natural language)란 일상생활에서 사용하는 언어를 의미하는데, 자연어 처리(natural language processing)란 이러한 자연어의 의미를 분석하여 컴퓨터가 처리할 수 있도록 하는 일을 통칭하는 것이다.Natural language refers to language used in daily life, and natural language processing is a general term for analyzing the meaning of natural language and allowing computers to process it.

자연어 처리는 음성 인식, 내용 요약, 번역, 사용자의 감성 분석, 텍스트 분류 작업(스팸 메일 분류, 뉴스 기사 카테고리 분류), 질의 응답 시스템, 챗봇 등 무궁무진한 분야에서 사용될 수 있다.Natural language processing can be used in a variety of fields, including voice recognition, content summarization, translation, user sentiment analysis, text classification tasks (spam email classification, news article category classification), question and answer systems, and chatbots.

자연어 처리를 수행하는 다양한 모델이 제시된 바 있는데, 대표적인 것이 순환 신경망(Recurrent Neural Network, RNN)을 이용한 seq2seq(sequence-to-sequence) 모델이다.Various models that perform natural language processing have been proposed, the representative one being the seq2seq (sequence-to-sequence) model using a recurrent neural network (RNN).

seq2seq 모델은 한 문장(시퀀스)을 다른 문장(시퀀스)으로 변환하는 모델로서, 내부적으로 인코더(Encoder)와 디코더(Decoder)로 구성되어 있다.The seq2seq model is a model that converts one sentence (sequence) into another sentence (sequence), and is internally composed of an encoder and a decoder.

여기서 인코더는 입력 데이터를 인코딩(부호화)하고, 디코더는 인코딩된 데이터를 디코딩(복호화)하는데, 예를 들어 특정 문장이 인코더를 통과하면서 해당 문장의 의미가 응축된 정보를 포함하는 콘텍스트 벡터(Context vector)가 만들어지고, 이러한 콘텍스트 벡터가 다시 디코더를 통과하면서 특정 문장에 대응되는 출력 문장(예를 들어 번역 문장)이 만들어지는 것이다.Here, the encoder encodes (encodes) the input data, and the decoder decodes (decodes) the encoded data. For example, as a specific sentence passes through the encoder, a context vector containing condensed information of the meaning of the sentence is generated. ) is created, and as these context vectors pass through the decoder again, an output sentence (for example, a translated sentence) corresponding to a specific sentence is created.

그런데 이러한 seq2seq 모델에서는 인코더(Encoder)가 디코더로 전달하는 정보는 고정된 길이의 벡터인데, '고정된 길이'라는 말은 문장이 아무리 길어도 항상 같은 길이의 벡터로 변환된다는 것을 의미하고, 이에 따라 필요한 모든 정보를 제한된 길이의 고정된 벡터에 온전히 담지 못하는 문제 즉, 필요한 정보가 제대로 콘텍스트 벡터에 포함되지 못하는 문제가 발생하였다. 즉, 입력 문장이 길어질수록 효율적으로 학습이 이루어지지 못하는 것이다.However, in this seq2seq model, the information that the encoder transmits to the decoder is a vector of fixed length. The term 'fixed length' means that no matter how long the sentence is, it is always converted into a vector of the same length, and accordingly, the necessary A problem occurred in which all information could not be completely contained in a fixed vector of limited length, that is, the necessary information was not properly included in the context vector. In other words, the longer the input sentence, the less efficiently learning takes place.

이를 해결하기 위해 어텐션(Attention) 메커니즘이 도입되었는데, 어텐션은 디코더(Decoder)에서 출력 단어를 예측하는 매 시점(time-step)마다, 인코더에서의 전체 입력 문장을 다시 한 번 참고하도록 하는 것이다.To solve this, an attention mechanism was introduced. Attention refers to the entire input sentence in the encoder once again at each time-step when the decoder predicts an output word.

이때, 전체 입력 문장을 전부 다 동일한 비율로 참고하는 것이 아니라, 해당 시점에서 예측해야할 단어와 연관이 있는 입력 단어 부분이 좀 더 가중치가 갖도록 하는 것이다.At this time, rather than referring to all input sentences at the same rate, the part of the input word that is related to the word to be predicted at that point is given more weight.

이에 따라 입력 문장이 길어지더라도 입력 문장에 대한 응축된 의미가 출력단에 전달될 수 있어서 정보 소실 문제가 해소될 수 있는 것이다.Accordingly, even if the input sentence is long, the condensed meaning of the input sentence can be transmitted to the output stage, thereby solving the problem of information loss.

그러나 이처럼 seq2seq 모델에 어텐션 매커니즘을 적용한다 하여도 자연어 처리 능력이 만족할만한 수준에 도달하지는 못하였고, 따라서 '트랜스포머(Transformer)'라는 새로운 모델이 제시되었다.However, even if the attention mechanism was applied to the seq2seq model, the natural language processing ability did not reach a satisfactory level, and therefore a new model called 'Transformer' was proposed.

트랜스포머(Transformer)는 2017년 구글이 발표한 논문인 "Attention is all you need"에서 나온 모델로 기존의 seq2seq의 구조인 인코더-디코더를 따르면서도, 논문의 이름처럼 어텐션(Attention)만으로 구현한 모델이다.Transformer is a model derived from the paper “Attention is all you need” published by Google in 2017. It follows the existing seq2seq structure of encoder-decoder, but as the name of the paper suggests, it is a model implemented only with attention. .

이 모델은 이 모델은 RNN을 사용하지 않고, 인코더-디코더 구조를 설계하였음에도 번역 성능에서도 RNN보다 우수한 성능을 보여주었다.This model showed better performance than RNN in translation performance even though this model did not use RNN and designed an encoder-decoder structure.

이러한 트랜스포머(Transformer) 모델 구조에 착안하여 확장한 다양한 모델로서, BERT(Bidirectional Encoder Representations from Transformers), GPT (Generative Pre-trained Transformer), BART(Bidirectional Auto-Regressive Transformer) 모델 등이 있다.Various models expanded based on this Transformer model structure include BERT (Bidirectional Encoder Representations from Transformers), GPT (Generative Pre-trained Transformer), and BART (Bidirectional Auto-Regressive Transformer) models.

여기서 BERT 모델은 2018년에 구글이 공개한 사전 훈련된 모델로서, 그 기본 구조는 트랜스포머의 인코더를 쌓아올린 구조이다.Here, the BERT model is a pre-trained model released by Google in 2018, and its basic structure is a structure that stacks transformer encoders.

또한 GPT 모델은 트랜스포머에서 인코더를 제외하고 디코더 구조만을 활용한 모델로서 BERT 모델이 문장의 의미를 추출하는 데 강점을 지닌 것과 대조적으로, GPT는 문장 생성에 강점을 가지는 모델이다.In addition, the GPT model is a model that excludes the encoder from the transformer and utilizes only the decoder structure. In contrast to the BERT model, which has strengths in extracting the meaning of sentences, GPT is a model that has strengths in sentence generation.

또한 BART 모델은 상술한 BERT와 GPT를 하나로 합친 것과 유사한 형태를 가지는 것으로서, 기존 Sequence to Sequence 트랜스포머 모델을 새로운 Pre-training objective를 통해 학습하여 하나로 합친 모델이다.In addition, the BART model has a similar form to the above-mentioned BERT and GPT combined into one, and is a model that combines the existing Sequence to Sequence transformer model by learning it through a new pre-training objective.

이러한 BART 모델을 활용함으로써 다양한 분야의 자연어 처리에 상당한 진전이 이루어지고 있는데, 그 중 시퀀셜 텍스트에 대한 요약문 생성과 관련한 부분에 있어서는 다소 미흡한 실정이다.By using this BART model, significant progress has been made in natural language processing in various fields, but it is somewhat inadequate in terms of generating summaries for sequential text.

여기서 시퀀셜 텍스트는 사람의 대화나 상담과 같이 시계열적인 문맥에 대한 분석이 필수적인 텍스트로서, 이러한 시퀀셜 텍스트로부터 요약문을 생성하는 것은 대화 또는 상담 내용에 대한 수동 또는 자동 분석에 있어서 상당히 중요한 일이다.Here, sequential text is a text in which time-series context analysis is essential, such as human conversation or consultation. Generating a summary from such sequential text is very important in manual or automatic analysis of conversation or consultation content.

따라서 최근 자연어 처리에 두각을 보이는 BART 모델을 활용하면서도 요약문 생성에 성능 개선을 나타내는 수정 모델 또는 그러한 수정 모델을 포함하는 장치에 대한 개발이 요청되고 있다.Therefore, there is a request for the development of a modified model that utilizes the BART model, which has recently shown prominence in natural language processing, while improving performance in generating summary sentences, or a device that includes such a modified model.

등록특허 제10-2256007호Registered Patent No. 10-2256007

본 발명은 상기한 종래의 요청에 부응하기 위해 안출된 것으로서, 그 목적은 대화나 상담과 같은 내용을 포함하는 시퀀셜 텍스트에 대한 자동 요약 성능을 개선시킨 시퀀셜 텍스트 요약 처리 장치 및 그 제어방법을 제공하는 것이다.The present invention was conceived to respond to the above-described conventional requests, and its purpose is to provide a sequential text summary processing device and a control method thereof that improve automatic summarization performance for sequential text containing content such as conversation or consultation. will be.

상기한 목적을 달성하기 위해 본 발명에 따른 시퀀셜 텍스트 요약 처리 장치는, 인코더와 디코더를 포함하는 바트 모델을 활용한 시퀀셜 텍스트 요약 처리 장치에 있어서, 시퀀셜 텍스트를 토픽 모델에 입력하여 토픽 모델링 데이터를 생성하고, 그 생성된 토픽 모델링 데이터에 대한 문서 토픽 임베딩 값을 산출하는 문서 토픽 임베딩 산출부와; 상기 바트 모델의 인코더와 해당 인코더로 입력값을 전송하는 입력층 사이에 배치되는 토픽 어텐션 레이어를 포함하고, 상기 토픽 어텐션 레이어는, 상기 문서 토픽 임베딩 산출부에서 산출된 문서 토픽 임베딩 값을 수신하고, 그 수신된 문서 토픽 임베딩 값과 상기 입력층으로부터 입력된 입력값에 기초한 크로스 어텐션 처리를 수행할 수 있다.In order to achieve the above object, a sequential text summary processing device according to the present invention utilizes a BART model including an encoder and a decoder, and generates topic modeling data by inputting sequential text into a topic model. a document topic embedding calculation unit that calculates a document topic embedding value for the generated topic modeling data; and a topic attention layer disposed between an encoder of the BART model and an input layer that transmits an input value to the encoder, wherein the topic attention layer receives the document topic embedding value calculated by the document topic embedding calculation unit, Cross attention processing can be performed based on the received document topic embedding value and the input value input from the input layer.

또, 상기한 목적을 달성하기 위해 본 발명에 따른 시퀀셜 텍스트 요약 처리 장치의 제어방법은, 인코더와 디코더를 포함하는 바트 모델을 활용한 시퀀셜 텍스트 요약 처리 장치의 제어방법에 있어서, 시퀀셜 텍스트를 토픽 모델에 입력하여 토픽 모델링 데이터를 생성하고, 그 생성된 토픽 모델링 데이터에 대한 문서 토픽 임베딩 값을 산출하는 단계와; 상기 바트 모델의 인코더와 해당 인코더로 입력값을 전송하는 입력층 사이에 토픽 어텐션 레이어를 배치하는 단계와; 상기 산출된 문서 토픽 임베딩 값을 상기 토픽 어텐션 레이어에 전달하여 상기 토픽 어텐션 레이어에 의해 상기 문서 토픽 임베딩 값 및 상기 입력층으로부터 입력된 입력값에 기초한 크로스 어텐션 처리가 이루어지도록 하는 단계를 포함할 수 있다.In addition, in order to achieve the above object, the control method of the sequential text summary processing device according to the present invention is a method of controlling the sequential text summary processing device using a BART model including an encoder and a decoder, where the sequential text is converted into a topic model. generating topic modeling data by inputting it into , and calculating a document topic embedding value for the generated topic modeling data; Placing a topic attention layer between the encoder of the BART model and an input layer that transmits an input value to the encoder; It may include the step of transmitting the calculated document topic embedding value to the topic attention layer so that cross attention processing is performed by the topic attention layer based on the document topic embedding value and the input value input from the input layer. .

이상 설명한 바와 같이 본 발명에 따르면, 시퀀셜 데이터에 대해 보다 개선된 요약문이 자동으로 생성될 수 있다.As described above, according to the present invention, an improved summary can be automatically generated for sequential data.

도 1은 본 발명의 일 실시예에 따른 시퀀셜 텍스트 요약 처리 장치의 기능블록도이고,
도 2는 내지 도 5는 도 1의 문서 토픽 임베딩 산출부에서 처리되는 과정에서 각 단계마다 생성되는 데이터 형태의 일 예이고,
도 6은 도 1의 토픽 어텐션 레이어의 구체적인 기능 블록도이다.1 is a functional block diagram of a sequential text summary processing device according to an embodiment of the present invention;
Figures 2 to 5 are examples of data types generated at each stage in the process of processing in the document topic embedding calculation unit of Figure 1;
FIG. 6 is a detailed functional block diagram of the topic attention layer of FIG. 1.

이하에서는 첨부도면을 참조하여 본 발명에 대해 상세히 설명한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

이하 본 발명에 따른 각 실시예는 본 발명의 이해를 돕기 위한 하나의 예에 불과하고, 본 발명이 이러한 실시예에 한정되는 것은 아니다. 특히 본 발명은 각 실시예에 포함되는 개별 구성, 개별 기능, 또는 개별 단계 중 적어도 어느 하나 이상의 조합으로 구성될 수 있다.Each embodiment according to the present invention below is only an example to aid understanding of the present invention, and the present invention is not limited to these embodiments. In particular, the present invention may be comprised of a combination of at least one of the individual components, individual functions, or individual steps included in each embodiment.

특히, 편의상 청구 범위의 일부 청구항에는 '(a)'와 같은 알파벳을 포함시켰으나, 이러한 알파벳이 각 단계의 순서를 규정하는 것은 아니다.In particular, for convenience, alphabet letters such as '(a)' are included in some claims, but these alphabet letters do not specify the order of each step.

본 발명의 일 실시예에 따른 시퀀셜 텍스트 요약 처리 장치(100)는 도 1에 도시된 바와 같이 인코더 입력부(110), 인코더부(120), 디코더 입력부(130), 디코더, 디코더 출력부(150), 문서 토픽 임베딩 산출부(160), 토픽 어텐션 레이어(170)를 포함하여 구성된다.As shown in FIG. 1, the sequential text summary processing device 100 according to an embodiment of the present invention includes an encoder input unit 110, an encoder unit 120, a decoder input unit 130, a decoder, and a decoder output unit 150. , a document topic embedding calculation unit 160, and a topic attention layer 170.

인코더 입력부(110)는 시퀀셜 텍스트를 입력받아서 인코더에서 처리될 수 있는 형태로 가공하여 인코더로 전달하는 기능을 수행한다.The encoder input unit 110 receives sequential text, processes it into a form that can be processed by the encoder, and transmits it to the encoder.

여기서 시퀀셜 텍스트는 상담전화 통화 내용과 같이 소정의 순서에 따른 대화 내용을 포함하는 텍스트로서, 본 실시예에서는 그러한 대화 내용 자체일 수도 있고, 또는 그러한 대화 내용으로부터 특정 품사(일 예로 '명사')만을 순서대로 추출한 것일 수도 있다.Here, the sequential text is a text that includes conversation content according to a predetermined order, such as the content of a consultation phone call. In this embodiment, it may be the conversation content itself, or only a specific part of speech (for example, a 'noun') from such conversation content. It may be that they were extracted in order.

더 나아가 시퀀셜 텍스트는 발화자의 발화시점 토큰을 포함할 수도 있다. 즉, 양 당사자가 대화를 하는 경우, 각 당사자가 발화하는 시점에 대한 정보를 포함할 수 있는 것이다.Furthermore, the sequential text may include the speaker's utterance point token. In other words, when two parties are having a conversation, information about when each party speaks can be included.

특히 인코더 입력부(110)는 입력된 시퀀셜 텍스트에 대해 임베딩 처리를 수행할 수 있다.In particular, the encoder input unit 110 can perform embedding processing on the input sequential text.

자연어 처리 분야에서 임베딩 처리라는 것은 각각의 단어를 벡터로 표현하는 것을 의미하고, 이러한 벡터는 수십 내지 수천차원을 가지는 숫자로 이루어질 수도 있다.In the field of natural language processing, embedding processing means expressing each word as a vector, and these vectors may consist of numbers with tens to thousands of dimensions.

더 나아가 인코더 입력부(110)는 시퀀셜 텍스트에 포함된 각 단어의 위치에 대한 정보를 반영한 후에 인코더부(120)에 전달할 수 있는데, 단어의 위치 정보를 반영하는 것은 포지셔널 인코딩(Positional Encoding)이라 한다.Furthermore, the encoder input unit 110 can reflect information on the position of each word included in the sequential text and then transmit it to the encoder unit 120. Reflecting the position information of the word is called positional encoding. .

한편, 문서 토픽 임베딩 산출부(160)는 토픽 모델을 포함하여 구성되는 것으로서, 시퀀셜 텍스트를 해당 토픽 모델에 입력하여 토픽 모델링 데이터가 생성되도록 하고, 그 생성된 토픽 모델링 데이터에 대한 문서 토픽 임베딩 값을 산출한 후 후술하는 토픽 어텐션 레이어(170)에 전달하는 기능을 수행한다.Meanwhile, the document topic embedding calculation unit 160 is configured to include a topic model, inputs sequential text into the corresponding topic model to generate topic modeling data, and generates a document topic embedding value for the generated topic modeling data. After calculating, it performs the function of transmitting to the topic attention layer 170, which will be described later.

여기서 토픽 모델은 기계 학습 및 자연언어 처리 분야에서 문서 집합의 추상적인 "주제"를 발견하기 위한 통계적 처리를 수행하는 것으로서, 잠재 의미 분석(LSA : Latent Semantic Analysis), 확률적 잠재 의미 인덱싱(PLSI : Probabilistic latent semantic indexing) 모델, 잠재 디리클레 할당(LDA : Latent Dirichlet Allocation) 모델 등을 포함한다.Here, the topic model performs statistical processing to discover the abstract "topic" of a document set in the field of machine learning and natural language processing, including latent semantic analysis (LSA) and probabilistic latent semantic indexing (PLSI: It includes the Probabilistic latent semantic indexing (LDA) model and the Latent Dirichlet Allocation (LDA) model.

다만, 본 실시예에서 토픽 모델은 잠재 디리클레 할당 모델인 것을 일 예로 한다.However, in this embodiment, the topic model is an example of a latent Dirichlet allocation model.

시퀀셜 텍스트를 잠재 디리클레 할당 모델에 적용시킴으로써 토픽 모델링 데이터가 생성되도록 하는 기술 그 자체는 공지된 기술에 해당하므로 보다 상세한 설명은 생략한다.Since the technology itself for generating topic modeling data by applying sequential text to a latent Dirichlet allocation model is a known technology, a more detailed description will be omitted.

즉, 소정의 데이터로부터 잠재 디리클레 할당 모델에 의해 도 2에 도시된 바와 같이 문서별 토픽 비중 데이터(도 2(a))와 토픽별 단어의 분포 데이터(도 2(b))가 추출될 수 있는데, 이러한 처리 알고리즘은 공지된 기술에 해당하는 것이다.In other words, topic proportion data for each document (FIG. 2(a)) and word distribution data for each topic (FIG. 2(b)) can be extracted from predetermined data using a latent Dirichlet allocation model, as shown in FIG. 2. , this processing algorithm corresponds to known technology.

문서 토픽 임베딩 값을 산출함에 있어서, 문서 토픽 임베딩 산출부(160)는 상술한 잠재 디리클레 할당 모델에 의해 생성된 데이터(즉, 문서별 토픽 비중 데이터와 토픽별 단어의 분포 데이터)와, 시퀀셜 텍스트에 기초한 워드 임베딩(word embedding) 처리 데이터를 함께 고려할 수 있다.In calculating the document topic embedding value, the document topic embedding calculation unit 160 uses data generated by the latent Dirichlet allocation model described above (i.e., topic proportion data for each document and word distribution data for each topic) and sequential text. Based on word embedding processing data can also be considered.

여기서 워드 임베딩은 각 단어를 밀집 벡터(dense vector)의 형태로 처리하는 것을 의미하는데, 이러한 밀집 벡터는 워드 임베딩 과정을 통해 나온 결과에 해당하므로 임베딩 벡터(embedding vector)라고도 한다.Here, word embedding means processing each word in the form of a dense vector. Since this dense vector corresponds to the result of the word embedding process, it is also called an embedding vector.

워드 임베딩 처리 방법에는 LSA, Word2Vec, FastText, Glove 등이 있는데, 본 발명에서는 이 중 FastText를 이용하는 것을 일 예로 한다.Word embedding processing methods include LSA, Word2Vec, FastText, and Glove, among which FastText is used as an example in the present invention.

FastText는 페이스북에서 개발한 것으로서 메커니즘 자체는 Word2Vec의 확장이라고 할 수 있는데, Word2Vec이 단어를 쪼개질 수 없는 단위로 생각한다면, FastText는 하나의 단어 안에도 여러 단어들이 존재하는 것으로 간주하여, 내부 단어. 즉, 서브워드(subword)를 고려하여 학습한다는 점이 차이점이다.FastText was developed by Facebook, and the mechanism itself can be said to be an extension of Word2Vec. If Word2Vec considers words as indivisible units, FastText considers multiple words to exist within one word, and internal words . In other words, the difference is that it is learned by considering subwords.

문서 토픽 임베딩 값을 산출하는 구체적인 예로써, 문서 토픽 임베딩 산출부(160)는 잠재 디리클레 할당 모델을 이용하여 생성한 토픽 모델링 데이터 중, 문서별 토픽 비중 데이터를 가중치로 하여 각 토픽별 단어의 분포데이터를 조정(일 예로 문서별 토픽 비중 값과 각 토픽별 단어의 비중 값을 서로 곱한 값을 생성)하고(도 3 ), 그 조정된 각 토픽별 단어의 분포 데이터와 패스트텍스트(FastText)에 의해 생성된 워드 임베딩 데이터(도 4)를 벡터곱 처리하여 문서 토픽 임베딩 값을 산출할 수 있는 것이다.As a specific example of calculating the document topic embedding value, the document topic embedding calculation unit 160 calculates the distribution data of words for each topic by using the topic ratio data for each document as a weight among the topic modeling data generated using the latent Dirichlet allocation model. is adjusted (for example, a value is generated by multiplying the topic proportion value for each document and the proportion value for each topic) (Figure 3), and generated using the adjusted word distribution data for each topic and FastText. The document topic embedding value can be calculated by vector multiplying the word embedding data (Figure 4).

여기서 벡터곱은 각 행렬간의 곱셈 처리에 해당할 수 있다.Here, vector multiplication may correspond to a multiplication process between each matrix.

이러한 과정에 의해 최종적으로 생성된 문서 토픽 임베딩의 일 예가 도 5에 도시되었다.An example of the document topic embedding finally generated through this process is shown in Figure 5.

토픽 어텐션 레이어(170)는 문서 토픽 임베딩 산출부(160)로부터 수신된 문서 토픽 임베딩 값 및 입력층으로부터 입력된 입력값을 이용하여 가공하는 기능을 수행한다.The topic attention layer 170 performs a processing function using the document topic embedding value received from the document topic embedding calculation unit 160 and the input value input from the input layer.

구체적으로 토픽 어텐션 레이어(170)는 토픽 어텐션 레이어(170)는 문서 토픽 임베딩 산출부(160)로부터 수신된 문서 토픽 임베딩 값 및 입력층으로부터 입력된 입력값을 이용하여 크로스 어텐션 처리를 수행한 후, 잔차 연결 및 층정규화(Residual connection and Layer Normalization, 일명 "Add & Norm") 처리를 수행한 결과를 피드 포워드 신경망을 통과시킨 후 다시 잔차 연결 및 층정규화 처리를 수행할 수 있다.Specifically, the topic attention layer 170 performs cross attention processing using the document topic embedding value received from the document topic embedding calculation unit 160 and the input value input from the input layer, The results of residual connection and layer normalization (aka "Add & Norm") processing can be passed through a feed-forward neural network, and then residual connection and layer normalization processing can be performed again.

인코더부(120)는 토픽 어텐션 레이어(170)로부터 수신되는 입력 데이터를 처리하여 시퀀셜 텍스트에 대한 의미가 함축되어 있는 정보를 생성하는 기능을 수행한다.The encoder unit 120 processes input data received from the topic attention layer 170 and performs a function of generating information containing the meaning of the sequential text.

이를 위해 인코더부(120)에는 멀티-헤드 어텐션(Multi-Head Attention) 모듈, 잔차 연결 및 층정규화(Add & Norm) 처리 모듈, 피드 포워드(Feed Forward) 모듈이 각각 적어도 하나 포함될 수 있다.To this end, the encoder unit 120 may include at least one multi-head attention module, a residual connection and layer normalization (Add & Norm) processing module, and a feed forward module.

이러한 인코더부(120)는 같은 구조로 복수 개로 구성될 수 있다.A plurality of such encoder units 120 may be configured with the same structure.

디코더 입력부(130)는 데이터를 입력받아 상술한 인코더 입력부(110)와 마찬가지로 임베딩 처리와 포지셔널 인코딩 처리를 수행하는 것으로서, 그 처리된 데이터를 디코더부(140)로 전송하는 기능을 수행한다.The decoder input unit 130 receives data and performs embedding processing and positional encoding processing, similar to the above-described encoder input unit 110, and transmits the processed data to the decoder unit 140.

이러한 디코더 입력부(130)에 입력되는 데이터는 기계 학습 단계에서는 문서 요약문일 수 있고, 실제 새로운 시퀀셜 텍스트에 대한 요약문 생성 단계에서는 디코드 시작을 제어하는 제어 텍스트일 수 있다.The data input to the decoder input unit 130 may be a document summary in the machine learning stage, and may be control text that controls the start of decoding in the summary generation stage for the actual new sequential text.

디코더부(140)는 디코더 입력부(130)로부터 입력된 데이터와 인코더로부터 입력되는 데이터를 함께 처리하여 시퀀셜 텍스트에 대응되는 요약문 데이터를 생성하는 기능을 수행한다.The decoder unit 140 processes the data input from the decoder input unit 130 and the data input from the encoder together to generate summary data corresponding to sequential text.

이를 위해 디코더부(140)에는 마스크드 멀티-헤드 어텐션(Masked Multi-Head Attention) 모듈, 멀티-헤드 어텐션(Multi-Head Attention) 모듈, 잔차 연결 및 층정규화(Add & Norm) 처리 모듈, 피드 포워드(Feed Forward) 모듈이 각각 적어도 하나 포함될 수 있다.To this end, the decoder unit 140 includes a Masked Multi-Head Attention module, a Multi-Head Attention module, a residual connection and layer normalization (Add & Norm) processing module, and a feed forward. At least one (Feed Forward) module may be included.

이러한 디코더부(140)는 같은 구조로 복수 개로 구성될 수 있다.A plurality of such decoder units 140 may be configured with the same structure.

디코더 출력부(150)는 디코더부(140)에서 생성된 데이터를 원하는 형태 또는 원하는 범위의 데이터가 되도록 처리하는 기능을 수행한다.The decoder output unit 150 performs the function of processing the data generated by the decoder unit 140 so that it becomes data in a desired form or range.

예를 들어 디코더 출력부(150)에는 댄스 레이어 또는 리니어 레이어가 포함될 수 있고, 이러한 디코더 출력부(150)는 입력받은 값을 출력으로 0~1사이의 값으로 모두 정규화하며 출력되도록 하기 위해 소프트맥스(Softmax)와 같은 활성화 함수 기능도 수행할 수 있다.For example, the decoder output unit 150 may include a dance layer or a linear layer, and the decoder output unit 150 normalizes all input values to values between 0 and 1 as output, and uses softmax to output them. Activation function functions such as (Softmax) can also be performed.

상술한 바와 같이 본 발명의 일 실시예에 따른 시퀀셜 텍스트 요약 처리 장치(100)에는 여러 가지 기능부들이 포함되어 있는데, 인코더 입력부(110), 인코더부(120), 디코더 입력부(130), 디코더부(140), 디코더 출력부(150)의 구조 및 그 고유 기능은 종래의 바트 모델과 동일하다.As described above, the sequential text summary processing device 100 according to an embodiment of the present invention includes various functional units, including an encoder input unit 110, an encoder unit 120, a decoder input unit 130, and a decoder unit. (140), the structure of the decoder output unit 150 and its unique function are the same as the conventional BART model.

다만, 본 발명에서는 문서 토픽 임베딩 산출부(160)가 추가로 구성되어 있고, 또한 토픽 어텐션 레이어(170)가 인코더 입력부(110)와 인코더부(120) 사이에 배치되어 있다는 점이 종래와 차이가 있는 것이다.However, the present invention differs from the prior art in that a document topic embedding calculation unit 160 is additionally configured, and a topic attention layer 170 is disposed between the encoder input unit 110 and the encoder unit 120. will be.

즉, 종래에는 인코더부(120) 영역쪽에는 별도의 크로스 어텐션을 처리하는 기능이 존재하지 않았으나, 본 발명에서는 토픽 어텐션 레이어(170)를 추가로 배치한 후에, 문서 토픽 임베딩 산출부(160)로부터 입력되는 문서 토픽 임베딩 값과 입력층으로부터 입력되는 입력값에 대해 크로스 어텐션 처리를 수행하도록 한다는 점에 차이가 있다.That is, conventionally, there was no separate function for processing cross attention in the area of the encoder unit 120, but in the present invention, after additionally arranging the topic attention layer 170, the document topic embedding calculation unit 160 The difference is that cross attention processing is performed on the input document topic embedding value and the input value input from the input layer.

이러한 토픽 어텐션 레이어(170)의 구체적은 구조는 도 6에 도시된 바와 같다.The specific structure of this topic attention layer 170 is as shown in FIG. 6.

동 도면에 도시된 바와 같이 토픽 어텐션 레이어(170)는 아래에서부터 크로스 어텐션 처리부(171), 잔차 연결 및 층정규화 처리부(172), 피드 포워드 처리부(173), 잔차 연결 및 층정규화 처리부(174)가 순차적으로 배치된 형태로 이루어질 수 있다.As shown in the figure, the topic attention layer 170 includes, from the bottom, a cross attention processing unit 171, a residual concatenation and layer normalization processing unit 172, a feed forward processing unit 173, and a residual concatenation and layer normalization processing unit 174. It may be arranged sequentially.

여기서 잔차 연결 및 층정규화 처리부와 피드 포워드 처리부는 인코더부(120)와 디코더부(140)에 포함되는 것과 동일한 기능을 수행하는 것이고, 크로스 어텐션 처리부는 다음의 수식에 따른 처리를 수행할 수 있다. Here, the residual concatenation and layer normalization processing unit and the feed forward processing unit perform the same functions as those included in the encoder unit 120 and the decoder unit 140, and the cross attention processing unit can perform processing according to the following equation.

여기서, Query는 입력부로부터 수신되는 임베딩 데이터이고, Key는 문서 토픽 임베딩 산출부로부터 입력된 문서 토픽 임베딩 값(토큰 임베딩 값)이며, Value는 문서 토픽 임베딩 산출부로부터 입력된 문서 토픽 임베딩 값이고, 는 토큰 및 문서 토픽 임베딩의 차원이다.Here, Query is the embedding data received from the input unit, Key is the document topic embedding value (token embedding value) input from the document topic embedding calculation unit, Value is the document topic embedding value input from the document topic embedding calculation unit, is the dimension of token and document topic embedding.

본 발명에 따른 처리 결과와 종래의 BART 모델에 따른 결과를 대표적인 요약 모델의 성능 지표인 Rouge score를 통해서 평가한 결과, 본 발명에 따른 Rouge score가 종래와 비교할 때 10%~15% 높은 수치를 나타냄을 확인할 수 있었다.As a result of evaluating the processing results according to the present invention and the results according to the conventional BART model through the Rouge score, which is a performance indicator of a representative summary model, the Rouge score according to the present invention is 10% to 15% higher than the conventional one. was able to confirm.

실험은 12,000 건의 고객센터 대화형 데이터를 대상으로 진행되었고, 정답 요약문은 상담사가 직접 라벨링 하였다.The experiment was conducted on 12,000 customer center interactive data, and the summaries of correct answers were labeled directly by the agent.

한편, 상술한 각 실시예를 수행하는 과정은 소정의 기록 매체(예를 들어 컴퓨터로 판독 가능한)에 저장된 프로그램 또는 애플리케이션에 의해 이루어질 수 있음은 물론이다. 여기서 기록 매체는 RAM(Random Access Memory)과 같은 전자적 기록 매체, 하드 디스크와 같은 자기적 기록 매체, CD(Compact Disk)와 같은 광학적 기록 매체 등을 모두 포함한다.Meanwhile, of course, the process of performing each of the above-described embodiments can be performed by a program or application stored in a predetermined recording medium (eg, computer-readable). Here, recording media include electronic recording media such as RAM (Random Access Memory), magnetic recording media such as hard disks, and optical recording media such as CDs (Compact Disk).

이때, 기록 매체에 저장된 프로그램은 컴퓨터나 스마트폰 등과 같은 하드웨어 상에서 실행되어 상술한 각 실시예를 수행할 수 있다. 특히, 상술한 본 발명에 따른 시퀀셜 텍스트 요약 처리 장치의 기능 블록 중 적어도 어느 하나는 이러한 프로그램 또는 애플리케이션에 의해 구현될 수 있다.At this time, the program stored in the recording medium can be executed on hardware such as a computer or smartphone to perform each of the above-described embodiments. In particular, at least one of the functional blocks of the sequential text summary processing device according to the present invention described above may be implemented by such a program or application.

또한, 본 발명은 상기한 특정 실시예에 한정되는 것이 아니라 본 발명의 요지를 벗어나지 않는 범위 내에서 여러 가지로 변형 및 수정하여 실시할 수 있는 것이다. 이러한 변형 및 수정이 첨부되는 청구범위에 속한다면 본 발명에 포함된다는 것은 자명할 것이다. In addition, the present invention is not limited to the specific embodiments described above, but can be implemented with various changes and modifications without departing from the gist of the present invention. It will be apparent that such changes and modifications are included in the present invention if they fall within the scope of the appended claims.

100 : 시퀀셜 텍스트 요약 처리 장치 110 : 인코더 입력부
120 : 인코더부 130 : 디코더 입력부
140 : 디코더부 150 : 디코더 출력부
160 : 문서 토픽 임베딩 산출부 170 : 토픽 어텐션 레이어100: Sequential text summary processing device 110: Encoder input unit
120: encoder unit 130: decoder input unit
140: decoder unit 150: decoder output unit
160: Document topic embedding calculation unit 170: Topic attention layer

Claims

In a sequential text summary processing device using a BART model including an encoder and a decoder,
a document topic embedding calculation unit that inputs sequential text into a topic model to generate topic modeling data, and calculates a document topic embedding value for the generated topic modeling data;
It includes a topic attention layer disposed between the encoder of the BART model and an input layer that transmits an input value to the encoder,
The topic attention layer receives the document topic embedding value calculated from the document topic embedding calculation unit, and performs cross attention processing based on the received document topic embedding value and the input value input from the input layer. A sequential text summary processing device.

According to paragraph 1,
A sequential text summary processing device, wherein the topic model is a Latent Dirichlet Allocation (LDA) model.

According to paragraph 2,
The document topic embedding calculation unit adjusts the distribution data of words for each topic using the topic proportion data for each document as a weight among the topic modeling data generated using the latent Dirichlet allocation model, and performs word embedding based on the sequential text. and calculating a document topic embedding value by vector multiplying the adjusted word distribution data for each topic and the generated word embedding data.

According to paragraph 3,
A sequential text summary processing device, characterized in that the word embedding processing uses a Fasttext method.

According to paragraph 1,
The sequential text is extracted from the conversation text based on a specific part of speech, and includes a speaker's utterance point token.

According to paragraph 1,
The topic attention layer performs cross attention processing using the document topic embedding value and the input value input from the input layer, and then passes the result of residual concatenation and layer normalization processing through a feed forward neural network and then performs cross attention processing again. A sequential text summary processing device characterized in that it performs residual concatenation and layer normalization processing.

According to clause 6,
A sequential text summary processing device, characterized in that the cross attention processing is performed using the following formula.

Here, Query is the embedding data received from the input unit,
Key is the document topic embedding value (token embedding value) input from the document topic embedding calculation unit,
Value is the document topic embedding value input from the document topic embedding calculation unit,
is the dimension of token and document topic embeddings.

In the control method of a sequential text summary processing device using a BART model including an encoder and a decoder,
(a) inputting sequential text into a topic model to generate topic modeling data, and calculating a document topic embedding value for the generated topic modeling data;
(b) placing a topic attention layer between the encoder of the BART model and an input layer that transmits an input value to the encoder;
(c) The document topic embedding value calculated in step (a) is transmitted to the topic attention layer, and cross attention processing is performed by the topic attention layer based on the document topic embedding value and the input value input from the input layer. A control method of a sequential text summary processing device, comprising the step of allowing

According to clause 8,
A control method for a sequential text summary processing device, wherein the topic model is a Latent Dirichlet Allocation (LDA) model.

According to clause 9,
In step (a),
(a1) adjusting the distribution data of words for each topic using topic proportion data for each document as a weight among the topic modeling data generated using the latent Dirichlet allocation model;
(a2) performing word embedding based on the sequential text;
(a3) calculating a document topic embedding value by vector multiplying the word distribution data for each topic adjusted in step (a1) and the word embedding data generated in step (a2). Control method of sequential text summary processing device.

According to clause 10,
A control method of a sequential text summary processing device, characterized in that in step (a2), word embedding is performed using the Fasttext method.

According to clause 8,
The sequential text is extracted from the conversation text based on a specific part of speech, and includes a speaker's utterance point token.

According to clause 8,
In step (c), cross attention processing is performed using the document topic embedding value and the input value from the input layer, and the result of residual concatenation and layer normalization processing is passed through a feed forward neural network. A control method for a sequential text summary processing device, characterized in that residual concatenation and layer normalization processing are performed again.

According to clause 13,
A control method of a sequential text summary processing device, characterized in that the cross attention processing is processed using the following equation.

Here, Query is the embedding data received from the input unit,
Key is the document topic embedding value (token embedding value) input from the document topic embedding calculation unit,
Value is the document topic embedding value input from the document topic embedding calculation unit,
is the dimension of token and document topic embeddings.

A computer-readable recording medium recording a program for executing the method of any one of claims 8 to 14.

An application program stored in a computer-readable recording medium in combination with hardware to execute the method of any one of claims 8 to 14.