KR102476492B1

KR102476492B1 - Deep Learning based Document Summarization Method and System

Info

Publication number: KR102476492B1
Application number: KR1020200042790A
Authority: KR
Inventors: 박소희; 김영철; 정이안
Original assignee: 에스케이 주식회사
Priority date: 2020-04-08
Filing date: 2020-04-08
Publication date: 2022-12-09
Also published as: KR20210125275A

Abstract

딥러닝 기반 문서 요약 생성 방법 및 시스템이 제공된다. 본 발명의 실시예에 따른 문서 요약 생성 방법은, 문서에 포함된 문장들을 전처리하고, 딥러닝 모델을 기반으로 문장이 문서의 중요 문장으로 추출될 확률인 추출 확률을 분석하고, 분석한 문장들의 추출 확률들과 문장들 간 유사도들을 기반으로 중요 문장들을 추출하며, 딥러닝 기반으로 추출된 중요 문장들을 이용하여 문서의 요약문을 생성한다. 이에 의해, 중요 문장으로 추출될 확률과 기추출된 중요 문장과의 유사도를 기반으로 중요 문장들을 최적으로 선정함으로써, 자동 요약문의 품질을 높일 수 있게 된다.A method and system for generating a deep learning-based document summary are provided. A method for generating a document summary according to an embodiment of the present invention preprocesses sentences included in a document, analyzes an extraction probability, which is a probability that a sentence is extracted as an important sentence of a document, based on a deep learning model, and extracts the analyzed sentences. It extracts important sentences based on probabilities and similarities between sentences, and generates a document summary using the extracted important sentences based on deep learning. Accordingly, the quality of the automatic summary sentence can be improved by optimally selecting important sentences based on the probability of being extracted as an important sentence and the degree of similarity with previously extracted important sentences.

Description

Deep Learning based Document Summarization Method and System}

본 발명은 인공지능 관련 기술에 관한 것으로, 더욱 상세하게는 딥러닝 기술을 이용하여 입력 문서의 요약문을 자동으로 생성하는 방법 및 시스템에 관한 것이다. The present invention relates to artificial intelligence-related technology, and more particularly, to a method and system for automatically generating a summary of an input document using deep learning technology.

정보의 홍수 속에 살고 있는 현대인들은 다양한 매체를 통해 막대한 양의 정보를 접할 수 밖에 없는데, 이 정보들을 습득하는 것은 아주 많은 시간과 노력을 필요로 한다.Modern people who live in a flood of information have no choice but to come into contact with a huge amount of information through various media, and acquiring this information requires a lot of time and effort.

이와 같은 고충을 해소하기 위해, 문서의 내용을 자동으로 요약하는 기법들이 등장하였다. 문서의 핵심이 되는 내용을 간추려 사용자에게 제공하여 주는 것이다.In order to solve such difficulties, techniques for automatically summarizing the contents of documents have appeared. It is to summarize the key contents of the document and provide it to the user.

TF-IDF 유사도 측정을 통한 문서 요약 기법이 이들 중 하나인데, 요약문에 불필요한 정보가 포함되는 경우가 빈번하다는 문제가 있다. 이에 따라, 자동 요약문의 품질을 높일 수 있는 방안이 필요하다.The document summary technique through TF-IDF similarity measurement is one of them, but there is a problem that unnecessary information is often included in the summary. Accordingly, there is a need for a method to improve the quality of the automatic summary.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은, 자동 요약문의 품질을 높이기 위한 방안으로, 딥러닝 기반으로 적정의 중요 문장들을 추출하고 이들로부터 추상 요약을 수행하는 문서 요약 생성 방법 및 시스템을 제공함에 있다.The present invention has been made to solve the above problems, and an object of the present invention is a method for improving the quality of an automatic summary, which extracts appropriate important sentences based on deep learning and abstracts the documents from them. It is to provide a method and system for generating a summary.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른, 문서 요약 생성 방법은, 문서에 포함된 문장들을 전처리하는 단계; 딥러닝 모델을 기반으로 문장이 문서의 중요 문장으로 추출될 확률인 추출 확률을 분석하고, 분석한 문장들의 추출 확률들과 문장들 간 유사도들을 기반으로, 중요 문장들을 추출하는 단계; 및 딥러닝 기반으로, 추출된 중요 문장들을 이용하여 문서의 요약문을 생성하는 단계;를 포함한다.According to an embodiment of the present invention for achieving the above object, a method for generating a document summary includes pre-processing sentences included in a document; analyzing an extraction probability, which is a probability that a sentence is extracted as an important sentence of a document, based on a deep learning model, and extracting important sentences based on similarities between the analyzed sentences and extraction probabilities of the analyzed sentences; and generating a summary of the document using the extracted important sentences based on deep learning.

그리고, 전처리 단계는, 문서를 문장 단위로 구분하여, 형태소 분석하는 단계; 및 형태소 분석된 문장들을 BPE(Byte-Pair Encoding)하는 단계;를 포함할 수 있다.The pre-processing step may include dividing the document into sentence units and morphologically analyzing the document; and performing Byte-Pair Encoding (BPE) on the morpheme-analyzed sentences.

또한, 추출 단계는, 전처리된 문장들을 개별 분석하여, 문장들의 문장 표현(Sentence Representation)들을 획득하는 제1 획득단계; 전처리된 문장들을 전후 문장과 연관 분석하여, 추출 확률들을 획득하는 제2 획득단계; 및 문장들의 추출 확률들과 문장 표현들로부터 계산한 문장들 간 유사도들을 이용하여, 문장들에서 중요 문장들을 선정하는 단계;를 포함할 수 있다.In addition, the extraction step may include a first acquiring step of acquiring sentence representations of the sentences by individually analyzing the preprocessed sentences; a second acquisition step of obtaining extraction probabilities by analyzing the preprocessed sentences in association with preceding and preceding sentences; and selecting important sentences from sentences by using similarities between sentences calculated from extraction probabilities of sentences and sentence expressions.

그리고, 제1 획득단계는, 자연어 처리를 위하여 사전 학습하는 딥러닝 모델을 이용하여, 문장들의 문장 표현들을 획득할 수 있다.And, in the first acquisition step, sentence representations of sentences may be obtained by using a deep learning model pretrained for natural language processing.

또한, 제1 획득단계는, 전처리된 문장들을 BERT(Bi-directional Encoder Representations from Transformers)에 입력하는 경우에 출력되는 문장들의 CLS(CLaSs) 토큰들을 문장들의 문장 표현들로 획득할 수 있다.Also, in the first acquisition step, CLS (CLaSs) tokens of sentences output when preprocessed sentences are input to a Bi-directional Encoder Representations from Transformers (BERT) may be acquired as sentence representations of sentences.

그리고, 제2 획득단계는, 문서를 구성하는 문장들에서 중요 문장들을 추출하도록 학습된 Bi-LSTM(Bi-directional - Long Short Term Memory) 네트워크들로 구성된 모델을 이용하여, 문장들의 추출 확률들을 획득할 수 있다.And, in the second obtaining step, extracting probabilities of sentences are obtained by using a model composed of Bi-directional - Long Short Term Memory (Bi-LSTM) networks learned to extract important sentences from sentences constituting the document. can do.

또한, 선정 단계는, 문장들의 스코어들을 계산하고, 계산된 스코어가 임계치를 초과하는 문장들을 중요 문장들로 선정하며, 스코어는, 문장의 추출 확률에 비례하고, 문장과 기선정된 중요 문장들 간의 유사도들 중 최대 유사도에 반비례할 수 있다.In addition, in the selection step, scores of sentences are calculated, sentences whose calculated scores exceed a threshold are selected as important sentences, the score is proportional to the probability of extracting sentences, and the relationship between the sentence and the previously selected important sentences is determined. It may be in inverse proportion to the maximum similarity among similarities.

그리고, 선정 단계는, 추출 확률이 높은 문장부터 스코어를 계산할 수 있다.In the selection step, scores may be calculated from sentences having a high extraction probability.

또한, 선정 단계는, 다음의 수학식을 이용하여 문장의 스코어를 계산하고,In addition, in the selection step, the score of the sentence is calculated using the following equation,

여기서, g_i는 i번째 문장의 스코어, α는 가중치, h_i는 i번째 문장의 추출 확률, E_k는 이미 선정된 k번째 중요 문장의 문장 표현, C_i는 i번째 문장의 문장 표현, sim()은 유사도 계산 함수, max[]는 최대값일 수 있다.Here, g _i is the score of the ith sentence, α is the weight, h _i is the extraction probability of the ith sentence, E _k is the sentence expression of the kth important sentence already selected, C _i is the sentence expression of the ith sentence, sim () may be a similarity calculation function, and max[] may be a maximum value.

그리고, 선정 단계는, 중요 문장들의 개수가 정해진 개수를 초과하지 않을 때까지 또는 중요 문장들의 전체 길이가 정해진 길이를 초과하지 않을 때까지 중요 문장들을 선정할 수 있다.In the selection step, important sentences may be selected until the number of important sentences does not exceed the predetermined number or until the total length of the important sentences does not exceed the predetermined length.

또한, 생성 단계는, 추출된 중요 문장들에서 중요 내용들만이 반영된 요약문을 생성할 수 있다.In addition, the generating step may generate a summary in which only important contents are reflected in the extracted important sentences.

그리고, 생성 단계는, 추출된 중요 문장들을 하나로 결합하여 BERT 인코더에 입력하는 단계; BERT 인코더의 인코딩 결과를 Attention 모듈을 통해 GRU(Gated Recurrent Unit) 디코더에 입력하여, 요약문을 생성하는 단계;를 포함할 수 있다.And, the generating step includes combining the extracted important sentences into one and inputting them to a BERT encoder; Generating a summary by inputting the encoding result of the BERT encoder to a gated recurrent unit (GRU) decoder through an attention module; may include.

또한, GRU 디코더는, 학습한 적 없는 단어를 요약문에 사용하기 위해, 출력에 필요한 단어를 입력에서 찾아 복사하여 사용할 수 있다.In addition, the GRU decoder may find and copy words required for output from the input in order to use unlearned words in the summary sentence.

그리고, GRU 디코더는, 사용된 단어의 Attention Distribution의 누적값을 Loss에 반영할 수 있다.And, the GRU decoder may reflect the accumulated value of the attention distribution of the used word to the loss.

또한, GRU 디코더는, 요약문의 최대 길이와 최소 길이를 제한할 수 있다.In addition, the GRU decoder may limit the maximum length and minimum length of the summary statement.

한편, 본 발명의 다른 실시예에 따른, 문서 요약 생성 시스템은, 문서에 포함된 문장들을 전처리하는 전처리부; 딥러닝 모델을 기반으로 문장이 문서의 중요 문장으로 추출될 확률인 추출 확률을 분석하고, 분석한 문장들의 추출 확률들과 문장들 간 유사도들을 기반으로, 중요 문장들을 추출하는 추출부; 및 딥러닝 기반으로, 추출된 중요 문장들을 이용하여 문서의 요약문을 생성하는 생성부;를 포함한다.Meanwhile, according to another embodiment of the present invention, a document summary generation system includes a pre-processing unit for pre-processing sentences included in a document; an extraction unit that analyzes an extraction probability, which is a probability that a sentence is extracted as an important sentence of a document based on a deep learning model, and extracts important sentences based on similarities between the analyzed sentences and extraction probabilities of the analyzed sentences; and a generation unit generating a summary of the document using the extracted important sentences based on deep learning.

이상 설명한 바와 같이, 본 발명의 실시예들에 따르면, 중요 문장으로 추출될 확률과 기추출된 중요 문장과의 유사도를 기반으로 중요 문장들을 최적으로 선정함으로써, 자동 요약문의 품질을 높일 수 있게 된다.As described above, according to the embodiments of the present invention, the quality of an automatic summary sentence can be improved by optimally selecting important sentences based on the probability of being extracted as an important sentence and the degree of similarity with previously extracted important sentences.

또한, 본 발명의 실시예들에 따르면, 추출된 주요 문장들을 그대로 사용하지 않고 추상 요약함으로써, 불필요한 내용을 뺀 중요 내용만이 요약문에 반영되어 가독성이 높은 고품질의 요약문을 생성할 수 있게 된다.In addition, according to the embodiments of the present invention, by abstracting the extracted main sentences instead of using them as they are, only the important contents minus unnecessary contents are reflected in the summary, so that a high-quality summary with high readability can be generated.

도 1은 본 발명의 일 실시예에 따른 문서 요약 생성 시스템의 블럭도,
도 2는, 도 1에 도시된 문서 전처리부의 상세 구조를 도시한 블럭도,
도 3은, 도 1에 도시된 중요 문장 추출부의 상세 구조를 도시한 블럭도,
도 4는, 도 1에 도시된 문서 요약 생성부의 상세 구조를 도시한 블럭도, 그리고,
도 5는 본 발명의 다른 실시예에 따른 문서 요약 생성 방법의 설명에 제공되는 흐름도이다.1 is a block diagram of a document summary generating system according to one embodiment of the present invention;
2 is a block diagram showing the detailed structure of the document pre-processing unit shown in FIG. 1;
3 is a block diagram showing the detailed structure of the important sentence extraction unit shown in FIG. 1;
4 is a block diagram showing the detailed structure of the document summary generating unit shown in FIG. 1, and
5 is a flowchart provided to explain a method for generating a document summary according to another embodiment of the present invention.

이하에서는 도면을 참조하여 본 발명을 보다 상세하게 설명한다.Hereinafter, the present invention will be described in more detail with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 문서 요약 생성 시스템의 블럭도이다. 본 발명의 실시예에 따른 문서 요약 생성 시스템은, 딥러닝 기술을 활용하여, 문서에서 중요 문장들을 추출하고, 추출한 중요 문장들을 이용하여 요약문을 생성하는 시스템이다.1 is a block diagram of a document summary generating system according to one embodiment of the present invention. A document summary generation system according to an embodiment of the present invention is a system that extracts important sentences from a document using deep learning technology and generates a summary using the extracted important sentences.

본 발명의 실시예에 따른 문서 요약 생성 시스템은, 도 1에 도시된 바와 같이, 문서 전처리부(110), 중요 문장 추출부(120) 및 문서 요약 생성부(130)를 포함하여 구성한다.As shown in FIG. 1 , a document summary generation system according to an embodiment of the present invention includes a document pre-processing unit 110, an important sentence extraction unit 120, and a document summary generation unit 130.

문서 전처리부(110)는, 입력 문서에 포함된 문장들을 전처리하는 수단으로, 도 2에 도시된 바와 같이, 형태소 분석 모듈(111) 및 BPE(Byte-Pair Encoding) 모듈(112)을 포함하여 구성된다. 도 2는 문서 전처리부(110)의 상세 구조를 도시한 블럭도이다.The document pre-processing unit 110 is a means for pre-processing sentences included in an input document, and as shown in FIG. 2 , it includes a morpheme analysis module 111 and a byte-pair encoding (BPE) module 112 do. 2 is a block diagram showing the detailed structure of the document pre-processing unit 110. As shown in FIG.

형태소 분석 모듈(111)은 입력 문서를 문장 단위로 구분하고, 구분된 문장들 각각에 대해 형태소 분석과 품사 태깅을 수행하여, BPE 모듈(112)로 전달한다. 후술할 딥러닝 과정에서 각 문장의 의미를 보다 정확하게 이해할 수 있도록 하기 위함이다.The morpheme analysis module 111 divides the input document into sentence units, performs morpheme analysis and part-of-speech tagging on each of the divided sentences, and transmits the result to the BPE module 112 . This is to enable a more accurate understanding of the meaning of each sentence in the deep learning process, which will be described later.

BPE 모듈(112)은 형태소 분석 모듈(111)에서 형태소 분석된 문장들에 대해 BPE를 수행한다. OOV(Out Of Vocabulary)의 발생 빈도를 줄이기 위함이다.The BPE module 112 performs BPE on sentences morphemically analyzed in the morpheme analysis module 111 . This is to reduce the occurrence frequency of OOV (Out Of Vocabulary).

다시 도 1을 참조하여 설명한다.It will be described with reference to FIG. 1 again.

중요 문장 추출부(120)는 딥러닝 기술과 MMR(Maximal Marginal Relevance)을 기반으로 문서에 포함된 문장들에서 중요 문장들을 추출한다. 중요 문장 추출은, '문장이 문서의 중요 문장으로 추출될 확률'(이하, '추출 확률'로 표기)과 다른 중요 문장과의 유사도를 기초로 수행되어 진다.The important sentence extraction unit 120 extracts important sentences from sentences included in the document based on deep learning technology and MMR (Maximal Marginal Relevance). Important sentence extraction is performed based on the 'probability that a sentence is extracted as an important sentence in a document' (hereinafter referred to as 'extraction probability') and similarity with other important sentences.

도 3은 중요 문장 추출부(120)의 상세 구조를 도시한 블럭도이다. 도시된 바와 같이, 중요 문장 추출부(120)는 BERT(Bi-directional Encoder Representations from Transformers)들(121-1,121-2,121-3), Bi-LSTM(Bi-directional - Long Short Term Memory) 네트워크들(122-1,122-2,122-3) 및 중요 문장 선정부(123)를 포함하여 구성된다.3 is a block diagram showing the detailed structure of the important sentence extraction unit 120. Referring to FIG. As shown, the key sentence extractor 120 includes BERTs (Bi-directional Encoder Representations from Transformers) 121-1, 121-2, and 121-3, Bi-LSTM (Bi-directional - Long Short Term Memory) networks ( 122-1, 122-2, 122-3) and an important sentence selection unit 123.

BERT들(121-1,121-2,121-3)은 문서 전처리부(110)에서 전처리된 문장들을 입력받아, 자연어 처리를 위해 언어 표현을 양방향으로 사전 학습하는 모듈들이다. 본 발명의 실시예에서는, BERT들(121-1,121-2,121-3)에서 생성하는 CLS(CLaSs) 토큰을 문장 표현(Sentence Representation)으로 활용하며, 이를 수식으로 나타내면 다음과 같다.The BERTs 121-1, 121-2, and 121-3 are modules that receive sentences preprocessed by the document preprocessor 110 and bidirectionally learn language expressions for natural language processing. In the embodiment of the present invention, the CLS (CLaSs) token generated by the BERTs 121-1, 121-2, and 121-3 is used as a sentence representation (Sentence Representation), and this is expressed as a formula as follows.

C_i=BERT(S_i)C _i =BERT(S _i )

여기서, S_i는 문서의 i번째 문장이고, C_i는 BERT에 의해 생성된 S_i의 CLS 토큰이다.Here, S _i is the ith sentence of the document, and C _i is the CLS token of S _i generated by BERT.

Bi-LSTM 네트워크들(122-1,122-2,122-3)은 문서를 구성하는 문장들에서 중요 문장들을 추출하도록 학습된 네트워크들로, 문서 전처리부(110)에서 전처리된 문장들의 추출 확률들을 획득한다. 문장 S_i의 추출 확률 h_i는 다음의 수학식으로 나타낼 수 있다.The Bi-LSTM networks 122-1, 122-2, and 122-3 are networks trained to extract important sentences from sentences constituting a document, and obtain extraction probabilities of sentences preprocessed in the document preprocessor 110. The extraction probability h _i of the sentence S _i can be expressed by the following equation.

문장을 전후 문장에 대한 정보를 참조하여 연관 분석한다는 점에서, Bi-LSTM 네트워크들(122-1,122-2,122-3)은 문장들을 개별 분석하는 BERT들(121-1,121-2,121-3)과 차이가 있다.Bi-LSTM networks (122-1, 122-2, and 122-3) differ from BERTs (121-1, 121-2, and 121-3) that individually analyze sentences in that they perform association analysis by referring to information about sentences before and after them. have.

중요 문장 선정부(123)는 문서를 구성하는 문장들에서 중요 문장들을 추출한다. 이를 위해, 중요 문장 선정부(123)는 문장들의 스코어들을 계산하고, 계산된 스코어가 임계치를 초과하는 문장들을 중요 문장들로 선정한다.The important sentence selection unit 123 extracts important sentences from the sentences constituting the document. To this end, the important sentence selector 123 calculates scores of sentences and selects sentences whose calculated scores exceed a threshold as important sentences.

문장의 스코어는, 문장의 추출 확률에 비례하고, 문장과 기선정된 중요 문장들 간의 유사도들 중 최대 유사도에 반비례한다. 기선정된 중요 문장과 유사도가 높은 문장, 즉, 기선정된 중요 문장과 내용이 중복하는 문장이 선정되는 것을 회피하기 위함이다. 구체적인 계산식은 다음과 같다.The score of the sentence is proportional to the sentence extraction probability and inversely proportional to the maximum similarity among the similarities between the sentence and pre-selected important sentences. This is to avoid selecting a sentence having a high similarity with the pre-selected important sentence, that is, a sentence overlapping in content with the pre-selected important sentence. The specific calculation formula is as follows.

g_i : i번째 문장의 스코어g _i : Score of the ith sentence

α : 0 < α < 1,α: 0 < α < 1;

h_i : i번째 문장의 추출 확률, Bi-LSTM 네트워크들(122-1,122-2,122-3)에 의해 획득됨h _i : Probability of extracting the ith sentence, obtained by Bi-LSTM networks (122-1, 122-2, 122-3)

E_k : 기선정된 k번째 중요 문장의 문장 표현,E _k : Sentence expression of the pre-selected k-th important sentence,

D ; 문서에 포함된 문장들D; Sentences included in the document

C_i : i번째 문장의 문장 표현, BERT들(121-1,121-2,121-3)에 의해 획득됨C _i : Sentence representation of the ith sentence, obtained by BERTs (121-1, 121-2, 121-3)

sim() : C_i와 E_k에 기반한 유사도 계산 함수sim() : similarity calculation function based on C _i and E _k

max[] : 최대값max[] : maximum value

α는 사용자에 의해 설정 가능한 가중치로, 클수록 추출 확률이 스코어에 더 큰 영향을 미치게 되고, 작을수록 최대 유사도가 스코어에 더 큰 영향을 미치게 된다.α is a weight that can be set by the user, and as it is larger, the extraction probability has a greater influence on the score, and as it is smaller, the maximum similarity has a greater influence on the score.

한편, 문장의 스코어를 계산함에 있어, 중요 문장 선정부(123)는 추출 확률이 높은 문장, 즉, h_i가 큰 문장부터 스코어를 계산한다.Meanwhile, in calculating the scores of sentences, the important sentence selector 123 calculates scores starting from sentences with a high extraction probability, that is, sentences with a large h _i .

또한, 중요 문장 선정부(123)은 중요 문장들의 개수가 정해진 개수를 초과하지 않을 때까지 또는 중요 문장들의 전체 길이(중요 문장들을 모두 나열하였을 때의 길이)가 정해진 길이를 초과하지 않을 때까지 중요 문장들을 추출한다.In addition, the important sentence selection unit 123 is important until the number of important sentences does not exceed the predetermined number or the total length of important sentences (the length when all important sentences are listed) does not exceed the predetermined length. extract sentences.

문서 요약 생성부(130)는 중요 문장 추출부(120)에 의해 추출된 중요 문장들을 이용하여, 딥러닝 기반으로 문서의 요약문을 생성한다. 문서의 요약문을 생성함에 있어, 문서 요약 생성부(130)는 추출된 중요 문장을 그대로 사용하지 않고, 불필요한 내용을 뺀 중요 내용만이 요약문에 반영되어 가독성이 높아질 수 있는 추상 요약(Abstractive Summarization)을 수행한다.The document summary generating unit 130 uses the important sentences extracted by the important sentence extracting unit 120 to generate a summary of the document based on deep learning. In generating the summary of the document, the document summary generation unit 130 does not use the extracted important sentences as they are, but only the important contents minus unnecessary contents are reflected in the summary to increase readability. carry out

도 4는 문서 요약 생성부(130)의 상세 구조를 도시한 블럭도이다. 도시된 바와 같이, 문서 요약 생성부(130)는 BERT 인코더(131), Attention 모듈(132) 및 GRU(Gated Recurrent Unit) 디코더(133)를 포함하여 구성된다.4 is a block diagram showing the detailed structure of the document summary generator 130. As shown, the document summary generator 130 includes a BERT encoder 131, an attention module 132, and a gated recurrent unit (GRU) decoder 133.

중요 문장 추출부(120)에 의해 추출된 중요 문장들이 하나로 결합되어 BERT 인코더(131)에 입력되면, BERT 인코더(131)의 인코딩 결과가 Attention 모듈(132)을 통해 GRU 디코더(133)에 입력되어 요약문이 생성되는 구조이다.When the important sentences extracted by the important sentence extractor 120 are combined into one and input to the BERT encoder 131, the encoding result of the BERT encoder 131 is input to the GRU decoder 133 through the attention module 132 This is the structure in which the summary is created.

요약문을 생성함에 있어, GRU 디코더(133)는 학습한 적 없는 단어를 요약문에 사용할 수 있도록 하기 위해, Copying Mechanism을 적용하여 출력에 필요한 단어를 입력에서 찾아 복사하여 사용할 수 있다.In generating the summary sentence, the GRU decoder 133 may apply a copying mechanism to find words necessary for output in the input, copy them, and use them in order to use unlearned words in the summary sentence.

또한, GRU 디코더(133)는 같은 단어가 반복적으로 생성되지 않도록, Coverage Mechanism을 적용하여, 사용된 단어의 Attention Distribution의 누적값을 Loss에 반영한다.In addition, the GRU decoder 133 applies a coverage mechanism so that the same word is not repeatedly generated, and reflects the accumulated value of the attention distribution of the used word to the loss.

그리고, GRU 디코더(133)는 요약문의 길이가 지나치게 길어지거나 짧게 생성되는 것을 방지하기 위해, Beam Search(Min/Max Length) 기법을 적용하여 요약문의 최대 길이와 최소 길이를 제한한다.In addition, the GRU decoder 133 applies a Beam Search (Min/Max Length) technique to limit the maximum and minimum lengths of the summary text in order to prevent the summary text from being too long or too short.

도 5는 본 발명의 다른 실시예에 따른 문서 요약 생성 방법의 설명에 제공되는 흐름도이다.5 is a flowchart provided to explain a method for generating a document summary according to another embodiment of the present invention.

문서 요약을 위해, 도 5에 도시된 바와 같이, 먼저, 문서 요약 생성 시스템의 형태소 분석 모듈(111)이 입력 문서를 문장 단위로 구분하여 형태소 분석과 품사 태깅을 수행한다(S210).For the document summary, as shown in FIG. 5 , first, the morpheme analysis module 111 of the document summary generation system divides the input document into sentence units and performs morpheme analysis and part-of-speech tagging (S210).

다음, BPE 모듈(112)은 S210단계에서 형태소 분석된 문장들에 대해 BPE를 수행한다(S220).Next, the BPE module 112 performs BPE on the sentences morphologically analyzed in step S210 (S220).

이후, BERT들(121-1,121-2,121-3)이 S210단계 및 S220단계를 통해 전처리된 문장들을 입력받아 생성하는 CLS 토큰을 문장 표현으로 획득한다(S230).Thereafter, the BERTs 121-1, 121-2, and 121-3 receive the preprocessed sentences through steps S210 and S220, and obtain generated CLS tokens as sentence expressions (S230).

그리고, Bi-LSTM 네트워크들(122-1,122-2,122-3)은 문서를 구성하는 문장들의 추출 확률들을 획득한다(S240).Then, the Bi-LSTM networks 122-1, 122-2, and 122-3 obtain extraction probabilities of sentences constituting the document (S240).

다음, 중요 문장 선정부(123)는 S240단계에서 획득된 추출 확률과 S230단계에서 획득된 문장 표현을 이용하여 문장들의 스코어들을 계산하면서, 계산된 스코어가 임계치를 초과하는 문장들을 중요 문장들로 추출한다(S250).Next, the important sentence selection unit 123 calculates scores of sentences using the extraction probability obtained in step S240 and the sentence expression obtained in step S230, and extracts sentences whose calculated scores exceed the threshold as important sentences. Do (S250).

S250단계에서 계산하는 문장의 스코어는, 문장의 추출 확률에 비례하고, 문장과 기선정된 중요 문장들 간의 유사도에 반비례함은 전술한 바 있다. 또한, S250단계에서는 추출 확률이 높은 문장부터 스코어를 계산하며, 중요 문장들의 개수가 정해진 개수를 초과하지 않을 때까지 또는 중요 문장들의 전체 길이가 정해진 길이를 초과하지 않을 때까지 중요 문장들을 추출한다.As described above, the sentence score calculated in step S250 is proportional to the sentence extraction probability and inversely proportional to the degree of similarity between the sentence and pre-selected important sentences. In addition, in step S250, scores are calculated from sentences with a high extraction probability, and important sentences are extracted until the number of important sentences does not exceed a predetermined number or the total length of important sentences does not exceed a predetermined length.

이후, 문서 요약 생성부(130)는 S250단계에서 추출된 중요 문장들을 하나로 결합하여 BERT 인코더(131)에 입력하고, 인코딩 결과를 Attention 모듈(132)을 통해 GRU 디코더(133)에 입력하여 문서의 요약문을 생성한다(S260).Thereafter, the document summary generating unit 130 combines the important sentences extracted in step S250 into one and inputs them to the BERT encoder 131, and inputs the encoding result to the GRU decoder 133 through the attention module 132 to generate the document summary. A summary sentence is generated (S260).

지금까지, 딥러닝 기반 문서 요약 생성 방법 및 시스템에 대해 바람직한 실시예들을 들어 상세히 설명하였다.So far, the deep learning-based document summary generation method and system have been described in detail with preferred embodiments.

한편, 본 실시예에 따른 장치와 방법의 기능을 수행하게 하는 컴퓨터 프로그램을 수록한 컴퓨터로 읽을 수 있는 기록매체에도 본 발명의 기술적 사상이 적용될 수 있음은 물론이다. 또한, 본 발명의 다양한 실시예에 따른 기술적 사상은 컴퓨터로 읽을 수 있는 기록매체에 기록된 컴퓨터로 읽을 수 있는 코드 형태로 구현될 수도 있다. 컴퓨터로 읽을 수 있는 기록매체는 컴퓨터에 의해 읽을 수 있고 데이터를 저장할 수 있는 어떤 데이터 저장 장치이더라도 가능하다. 예를 들어, 컴퓨터로 읽을 수 있는 기록매체는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광디스크, 하드 디스크 드라이브, 등이 될 수 있음은 물론이다. 또한, 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터로 읽을 수 있는 코드 또는 프로그램은 컴퓨터간에 연결된 네트워크를 통해 전송될 수도 있다.Meanwhile, it goes without saying that the technical idea of the present invention can also be applied to a computer-readable recording medium containing a computer program for performing the functions of the apparatus and method according to the present embodiment. In addition, technical ideas according to various embodiments of the present invention may be implemented in the form of computer readable codes recorded on a computer readable recording medium. The computer-readable recording medium may be any data storage device that can be read by a computer and store data. For example, the computer-readable recording medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, etc., of course. In addition, computer-readable codes or programs stored on a computer-readable recording medium may be transmitted through a network connected between computers.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In addition, although the preferred embodiments of the present invention have been shown and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the present invention belongs without departing from the gist of the present invention claimed in the claims. Of course, various modifications are possible by those skilled in the art, and these modifications should not be individually understood from the technical spirit or perspective of the present invention.

110 : 문서 전처리부
111 : 형태소 분석 모듈
112 : BPE(Byte-Pair Encoding) 모듈
120 : 중요 문장 추출부
121-1,121-2,121-3 : BERT(Bi-directional Encoder Representations from Transformers)
122-1,122-2,122-3 : Bi-LSTM(Bi-directional - Long Short Term Memory) 네트워크
123 : 중요 문장 선정부
130 : 문서 요약 생성부
131 : BERT 인코더
132 : Attention 모듈
133 : GRU(Gated Recurrent Unit) 디코더110: document pre-processing unit
111: morphological analysis module
112: BPE (Byte-Pair Encoding) module
120: important sentence extraction unit
121-1,121-2,121-3 : BERT (Bi-directional Encoder Representations from Transformers)
122-1,122-2,122-3 : Bi-LSTM (Bi-directional - Long Short Term Memory) network
123: important sentence selection unit
130: document summary generation unit
131: BERT encoder
132: Attention module
133: GRU (Gated Recurrent Unit) decoder

Claims

preprocessing, by a document summary generating system, sentences included in the document;
The document summary generation system analyzes the extraction probability, which is the probability that a sentence is extracted as an important sentence in a document based on a deep learning model, and extracts important sentences based on the extraction probabilities of the analyzed sentences and the similarities between the sentences. step;
The document summary generation system generates a summary of the document using the extracted important sentences based on deep learning; includes,
The extraction step is
calculating, by a document summary generating system, scores of sentences using similarities between sentences calculated from extraction probabilities of sentences and sentence expressions, and selecting sentences whose calculated scores exceed a threshold as important sentences; Including,
score,
A method for generating a document summary, characterized in that it is proportional to a sentence extraction probability and inversely proportional to a maximum similarity among similarities between a sentence and pre-selected important sentences.

The method of claim 1,
The preprocessing step is
morphologically analyzing the document by classifying the document into sentence units; and
A method for generating a document summary, comprising performing byte-pair encoding (BPE) on the morphologically analyzed sentences by a document summary generating system.

The method of claim 1,
The extraction step is
A first acquisition step of obtaining sentence representations of the sentences by individually analyzing the preprocessed sentences by the document summary generating system; and
The method of generating a document summary, characterized in that it further comprises; a second acquisition step in which the document summary generation system analyzes the preprocessed sentences in relation to the preceding and following sentences to obtain extraction probabilities.

The method of claim 3,
The first acquisition step is
A method for generating document summaries, characterized by acquiring sentence representations of sentences using a deep learning model pretrained for natural language processing.

The method of claim 4,
The first acquisition step is
A method for generating a document summary, characterized by acquiring CLS (CLaSs) tokens of output sentences as sentence representations of sentences when preprocessed sentences are input to BERT (Bi-directional Encoder Representations from Transformers).

The method of claim 3,
In the second acquisition step,
Generating a document summary characterized by obtaining probabilities of extracting sentences using a model composed of Bi-directional - Long Short Term Memory (Bi-LSTM) networks learned to extract important sentences from sentences constituting a document Way.

delete

The method of claim 1,
The selection stage is
A method for generating a document summary, characterized in that scores are calculated from sentences with a high probability of extraction.

The method of claim 8,
The selection stage is
Calculate the score of the sentence using the following equation,

Here, g _i is the score of the ith sentence, α is the weight, h _i is the extraction probability of the ith sentence, E _k is the sentence expression of the kth important sentence already selected, C _i is the sentence expression of the ith sentence, sim () is a similarity calculation function, and max[] is a maximum value.

The method of claim 8,
The selection stage is
A method for generating a document summary, characterized by selecting important sentences until the number of important sentences does not exceed a predetermined number or until the total length of the important sentences does not exceed a predetermined length.

The method of claim 1,
The creation step is
A method for generating a document summary, characterized by generating a summary in which only important contents are reflected in the extracted important sentences.

The method of claim 11,
The creation step is
The document summary generating system combines the extracted important sentences into one and inputs them to a BERT (Bi-directional Encoder Representations from Transformers) encoder;
A method for generating a document summary, comprising generating a summary by inputting the encoding result of the BERT encoder to a gated recurrent unit (GRU) decoder through an attention module, by a document summary generation system.

The method of claim 12,
The GRU decoder,
A method for generating a document summary, characterized in that in order to use words that have not been learned in a summary, words required for output are found in an input and copied and used.

The method of claim 12,
The GRU decoder,
A method for generating a document summary characterized in that the cumulative value of the attention distribution of used words is reflected in the loss.

The method of claim 12,
The GRU decoder,
A method for generating a document summary, characterized in that the maximum length and minimum length of the summary are limited.

a pre-processing unit that pre-processes sentences included in the document;
an extraction unit that analyzes an extraction probability, which is a probability that a sentence is extracted as an important sentence of a document based on a deep learning model, and extracts important sentences based on similarities between the analyzed sentences and extraction probabilities of the analyzed sentences;
Based on deep learning, a generator for generating a summary of the document using the extracted important sentences; includes,
extraction unit,
Scores of sentences are calculated using similarities between sentences calculated from extraction probabilities of sentences and sentence expressions, and sentences whose calculated scores exceed a threshold are selected as important sentences;
score,
Document summary generation system that is proportional to the sentence extraction probability and inversely proportional to the maximum similarity among the similarities between the sentence and pre-selected important sentences.