KR20220099690A

KR20220099690A - Apparatus, method and computer program for summarizing document

Info

Publication number: KR20220099690A
Application number: KR1020210001891A
Authority: KR
Inventors: 류휘정
Original assignee: 주식회사 케이티
Priority date: 2021-01-07
Filing date: 2021-01-07
Publication date: 2022-07-14

Abstract

A device for summarizing a document comprises: a registration part that receives a document summary rule; an input part that receives a summary target document; a summary candidate sentence extraction part that extracts at least one summary candidate sentence using the document summary rule for the inputted summary target document; an order adjustment part that calculates a score based on an order adjustment criterion for the extracted at least one summary candidate sentence, and adjusts the order of the at least one summary candidate sentence based on the calculated score; and a summary result providing part that provides a summary result for the document to be summarized based on the adjusted order of the at least one summary candidate sentence.

Description

APPARATUS, METHOD AND COMPUTER PROGRAM FOR SUMMARIZING DOCUMENT

본 발명은 문서를 요약하는 장치, 방법 및 컴퓨터 프로그램에 관한 것이다. The present invention relates to an apparatus, method and computer program for summarizing documents.

자동 요약 기술이란 원본 데이터의 요점을 축약한 요약 데이터를 생성하기 위해 소프트웨어를 활용하여 원본 데이터의 길이를 줄이는 것을 의미한다. 이러한 자동 요약 기술은 문서 요약, 이미지 컬렉션 요약, 동영상 요약 등에 주로 이용되고 있다. 이 중 문서 요약 기술은 정보를 가장 많이 포함한 문장을 탐색하여 전체 문서를 대표할 수 있는 요약문 또는 개요를 생성하는 작업을 의미한다. Automatic summarization technology refers to reducing the length of the original data by utilizing software to generate summary data that abbreviates the main points of the original data. This automatic summary technology is mainly used for document summary, image collection summary, video summary, and the like. Among them, the document summary technology refers to the operation of generating a summary or outline that can represent the entire document by searching for the sentence containing the most information.

이러한 문서 요약 기술과 관련하여, 선행기술인 한국등록특허 제 10-1508260호는 문서 특징을 반영하는 요약문 생성 장치 및 방법을 개시하고 있다. In relation to such document summary technology, Korean Patent Registration No. 10-1508260, which is a prior art, discloses an apparatus and method for generating a summary text reflecting document characteristics.

문서 요약 기술을 이용하기 위해서는 원본 문서로부터 문장을 추출하기 위한 지식 구축 작업이 요구된다. 이는, 해당 분야의 지식을 가진 사람에 의해 지식이 선별되고, 선별된 지식에 기초하여 문장을 추출하기 위한 지식 구축 작업이 수행되어야 함에 따라 많은 시간이 소요된다는 단점을 가지고 있다. In order to use the document summary technology, knowledge building work for extracting sentences from the original document is required. This has a disadvantage in that it takes a lot of time as knowledge is selected by a person having knowledge in the relevant field, and a knowledge building operation for extracting a sentence based on the selected knowledge must be performed.

또한, 원본 문서로부터 일정 부분을 그대로 추출하여 문서를 요약하고자 하는 경우, 실제 사용자가 원하는 요약 내용과는 차이가 존재한다는 단점을 가지고 있다. In addition, when a document is to be summarized by extracting a certain part from the original document as it is, there is a disadvantage in that there is a difference from the summary content desired by the actual user.

문서 요약 규칙을 등록받고, 요약 대상 문서를 입력받아 입력된 요약 대상 문서에 대해 문서 요약 규칙을 이용하여 적어도 하나의 요약 후보 문장을 추출하는 장치, 방법 및 컴퓨터 프로그램을 제공하고자 한다. An object of the present invention is to provide an apparatus, method, and computer program for registering a document summary rule, receiving a summary target document, and extracting at least one summary candidate sentence using the document summary rule for the input summary target document.

적어도 하나의 요약 후보 문장에 대해 순서 조정 기준에 기초하여 점수를 산출하고, 산출된 점수에 기초하여 적어도 하나의 요약 후보 문장의 순서를 조정하고, 조정된 적어도 하나의 요약 후보 문장의 순서에 기초하여 요약 대상 문서에 대한 요약 결과를 제공하는 장치, 방법 및 컴퓨터 프로그램을 제공하고자 한다. Calculating a score for the at least one summary candidate sentence based on the order adjustment criterion, adjusting the order of the at least one summary candidate sentence based on the calculated score, and based on the adjusted order of the at least one summary candidate sentence SUMMARY It is an object of the present invention to provide an apparatus, method, and computer program for providing summary results for a document to be summarized.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. However, the technical problems to be achieved by the present embodiment are not limited to the technical problems described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 수단으로서, 본 발명의 일 실시예는, 문서 요약 규칙을 등록받는 등록부, 요약 대상 문서를 입력받는 입력부, 상기 입력된 요약 대상 문서에 대해 상기 문서 요약 규칙을 이용하여 적어도 하나의 요약 후보 문장을 추출하는 요약 후보 문장 추출부, 상기 추출된 적어도 하나의 요약 후보 문장에 대해 순서 조정 기준에 기초하여 점수를 산출하고, 상기 산출된 점수에 기초하여 상기 적어도 하나의 요약 후보 문장의 순서를 조정하는 순서 조정부 및 상기 조정된 적어도 하나의 요약 후보 문장의 순서에 기초하여 상기 요약 대상 문서에 대한 요약 결과를 제공하는 요약 결과 제공부를 포함하는 문서 요약 장치를 제공할 수 있다. As a means for achieving the above-described technical problem, an embodiment of the present invention provides a register for registering a document summary rule, an input unit for receiving a summary target document, and using the document summary rule for the input summary target document. A summary candidate sentence extraction unit for extracting at least one summary candidate sentence, calculates a score for the extracted at least one summary candidate sentence based on an order adjustment criterion, and based on the calculated score, the at least one summary candidate sentence A document summarizing apparatus may be provided, comprising: an order adjusting unit for adjusting the order of sentences; and a summary result providing unit for providing a summary result for the summary target document based on the adjusted order of the at least one summary candidate sentence.

본 발명의 다른 실시예는, 문서 요약 규칙을 등록받는 단계, 요약 대상 문서를 입력받는 단계, 상기 입력된 요약 대상 문서에 대해 상기 문서 요약 규칙을 이용하여 적어도 하나의 요약 후보 문장을 추출하는 단계, 상기 추출된 적어도 하나의 요약 후보 문장에 대해 순서 조정 기준에 기초하여 점수를 산출하는 단계, 상기 산출된 점수에 기초하여 상기 적어도 하나의 요약 후보 문장의 순서를 조정하는 단계 및 상기 조정된 적어도 하나의 요약 후보 문장의 순서에 기초하여 상기 요약 대상 문서에 대한 요약 결과를 제공하는 단계를 포함하는 문서 요약 방법을 제공할 수 있다. Another embodiment of the present invention includes the steps of: registering a document summary rule; receiving a summary target document; extracting at least one summary candidate sentence from the input summary target document by using the document summary rule; calculating a score for the extracted at least one summary candidate sentence based on an order adjustment criterion, adjusting the order of the at least one summary candidate sentence based on the calculated score, and the adjusted at least one The method may include providing a summary result for the summary target document based on the order of summary candidate sentences.

본 발명의 또 다른 실시예는, 컴퓨터 프로그램은 컴퓨팅 장치에 의해 실행될 경우, 문서 요약 규칙을 등록받고, 요약 대상 문서를 입력받고, 상기 입력된 요약 대상 문서에 대해 상기 문서 요약 규칙을 이용하여 적어도 하나의 요약 후보 문장을 추출하고, 상기 추출된 적어도 하나의 요약 후보 문장에 대해 순서 조정 기준에 기초하여 점수를 산출하고, 상기 산출된 점수에 기초하여 상기 적어도 하나의 요약 후보 문장의 순서를 조정하고, 상기 조정된 적어도 하나의 요약 후보 문장의 순서에 기초하여 상기 요약 대상 문서에 대한 요약 결과를 제공하도록 하는 명령어들의 시퀀스를 포함하는 매체에 저장된 컴퓨터 프로그램을 제공할 수 있다. Another embodiment of the present invention provides that, when the computer program is executed by a computing device, a document summary rule is registered, a summary target document is input, and at least one document summary rule is used for the input summary target document. extracting summary candidate sentences of , calculating a score for the extracted at least one summary candidate sentence based on an order adjustment criterion, and adjusting the order of the at least one summary candidate sentence based on the calculated score; A computer program stored in a medium including a sequence of instructions for providing a summary result for the summary target document based on the adjusted order of the at least one summary candidate sentence may be provided.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary, and should not be construed as limiting the present invention. In addition to the exemplary embodiments described above, there may be additional embodiments described in the drawings and detailed description.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 문서 요약 규칙을 이용하여 자동으로 요약 대상 문서로부터 요약 결과를 도출할 수 있도록 하는 장치, 방법 및 컴퓨터 프로그램을 제공할 수 있다. According to any one of the above-described problem solving means of the present invention, it is possible to provide an apparatus, a method, and a computer program for automatically deriving a summary result from a summary target document using a document summary rule.

문서 요약 규칙을 이용하여 요약 대상 문서로부터 적어도 하나의 요약 후보 문장을 추출하고, 추출된 요약 후보 문장에 대한 순서를 조정하여 요약 결과를 도출하는 장치, 방법 및 컴퓨터 프로그램을 제공할 수 있다.An apparatus, method, and computer program may be provided for extracting at least one summary candidate sentence from a summary target document using a document summary rule and deriving a summary result by adjusting the order of the extracted summary candidate sentences.

요약 대상 문서에 대해 요약 후보 문장을 추출하기 이전에 전처리 과정을 수행하고, 요약 결과를 제공하기 이전에 후처리 과정을 수행함으로써, 요약 결과의 품질을 향상시키는 장치, 방법 및 컴퓨터 프로그램을 제공할 수 있다.It is possible to provide an apparatus, method, and computer program for improving the quality of a summary result by performing a pre-processing process on a document to be summarized before extracting summary candidate sentences and performing a post-processing process before providing a summary result. have.

도 1은 본 발명의 일 실시예에 따른 문서 요약 장치의 구성도이다.
도 2는 본 발명의 일 실시예에 따른 문서 요약 규칙을 도시한 예시적인 도면이다.
도 3은 본 발명의 일 실시예에 따른 서비스 로그를 분석하여 요약 결과의 오류 상태에 따라 문서 요약 규칙을 업데이트하는 방법의 순서도이다.
도 4는 본 발명의 일 실시예에 따른 문서 요약 과정을 설명하기 위한 예시적인 도면이다.
도 5는 본 발명의 일 실시예에 따른 문서 요약 장치에서 문서를 요약하는 방법의 순서도이다. 1 is a block diagram of a document summarizing apparatus according to an embodiment of the present invention.
2 is an exemplary diagram illustrating a document summary rule according to an embodiment of the present invention.
3 is a flowchart of a method of updating a document summary rule according to an error state of a summary result by analyzing a service log according to an embodiment of the present invention.
4 is an exemplary diagram for explaining a document summary process according to an embodiment of the present invention.
5 is a flowchart of a method of summarizing a document in a document summarizing apparatus according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. However, the present invention may be embodied in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. Throughout the specification, when a part is "connected" with another part, this includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element interposed therebetween. . Also, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated, and one or more other features However, it is to be understood that the existence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded in advance.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다.In this specification, a "part" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. In addition, one unit may be implemented using two or more hardware, and two or more units may be implemented by one hardware.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다.Some of the operations or functions described as being performed by the terminal or device in this specification may be instead performed by a server connected to the terminal or device. Similarly, some of the operations or functions described as being performed by the server may also be performed in a terminal or device connected to the corresponding server.

이하 첨부된 도면을 참고하여 본 발명의 일 실시예를 상세히 설명하기로 한다. Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 문서 요약 장치의 구성도이다. 도 1을 참조하면, 문서 요약 장치(100)는 등록부(110), 입력부(120), 전처리 수행부(130), 요약 후보 문장 추출부(140), 순서 조정부(150), 후처리 수행부(160), 요약 결과 제공부(170) 및 기록부(180)를 포함할 수 있다. 1 is a block diagram of a document summarizing apparatus according to an embodiment of the present invention. Referring to FIG. 1 , the document summary apparatus 100 includes a registration unit 110 , an input unit 120 , a pre-processing unit 130 , a summary candidate sentence extraction unit 140 , an order adjustment unit 150 , and a post-processing unit ( 160 ), a summary result providing unit 170 , and a recording unit 180 .

등록부(110)는 문서 요약 규칙을 등록받을 수 있다. 예를 들어, 등록부(110)는 관리자에 의해 요약 대상 문서에서 추출하고자 하는 내용이 정의된 문서 요약 규칙을 등록받을 수 있다. 여기서, 문서 요약 규칙은 다수의 요약 대상 문서와 요약하고자 하는 내용 또는 문장이 쌍(pair)으로 구성됨으로써 규칙화될 수 있다. 문서 요약 규칙은 요약 대상 문서로부터 정확도 높은 요약 결과를 획득하기 위해서는 충분한 양의 규칙이 정의되어야 하며, 요약 대상 문서가 변화됨에 따라 문서 요약 규칙 또한 지속적으로 관리 및 보완되어야 한다. 문서 요약 규칙에 대해서는 도 2를 통해 상세히 설명하도록 한다. The registration unit 110 may receive a document summary rule registered. For example, the registration unit 110 may receive a document summary rule in which content to be extracted from the summary target document is defined by the administrator. Here, the document summary rule may be regularized by forming a pair of a plurality of summary target documents and content or sentences to be summarized. In order to obtain a high-accuracy summary result from the summary target document, a sufficient amount of rules should be defined for the document summary rule, and as the summary target document changes, the document summary rule should also be continuously managed and supplemented. The document summary rule will be described in detail with reference to FIG. 2 .

도 2는 본 발명의 일 실시예에 따른 문서 요약 규칙을 도시한 예시적인 도면이다. 도 2를 참조하면, 문서 요약 규칙은 추출 규칙(200), 교체 규칙(210), 삭제 규칙(220), 불용어 규칙(230) 등을 포함할 수 있다. 2 is an exemplary diagram illustrating a document summary rule according to an embodiment of the present invention. Referring to FIG. 2 , the document summary rule may include an extraction rule 200 , a replacement rule 210 , a deletion rule 220 , a stopword rule 230 , and the like.

추출 규칙(200)은 요약 대상 문서로부터 중요 문장을 추출하는 규칙을 의미할 수 있다. 여기서, 추출 규칙(200)은 정규표현식(Regular Expression), 문장 내 특정 키워드의 포함 여부, 파싱 트리(Parsing Tree)와 같은 다양한 형태로 정의될 수 있으며, 각 문장 또는 요약 대상 문서 내의 특정 문장과의 비교를 통해 부합되는 문장을 추출할 수 있다. 예를 들어, 추출 규칙(200)이 [요금제*변경*할라|할려|하려]로 구성된 경우, 추출 규칙(200)을 이용하여 요약 대상 문서로부터 "요금제 좀 변경을 할라 하는데요"라는 내용이 추출될 수 있다. 이 규칙 내 '*' 기호는 단어의 전후 관계, '|' 기호는 단어 중 어느 하나라도 표현됨을 뜻하며 이를 해석하면 '요금제' 단어 이후 '변경' 단어 이후 '할라' 또는 '할려' 또는 '하려' 표현이 나타내는 문장임을 의미할 수 있다.The extraction rule 200 may refer to a rule for extracting important sentences from the summary target document. Here, the extraction rule 200 may be defined in various forms such as a regular expression, whether or not a specific keyword is included in a sentence, and a parsing tree, and each sentence or a specific sentence in the summary target document is Through comparison, matching sentences can be extracted. For example, if the extraction rule 200 is configured with [rate plan*change*hala|hallye| can The '*' symbol within this rule indicates the context of a word, '|' The symbol means that any one of the words is expressed, and interpretation of this may mean that the word 'rate plan', 'change', and then 'hala' or 'halyeo' or 'hate' indicates a sentence.

삭제 규칙(210)은 요약 대상 문서의 전처리 과정 중 불필요한 문장을 삭제하는 규칙 및 추출된 요약 후보 문장 중 불필요한 부분(2어절 이상의 단어)을 삭제하는 규칙을 의미할 수 있다. 여기서, 삭제 규칙(210)은 정규표현식(Regular Expression) 등으로 정의될 수 있으며, 요약 대상 문서에 대한 전처리 과정 또는 요약 후보 문장에 대한 후처리 과정에서 이용될 수 있다. 예를 들어, 삭제 규칙(210)이 [추가*문의사항*없습니까]로 구성된 경우, 삭제 규칙(210)을 이용하여 요약 대상 문서에 포함된 "네 해지 신청 완료되었구요 추가 문의사항 없습니까"라는 문장에 대해 '추가 문의사항 없습니까'라는 문장이 삭제되도록 할 수 있다. The deletion rule 210 may refer to a rule for deleting unnecessary sentences during preprocessing of the summary target document and a rule for deleting unnecessary parts (words of two or more words) from the extracted summary candidate sentences. Here, the deletion rule 210 may be defined as a regular expression or the like, and may be used in a pre-processing process for a summary target document or a post-processing process for a summary candidate sentence. For example, if the deletion rule 210 is configured as [Are there any additional *questions*?], in the sentence "Yes, the cancellation application has been completed and there are no additional questions" included in the document to be summarized using the deletion rule 210 You can make the sentence 'Are there any further inquiries' to be deleted?

교체 규칙(220)은 요약 대상 문서로부터 추출된 문장을 매칭하여 더 간결한 요약 후보 문장으로 변환하는 규칙을 의미할 수 있다. 여기서, 교체 규칙(220)은 정규표현식(Regular Expression) 등의 표현과 일치하는 문장이 사용자 친화적인 문장 또는 어구로 변경되도록 정의될 수 있다. 예를 들어, 교체 규칙(220)이 ([링고*서비*할려|할라|하려|신청할|신청하|가입할|가입하], 링고 서비스 문의)로 구성된 경우, 교체 규칙(220)을 이용하여 요약 대상 문서에 포함된 "링고 서비스 신청할려고요"라는 문장에 대해 '링고 서비스 문의'라는 문장으로 대체되도록 할 수 있다. The replacement rule 220 may refer to a rule for matching sentences extracted from the summary target document and converting the sentences into more concise summary candidate sentences. Here, the replacement rule 220 may be defined such that a sentence matching an expression such as a regular expression is changed to a user-friendly sentence or phrase. For example, if the replacement rule 220 is configured as ([Lingo*Serve*Halyeo|Hala|Halla|go|apply|apply|subscribe|subscribe], contact lingo service), using the replacement rule 220 The sentence "I want to apply for Ringo service" included in the document to be summarized may be replaced with the sentence "Inquiry for Ringo service".

불용어 규칙(230)은 요약 대상 문서 내 불필요한 부분(1어절의 단어)을 삭제하는 규칙을 의미할 수 있다. 여기서, 불용어 규칙(230)은 특정 어절 또는 단음절 표현형으로 정의될 수 있으며, 요약 대상 문서에 대한 전처리 과정에서 주로 이용될 수 있다. 예를 들어, 불용어 규칙(230)이 '여보세요'로 구성된 경우, 불용어 규칙(230)을 이용하여 요약 대상 문서에 포함된 "여보세요 해지 신청 하고 싶은데요"라는 문장에 대해 '여보세요'라는 불용어가 제외되도록 할 수 있다. The stop-word rule 230 may mean a rule for deleting unnecessary parts (words in one word) in the summary target document. Here, the stopword rule 230 may be defined as a specific word or monosyllabic expression, and may be mainly used in a preprocessing process for a document to be summarized. For example, if the stopword rule 230 consists of 'hello', the stopword rule 230 is used to respond to the sentence "Hello, I want to cancel the cancellation" included in the summary target document using the stopword 'hello'. may be excluded.

이하에서는, 문서 요약 규칙에 대한 특징을 설명하도록 한다. Hereinafter, the characteristics of the document summary rule will be described.

첫째, 문서 요약 규칙은 문서 요약 규칙을 구성하는 각각의 성분이 트리 구조로 구성될 수 있다. 관리자가 문서 요약 규칙을 정의하여 등록한 경우, 문서 요약 규칙은 상호 포함 관계에 따라 상위 규칙 및 하위 규칙으로 분리될 수 있다. 예를 들어, [요금제*변경*하려고|할려고|하고싶]으로 정의된 A 문서 요약 규칙이 존재한다고 가정하자. 여기서, '*'는 키워드의 전화 관계, '|'는 'OR'을 의미할 수 있다. A 문서 요약 규칙의 파생 규칙에 해당하는 [인터넷*요금제*변경*하려고|할려고|하고싶]이라는 B 문서 요약 규칙이 존재하는 경우, A 문서 요약 규칙은 상위 규칙으로 분리되고, '인터넷'이라는 키워드가 존재해야 하는 B 문서 요약 규칙은 하위 규칙으로 분리될 수 있다. 또한, [모바일*요금제*변경*하려고|할려고|하고싶]이라는 C 문서 요약 규칙이 존재하는 경우, A 문서 요약 규칙은 상위 규칙으로 분리되고, '모바일'이라는 키워드가 존재해야 실행되는 C 문서 요약 규칙은 하위 규칙으로 분리될 수 있다.First, in the document summary rule, each component constituting the document summary rule may be configured in a tree structure. When an administrator defines and registers a document summary rule, the document summary rule may be divided into a parent rule and a lower rule according to the mutual inclusion relationship. For example, suppose that there is a document summary rule A defined as [Plan*Change*Try|Try|Want to|Want to]. Here, '*' may mean a phone relationship of keywords, and '|' may mean 'OR'. If there exists a document summary rule B called [Internet*plan*change*try|try|want] that corresponds to the derivation rule of the document summary rule A, the document summary rule A is separated into a parent rule, and the keyword 'Internet' is A document summary rule in which B must exist can be divided into sub-rule. Also, if there is a C document summary rule of [mobile*plan*change*try|try|want to], the A document summary rule is separated into a parent rule, and the C document summary is executed only when the keyword 'mobile' exists. Rules can be divided into sub-rules.

이와 같이, A 문서 요약 규칙은 B 문서 요약 규칙 및 C 문서 요약 규칙을 포함하면서도, B 문서 요약 규칙 및 C 문서 요약 규칙에는 포함되지 않는 잠재적인 규칙인 [핸드폰*요금제*변경*하려고|할려고|하고싶]을 포함할 수도 있다. As such, while document A summary rule contains document B summary rule and document summary rule C, a potential rule that is not included in document summary rule B and document summary rule C: want] may be included.

이러한 문서 요약 규칙은 문서 요약 규칙의 추가, 수정, 삭제와 같은 업데이트가 진행될 경우, 트리 구조가 재정리될 수 있다. The tree structure of these document summary rules may be rearranged when an update such as addition, modification, or deletion of the document summary rule is performed.

둘째, 문서 요약 규칙은 각각 가중치가 부여될 수 있다. 여기서, 가중치는 전체 요약 대상 문서의 많은 부분에서 문서 요약 규칙과 부합하는 문장이 존재하는 경우, 어떤 문장을 요약 결과로 제공할지 결정하기 위해 이용될 수 있다. 예를 들어, 요약 대상 문서에 [상담사*연결*부탁|요청]이라는 A 문서 요약 규칙과 [요금제*변경*하려고|할려고|하고싶]이라는 B 문서 요약 규칙에 부합하는 문장이 모두 포함되고, A 문서 요약 규칙의 가중치가 '10'이고, B 문서 요약 규칙의 가중치가 '20'인 경우, 가중치에 기초하여 B 문서 요약 규칙에 부합하는 문장이 선택되도록 하여 요약 결과로 제공되도록 할 수 있다. 따라서, 반드시 추출되어야 하는 중요한 문장에 부합하는 문서 요약 규칙에는 높은 가중치가 부여되고, 덜 중요한 문장에 부합하는 문서 요약 규칙에는 상대적으로 낮은 가중치가 부여될 수 있다. Second, each document summary rule may be weighted. Here, the weight may be used to determine which sentence to provide as a summary result when a sentence matching the document summary rule exists in a large portion of the entire summary target document. For example, the document to be summarized contains both sentences that meet the document summarization rule A for [agent*connect*request|request] and the document summary rule B for [plan*change*would|will|want to], and A When the weight of the document summary rule is '10' and the weight of the document summary rule B is '20', a sentence matching the document summary rule B may be selected based on the weight and provided as a summary result. Accordingly, a high weight may be given to a document summary rule corresponding to an important sentence that must be extracted, and a relatively low weight may be given to a document summary rule corresponding to a less important sentence.

다시 도 1로 돌아와서, 입력부(120)는 요약 대상 문서를 입력받을 수 있다. 요약 대상 문서는 예를 들어, 신문기사, 콜 센터의 상담 대화록 등 다양한 문서를 포함할 수 있다. Returning to FIG. 1 again, the input unit 120 may receive a summary target document. The document to be summarized may include, for example, various documents such as a newspaper article and a conversation log of a call center.

전처리 수행부(130)는 문서 요약 규칙 중 적어도 하나를 이용하여 요약 대상 문서로부터 불완전하게 종료된 문장 제거, 불용어만으로 구성된 문장 제거, 문장 내 삭제 규칙에 해당하는 부분이 전체 문장 길이 대비 소정 비율 이상인 문장 제거 등을 수행할 수 있다. 이외에도, 전처리 수행부(130)는 통상적으로 수행되는 문장의 전제 작업을 수행할 수 있다. The preprocessing unit 130 removes incompletely terminated sentences from the summary target document using at least one of the document summarization rules, removes sentences composed of only stopwords, and sentences in which the portion corresponding to the in-sentence deletion rule is greater than or equal to a predetermined ratio of the total sentence length. removal, etc. may be performed. In addition, the preprocessing unit 130 may perform a premise task of a normally performed sentence.

요약 후보 문장 추출부(140)는 입력된 요약 대상 문서에 대해 문서 요약 규칙을 이용하여 적어도 하나의 요약 후보 문장을 추출할 수 있다. 여기서, 요약 후보 문장 추출부(140)는 요약 대상 문서를 구성하는 복수의 문장 중 추출 규칙을 만족하는 문장을 요약 후보 문장으로 추출할 수 있다. 예를 들어, 요약 후보 문장 추출부(140)는 정규표현식으로 정의된 추출 규칙을 이용하여 요약 대상 문서 내의 문장들이 정규표현식을 만족하는지 여부를 판별하고, 다른 방식으로 정의된 추출 규칙을 이용하여 요약 대상 문서 내의 문장들이 해당 방식을 만족하는지 여부를 판별할 수 있다. 이 때, 요약 후보문장 추출부(140)는 추출 규칙을 만족하는지 여부뿐만 아니라, 추출 규칙을 어느 정도 만족하는지에 대한 값을 추출할 수도 있다. The summary candidate sentence extraction unit 140 may extract at least one summary candidate sentence from the input summary target document by using a document summary rule. Here, the summary candidate sentence extraction unit 140 may extract a sentence satisfying the extraction rule from among a plurality of sentences constituting the summary target document as a summary candidate sentence. For example, the summary candidate sentence extraction unit 140 determines whether sentences in the summary target document satisfy the regular expression using an extraction rule defined as a regular expression, and summarizes it using an extraction rule defined in another way. It may be determined whether sentences in the target document satisfy the corresponding method. In this case, the summary candidate sentence extraction unit 140 may extract a value for not only whether the extraction rule is satisfied, but also to what extent the extraction rule is satisfied.

순서 조정부(150)는 추출된 적어도 하나의 요약 후보 문장에 대해 순서 조정 기준에 기초하여 점수를 산출할 수 있다. 예를 들어, 순서 조정부(150)는 요약 대상 문서에서 각 요약 후보 문장의 위치, 문서 요약 규칙과 각 요약 후보 문장 간의 부합 정도, 문서 요약 규칙 중 각 요약 후보 문장과 부합되는 규칙의 수, 문서 요약 규칙마다 설정된 가중치값 중 적어도 하나와 관련된 순서 조정 기준에 기초하여 적어도 하나의 요약 후보 문장에 대한 각각의 점수를 산출할 수 있다. The order adjustment unit 150 may calculate a score for the extracted at least one summary candidate sentence based on the order adjustment criterion. For example, the order adjustment unit 150 may control the position of each summary candidate sentence in the summary target document, the degree of correspondence between the document summary rule and each summary candidate sentence, the number of rules matching each summary candidate sentence among the document summary rules, and the document summary Scores for at least one summary candidate sentence may be calculated based on an order adjustment criterion related to at least one of the weight values set for each rule.

순서 조정부(150)는 산출된 점수에 기초하여 적어도 하나의 요약 후보 문장의 순서를 조정할 수 있다. 여기서, 순서 조정부(150)는 수치의 평균 또는 가중치의 평균을 이용하여 적어도 하나의 요약 후보 문장의 순서를 조정할 수 있다. The order adjusting unit 150 may adjust the order of at least one summary candidate sentence based on the calculated score. Here, the order adjusting unit 150 may adjust the order of at least one summary candidate sentence by using the average of the numerical values or the average of the weights.

예를 들어, 신문기사, 콜 센터의 상담 대화록 등에 해당하는 요약 대상 문서의 경우, 요약 대상 문서를 구성하는 전체 텍스트 중 앞쪽에 중요 문장 및 표현들이 위치하므로, 순서 조정부(150)는 전체 텍스트의 길이 대비 얼마나 앞쪽에 위치하였는지를 정량화한 수치를 통해 점수를 산출하고, 산출된 점수에 기초하여 요약 후보 문장의 순서를 조정할 수 있다. For example, in the case of a summary target document corresponding to a newspaper article, a conversation log of a call center, and the like, important sentences and expressions are located in the front of the entire text constituting the summary target document, so the order adjustment unit 150 determines the length of the entire text. A score may be calculated based on a numerical value quantifying how far in front of the contrast, and the order of summary candidate sentences may be adjusted based on the calculated score.

다른 예를 들어, [요금제*변경*하고]라는 문서 요약 규칙이 존재하고, 요약 대상 문서 내 "제가 요금제를 어제 바꿨는데 그게 생각보다 너무 데이터가 적어서 변경하고 싶은데 하루 만에 가능할지 모르겠어요"라는 A 문장과 "요금제 변경하고 싶어서 전화했어요"라는 B 문장이 존재하는 경우, B 문장이 전체 길이 대비 문서 요약 규칙에 부합하는 부분(밑줄 부분)이 더 많으므로, 순서 조정부(150)는 이를 계량화한 수치를 통해 점수를 산출하여, B 문장과 같이 명확하게 표현된 문장의 순서가 높아지도록 순서를 조정할 수 있다. For another example, there is a document summary rule called [Price plan*change*], and in the document to be summarized, there is an A saying "I changed the plan yesterday, but it has too little data than I thought, so I want to change it, but I don't know if it will be possible in one day." If there is a sentence and sentence B, "I called because I wanted to change the plan ," since sentence B has more parts (underlined part) that match the document summary rule compared to the overall length, the order adjustment unit 150 quantifies it By calculating a score through , the order can be adjusted so that the order of clearly expressed sentences such as sentence B is higher.

또 다른 예를 들어, [요금제*변경*하고]라는 A 문서 요약 규칙과 [요금제*변경*전화]라는 B 문서 요약 규칙이 존재한다고 가정하자. "제가 요금제를 어제 바꿨는데 그게 생각보다 너무 데이터가 적어서 변경하고 싶은데 하루 만에 가능할지 모르겠어요"라는 A 문장의 경우, A 문서 요약 규칙에만 부합하지만, "요금제 변경하고 싶어서 전화했어요"라는 B 문장의 경우, A 및 B 문서 요약 규칙 모두 부합할 수 있다. 따라서, 순서 조정부(150)는 B 문장이 더 높은 우선순위를 가지도록 요약 후보 문장의 순서를 조정할 수 있다. As another example, suppose that there is an A document summary rule called [Plan*Change*and] and a B document summary rule called [Plan*Change*Phone]. In the case of sentence A, "I changed the plan yesterday, but there is too little data than I thought, I want to change it, but I don't know if it will be possible in one day", it only meets the rule of summary of document A, but the sentence B of "I called because I wanted to change the plan" In this case, both A and B document summary rules may be met. Accordingly, the order adjustment unit 150 may adjust the order of the summary candidate sentences so that sentence B has a higher priority.

또 다른 예를 들어, 반드시 추출되어야 하는 중요 문장에 부합하는 문서 요약 규칙에는 높은 가중치가 부여되고, 비교적 덜 중요한 문장에 부합하는 문서 요약 규칙에는 상대적으로 낮은 가중치가 부여된 경우, 순서 조정부(150)는 가중치가 부여된 문장들 중 높은 가중치가 부여된 문서 요약 규칙에 부합하는 요약 후보 문장의 순위가 높아지도록 요약 후보 문장의 순서를 조정할 수 있다. As another example, when a high weight is given to a document summary rule matching an important sentence that must be extracted and a relatively low weight is given to a document summary rule matching a relatively less important sentence, the order adjustment unit 150 . may adjust the order of the summary candidate sentences so that the ranking of the summary candidate sentences matching the high weighted document summary rule among the weighted sentences is increased.

후처리 수행부(160)는 문서 요약 규칙 중 적어도 하나를 이용하여 적어도 하나의 요약 후보 문장 중 중복된 요약 후보 문장을 요약 결과로 선택하거나, 기설정된 문장 수에 기초하여 하위 순위에 해당하는 요약 후보 문장을 제거하거나, 불용어 규칙 및 삭제 규칙에 해당하는 요약 후보 문장을 제거 또는 교체할 수 있다. 이외에도, 후처리 수행부(160)는 통상적으로 이용될 수 있는 문장의 정제 및 변환 과정을 후처리 과정을 통해 수행할 수 있다. The post-processing unit 160 selects a duplicate summary candidate sentence from among at least one summary candidate sentence as a summary result by using at least one of the document summary rules, or a summary candidate corresponding to a lower rank based on a preset number of sentences. Sentences may be removed, or summary candidate sentences corresponding to stopword rules and deletion rules may be removed or replaced. In addition, the post-processing performing unit 160 may perform a process of refining and converting a commonly used sentence through a post-processing process.

요약 결과 제공부(170)는 조정된 적어도 하나의 요약 후보 문장의 순서에 기초하여 요약 대상 문서에 대한 요약 결과를 제공할 수 있다. 여기서, 요약 결과는 요약 대상 문서로부터 중요하게 여겨지는 요약 문장의 일부가 발췌되어 특정 형식으로 변환 및 가공되거나, 혹은 발췌된 원문 그대로 제공될 수 있다. The summary result providing unit 170 may provide a summary result for the summary target document based on the adjusted order of at least one summary candidate sentence. Here, as for the summary result, a part of the summary sentence considered important from the document to be summarized may be extracted and converted into a specific format and processed, or the extracted original text may be provided.

기록부(180)는 요약 결과의 도출에 이용된 문서 요약 규칙에 대한 규칙 이용 로그 및 요약 대상 문서 및 요약 결과에 대한 서비스 로그를 기록할 수 있다. The recording unit 180 may record a rule usage log for a document summary rule used for deriving a summary result, and a service log for a document to be summarized and a summary result.

규칙 이용 로그는 어떤 규칙이 많이 이용되었는지, 또는 어떤 규칙이 전혀 이용되지 않았는지, 또는 어떤 규칙이 올바른 요약문 생성에 많이 이용되었는지, 또는 어떤 규칙이 잘못된 요약문 생성에 많이 이용되었는지 등이 통계 분석됨으로써, 문서 요약 규칙을 업데이트하는데 이용될 수 있다. The rule usage log is statistically analyzed which rule was used a lot, which rule was not used at all, which rule was used a lot to generate a correct summary, or which rule was used a lot to generate an incorrect summary, etc. Can be used to update document summary rules.

서비스 로그는 추후 관리자에 의해 어떤 요약 대상 문서의 분석 결과가 미흡하여 어떤 요약 결과가 제공되었는지, 미흡한 요약 결과의 경우 어떤 요약 결과를 제공하는 것이 더 적절하였는지 분석하는데 이용됨으로써, 문서 요약 규칙을 업데이트하는데 이용될 수 있다. The service log is later used by the administrator to analyze which summary results were provided due to insufficient analysis results of which documents to be summarized, and which summary results were more appropriate to provide in case of insufficient summary results, thereby updating the document summary rules. can be used

등록부(110)는 규칙 이용 로그로부터 각 규칙별 이용 횟수를 추출하고, 추출된 각 규칙별 이용 횟수에 기초하여 소정 기간 동안 미사용된 규칙을 분석하여 문서 요약 규칙을 업데이트할 수 있다. 여기서, 장기간 지속적으로 미사용된 규칙은 삭제 후보가 될 수 있다. The registration unit 110 may update the document summary rule by extracting the number of uses for each rule from the rule use log, and analyzing rules that have not been used for a predetermined period based on the extracted number of uses for each rule. Here, a rule that has not been used continuously for a long time may be a candidate for deletion.

미사용 규칙의 원인으로는 1) 규칙이 매우 상세히 정의되어 요약 대상 문서 내의 문장과의 매칭이 어려운 경우, 2) 다른 규칙에 더 큰 가중치가 부여됨에 따라 우선 순위에서 밀린 경우가 존재할 수 있다. The cause of the unused rule may be 1) when the rule is defined in great detail and it is difficult to match the sentence in the document to be summarized, and 2) when the priority is pushed back because another rule is given a greater weight.

1)의 경우, 특정 규칙과 특정 규칙의 트리 구조 상의 상위 규칙의 이용 수에 기초하여 원인이 파악될 수 있다. 예를 들어, 상위 규칙의 이용 횟수가 많은 반면, 상위 규칙으로부터 파생된 특정 규칙의 이용 횟수는 극히 적은 경우, 상위 규칙으로부터 하위 규칙을 정의하기 위해 추가된 조건이 요약 대상 문서와 실제로 거의 매칭이 발생되지 않은 경우로 분석될 수 있다. In the case of 1), the cause may be identified based on the specific rule and the number of uses of the upper rule in the tree structure of the specific rule. For example, if the number of uses of the parent rule is high, but the number of uses of a specific rule derived from the parent rule is extremely small, the condition added to define the child rule from the parent rule almost matches the document to be summarized. It can be analyzed as a non-existent case.

이 경우, 등록부(110)는 특정 규칙의 조건을 변경하거나, 상위 규칙과 특정 규칙을 관리자에게 제시하고, 더 많이 등장할 것으로 예상되는 규칙을 관리자에게 제안함으로써, 관리자로부터 특정 규칙을 대체할 추가 규칙을 등록 받을 수 있다. In this case, the register 110 changes the condition of a specific rule, presents the upper rule and the specific rule to the manager, and proposes a rule expected to appear more to the manager, thereby adding an additional rule to replace the specific rule from the manager can be registered.

2)의 경우, 서비스 로그 내 요약 대상 문서의 전체 텍스트를 대상으로 특정 규칙(우선 순위에서 밀려 거의 이용되지 않는 규칙)이 적용될 수 있는 텍스트를 탐색하고, 해당 텍스트에 대해 실제 요약 결과로 제시될 경우에 이용된 규칙 및 가중치가 획득됨으로써 원인이 파악될 수 있다. In case 2), if a text to which a specific rule (a rule that is rarely used because it has been pushed out of priority) can be applied is searched for the entire text of the document to be summarized in the service log, and the text is presented as an actual summary result By obtaining the rules and weights used for , the cause can be identified.

이 경우, 특정 규칙과 실제 이용된 규칙 간의 가중치의 비교를 통해, 특정 규칙을 밀어내고, 선적용된 규칙이 파악될 수 있으며, 등록부(110)는 관리자로부터 가중치 재설정, 선적용된 규칙의 수정 및 삭제를 입력 받을 수 있다. In this case, by comparing the weights between the specific rule and the rule actually used, the specific rule may be pushed out and the pre-applied rule may be identified, and the register 110 may reset the weight, modify and delete the pre-applied rule from the manager. can be input.

등록부(110)는 규칙 이용 로그로부터 각 규칙 별 이용 횟수를 추출하고, 추출된 각 규칙 별 이용 횟수에 기초하여 소정 기간 동안 과다 사용된 규칙을 분석하여 문서 요약 규칙을 업데이트할 수 있다. 이 때, 등록부(110)는 각 규칙 별 이용 횟수에 기초하여 각 규칙을 내림차순으로 정렬하여 상위에 위치한 소정의 규칙에 대해 세분화된 분석을 수행할 수 있다. 여기서, 상위에 위치한 소정의 규칙에 대해 세분화된 분석을 수행하는 이유는 문서 요약 규칙이 과다 사용된 경우와 일반적으로 사용되는 경우와 같이 유형별로 세분화하여 분석하기 위함이다. The registration unit 110 may update the document summary rule by extracting the number of uses for each rule from the rule usage log, and analyzing the rule overused for a predetermined period based on the extracted number of uses for each rule. In this case, the registration unit 110 may sort each rule in descending order based on the number of uses for each rule, and perform a detailed analysis on a predetermined rule positioned above. Here, the reason for performing the subdivided analysis on the predetermined rule positioned above is to analyze it subdivided by type, such as when the document summary rule is excessively used and when it is generally used.

등록부(110)는 과다 사용 규칙의 경우, 규칙 적용 범위 내 또는 규칙 적용 범위와 인접한 위치에 빈번하게 등장하는 키워드를 관리자에게 제시하여, 관리자로부터 세분화된 문서 요약 규칙을 등록받을 수 있다. In the case of an overuse rule, the registration unit 110 may present a keyword frequently appearing within the rule application range or in a position adjacent to the rule application range to the administrator, so that the subdivided document summary rule may be registered from the administrator.

예를 들어, [요금제*변경*하려|할려|하고]라는 문서 요약 규칙이 전체 규칙 중 요약 대상 문서에서 다른 규칙 대비 상대적으로 매우 많은 수가 적용됨으로써, 과다 사용 규칙의 후보가 되었다고 가정하자. 이 때, 규칙 이용 로그로부터 해당 문서 요약 규칙이 적용된 문장을 탐색한 경우, "인터넷 요금제 변경 하려고 하는데요", "인터넷 요금제 변경 하고 싶은데요", "핸드폰 요금제 변경 하고 싶어서 전화했어요", "제 핸드폰 요금제 변경 하려고요" 등과 같은 문장이 탐색될 수 있다. 이 때, 문서 요약 규칙에 부합되는 부분(밑줄 부분)과 인접한 위치에 위치한 '인터넷', '핸드폰' 등과 같은 키워드가 빈번하게 등장하는 것을 확인할 수 있다. For example, suppose that the document summarization rule of [rate plan*change*|attempt|would|want|would] be applied as a candidate for the overuse rule by applying a relatively large number of document summarization rules among all the rules compared to other rules in the document to be summarized. At this time, if you search for a sentence to which the document summary rule is applied from the rule usage log, "I want to change my Internet plan", "I want to change my Internet plan ", "I called because I want to change my cell phone plan ", "My cell phone Sentences such as "I want to change the plan " can be searched. In this case, it can be seen that keywords such as 'internet' and 'cell phone' frequently appear in positions adjacent to the part (underlined part) matching the document summary rule.

이 경우, 등록부(110)는 관리자에게 문서 요약 규칙의 앞에 빈번하게 등장하는 키워드를 제시하여, 관리자로부터 세분화된 문서 요약 규칙을 등록받을 수 있다. 이러한 세분화된 문서 요약 규칙은 문서 요약 규칙의 관리, 요약 결과의 분석 등에 유용하게 이용될 수 있다. In this case, the registration unit 110 may present a keyword that appears frequently in front of the document summary rule to the administrator, and receive the subdivided document summary rule registered by the administrator. Such a subdivided document summary rule may be usefully used for management of document summary rules, analysis of summary results, and the like.

등록부(110)는 서비스 로그를 분석하여 요약 대상 문서에 대한 요약 결과를 정상 상태 또는 오류 상태 중 어느 하나로 구분하고, 오류 상태에 해당하는 요약 대상 문서 및 요약 결과에 대해 정답 문장 유사도에 기초하여 문서 요약 규칙을 업데이트할 수 있다. 여기서, 서비스 로그는 신규 문서 요약 규칙을 정의하거나, 기존의 문서 요약 규칙을 수정하기 위해 이용될 수 있다. The registration unit 110 analyzes the service log to classify the summary result for the summary document into either a normal state or an error state, and summarizes the document based on the similarity of the correct answer sentence for the summary target document and the summary result corresponding to the error state You can update the rules. Here, the service log may be used to define a new document summary rule or to modify an existing document summary rule.

예를 들어, 등록부(110)는 서비스 로그를 분석하여 추출되어야 하는 임의의 정답 문장 및 텍스트를 선정하고, 이에 대한 요약 결과를 분석하여 각 유형 별로 정상 상태 또는 오류 상태로 구분할 수 있다. For example, the registration unit 110 may analyze a service log to select an arbitrary correct answer sentence and text to be extracted, and analyze a summary result thereof to classify it into a normal state or an error state for each type.

정상 상태는 1) 추출되어야 하는 문장이 명확하고 적절한 형태로 추출된 경우, 2) 추출되어야 하는 문장이 불명확하고 추출된 문장이 존재하지 않은 경우 등을 포함할 수 있다. The normal state may include 1) when the sentence to be extracted is extracted in a clear and appropriate form, 2) when the sentence to be extracted is unclear and the extracted sentence does not exist.

오류 상태는 1) 추출되어야 하는 명확하지만 추출되지 않은 경우, 2) 추출되어야 하는 문장이 명확하지만 다른 문장이 추출된 경우, 3) 추출되어야 하는 문장이 명확하고 추출되었지만, 불필요한 문장이 함께 추출되어 정확한 요약 결과로 판단되지 않은 경우, 4) 추출되어야 하는 문장이 불명확하고 상관 없는 문장을 추출한 경우 등을 포함할 수 있다. The error state is 1) clear to be extracted but not extracted, 2) when the sentence to be extracted is clear but another sentence is extracted, 3) the sentence to be extracted is clear and extracted, but unnecessary sentences are extracted together and correct When it is not determined as a summary result, 4) the sentence to be extracted is unclear and irrelevant sentences are extracted.

등록부(110)는 오류 상태에 해당하는 모든 경우를 정답 문장 유사도에 기초하여 분류할 수 있다. 정답 문장 유사도는 자카드 유사도(Jaccard Similarity)를 통해 계산된 후, 군집화 과정을 통해 복수의 클러스터로 생성될 수 있다. 자카드 유사도는 집합 간 유사도를 측정하는 방법 중 하나로 여기서는 문장 간 유사도를 계산하는 다른 어떤 방법이 사용되어도 무방하다. 이 때, 클러스터 내에 속한 오류 상태가 소정의 수 이상이 되는 경우, 정답 문장을 분석하여 자동으로 규칙화할 수 있다. 이 때, 자동으로 생성된 규칙은 포함된 단어의 순서, 동일 단어의 포함 여부, 동일 단어의 표현 위치 등을 고려하여 생성될 수 있다. The registration unit 110 may classify all cases corresponding to the error state based on the similarity of the correct answer sentence. After the correct answer sentence similarity is calculated through Jaccard similarity, a plurality of clusters may be generated through a clustering process. Jacquard similarity is one of the methods for measuring the similarity between sets, and any other method for calculating the similarity between sentences may be used here. At this time, when the number of error states belonging to the cluster is greater than or equal to a predetermined number, correct sentences may be analyzed and regularized automatically. In this case, the automatically generated rule may be generated in consideration of the order of included words, whether the same word is included, an expression position of the same word, and the like.

예를 들어, 하나의 클러스터 내에 속한 정답 문장이 "핸드폰 요금제 변경하려고요', '요금제 변경하려는데 어떻게 해요', '요금제 바구고 싶은데 전화로 신청할 수 있나요', '요금제 변경이 가능한지 문의드리려고요' 등의 네 가지 경우를 포함한다고 가정하자. 여기서, 기존의 언어 모델(Language Model)을 통해 '요금제', '바꾸고', '변경' 등이 공통적으로 등장하고, '바꾸고'와 '변경'이 유사한 어휘라는 것이 확인되면, [요금제*변경|바꾸고]라는 신규 문서 요약 규칙을 생성하여 관리자에게 제시하고, 관리자에 의해 최종 검토를 통해 신규 문서 요약 규칙을 등록받을 수 있다. For example, the correct answer sentences in one cluster are "I want to change the mobile phone plan", "I want to change the plan, how do I do it", "I want to buy a plan, can I apply by phone?" Assume that there are four cases: Here, 'rate plan', 'change', 'change', etc. appear in common through the existing language model, and 'change' and 'change' are similar words. If it is confirmed, a new document summary rule called [rate plan * change|change] is created and presented to the administrator, and the new document summary rule can be registered through final review by the administrator.

이러한 과정을 통해, 지정된 임의의 정답 문장 및 텍스트를 테스트 셋 내에 포함시킬 수 있다. 테스트 셋은 문서 요약 규칙을 정량적으로 평가하는데 이용될 수 있으며, 테스트 셋과 요약 결과가 얼마나 유사한 결과를 도출해내는지 측정할 수 있다. 이를 통해, 문서 요약 규칙의 업데이트 전후를 비교하여 추가된 신규 문서 요약 규칙이 요약 결과의 정확도에 얼마나 기여하는지가 파악될 수 있다. Through this process, it is possible to include a specified arbitrary correct answer sentence and text in the test set. The test set can be used to quantitatively evaluate the document summary rule, and it can measure how similar the test set and the summary result lead to a result. Through this, by comparing before and after the update of the document summary rule, it can be understood how much the added new document summary rule contributes to the accuracy of the summary result.

도 3은 본 발명의 일 실시예에 따른 서비스 로그를 분석하여 요약 결과의 오류 상태에 따라 문서 요약 규칙을 업데이트하는 방법의 순서도이다. 도 3을 참조하면, 문서 요약 장치(100)는 서비스 로그로부터 요약 대상 문서 및 요약 결과에 기초하여 오류 상태의 여부를 분석하고, 각 오류 상태별로 유형을 분류할 수 있다(S310).3 is a flowchart of a method of updating a document summary rule according to an error state of a summary result by analyzing a service log according to an embodiment of the present invention. Referring to FIG. 3 , the document summarizing apparatus 100 may analyze whether an error state exists based on a summary target document and a summary result from a service log, and classify a type for each error state ( S310 ).

문서 요약 장치(100)는 유형별로 분류된 오류 상태에 대해 정답 문장 유사도에 기초하여 규칙화할 수 있다(S320).The document summarizing apparatus 100 may regularize the error states classified by type based on the similarity of the correct answer sentence ( S320 ).

문서 요약 장치(100)는 신규 추출 규칙의 등록을 수행하고(S331), 기존 추출 규칙의 수정을 수행하고(S332), 신규 삭제 규칙의 등록을 수행하고(S333), 신규 교체 규칙의 등록을 수행할 수 있다(S334). The document summary apparatus 100 performs registration of a new extraction rule (S331), performs modification of an existing extraction rule (S332), performs registration of a new deletion rule (S333), and performs registration of a new replacement rule It can be done (S334).

문서 요약 장치(100)는 유형별로 등록 또는 수정된 규칙을 검증하고(S340), 테스트 셋에 포함시켜 성능을 측정할 수 있다(S350). The document summarizing apparatus 100 may verify the registered or modified rules for each type (S340) and measure the performance by including them in a test set (S350).

상술한 설명에서, 단계 S310 내지 S350은 본 발명의 구현 예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S310 to S350 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, and the order between the steps may be switched.

이러한 문서 요약 장치(100)는 문서를 요약하는 명령어들의 시퀀스를 포함하는 매체에 저장된 컴퓨터 프로그램에 의해 실행될 수 있다. 컴퓨터 프로그램은 컴퓨팅 장치에 의해 실행될 경우, 문서 요약 규칙을 등록받고, 요약 대상 문서를 입력받고, 입력된 요약 대상 문서에 대해 상기 문서 요약 규칙을 이용하여 적어도 하나의 요약 후보 문장을 추출하고, 추출된 적어도 하나의 요약 후보 문장에 대해 순서 조정 기준에 기초하여 점수를 산출하고, 산출된 점수에 기초하여 적어도 하나의 요약 후보 문장의 순서를 조정하고, 조정된 적어도 하나의 요약 후보 문장의 순서에 기초하여 요약 대상 문서에 대한 요약 결과를 제공하도록 하는 명령어들의 시퀀스를 포함할 수 있다. The document summarizing apparatus 100 may be executed by a computer program stored in a medium including a sequence of instructions for summarizing a document. When the computer program is executed by the computing device, the document summary rule is registered, the summary target document is input, the input summary target document is extracted at least one summary candidate sentence using the document summary rule, and the extracted Calculating a score for the at least one summary candidate sentence based on the order adjustment criterion, adjusting the order of the at least one summary candidate sentence based on the calculated score, and based on the adjusted order of the at least one summary candidate sentence It may include a sequence of instructions to provide a summary result for a document to be summarized.

도 4는 본 발명의 일 실시예에 따른 문서 요약 과정을 설명하기 위한 예시적인 도면이다. 도 4를 참조하면, 문서 요약 장치(100)는 사용자 단말로부터 요약을 요청받을 수 있다(S400). 여기서, 요약 요청은 추출할 요약 문장의 수(TOP K 로 표현되는 요약 요청 시 받을 것으로 기대하는 최대 요약 문장의 수), 요약 대상 문서, 요약 대상 문서의 분류 체계 등을 포함할 수 있다. 4 is an exemplary diagram for explaining a document summary process according to an embodiment of the present invention. Referring to FIG. 4 , the document summary apparatus 100 may receive a summary request from the user terminal (S400). Here, the summary request may include the number of summary sentences to be extracted (the maximum number of summary sentences expected to be received when a summary request expressed as TOP K), a summary target document, a classification system of the summary target document, and the like.

문서 요약 장치(100)는 분류 체계가 일치하는 키워드 기반 규칙, 디폴트 규칙 등을 포함하는 문서 요약 규칙을 이용하여 요약 대상 문서에 대한 전처리를 수행할 수 있다(S410). 예를 들어, 문서 요약 장치(100)는 요약 대상 문서에 대해 예외처리 로직, null 및 공백문장 제거, 불용어로만 구성된 문장 제거, 문장 내 삭제 규칙에 해당하는 부분의 길이가 전체 문장 길이 대비 특정 비율(config) 이상인 문장을 제거할 수 있다.The document summary apparatus 100 may perform pre-processing on the summary target document by using a document summary rule including a keyword-based rule, a default rule, and the like in which the classification system matches ( S410 ). For example, the document summary apparatus 100 determines that the length of the part corresponding to the exception handling logic, null and blank sentences removal, sentences consisting only of stopwords, and deletion rules within sentences for the summary target document is a specific ratio ( config) and above can be removed.

문서 요약 장치(100)는 문서 요약 규칙을 이용하여 적어도 하나의 요약 후보 문장을 추출할 수 있다(S420). The document summary apparatus 100 may extract at least one summary candidate sentence using the document summary rule ( S420 ).

문서 요약 장치(100)는 추출된 적어도 하나의 요약 후보 문장 각각에 대해 점수를 산출하고, 산출된 점수에 기초하여 순서를 조정하고, 순서가 조정된 요약 후보 문장에 대해 후처리 과정을 수행할 수 있다(S430). The document summary apparatus 100 may calculate a score for each of the extracted at least one summary candidate sentence, adjust the order based on the calculated score, and perform post-processing on the adjusted summary candidate sentence. There is (S430).

문서 요약 장치(100)는 후처리가 완료된 요약 후보 문장에 대해 Json 등 장치간 메시지 전송 규격으로의 변환을 수행하여 요약 결과를 사용자 단말로 제공할 수 있다(S440). The document summary apparatus 100 may provide a summary result to the user terminal by converting the post-processing candidate summary sentence into a message transmission standard between devices such as Json (S440).

상술한 설명에서, 단계 S410 내지 S440은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S410 to S440 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, and the order between the steps may be switched.

도 5는 본 발명의 일 실시예에 따른 문서 요약 장치에서 문서를 요약하는 방법의 순서도이다. 도 5에 도시된 문서 요약 장치(100)에서 문서를 요약하는 방법은 도 1 내지 도 4에 도시된 실시예에 따라 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1 내지 도 4에 도시된 실시예에 따른 문서 요약 장치(100)에서 문서를 요약하는 방법에도 적용된다.5 is a flowchart of a method of summarizing a document in a document summarizing apparatus according to an embodiment of the present invention. The method of summarizing a document in the document summarizing apparatus 100 illustrated in FIG. 5 includes steps of time-series processing according to the embodiments illustrated in FIGS. 1 to 4 . Accordingly, even if the description is omitted below, it is also applied to the method of summarizing a document in the document summarizing apparatus 100 according to the embodiment shown in FIGS. 1 to 4 .

단계 S510에서 문서 요약 장치(100)는 문서 요약 규칙을 등록받을 수 있다. In step S510, the document summary apparatus 100 may receive a document summary rule registered.

단계 S520에서 문서 요약 장치(100)는 요약 대상 문서를 입력받을 수 있다. In step S520, the document summary apparatus 100 may receive a summary target document.

단계 S530에서 문서 요약 장치(100)는 입력된 요약 대상 문서에 대해 문서 요약 규칙을 이용하여 적어도 하나의 요약 후보 문장을 추출할 수 있다. In operation S530 , the document summary apparatus 100 may extract at least one summary candidate sentence from the input summary target document by using a document summary rule.

단계 S540에서 문서 요약 장치(100)는 추출된 적어도 하나의 요약 후보 문장에 대해 순서 조정 기준에 기초하여 점수를 산출할 수 있다. In operation S540 , the document summarizing apparatus 100 may calculate a score for at least one extracted summary candidate sentence based on the order adjustment criterion.

단계 S550에서 문서 요약 장치(100)는 산출된 점수에 기초하여 적어도 하나의 요약 후보 문장의 순서를 조정할 수 있다. In operation S550 , the document summary apparatus 100 may adjust the order of at least one summary candidate sentence based on the calculated score.

단계 S560에서 문서 요약 장치(100)는 조정된 적어도 하나의 요약 후보 문장의 순서에 기초하여 요약 대상 문서에 대한 요약 결과를 제공할 수 있다. In operation S560 , the document summary apparatus 100 may provide a summary result for the summary target document based on the adjusted order of at least one summary candidate sentence.

상술한 설명에서, 단계 S510 내지 S560은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S510 to S560 may be further divided into additional steps or combined into fewer steps according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, and the order between the steps may be switched.

도 1 내지 도 5를 통해 설명된 문서 요약 장치에서 문서를 요약하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램 또는 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 또한, 도 1 내지 도 5를 통해 설명된 문서 요약 장치에서 문서를 요약하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램의 형태로도 구현될 수 있다. The method of summarizing a document in the document summarizing apparatus described with reference to FIGS. 1 to 5 may also be implemented in the form of a computer program stored in a medium executed by a computer or a recording medium including instructions executable by the computer. Also, the method for summarizing a document in the document summarizing apparatus described with reference to FIGS. 1 to 5 may be implemented in the form of a computer program stored in a medium executed by a computer.

컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer-readable media may include computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The foregoing description of the present invention is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and likewise components described as distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

100: 문서 요약 장치
110: 등록부
120: 입력부
130: 전처리 수행부
140: 요약 후보 문장 추출부
150: 순서 조정부
160: 후처리 수행부
170: 요약 결과 제공부
180: 기록부100: document summary device
110: register
120: input unit
130: pre-processing unit
140: summary candidate sentence extraction unit
150: order adjustment unit
160: post-processing performing unit
170: summary result provision unit
180: register

Claims

A device for summarizing a document, comprising:
a register receiving document summary rules;
an input unit for receiving a document to be summarized;
a summary candidate sentence extraction unit for extracting at least one summary candidate sentence from the input summary target document by using the document summary rule;
an order adjustment unit for calculating a score for the extracted at least one summary candidate sentence based on an order adjustment criterion and adjusting the order of the at least one summary candidate sentence based on the calculated score; and
A summary result providing unit that provides a summary result for the summary target document based on the adjusted order of the at least one summary candidate sentence
A document summarizing device comprising:

The method of claim 1,
The document summary rule includes at least one of an extraction rule, a replacement rule, a deletion rule, and a stopword rule,
and each component constituting the document summarizing rule is configured in a tree structure.

3. The method of claim 2,
At least one of removing incomplete sentences from the summary target document using at least one of the document summarization rules, removing sentences composed of only stopwords, and removing sentences including a part corresponding to the deletion rule at a predetermined ratio or more relative to the total sentence length preprocessing unit
Further comprising a, document summarization device.

3. The method of claim 2,
wherein the summary candidate sentence extraction unit extracts a sentence satisfying the extraction rule from among a plurality of sentences constituting the summary target document as the summary candidate sentence.

The method of claim 1,
The order adjustment unit includes a position of each summary candidate sentence in the summary target document, a degree of correspondence between the document summary rule and each candidate summary sentence, the number of rules matching each candidate summary sentence among the document summary rules, and the document summary and calculating each score for the at least one summary candidate sentence based on an order adjustment criterion related to at least one of weight values set for each rule.

The method of claim 1,
Selecting a duplicate summary candidate sentence from among the at least one summary candidate sentence as the summary result using at least one of the document summary rules, or removing a summary candidate sentence corresponding to a lower rank based on a preset number of sentences; Post-processing unit that removes or replaces summary candidate sentences corresponding to stopwords and deletion rules
Further comprising a, document summarization device.

The method of claim 1,
A recorder for recording a rule usage log for a rule used to derive the summary result and a service log for the summary target document and the summary result
Further comprising a, document summarization device.

8. The method of claim 7,
The registration unit extracts the number of uses for each rule from the rule use log, and updates the document summary rule by analyzing the unused and overused rules for a predetermined period based on the extracted number of uses for each rule In, document summary device.

9. The method of claim 8,
The registration unit analyzes the service log to classify the summary result for the summary document into either a normal state or an error state, and based on the similarity of the correct answer sentence for the summary target document and the summary result corresponding to the error state and updating the document summarization rule.

A method of summarizing a document in a document summarizing device, the method comprising:
receiving a document summary rule registration;
receiving a summary target document;
extracting at least one summary candidate sentence from the input summary target document by using the document summary rule;
calculating a score for the extracted at least one summary candidate sentence based on an order adjustment criterion;
adjusting the order of the at least one summary candidate sentence based on the calculated score; and
providing a summary result for the summary target document based on the adjusted order of at least one summary candidate sentence;
A document summary method comprising:

11. The method of claim 10,
The document summary rule includes at least one of an extraction rule, a replacement rule, a deletion rule, and a stopword rule,
and each component constituting the document summarization rule is configured in a tree structure.

12. The method of claim 11,
At least one of removing incomplete sentences from the summary target document using at least one of the document summarization rules, removing sentences composed of only stopwords, and removing sentences including a part corresponding to the deletion rule at a predetermined ratio or more relative to the total sentence length step
Further comprising a, document summary method.

12. The method of claim 11,
The step of extracting the summary candidate sentence comprises:
and extracting a sentence satisfying the extraction rule from among a plurality of sentences constituting the summary target document as the summary candidate sentence.

11. The method of claim 10,
The step of calculating the score is
The position of each summary candidate sentence in the summary target document, the degree of correspondence between the document summary rule and each of the summary candidate sentences, the number of rules matching each of the summary candidate sentences among the document summary rules, and a weight set for each document summary rule and calculating a respective score for the at least one summary candidate sentence based on a reordering criterion associated with at least one of the values.

11. The method of claim 10,
Selecting a duplicate summary candidate sentence from among the at least one summary candidate sentence as the summary result using at least one of the document summary rules, or removing a summary candidate sentence corresponding to a lower rank based on a preset number of sentences; Remove or replace summary candidate sentences corresponding to stopwords and deletion rules
Further comprising a, document summary method.

11. The method of claim 10,
Recording a rule usage log for a rule used to derive the summary result and a service log for the summary target document and the summary result
Further comprising a, document summary method.

17. The method of claim 16,
The step of receiving the document summary rule registration,
extracting the number of uses for each rule from the rule use log; and
and updating the document summary rule by analyzing the unused rule and the overused rule for a predetermined period based on the extracted number of uses for each rule.

18. The method of claim 17,
The step of registering the document summary rule comprises:
analyzing the service log to classify the summary result for the summary document into either a normal state or an error state; and
and updating the document summarization rule based on a correct answer sentence similarity with respect to a summary target document corresponding to the error state and a summary result.

A computer program stored on a medium comprising a sequence of instructions for summarizing a document, the computer program comprising:
When the computer program is executed by a computing device,
Register the document summary rule,
Receive the document to be summarized,
extracting at least one summary candidate sentence using the document summary rule with respect to the input summary target document;
calculating a score based on an order adjustment criterion for the extracted at least one summary candidate sentence, and adjusting the order of the at least one summary candidate sentence based on the calculated score;
and a sequence of instructions configured to provide a summary result for the summary target document based on the adjusted order of the at least one summary candidate sentence.