KR101828051B1

KR101828051B1 - Method, system and computer readable recording medium for extracting of summary e-mail of same subject

Info

Publication number: KR101828051B1
Application number: KR1020110064753A
Authority: KR
Inventors: 강태기; 박호민; 박민식
Original assignee: 네이버 주식회사
Priority date: 2011-06-30
Filing date: 2011-06-30
Publication date: 2018-02-09
Also published as: KR20130007244A

Abstract

본 발명은 전자메일 요약본 생성 방법 및 장치에 대한 것이다. 본 발명에 따른 전자메일 요약본 생성 방법은, (a) 전자메일로부터 추출되는 시드 키워드에 기초하여 동일한 주제를 갖는 복수의 전자메일을 하나의 그룹으로 클러스터링하는 단계; (b) 상기 그룹으로 설정된 전자메일 중 어느 한 전자메일을 기초 전자메일로 결정하는 단계; (c) 상기 기초 전자메일과 상기 그룹 내의 다른 전자메일 간의 유사도를 산출하는 단계; 및 (d) 상기 전자메일별로 산출된 유사도를 기초로, 상기 기초 전자메일이 상기 그룹 내의 모든 전자메일과 미리 설정된 값 이상의 유사도를 갖는 경우 상기 기초 전자메일을 전자메일 요약본으로 결정하는 단계를 포함한다.
The present invention relates to a method and apparatus for generating an electronic mail summary. A method for generating an electronic mail summary according to the present invention comprises the steps of: (a) clustering a plurality of electronic mails having the same subject based on a seed keyword extracted from an electronic mail into one group; (b) determining any one of the e-mails set as the group as the basic e-mail; (c) calculating a degree of similarity between the basic e-mail and other e-mails in the group; And (d) determining the basic e-mail as an e-mail summary if the basic e-mail has a degree of similarity equal to or greater than a predetermined value to all e-mails in the group based on the similarity calculated for each e-mail .

Description

[0001] METHOD, SYSTEM AND COMPUTER READABLE RECORDING MEDIUM FOR EXTERNAL E-MAIL OF SAME SUBJECT [0002] BACKGROUND OF THE INVENTION [0003]

본 발명은 전자메일 요약본 생성 방법, 장치 및 컴퓨터 판독 가능한 기록매체에 관한 것으로, 보다 상세하게는 사용자 정보가 동일하거나 또는, 동일한 주제를 가지고 있는 복수의 전자메일들을 병합하여 새로운 전자메일 요약본을 생성하기 위한 방법, 장치 및 컴퓨터 판독 가능한 매체에 관한 것이다.
BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a method for generating an electronic mail summary, an apparatus and a computer-readable recording medium, and more particularly to a method for generating a new electronic mail summary by merging a plurality of e- To an apparatus, and a computer-readable medium.

최근 정보통신의 발달로 많은 사람들이 인터넷을 이용하며, 이러한 인터넷 이용의 증대와 함께 상업적 용도 또는 개인적 용도의 전자메일이 보편화되어 사용되고 있다. 최근에는 메일 발신자, 수신자 또는 참조자가 다수인 경우가 빈번하게 발생한다. 이러한 여러명의 사용자들은 서로 간에 답신 전자메일을 보내기도 하고, 이를 제3자에게 전달하기도 한다. 이렇게 다수가 답신과 전달을 거듭하는 경우, 동일한 주제에 대해서 메일의 내용이 산만하게 정리되는 경우가 발생한다. 예를 들어, 최초 발신인은 제1 수신자와 제2 수신자 사이에서 답장을 주고 받을 수 있으며, 그러한 과정에서 제3 수신자가 중간에 관여하게 될 수도 있으며, 참조인도 제1 내지 제3 수신자 및 발신인에게 동일한 주제에 대한 언급을 할 수도 있다. 이러한 경우에 최초의 발신인은 복수 개의 전자메일을 수회에 걸쳐서 확인해야하는데, 통상 다른 주제의 메일도 수신함에 있으므로, 사용자의 불편을 초래하게 된다. BACKGROUND ART [0002] With the recent development of information communication, many people use the Internet. With the increase of the Internet usage, commercial use or e-mail for personal use are widely used. In recent years, a large number of mail senders, recipients, or referrers frequently occur. Many of these users send e-mails to each other and forward them to third parties. When a large number of such messages are repetitively sent and received, the contents of the mail may be scattered on the same topic. For example, the original sender may be able to reply back and forth between the first recipient and the second recipient, in which case the third recipient may be involved in the middle, and the first recipient and the third recipient, You can also mention the topic. In this case, the first sender has to check a plurality of e-mails over a plurality of times, and usually receives e-mails on other topics, thereby causing inconvenience to the user.

따라서, 상기의 문제점을 해결할 수 있도록 동일 주제 전자메일 요약본을 추출하는 방법 또는 장치가 요청된다.
Therefore, a method or apparatus for extracting a summary e-mail summary of the same topic is required to solve the above problems.

본 발명의 목적은 위에서 언급한 종래 기술의 문제점을 해결하는 것이다. An object of the present invention is to solve the above-mentioned problems of the prior art.

본 발명의 일 목적은 동일한 주제를 갖는 복수의 전자메일들을 병합하여 전자메일 요약본을 생성하는 것이다. One object of the present invention is to merge a plurality of e-mails having the same subject to generate an e-mail summary.

본 발명의 다른 목적은 전자메일 요약본을 생성하여, 사용자가 쉽게 전자메일의 내용을 파악할 수 있도록 하는 것이다. Another object of the present invention is to generate an electronic mail summary so that the user can easily grasp the content of the electronic mail.

본 발명의 또 다른 목적은 전자메일 요약본을 계속적으로 갱신하여, 새로 수신된 전자메일에 대해서도 기존 메일들과 비교 및 대조하면서 볼 수 있도록 하는 것이다.
Yet another object of the present invention is to continuously update an electronic mail summary so that newly received electronic mail can be viewed and compared with existing mail.

상기한 바와 같은 본 발명의 목적을 달성하고, 후술하는 본 발명의 특유의 효과를 달성하기 위한, 본 발명의 특징적인 구성은 하기와 같다. In order to achieve the above-described object of the present invention and to achieve the specific effects of the present invention described below, the characteristic structure of the present invention is as follows.

본 발명의 일 태양에 따르면, 전자메일 요약본 생성 장치에 의해 수행되는 전자메일 요약본 생성 방법에 있어서, (a) 동일한 주제를 갖는 복수의 전자메일을 하나의 그룹으로 클러스터링하는 단계; (b) 상기 그룹으로 설정된 전자메일 중 기초 전자메일을 결정하는 단계; (c) 상기 기초 전자메일과 상기 그룹 내의 다른 전자메일 간의 유사도를 산출하는 단계; 및 (d) 상기 전자메일별로 산출된 유사도를 기초로, 미리 설정된 값 이상의 유사도를 갖는 경우 상기 기초 전자메일을 전자메일 요약본으로 결정하는 단계를 포함하는 것을 특징으로 하는 전자메일 요약본 생성 방법이 제공된다. According to an aspect of the present invention, there is provided a method of generating an electronic mail summary that is performed by an electronic mail summary generating apparatus, the method comprising: (a) clustering a plurality of electronic mails having the same subject into a group; (b) determining a basic e-mail among the e-mails set as the group; (c) calculating a degree of similarity between the basic e-mail and other e-mails in the group; And (d) determining the basic e-mail as an e-mail summary if the degree of similarity is equal to or greater than a predetermined value based on the degree of similarity calculated for each e-mail. .

본 발명의 다른 태양에 따르면, 전자메일 요약본 생성 장치에 있어서, 동일한 주제를 갖는 복수의 전자메일을 하나의 그룹으로 클러스터링하는 클러스터링부; 상기 그룹으로 설정된 전자메일 중 기초 전자메일을 결정하여 추출하는 추출부; 상기 기초 전자메일과 상기 그룹 내의 다른 전자메일 간의 유사도를 산출하는 평가부; 및 상기 전자메일별로 산출된 유사도를 기초로, 미리 설정된 값 이상의 유사도를 갖는 경우 상기 기초 전자메일을 전자메일 요약본으로 결정하는 생성부를 포함하는 것을 특징으로 하는 전자메일 요약본 생성 장치가 제공된다.
According to another aspect of the present invention, there is provided an apparatus for generating an electronic mail summary, comprising: a clustering unit for clustering a plurality of electronic mails having the same subject into one group; An extracting unit for determining and extracting basic e-mail among the e-mails set as the group; An evaluation unit for calculating a degree of similarity between the basic electronic mail and another electronic mail in the group; And a generation unit for determining the basic e-mail as an e-mail summary if the degree of similarity is equal to or greater than a predetermined value based on the degree of similarity calculated for each e-mail.

상술한 바와 같이 본 발명에 따르면, 본 발명은 동일한 주제를 가지고 있는 복수의 전자메일들을 병합하여 전자메일 요약본을 생성하도록 할 수 있고, 사용자가 쉽게 전자메일의 내용을 파악하도록 할 수 있는 효과가 있다. As described above, according to the present invention, an e-mail summary can be generated by merging a plurality of e-mails having the same subject, and the user can easily grasp the content of the e-mail .

또한, 본 발명에 따르면, 생성된 전자메일 요약본을 주기적으로 갱신할 수 있다는 효과가 있다.
Further, according to the present invention, there is an effect that the created electronic mail summary can be periodically updated.

도 1은 본 발명의 일 실시예에 따른 전자메일 관리 서버를 포함하는 전체 장치를 개략적으로 나타내는 도면이다.
도 2는 본 발명의 일 실시예에 따른 전자메일 관리 서버의 세부 구성도이다.
도 3은 본 발명의 일 실시예에 따른 전자메일 요약본 생성 장치를 도시한 도면이다.
도 4 내지 도 5은 본 발명의 일 실시예에 따른 하나의 그룹으로 클러스터링 될 수 있는 전자메일들의 예시도이다.
도 6은 본 발명의 일 실시예에 따른 도 4 및 도 5로부터 추출된 전자메일 요약본을 도시한 예시도이다.
도 7 내지 도 9는 본 발명의 일 실시예에 따른 전자메일 요약본을 추출하는 절차의 흐름도이다. 1 is a schematic view of an entire apparatus including an electronic mail management server according to an embodiment of the present invention.
2 is a detailed configuration diagram of an electronic mail management server according to an embodiment of the present invention.
3 is a diagram illustrating an apparatus for generating an electronic mail summary according to an embodiment of the present invention.
Figures 4 through 5 are illustrations of e-mails that may be clustered into a group according to an embodiment of the present invention.
FIG. 6 is an exemplary view illustrating an electronic mail summary extracted from FIGS. 4 and 5 according to an embodiment of the present invention.
7 to 9 are flowcharts of a procedure for extracting an electronic mail summary according to an embodiment of the present invention.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는 적절하게 설명된다면 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다. The following detailed description of the invention refers to the accompanying drawings, which illustrate, by way of illustration, specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that the various embodiments of the present invention are different, but need not be mutually exclusive. For example, certain features, structures, and characteristics described herein may be implemented in other embodiments without departing from the spirit and scope of the invention in connection with an embodiment. It is also to be understood that the position or arrangement of the individual components within each disclosed embodiment may be varied without departing from the spirit and scope of the invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is to be limited only by the appended claims, along with the full scope of equivalents to which the claims are entitled, if properly explained. In the drawings, like reference numerals refer to the same or similar functions throughout the several views.

이하, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 하기 위하여, 본 발명의 바람직한 실시예들에 관하여 첨부된 도면을 참조하여 상세히 설명하기로 한다.
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings, so that those skilled in the art can easily carry out the present invention.

[본 발명의 바람직한 실시예] [Preferred Embodiment of the Present Invention]

본 발명의 명세서에서, 용어 "사용자"는 전자메일 관리 서버를 통하여, 메일링 서비스를 제공받는 자를 말하는 것이고, 사용자는 예를들어, 발신자, 수신자, 참조자를 포함할 수 있다.In the specification of the present invention, the term "user " refers to a person who is provided with a mailing service through an electronic mail management server, and a user may include, for example, a sender, a receiver, and a reference.

본 발명의 명세서에서, 용어 "우선도"는 동일 그룹으로 클러스터링된 메일 중, 전자메일 요약본의 내용과 동일할 수 있는 확률을 나타내는 정도를 나타내는 수치이다. "우선도"는 예를들어, 전자메일의 길이, 수신 시간, 사용자 수 중 하나 이상의 변수를 통해서 선정할 수 있다. In the specification of the present invention, the term "priority degree" is a numerical value indicating the degree of probability that the probability of being the same as the content of the electronic mail summary among the mail clustered in the same group. The "priority" can be selected through one or more of, for example, the length of the electronic mail, the reception time, and the number of users.

본 발명의 명세서에서, 용어 "유사도"는 동일 그룹으로 클러스터링된 메일 사이에서, 서로 간에 내용이 유사한 정도를 나타내는 것이다. "유사도"는 예를들어, 전자메일의 세부 내용을 통해서 산출될 수 있다. In the specification of the present invention, the term "similarity degree " indicates the degree of similarity between contents among mail clustered in the same group. The "similarity" can be calculated, for example, through the details of the e-mail.

상술한 용어의 정의는 본 발명의 권리범위를 문자 그대로 한정하는 의미는 아니고, 당업자가 본 명세서의 기재를 통하여 쉽게 알 수 있는 내용까지 포함하는 것이다. The definition of the above-mentioned terms does not mean to limit the scope of the right of the present invention literally, but includes contents easily understood by those skilled in the art through the description of the present specification.

전체 장치 구성Configure the entire device

도 1은 본 발명의 일 실시예에 따른 전자메일 관리 서버를 포함하는 전체 장치를 개략적으로 나타내는 도면이다.1 is a schematic view of an entire apparatus including an electronic mail management server according to an embodiment of the present invention.

도 1에서 도시된 바와 같이, 본 발명의 일 실시예에 따른 전체 장치는 데이터베이스(300)에 연결된 전자메일요약본 생성 장치를 포함하고 있는 전자메일 관리 서버(100)가 통신망(200)을 통하여 하나 이상의 사용자 단말장치(400)와 연결되어 있다. 1, an overall apparatus according to an exemplary embodiment of the present invention includes an electronic mail management server 100 including an electronic mail summary generation device connected to a database 300, And is connected to the user terminal device 400.

본 발명의 일 실시예에 따르면, 먼저 전자메일 관리 서버(100)는 메일 서비스에 관한 서버로서, 사용자가 단말장치(400)를 통한 요청에 의해 전자메일의 발신, 전자메일의 수신, 스펨메일 차단, 전자메일의 발신 및 수신의 알림 기능 등을 제공할 수 있다. 여기서, 각 사용자의 정보(사용자 ID) 및 이들이 발신 및 수신하는 전자메일들은 데이터베이스(300) 상에 저장될 수 있다. According to an embodiment of the present invention, first, the e-mail management server 100 is a server related to a mail service. When a user requests e-mail, e-mail, , A notification function of sending and receiving e-mail, and the like. Here, the information (user ID) of each user and the e-mails from which they are sent and received can be stored on the database 300.

또한, 본 발명의 다른 실시예에 따라 전자메일 관리 서버(100) 및 후술하는 전자메일 요약본 생성 장치(130)는 그 기능별로 분리하여 물리적으로 복수의 장치 내에 구현될 수도 있다. 전자메일 관리 서버(100)의 세부 구성요소에 대해서는 후술하도록 한다. In addition, according to another embodiment of the present invention, the electronic mail management server 100 and the electronic mail summary file generation device 130 described later may be physically divided into a plurality of devices by their functions. The detailed components of the e-mail management server 100 will be described later.

그리고, 본 발명의 일 실시예에 따르면, 통신망(200)은 유선 및 무선 등과 같은 그 통신 양태를 가리지 않고 구성될 수 있으며, 단거리 통신망(PAN; Personal Area Network), 근거리 통신망(LAN; Local Area Network), 도시권 통신망(MAN; Metropolitan Area Network), 광역 통신망(WAN; Wide Area Network) 등 다양한 통신망으로 구성될 수 있다. In addition, according to an embodiment of the present invention, the communication network 200 may be configured without regard to its communication mode such as wired and wireless, and may be a personal area network (PAN), a local area network , A metropolitan area network (MAN), a wide area network (WAN), and the like.

또한, 본 발명의 일 실시예에 따르면, 데이터베이스(300)는 웹문서를 통해서 제공될 수 있는 메일링 서비스의 사용자가 입력하는 문서, 문서와 함께 또는 단독으로 첨부될 수 있는 사진이나 동영상과 같은 첨부자료들, 각 사용자의 ID 및 이들이 발신 및 수신하는 전자메일 사용자에 관련된 정보 및 이들에 대한 메타 데이터등 다양한 데이터를 포함할 수 있으나 이에 한정되는 것은 아니며, 공지의 전자메일 관리 서버(100)에 필요한 사용자 인증 정보 등 본 발명의 구현을 위해 이용될 수 있는 다양한 데이터를 더 포함할 수 있는 것은 통상의 기술자에게 있어 자명하다. In addition, according to an embodiment of the present invention, the database 300 may include a document, a document such as a photograph or a moving picture, which can be attached alone or in association with a document, a document input by a user of a mailing service, But it is not limited to this, and may include various data such as data, information of each user, information related to the e-mail user from which they are transmitted and received, and metadata about them. It is obvious to those of ordinary skill in the art that it may further include various data that can be used for the implementation of the present invention, such as user authentication information.

한편, 본 발명의 일 실시예에 따른 사용자 단말장치(400)는 전자메일 관리 서버(100)에서 제공하는 다양한 서비스 및 기능을 이용하기 위해, 사용자가 통신망(200)을 통하여 전자메일 관리 서버(100)와 연결하기 위한 기능을 포함하는 입출력 장치를 의미하며, 데스크톱 컴퓨터뿐만 아니라 노트북 컴퓨터, 워크스테이션, 팜톱(palmtop) 컴퓨터, 개인 휴대 정보 단말기(personal digital assistant: PDA), 웹 패드, 스마트 폰을 포함하는 이동 통신 단말기 등과 같이 메모리 수단을 구비하고 마이크로 프로세서를 탑재하여 연산 능력을 갖춘 디지털 기기라면 얼마든지 본 발명에 따른 사용자 단말 장치(400)로서 채택될 수 있다. 또한, 본 발명의 일 실시예에 따른 사용자 단말장치(400)는 사용자가 통신망(200)을 통하여 전자메일 관리 서버(100)와 연결하기 위한 기능을 포함하는 기능을 더 포함할 수 있다. 바람직하게는, 전자메일 관리 서버(100)와 연결하고, 다양한 서비스를 제공받고 이용하기 위하여 사용자 단말장치(400) 내의 웹 브라우저 또는 어플리케이션을 실행시키고 사용할 수 있으나, 반드시 이에 한정되는 것은 아니다.
The user terminal 400 according to an exemplary embodiment of the present invention uses a variety of services and functions provided by the e-mail management server 100 to allow a user to access the e-mail management server 100 Output device including a function for connecting to a personal computer (PDA), a notebook computer, a workstation, a palmtop computer, a personal digital assistant (PDA), a web pad, and a smart phone as well as a desktop computer The present invention can be adopted as a user terminal device 400 according to the present invention as long as it is a digital device having memory means such as a mobile communication terminal and having a computing function by mounting a microprocessor. The user terminal 400 according to an exemplary embodiment of the present invention may further include a function for allowing a user to connect to the e-mail management server 100 through the communication network 200. [ Preferably, the web browser or application in the user terminal device 400 may be executed and used in order to connect to the e-mail management server 100 and provide and use various services. However, the present invention is not limited thereto.

전자메일 관리 서버Email Management Server

도 2는 본 발명의 일 실시예에 따른 전자메일 관리 서버(100)의 세부 구성도이다. 도 2에 개시된 세부 구성도는 본 발명에 따른 전자메일 관리 서버(100)에서 웹문서를 통해서 제공될 수 있는 특정 서비스의 기능을 설명하기 위한 최소한의 구성요소만 포함하고 있으며, 설명의 용이성을 위해 전자메일 관리 서버(100)가 제공하는 공지의 기능에 대한 구성요소 및 설명은 생략하기로 한다. 하지만, 전자메일 관리 서버(100)가 제공하는 기능은 이하의 실시예에 한정되는 것은 아니다. 2 is a detailed configuration diagram of an e-mail management server 100 according to an embodiment of the present invention. The detailed configuration diagram shown in FIG. 2 includes only a minimum component for describing a function of a specific service that can be provided through a web document in the electronic mail management server 100 according to the present invention. For ease of explanation, Components and descriptions of well-known functions provided by the e-mail management server 100 will be omitted. However, the functions provided by the electronic mail management server 100 are not limited to the following embodiments.

도 2를 참조하면, 본 발명의 일 실시예에 따른 전자메일 관리 서버(100)는 각종 전자메일 관리 장치를 포함할 수 있고, 전자메일 요약본 생성 장치를 포함한다.
Referring to FIG. 2, the e-mail management server 100 according to an embodiment of the present invention may include various e-mail management apparatuses and includes an e-mail summary copy generation apparatus.

전자메일 요약본 생성 장치E-mail summary generator

도 3은 본 발명의 일 실시예에 따른 동일 주제 전자메일 요약본 생성 장치를 도시한 도면이다. FIG. 3 illustrates an apparatus for generating a summary e-mail summary according to an exemplary embodiment of the present invention. Referring to FIG.

도 3를 참조하면, 본 발명의 일 실시예에 따른 전자메일 요약본 생성 장치(130)는 클러스터링부(131), 추출부(132), 평가부(133), 생성부(134)를 포함할 수 있다. 3, an apparatus 130 for generating an electronic mail summary according to an exemplary embodiment of the present invention includes a clustering unit 131, an extracting unit 132, an evaluating unit 133, and a generating unit 134 have.

<클러스터링부><Clustering section>

본 발명에 따른 클러스터링부(131)는 클러스터링 기법을 이용하여, 저장된 복수의 전자메일을 각각의 주제에 따라 분류함으로써 동일한 주제를 가진 복수의 전자메일들에 대한 요약본 생성이 가능하도록 하는 기능을 수행하게 된다. 이하에서, 이러한 본 발명에 따른 클러스터링부(131)의 상세한 구성과 기능에 대하여 상세하게 설명하도록 한다. The clustering unit 131 according to the present invention performs a function of generating a summary of a plurality of e-mails having the same subject by classifying a plurality of stored e-mails according to respective topics using a clustering technique do. Hereinafter, the detailed configuration and function of the clustering unit 131 according to the present invention will be described in detail.

먼저, 본 발명의 일 실시예에 따른 클러스터링부(131)는 사용자에 따라서 분류가 되는 전자메일을 포함하는 데이터베이스에서, 하나 이상의 시드 키워드(seed keyword)를 이용하여 동일한 주제를 가지고 있는 복수의 전자메일들을 하나의 그룹으로 분류하고, 이를 다시 데이터베이스에 저장하는 기능을 제공한다. 이때, 클러스터링에 이용되는 시드 키워드는 전자메일의 제목으로부터 추출되는 제목 정보, 전자메일의 사용자(발신인 아이디, 수신인 아이디, 참조인 아이디)로부터 추출되는 사용자 정보를 포함하여 구성됨이 바람직하다. 통상적으로 전자메일의 제목은 전자메일의 내용을 함축적으로 포함하고 있을 가능성이 높기 때문에, 전자메일의 제목에 대한 형태소 분석을 통해 적어도 하나 이상의 명사를 추출하여 클러스터링을 수행하기 위한 시드 키워드로 사용하는 경우 유사도가 높은 전자메일들을 추출/분류할 수 있게 된다. 또한, 본 발명에 따른 클러스터링부(131)는 전술한 바와 같은 제목 정보뿐만 아니라 전자메일들에 포함되어 있는 사용자 정보(발신인 아이디, 수신인 아이디, 참조인 아이디)를 클러스터링 시드 키워드로 이용하므로, 동일한 주제로 오간 복수의 전자메일들이 추출되어 하나의 그룹으로 분류될 수 있게 된다. First, a clustering unit 131 according to an embodiment of the present invention includes a clustering unit 131 for clustering a plurality of e-mails having the same topic using one or more seed keywords in a database including an e- Are classified into one group and stored in a database. Preferably, the seed keyword used for clustering includes user information extracted from the title information extracted from the title of the electronic mail, the user of the electronic mail (the sender ID, the receiver ID, and the reference ID). In general, the title of the e-mail is likely to include the content of the e-mail. Therefore, when at least one noun is extracted through morphological analysis of the title of the e-mail and used as a seed keyword for performing clustering It is possible to extract / classify e-mails having high similarity. Also, since the clustering unit 131 according to the present invention uses the title information as well as the user information (sender ID, recipient ID, reference ID) included in the e-mails as the clustering seed keyword, A plurality of e-mails can be extracted and classified into one group.

한편, 본 발명에 따른 클러스터링부(131)는 미리 설정된 기간 이전에 수신 또는 발신된 전자메일을 동일한 주제별로 설정되는 그룹에서 제외하도록 구성될 수도 있다. 즉, 동일한 수신자와 발신자 사이에 오간 전자메일에 있어서, 기간이 오래 경과 되는 경우 동일 또는 유사한 제목을 가지고 있더라도 다른 주제를 가진 전자메일일 확률이 높기 때문에, 클러스터링부(131)는 오분석을 방지하기 위하여 최초 전자메일이 수신 또는 발신된 시점으로부터 미리 설정된 기간이 경과된 메일에 대하여 다른 주제를 가지고 있는 것으로 판단하여 다른 그룹으로 분류하도록 구성될 수 있다. 물론, 실시예를 구성하기에 따라 요약본을 생성하는 시점을 기준으로 미리 설정된 기간 이내에 수신 또는 발신된 전자메일만을 클러스터링하여 하나의 그룹으로 설정하도록 구성될 수도 있을 것이다. 여기서, 미리 설정된 기간은 바람직하게는 1주일, 1개월, 3개월일 수 있으며, 이는 사용자의 선택에 따라서 조절할 수도 있을 것이다. Meanwhile, the clustering unit 131 according to the present invention may be configured to exclude e-mails received or transmitted before a preset period from groups set for the same theme. In other words, when a long period of time elapses between the same recipient and sender, the clustering unit 131 prevents the erroneous analysis because it is highly likely to have an e-mail having a different subject even if it has the same or similar title It is determined that the e-mail having a predetermined period of time elapses from the time when the first e-mail is received or transmitted and that the e-mail has a different subject, so that the e-mail is classified into another group. Of course, according to the embodiment, only e-mails received or transmitted within a predetermined period based on the time of generating a summary may be clustered and set as a group. Here, the predetermined period may be preferably one week, one month, three months, and may be adjusted according to the user's selection.

또한, 사용자는 사용자 단말장치(400)를 통해서 클러스터링만 따로 수행할 수도 있다. 이러한 경우 사용자는 클러스터링된 하나의 그룹에 임의적으로 특정메일을 배제하거나 추가할 수 있게된다. In addition, the user may perform clustering separately through the user terminal device 400. [ In this case, the user can arbitrarily exclude or add specific mail to one clustered group.

또한, 사용자는 사용자 단말장치(400)를 통하여 클러스터링부(131)에서 사용하는 시드 키워드를 모니터링 할 수 있고, 특정 시드 키워드를 시드 키워드에 추가하거나 또는 삭제, 변경할 수 있다. Also, the user can monitor the seed keyword used in the clustering unit 131 through the user terminal device 400, and add or delete or change the specific seed keyword to the seed keyword.

한편, 본 발명의 일 실시예에 따른 클러스터링부(131)는 특정 주제를 가진 전자메일들이 오가는 사이에 새롭게 사용자가 추가되는 경우, 추가된 사용자 정보를 반영하여 클러스터링을 다시 수행하도록 구성될 수 있다. 즉, 클러스터링부(131)는 1차 클러스터링을 통해 동일한 주제를 가지고 있는 것으로 판별되어 하나의 그룹으로 설정된 복수의 전자메일들에 포함된 사용자 정보의 합집합과 1차 클러스터링에 이용된 시드 키워드에 포함된 사용자 정보를 비교하여, 사용자 정보의 합집합에 1차 클러스터링에 이용된 시드 키워드에 포함되지 않은 사용자 정보가 존재하는 경우 해당되는 사용자 정보를 시드 키워드에 추가하여 2차 클러스터링을 수행하도록 구성된다. 이러한 사용자 정보의 추가는 클러스터링에 이용된 시드 키워드가 설정된 그룹 내의 모든 전자메일들에 포함된 사용자 정보의 합집합을 모두 포함할 때까지 반복적으로 수행되며, 이에 따라 클러스터링부(131)는 동일한 주제를 가진 전자메일들과 관련된 모든 사용자 정보를 기초로 클러스터링을 수행할 수 있게 된다. Meanwhile, the clustering unit 131 according to an embodiment of the present invention may be configured to perform clustering again by reflecting added user information when a new user is added while e-mails having a specific topic are being exchanged. That is, the clustering unit 131 determines that the e-mails are grouped into a single group through the primary clustering, The user information is compared and if the user information not included in the seed keyword used in the primary clustering exists in the union of the user information, the corresponding user information is added to the seed keyword to perform the secondary clustering. The addition of the user information is repeatedly performed until all the combinations of the user information included in all the e-mails in the group in which the seed keyword used for clustering is set are included, so that the clustering unit 131 has the same subject Clustering can be performed based on all user information related to e-mails.

다른 한편으로, 본 발명의 다른 일 실시예에 따른 클러스터링부(131)는 추출부(132) 및 평가부(133)와 연동되어 보다 효율적으로 추가 클러스터링을 수행하도록 구성될 수 있다. 즉, 클러스터링부(131)는 1차 클러스터링 결과 하나의 그룹으로 설정된 복수의 전자메일 중에서 우선도가 높은 전자메일을 기준으로 유의미한 상관관계를 갖는, 즉, 우선 전자메일과 미리 설정된 값 이상의 유사도를 갖는 전자메일들에 포함된 사용자 정보의 합집합과 우선도가 높은 전자메일에 포함된 사용자 정보를 비교하고 그에 따라 사용자 정보의 추가 여부를 결정하도록 구성된다. 실시예를 구성하기에 따라, 클러스터링부(131)는 우선도가 가장 높은 하나의 최우선 전자메일을 기초로 전술한 과정을 수행하도록 구성될 수도 있으며, 또는 우선도가 높은 복수의 우선 전자메일들을 기초로 전술한 과정을 수행하도록 구성될 수도 있다. 사용자가 기존의 메일 쓰레드를 삭제하고 메일을 작성하거나, 또는 회신 시점 차이에 따라 기존 메일의 내용이 일부 포함되지 않을 수 있기 때문에, 오분석을 방지하기 위하여 복수의 우선 전자메일들을 기초로 유사도를 평가하도록 구성되는 것이 보다 바람직하다. 그룹 내의 각 메일에 대한 우선도는 추출부(132)에서 산출되며, 메일 간에 유사도 산출은 평가부(133)에서 수행된다. 이러한 추출부(132)와 평가부(133)의 상세한 구성과 기능에 대해서는 후술하기로 한다. On the other hand, the clustering unit 131 according to another embodiment of the present invention can be configured to perform additional clustering more efficiently in cooperation with the extracting unit 132 and the evaluating unit 133. That is, the clustering unit 131 has a meaningful correlation based on the priority e-mail among the plurality of e-mails set as one group as a result of the primary clustering, that is, The unified combination of the user information included in the e-mails is compared with the user information included in the e-mail having a high priority, and it is determined whether or not the user information is added accordingly. According to the embodiment, the clustering unit 131 may be configured to perform the above-described process on the basis of one highest priority e-mail having the highest priority, As described above. Since the user may delete the existing mail thread and write the mail or the contents of the existing mail may not be included in part due to the difference in the point of time of the reply, the similarity degree is evaluated based on the plurality of priority e- It is more preferable to configure it. The priority for each mail in the group is calculated by the extracting unit 132, and the degree of similarity between the mails is calculated by the evaluating unit 133. [ The detailed configuration and functions of the extracting unit 132 and the evaluating unit 133 will be described later.

한편, 동일한 주제를 가진 복수의 전자메일이 하나의 그룹으로 클러스터링되면, 클러스터링부(131)는 이를 데이터 베이스(300)에 저장하고, 이는 전자메일 관리 서버(100)에서 접근 가능할 수 있다. On the other hand, if a plurality of e-mails having the same topic are clustered into one group, the clustering unit 131 stores the e-mails in the database 300, which can be accessed by the e-mail management server 100.

또한, 본 발명에 따른 클러스터링부(131)는 동일한 그룹에 속하는 전자메일들에 대하여 직관적으로 분류 그룹을 식별할 수 있는 별도의 표시(예를 들어 색상 표시)를 추가하여 출력하도록 구성될 수도 있다. 예를 들어, A 그룹에 속하는 메일들에 대해서는 파란색으로 식별표시를 부가하고, B 그룹에 속하는 메일들에 대해서는 노란색의 식별표시를 부가하여 표시할 수 있다. 물론, 디렉토리 자체를 그룹별로 분리하여 전자메일들을 분류하도록 구성될 수도 있음은 당업자에게 자명할 것이다.
In addition, the clustering unit 131 according to the present invention may be configured to intuitively output a separate indication (for example, a color indication) for identifying a classification group to e-mails belonging to the same group. For example, it is possible to add identification marks in blue to mails belonging to group A and add a yellow identification mark to mails belonging to group B to display them. Of course, those skilled in the art will appreciate that the directory itself may be configured to separate groups of emails to separate emails.

<추출부><Extraction section>

본 발명의 일 실시예에 따른 추출부(132)는 데이터 베이스(300)에 접근하여, 클러스터링된 하나의 그룹 내의 각 전자메일들의 분량을 기준으로 가장 분량이 많은 전자메일을 기초 전자메일로 결정하여 생성부(134)로 출력하는 기능을 수행하게 된다. 여기서 기초 전자메일이란 동일 주제에 대한 요약본의 기초가 되는 전자메일을 의미하며, 전자메일의 분량이 많을수록 관련된 내용을 많이 포함하고 있으므로 원칙적으로 분량을 기준으로 기초 전자메일을 선정하게 된다. 물론, 실시예를 구성하기에 따라 이하에서 서술하는 "우선도"에 기초하여 가장 높은 우선도를 가지는 최우선 전자메일을 기초 전자메일로 선택하도록 구성될 수도 있다. The extracting unit 132 according to the embodiment of the present invention accesses the database 300 and determines the most e-mail as the basic e-mail based on the amount of each e-mail in the clustered group And outputs it to the generation unit 134. Here, the basic e-mail means an e-mail based on a summary of the same subject. As the amount of e-mail increases, the basic e-mail includes a lot of related contents. Of course, the embodiment may be configured to select the highest priority e-mail as the basic e-mail having the highest priority based on the "priority degree "

또한, 본 발명에 따른 추출부(132)는 클러스터링된 하나의 그룹 내의 각 전자메일들 각각에 대하여 우선도를 산출하게 된다. "우선도"는 동일 그룹으로 클러스터링된 메일 중, 요구되는 전자메일 요약본의 내용과 동일할 수 있는 정도를 나타내는 정보이다. "우선도"는 예를 들어, 전자메일의 분량, 수신/발신 시점, 사용자 수 중 하나 이상의 변수를 통해서 산출될 수 있다. 이러한 "우선도"의 산출에 이용되는 변수는 본 발명 분야의 당업자가 쉽게 알 수 있는 것을 포함할 수 있다. 추출부(132)에 의해 산출된 각 전자메일별 우선도는, 전술한 바와 같은 클러스터링부(131)에서의 사용자 정보의 추가 및 추가적인 클러스터링에 이용될 수 있다. 예를 들어, 클러스터링 시점으로부터 미리 설정된 기간 내(최근 메일로 볼 수 있는 기간)에 수신 또는 발신된 전자메일 중 메일의 분량에 따라 우선도를 산출하도록 구성되는 경우, 추출부(132)는 그룹 내의 전자메일들을 수발신 시점에 따라 정렬하고, 미리 설정된 기간 내에 속하는 전자메일들에 대하여 전자메일의 분량을 기준으로 재정렬함으로써 각 메일의 우선도를 산출할 수 있게 된다. 즉, 추출부(132)는 산출된 우선도를 이용하여 최근 메일 중에서 분량이 많은 메일을 식별/추출할 수 있게 된다. In addition, the extracting unit 132 according to the present invention calculates the priority for each of the e-mails in one clustered group. The "priority" is information indicating the degree to which the e-mail summarized in the same group can be the same as the content of the requested e-mail summary. The "priority degree" can be calculated through one or more variables, for example, the amount of e-mail, the time of reception / transmission, and the number of users. Variables used in the calculation of such "priority" may include those readily apparent to those skilled in the art of the present invention. The priority for each e-mail calculated by the extracting unit 132 can be used for addition and further clustering of user information in the clustering unit 131 as described above. For example, in a case where the priority is calculated according to the amount of mail among received or sent e-mails received or sent within a preset period from the clustering point (a period viewed in recent mail), the extracting unit 132 extracts, It is possible to calculate the priority of each mail by sorting the e-mails according to the time of sending and receiving and reordering the e-mails belonging to within a predetermined period based on the amount of the e-mail. That is, the extracting unit 132 can identify / extract the mail having a large amount of the recent mail by using the calculated priority.

한편, 전술한 바와 같이, 복수(예를 들어 3개)의 우선 전자메일을 이용하여 사용자 정보를 비교하고 필요한 경우 시드 키워드를 추가하여 클러스터링을 다시 수행하도록 구성된 실시예에 있어서, 추출부(132)는 산출된 우선도에 기초하여 최근 전자메일 중 분량 순으로 분량이 많은 우선 전자메일 3개를 추출하게 되며, 평가부(133)는 추출된 3개의 우선 전자메일 각각에 대한 그룹 내의 전자메일 각각의 유사도를 산출하게 된다. 클러스터링부(131)는 3개의 우선 전자메일 중 적어도 어느 하나와 미리 설정된 값(예를 들어 0.95) 이상의 유사도를 갖는 모든 전자메일의 사용자 정보의 합집합과 3개의 우선 전자메일의 사용자 정보의 합집합을 비교하고, 3개의 우선 전자메일의 사용자 정보의 합집합에 포함되지 않은 사용자 정보가 존재하는 경우 해당 사용자 정보를 시드 키워드에 추가한 후, 갱신된 시드 키워드를 기초로 다시 클러스터링을 수행하도록 구성된다. 이러한 방식으로 실시예가 구성되는 경우, 동일한 주제를 가진 메일이 오가는 도중에 새롭게 추가된 사용자의 정보를 반영할 수 있으므로, 보다 광범위하고 정확한 클러스터링이 수행될 수 있다는 장점이 있다. On the other hand, as described above, in the embodiment configured to compare user information by using a plurality of (for example, three) priority e-mails and to perform clustering again by adding a seed keyword if necessary, Based on the calculated priority, three priority e-mails, each of which has a large amount in descending order of the recent e-mails, and the evaluating unit 133 evaluates the priority e-mails of each of the three e- The degree of similarity is calculated. The clustering unit 131 compares the union of user information of all e-mails and the union of user information of three priority e-mails having a degree of similarity of at least a predetermined value (for example, 0.95) And if there is user information not included in the union of user information of three priority e-mails, the user information is added to the seed keyword, and then the clustering is performed based on the updated seed keyword. When the embodiment is configured in this way, the information of the newly added user can be reflected in the middle of the mail having the same theme, so that there is an advantage that more extensive and accurate clustering can be performed.

다른 한편으로, 실시예를 구성하기에 따라 추출부(132)는 산출된 우선도에 기초하여 우선도가 가장 높은 최우선 전자메일을 기초 전자메일로 결정하도록 구성될 수도 있다. 전술한 바와 같이, 우선도가 높을수록 본 발명에 따른 장치가 생성하고자 하는 전자메일 요약본의 내용과 동일할 가능성이 높은 것이므로, 추출부(132)는 최우선 전자메일을 기초 메일로 결정하도록 구성될 수 있다. 메일의 분량 및 수/발신 시점에 기초해 우선도가 산출되므로, 최근 수/발신된 메일 중 가장 분량이 많은 메일이 최우선 전자메일로 선택되어 기초 전자메일로 결정될 것이다.On the other hand, according to the embodiment, the extracting unit 132 may be configured to determine the highest priority e-mail having the highest priority based on the calculated priority as the basic e-mail. As described above, the higher the priority, the higher the likelihood that the device according to the present invention is the same as the content of the electronic mail summary to be generated. Therefore, the extraction unit 132 can be configured to determine the priority e- have. The priority level is calculated based on the amount of mail and the number / origin of the mail, so that the mail having the largest amount of the latest number / transmitted mail will be selected as the primary e-mail as the primary e-mail.

다만, 사용자는 답장을 발신하는 경우, 기존의 메일 쓰레드를 삭제하여 메일을 작성할 수 있다. 예를들어, 발신자 A의 메일에 대해서 수신자 B가 답장을 보내고, 이에대해 다시 발신자 A가 보낸 답장을 보내는 경우, 자신이 보낸 메일의 내용에 대해서는 삭제하고, 수신자 B에게 답장을 보낼 수 있다. 이러한 경우, 최우선 전자메일이 그룹 내의 메일들에 관련된 모든 내용을 담을 수 없다. 따라서, 잘못된 분석을 방지하기 위하여, 추출부(132)는 산출된 우선도에 기초하여, 우선도 순으로 복수의 우선 전자메일을 추출하도록 구성될 수도 있다. 복수의 우선 전자메일을 추출하여 기초 전자메일을 생성하도록 구성된 이러한 실시예에 있어, 추출부(132)는 추출된 복수의 우선 전자메일을 병합하여 기초 전자메일로 결정하도록 구성될 수 있다. 예를 들어, 3개의 우선 전자메일을 이용해 기초 전자메일을 생성하도록 설정된 경우, 추출부(132)는 각 전자메일의 우선도에 기초해 그룹 내 최근 수/발신된 메일 중 분량이 많은 순으로 3개의 우선 전자메일을 선택하고, 3개의 우선 전자메일의 내용을 중복되는 내용이 없도록 병합하여 기초 전자메일을 생성하게 된다. However, when sending a reply, the user can delete the existing mail thread and write the mail. For example, if recipient B replies to sender A's mail and sends back a reply from sender A, he can delete the contents of his or her mail and send a reply to recipient B. In this case, the top priority e-mail can not contain everything related to the mail in the group. Therefore, in order to prevent erroneous analysis, the extracting unit 132 may be configured to extract a plurality of priority e-mails in priority order based on the calculated priority. In this embodiment configured to extract a plurality of priority e-mails and generate a basic e-mail, the extracting unit 132 may be configured to merge the extracted plurality of priority e-mails into a base e-mail. For example, when the basic e-mail is set to be generated using three priority e-mails, the extracting unit 132 extracts the number of most recent e-mails sent / Mail, and merges the contents of the three priority e-mails so that there is no overlapping content, thereby generating the basic e-mail.

또한, 사용자가 3인이상인 경우 한명의 사용자에 대해서 2개 이상의 답장이 반복될 수 있다. 예를들어, 발신자 C가 수신자 D 및 E에게 전자 메일을 발신하고, D 및 E가 C에게 답장을 보내고, C는 D, E 각각에게 다시 답장을 보낼 수 있다. 이렇게 하면, 동일 주제에 대해서, 메일 쓰레드가 2개 이상 발생하는데, 이때, 전자메일 요약본은 이러한 각각의 메일 쓰레드의 내용을 모두 포함하는 것이 바람직하다. 이러한 경우, 각각의 쓰레드에서 가장 분량이 많은 메일을 복수개 선택하는 것이 바람직하다. 바람직하게, 사용자가 N명이라고 할때, 쓰레드가 N-1 개 발생 가능하므로, N-1 개의 우선도가 높은 메일을 추출할 수 있다. 예를들어, 하나의 그룹의 최초의 전자메일의 발신자가 1명, 수신자가 3명, 참조자가 2명인 경우에는, 사용자가 모두 6명이므로, 5개의 전자메일을 추출 할 수 있다. 또한, 사용자의 선택에 의하여, 미리 정해진 개수를 3개 또는 4개 등으로 고정할 수 있다. 추출부(132)는 우선도 순으로 뽑힌 N개의 전자 메일을 병합하여, 기초 전자메일을 생성할 수 있다. 이러한 기초 전자메일은 그룹 내 다른 메일들과의 유사도 비교를 통해 전자메일 요약본을 생성하는데 사용될 수도 있다. Also, if there are three or more users, two or more replies may be repeated for one user. For example, sender C may send an e-mail to recipients D and E, D and E may reply to C, and C may reply to D and E respectively. In this way, more than one mail thread occurs for the same subject, and at this time, it is preferable that the e-mail summary contains all the contents of each of these mail threads. In such a case, it is preferable to select a plurality of mail having the largest amount in each of the threads. Preferably, when the number of users is N, since N-1 threads can be generated, it is possible to extract N-1 high-priority mails. For example, if the first e-mail of a group has one sender, three recipients, and two referrers, five e-mails can be extracted because the users are all six persons. In addition, the predetermined number can be fixed to 3, 4, or the like by the user's selection. The extracting unit 132 can merge the N e-mails extracted in order of priority to generate the basic e-mail. This basic e-mail may also be used to generate an e-mail summary via comparison of similarities with other mails in the group.

추출부(132)는 이렇게 추출된 메일을 데이터베이스(300)에 저장하여, 전자메일 관리 서버(100)에서 접근할 수 있도록 한다. The extraction unit 132 stores the extracted mail in the database 300 so that it can be accessed by the electronic mail management server 100. [

한편, 본 발명에 따른 추출부(132)는 특정 전자메일의 내용이 이미지 파일인 경우, 이미지 처리를 통해 해당 이미지 내의 텍스트 데이터를 추출하여, 메일의 내용을 인식하도록 구성될 수도 있다. 이러한 이미지 기반 텍스트 추출기술 자체는 이미 공지된 기술을 이용하고 있는 바, 더 이상의 상세한 설명은 생략하기로 한다.
Meanwhile, if the content of the specific electronic mail is an image file, the extracting unit 132 according to the present invention may be configured to extract the text data in the image through image processing and recognize the content of the mail. This image-based text extraction technique itself uses already known techniques, and a detailed description thereof will be omitted.

<평가부><Evaluation section>

다음으로 본 발명의 일 실시예에 따른 평가부(133)는 추출된 미리 정해진 개수의 메일들 사이의 유사도를 평가한다. 본 발명의 명세서에서, 용어 "유사도"는 동일 그룹으로 클러스터링된 메일 사이에서, 서로 간에 내용이 유사한 정도를 나타내는 것이다. "유사도"는 바람직하게는, 전자메일의 세부 내용을 통해서 산출될 수 있다.Next, the evaluating unit 133 according to an embodiment of the present invention evaluates the similarity between the extracted predetermined number of mails. In the specification of the present invention, the term "similarity degree " indicates the degree of similarity between contents among mail clustered in the same group. The "similarity" can preferably be calculated through the details of the e-mail.

세부 내용을 통한 유사도 산출 방식은 전자 메일의 내용을 서로 비교하여, 동일한 부분과 동일하지 않은 부분을 구별하여 산출하는 방식을 의미할 수 있다. 예를들어, 단어 수가 각각 100개(메일 X), 80개(메일 Y)인 두 개의 메일의 유사도를 평가할 때, 동일한 부분의 단어 수가 40개라고 하면, 메일 X와 메일 Y의 유사도는 40/100=0.4로 계산될 수 있다. 여기서 100은 메일 Y와 메일 X 중에서 긴 메일의 단어수 이다. 즉, 유사도 = (양쪽 메일의 동일한 부분의 단어 수)/(양쪽 메일 중 긴 메일의 단어 수)로 나타낼 수 있다. 또한, 다른 방식으로서 각각의 메일에 있어 단어의 조건부 확률분포의 유사도를 이용하는 등의 공지된 다양한 유사도 산출 기법이 적용될 수 있음은 당업자에게 자명할 것이다. 전술한 바와 같은, 유사도 산출 방식은 단지 예시적인 것이고, 본 발명의 권리범위를 제한하는 의미로 해석되어서는 아니 되며, 본 발명의 기술분야에서 통상의 지식을 가진 자가 용이하게 생각할 수 있는 산출 방식도 포함한다. The similarity calculation method based on the detailed contents may mean a method of comparing the contents of the electronic mail with each other to distinguish the same part and the non-identical part. For example, when evaluating the similarity of two mail having 100 words (mail X) and 80 words (mail Y), the similarity degree between mail X and mail Y is 40 / 100 = 0.4. Here, 100 is the number of words of long mail among the mail Y and the mail X. That is, the degree of similarity = (the number of words in the same part of both mail) / (the number of words in long mail in both mail). It will be apparent to those skilled in the art that various known similarity calculation techniques, such as using the similarity of the conditional probability distribution of words in each mail, may be applied as another method. The similarity calculation method as described above is merely an example and should not be construed as limiting the scope of the present invention, and a calculation method that can be easily conceived by those skilled in the art .

전술한 바와 같이, 복수(예를 들어 3개)의 우선 전자메일을 이용하여 사용자 정보를 비교하고 필요한 경우 시드 키워드를 추가하여 클러스터링을 다시 수행하도록 구성된 실시예에 있어서, 평가부(133)는 추출부(132)에 추출된 3개의 우선 전자메일 각각에 대한 그룹 내의 전자메일 각각의 유사도를 산출하게 된다.As described above, in the embodiment configured to compare the user information using a plurality of (for example, three) priority e-mails and add the seed keyword if necessary to perform clustering again, The degree of similarity of each of the three electronic mail in the group for each of the three priority electronic mail extracted in the unit 132 is calculated.

또한, 본 발명에 따른 평가부(133)는 최종적인 전자메일 요약본을 생성하기 위하여, 추출부(132)에 의해 결정(생성)된 기초 전자메일과 클러스터링된 하나의 그룹 내에 포함된 전자메일들 간의 유사도를 산출하게 된다. 이렇게 산출된 메일별 유사도를 이용하여, 본 발명에 따른 장치는 기초 전자메일이 그룹 내 모든 전자메일의 내용을 전부 포함하고 있는지 여부를 판단하게 되며, 기초 전자메일에 포함되지 않은 내용이 있는 경우에는, 이를 기초 전자메일에 병합하여 전자메일 요약본을 생성하게 된다.
In addition, the evaluation unit 133 according to the present invention calculates the number of e-mails between the base e-mails determined (generated) by the extracting unit 132 and the e-mails included in one clustered group The degree of similarity is calculated. By using the mail similarity thus calculated, the device according to the present invention judges whether or not the basic e-mail contains all the contents of all the e-mails in the group. If there is content that is not included in the basic e- , And merges them into the basic e-mail to generate an e-mail summary.

<생성부><Generator>

다음으로 본 발명의 일 실시예에 따른 생성부(134)는 평가부(133)에서 산출되어 데이터베이스(300)에 저장된 그룹 내 메일별 유사도를 기초로, 전자 메일 요약본을 생성한다. 우선도가 가장 높은 최우선 전자메일(메일 P) 또는 분량이 가장 많은 전자메일이 기초 전자메일로 선정되며, 기초 전자메일과 일정 수준 이상의 유사도를 가진 메일에 대해서는 기초 전자메일에 해당 메일의 내용이 포함되어 있는 것으로 판단된다. 또한, 본 발명에 따른 생성부(134)는 생성된 전자 메일 요약본에 대하여 직관적으로 다른 전자메일들과 식별할 수 있는 별도의 표시(예를 들어 색상 표시)를 추가하도록 구성될 수도 있으며, 또는 다른 전자메일들과 구별될 수 있도록 미리 정해지거나 사용자에 의해 지정된 소정 위치(폴더)에 저장되도록 구성될 수도 있다. 도 8과 도 9의 단계(S1011), 단계(S1013), 단계(S1015), 단계(S1017)는 유사도를 이용하여 보다 구체적으로 전자 메일 요약본을 생성하는 방법에 대해서 나타낸다. 이에 대한 자세한 설명은 도 8 및 도 9에 대한 설명에서 후술한다. Next, the generating unit 134 according to the embodiment of the present invention generates an electronic mail summary based on the similarity of the in-group mail stored in the database 300, which is calculated by the evaluation unit 133. [ The primary e-mail with the highest priority (e-mail P) or the e-mail with the highest amount is selected as the basic e-mail, and the basic e-mail contains the contents of the e-mail . In addition, the generating unit 134 according to the present invention may be configured to add a separate indication (e.g., a color indication) that can be intuitively identified with other e-mails for the generated e-mail summary, And may be configured to be distinguished from e-mails or stored in a predetermined location (folder) designated by the user. Steps S1011, S1013, S1015, and S1017 in Figs. 8 and 9 illustrate a method for generating an e-mail summary using the similarity. A detailed description thereof will be described later with reference to FIGS. 8 and 9. FIG.

또한, 본 발명에 따른 생성부(134)는 그룹 내의 전자메일들에 첨부되어 있는 모든 첨부파일을 생성된 전자메일 요약본의 첨부파일로서 추가하도록 구성될 수 있다. 이러한 경우, 사용자는 전자메일 요약본을 통하여 동일한 주제를 가지는 모든 메일의 내용을 확인할 수 있는 동시에, 편리하게 첨부파일을 참조할 있다는 효과를 기대할 수 있다.
Further, the generating unit 134 according to the present invention can be configured to add all the attachments attached to the e-mails in the group as an attachment of the generated e-mail summary. In this case, the user can confirm the contents of all the mail having the same subject through the e-mail summary, and at the same time, the effect of referring to the attached file conveniently can be expected.

도 3에서의 클러스터링부(131), 추출부(132), 평가부(133) 및 생성부(134)는 물리적으로 하나의 기계 내에 구현될 수도 있고 일부 또는 그 각각이 물리적으로 다른 기계에 구현될 수도 있다. 이렇듯 본 발명은 각 구성요소가 설치된 기계의 물리적인 개수 및 위치에 한정되지 않고 다양한 방식으로 설계 변경될 수 있음은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 자명하다.
The clustering unit 131, the extracting unit 132, the evaluating unit 133 and the generating unit 134 in FIG. 3 may be physically implemented in one machine or a part or each of them may be physically implemented in another machine It is possible. It will be apparent to those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

도 4 내지 도 5은 본 발명의 일 실시예에 따른 하나의 그룹으로 클러스터링 될 수 있는 전자메일들의 예시도이다. 또한, 도 6은 본 발명의 일 실시예에 따른 도 4 및 도 5로부터 추출된 동일 주제 전자메일 요약본을 도시한 예시도이다. Figures 4 through 5 are illustrations of e-mails that may be clustered into a group according to an embodiment of the present invention. FIG. 6 is an exemplary diagram illustrating an e-mail summarized in the same topic extracted from FIGS. 4 and 5 according to an embodiment of the present invention.

도 4를 참조하면, 여름 상품 목록에 대하여 Mr.A, Mr.K, Mr.B 간에 <내용 1>, <내용 2>, <내용 3>을 주고 받을 것을 알 수 있다. 이러한 메일의 참조인은 Mr.D, Mr.R이다. 최초의 원본 메시지는 Mr.B가 Mr.A, Mr.K에게 <내용 3>을 보냈다. 이에 대해서, Mr.K가 Mr.A, Mr.B에게 <내용 2>를 보냈다. 이에 대하여 Mr.A가 Mr.K, Mr.B에게 <내용 1>을 보냈다. Referring to FIG. 4, it can be seen that <Contents 1>, <Contents 2>, and <Contents 3> are exchanged between Mr.A, Mr.K, and Mr.B regarding the summer product list. The references to these messages are Mr.D, Mr.R. The original original message was sent by Mr.B to Mr. A, Mr.K. Mr.K sent <Contents 2> to Mr.A and Mr.B about this. In response, Mr.A sent Mr. Contents 1 to MrK and Mr.B.

도 5를 참조하면, 여름 상품 목록에 대해서 Mr.P, Mr.X를 수신자로 하여, <내용 4>에 대한 메일을 보냈다. Referring to FIG. 5, Mr.P and Mr.X are regarded as recipients of the summer commodity list, and an e-mail has been sent to <Contents 4>.

도 6을 참조하면, 도 4와 도 5의 상술한 도 3의 장치를 통하여, 도 6과 같은 동일 주제 전자메일 요약본을 추출할 수 있다. 이러한 도 6의 전자메일 요약본은 발신일 순서로 <내용 1> 내지 <내용 4>를 수신시간 순서로 포함하므로, 사용자는 여름 상품 목록과 관련하여 주고, 받은 메일의 내용을 쉽게 알 수 있다.
Referring to FIG. 6, an e-mail summary of the same topic as FIG. 6 can be extracted through the apparatus of FIG. 3 described above with reference to FIG. 4 and FIG. The e-mail summary of FIG. 6 includes <Contents 1> to <Contents 4> in the order of the reception time so that the user can easily understand the content of the received mail in relation to the summer product list.

전자메일 요약본 생성 방법 예시Example of how to create an e-mail summary

다음으로, 도 7 내지 도 9을 참조하여 본 발명의 일 실시예에 따라 전자메일 관리 서버(100)에서 수행되는 전자 메일 요약본 생성 방법에 대해 설명하도록 한다. Next, a method of generating an electronic mail summary which is performed by the electronic mail management server 100 according to an embodiment of the present invention will be described with reference to FIGS. 7 to 9. FIG.

도 7에 도시된 본 발명의 실시예에 따르면, 전자메일 요약본 생성 장치(130)는 데이터 베이스(300)에서 전자메일을 수집하여 클러스터링을 수행하기 위한 시드 키워드(seed keyword)를 설정한다(S1001). 전술한 바와 같이, 전자메일 제목으로부터 추출되는 하나 이상의 명사를 포함하는 제목과 관련된 시드 키워드와 전자메일에 포함된 사용자 정보에 관한 시드 키워드가 추출되어 클러스터링 시드 키워드로 설정된다. According to the embodiment of the present invention shown in FIG. 7, the e-mail summary generation apparatus 130 collects e-mails from the database 300 and sets a seed keyword for performing clustering (S1001) . As described above, the seed keyword related to the title including one or more nouns extracted from the e-mail title and the seed keyword related to the user information included in the e-mail are extracted and set as the clustering seed keyword.

시드 키워드가 설정되면, 전자메일 요약본 생성 장치(130)는 설정된 시드 키워드를 이용하여 전자 메일들을 클러스터링한다(S1003). 전자메일 요약본 생성 장치(130)는 전자메일들을 클러스터링을 통해 주제별로 그룹을 만들고, 동일한 주제를 갖는 전자메일들을 하나의 동일한 그룹으로 분류한 후, 데이터 베이스(300)에 저장한다. When the seed keyword is set, the electronic mail summary generating apparatus 130 clusters the electronic mail using the set seed keyword (S1003). The electronic mail summary generating apparatus 130 groups groups of electronic mail through subject clustering, classifies electronic mail having the same subject into one same group, and stores the classified electronic mail in the database 300.

전자메일들이 주제에 따라 그룹별로 분류된 후, 전자메일 요약본 생성 장치(130)는 동일 그룹 내의 전자메일들에 대하여 우선도를 산출하여, 메일별 우선도를 설정한다(S1005). 전술한 바와 같이, 우선도의 산출은 메일의 분량, 단어 수, 수신 및/또는 발신 시점 등의 정보에 기초할 수 있다. After the e-mails are classified into groups according to the topic, the e-mail summary file generation device 130 calculates the priority for the e-mails in the same group and sets the priority for each mail (S1005). As described above, the calculation of the degree of priority can be based on information such as the amount of mail, the number of words, and the time of reception and / or transmission.

그룹 내 모든 전자메일에 대하여 우선도가 산출되면, 전자메일 요약본 생성 장치(130)는 우선도가 높은 순서대로 미리 정해진 개수의 우선 전자메일을 추출한다(S1007). 전술한 바와 같이, 전자메일 요약본 생성 장치(130)는 추출된 복수의 우선 전자메일의 내용을 중복되지 않게 하나의 메일로 병합하여 기초 전자메일을 생성할 수도 있다. 또는 실시예를 구성하기에 따라 우선도가 가장 높은 최우선 전자메일을 기초 전자메일로 결정하거나, 또는 단순히 분량이 가장 많은 메일을 기초 전자메일로 결정하도록 구성될 수 있음은 이상에서 설명한 바와 같다. When the priority levels are calculated for all the e-mails in the group, the e-mail summary file generation device 130 extracts a predetermined number of priority e-mails in descending order of priority (S1007). As described above, the e-mail summary file generation device 130 may generate the basic e-mail by merging the extracted contents of the plurality of priority e-mails into one e-mail without overlapping. Alternatively, the priority e-mail having the highest priority may be determined as the basic e-mail, or simply the e-mail having the largest amount may be determined as the basic e-mail according to the embodiment.

기초 전자메일이 결정된 후, 전자메일 요약본 생성 장치(130)는 기초 전자메일과 그룹에 속한 전자메일들 간의 유사도를 산출하고, 산출된 유사도에 기초하여 기초 전자메일이 그룹 내 전자메일들의 내용을 모두 포함하고 있는지 여부를 판단한다(S1009). After the basic e-mail is determined, the e-mail summary file generation device 130 calculates the degree of similarity between the basic e-mail and the e-mails belonging to the group, and based on the calculated degree of similarity, (S1009).

판단한 결과, 기초 전자메일이 그룹 내 전자메일들의 내용을 모두 포함하고 있는 경우 전자메일 요약본 생성 장치(130)는 기초 전자메일을 전자메일 요약본으로 결정하여 저장하고, 기초 전자메일이 그룹 내 전자메일들의 내용을 모두 포함하고 있지 않은 경우 전자메일 요약본 생성 장치(130)는 기초 전자메일에 포함되어 있지 않은 메일의 내용을 기초 전자메일에 병합함으로써 전자메일 요약본을 생성하고 저장한다(S1019). As a result of the determination, if the basic e-mail includes all the contents of the e-mails in the group, the e-mail summary generation device 130 determines and stores the basic e-mail as the e-mail summary, The e-mail summary generation apparatus 130 generates and stores an electronic mail summary by merging the contents of the electronic mail that is not included in the basic electronic mail into the basic electronic mail (S1019).

도 8 및 도 9는 도 7의 방법으로 전자 메일 요약본을 생성하는 경우에, 유사도를 기준으로 전자 메일 요약본을 생성하는 구체적인 방법을 설명하는 흐름도이다.8 and 9 are flowcharts illustrating a specific method of generating an electronic mail summary based on the similarity in the case of generating an electronic mail summary in the method of FIG.

도 8을 참조하면, 가장 높은 우선도를 가지는 최우선 전자메일과 설정된 값 이상의 유사도를 가지는 전자 메일들의 사용자 정보(아이디) 중에서 최우선 전자메일의 사용자 정보에 포함되지 않은 사용자 정보가 있는지 판단한다(S1011). 불포함된 사용자 정보가 있는 경우에는, 이를 시드 키워드에 포함하여, 단계(S1003)부터 단계(S1009)까지 반복 수행한다.Referring to FIG. 8, it is determined whether there is user information that is not included in the user information of the highest priority e-mail among user information (ID) of the highest priority e-mail and e-mails having a degree of similarity equal to or higher than a predetermined value (S 1011) . If the user information is not included, the seed keyword is included in the seed keyword, and the process is repeated from step S1003 to step S1009.

도 9를 참조하면, 우선도가 가장 높은 최우선 전자메일(기초 전자메일) 내에 상기 하나의 그룹 내에 있는 전자메일의 내용을 모두 포함하는지 판단한다(S1015). 모두 포함하는 경우에는 기초 전자메일을 전자 메일 요약본으로 생성하고(S1019), 그렇지 않은 경우에는 불포함된 메일의 내용을 병합하여(S1017), 단계(S1015)를 다시 실행한다.Referring to FIG. 9, it is determined whether all the contents of the e-mail in the one group are included in the highest priority e-mail (basic e-mail) having the highest priority (S1015). If all the e-mails are included, the basic e-mail is generated as an e-mail summary (S1019). Otherwise, the contents of the non-e-mail are merged (S1017) and step S1015 is executed again.

도 7 내지 도 9에 기재된 도면은, 본 발명의 일 실시예에 따른 전자 메일 요약본 생성 방법에 불과하므로, 해당 분야의 기술 수준 및 당업자의 기술 상식에 따라 다양한 방법에 의해 상기 도 7 내지 도 9의 하나 이상의 예시들을 결합하여 사용하거나, 동일 또는 유사한 기능을 수행할 수 있도록 변형하여 표현하거나 사용할 수 있는 것은 인지되어야 한다. 7 to 9 are merely a method of generating an e-mail summary according to an embodiment of the present invention. Accordingly, the method of FIG. 7 to FIG. 9 may be performed by various methods according to the technical level of a relevant field, It is to be appreciated that one or more examples may be used in combination, or may be modified or represented to perform the same or similar functions.

본 발명에 따른 실시예들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(Floptical disk)와 같은 자기-광 매체(megneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어부로서 작동되도록 구성될 수 있으며, 그 역도 마찬가지다. Embodiments according to the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as a hard disk, a floppy disk, and a magnetic tape; optical media such as CD-ROM and DVD; magnetic recording media such as a floppy disk; Includes hardware devices specifically configured to store and perform program instructions such as megneto-optical media and ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software components to perform the operations of the present invention, and vice versa.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. As described above, the present invention has been described with reference to particular embodiments, such as specific elements, and specific embodiments and drawings. However, it should be understood that the present invention is not limited to the above- And various modifications and changes may be made thereto by those skilled in the art to which the present invention pertains.

따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다. Accordingly, the spirit of the present invention should not be construed as being limited to the embodiments described, and all of the equivalents or equivalents of the claims, as well as the following claims, belong to the scope of the present invention .

Claims

A method for generating an electronic mail summary, which is performed by an electronic mail summary generating device,
(a) clustering a plurality of e-mails having the same subject into one group;
(b) determining a basic e-mail among the e-mails set as the group;
(c) calculating a degree of similarity between the basic e-mail and other e-mails in the group; And
(d) determining the basic e-mail as an e-mail summary if the degree of similarity is greater than a predetermined value based on the similarity calculated for each e-mail in the group,
The step (b)
an electronic mail having a maximum amount among the electronic mail set as the group; and ii) a priority degree of each electronic mail on the basis of the time and point of receipt and the amount of each electronic mail set as the group, and based on the calculated priority And the base e-mail is determined based on any one of the extracted at least one e-mail.

The method according to claim 1,
Wherein the step (a) performs clustering based on a seed keyword extracted from an electronic mail, and the seed keyword includes at least one of an electronic mail title, a sender ID, a recipient ID, and a reference ID. How to create a mail summary.

The method of claim 2,
Wherein the seed keyword extracted from the e-mail subject includes at least one noun extracted through morphological analysis of the e-mail subject.

The method according to claim 1,
The step (a)
(a1) extracting title information including at least one or more nouns from the title of the electronic mail through morphological analysis;
(a2) extracting user information including at least one of information of a recipient of the electronic mail, sender information, and reference information; And
(a3) classifying a plurality of e-mails having the same subject by using the title information and the user information as a seed keyword, and clustering the e-mails into one group.

The method of claim 4,
Wherein the step (a3) excludes e-mails received or transmitted before a preset period from the group.

A method for generating an electronic mail summary, which is performed by an electronic mail summary generating device,
(a) clustering a plurality of e-mails having the same subject into one group;
(b) determining a basic e-mail among the e-mails set as the group;
(c) calculating a degree of similarity between the basic e-mail and other e-mails in the group; And
(d) determining the basic e-mail as an e-mail summary if the degree of similarity is greater than a predetermined value based on the similarity calculated for each e-mail in the group,
The step (a)
(a1) extracting title information including at least one or more nouns from the title of the electronic mail through morphological analysis;
(a2) extracting user information including at least one of information of a recipient of the electronic mail, sender information, and reference information;
(a3) classifying a plurality of e-mails having the same subject by using the title information and the user information as seed keywords and clustering them into one group; And
(a4) compares the union of user information of all e-mails set in the group with the seed keyword, and if there is user information not included in the seed keyword, adds the non-included user information to the seed keyword And returning to the step (a3).

The method of claim 6,
The step (a4)
(a4-1) calculating priority of each e-mail based on the time and date of receipt and the amount of each e-mail set as the group;
(a4-2) selecting the highest priority e-mail having the highest priority based on the calculated priority;
(a4-3) calculating the similarity between the highest priority email and each email set as the group; And
(a4-4) comparing the user information combination of the highest priority e-mails with the user information combination of all e-mails having a degree of similarity higher than a predetermined value to the highest priority e-mails, And adding the non-included user information to the seed keyword if it is present, and then returning to the step (a3).

The method of claim 7,
Wherein the step (a4-2) selects a plurality of priority e-mails according to the setting in the order of higher priority on the basis of the calculated priority,
Wherein the step (a4-3) calculates the similarity between each of the plurality of priority emails and each email set as the group,
Wherein the step (a4-4) compares the union of user information of all e-mails having a degree of similarity equal to or higher than a preset value to any one of the plurality of priority e-mails and the union of user information of the plurality of priority e-mails, If the user information that is not included in the user information union of the e-mail exists, adds the non-included user information to the seed keyword, and then returns to the step (a3).

delete

The method according to claim 1,
Wherein the step (b) determines the highest priority e-mail having the highest priority as the basic e-mail.

The method according to claim 1,
Wherein the step (b) extracts a plurality of priority e-mails in descending order of priority, and merges the extracted plurality of priority e-mails to determine a basic e-mail.

The method according to claim 1,
Wherein the step (d) includes the step of, if there is an e-mail having a degree of similarity less than a predetermined value to the basic e-mail among the e-mails in the group, And generating an electronic mail summary by reflecting the electronic mail summary.

delete

The method according to claim 1,
The method of generating an electronic mail summary,
And outputting the same to the group set in the e-mail clustered into the group and outputting the same.

The method according to claim 1,
The method of generating an electronic mail summary,
And extracting text data from the image file when the electronic mail is an image file.

The method according to claim 1,
Wherein the electronic mail summary includes all of the attached files attached to the electronic mail belonging to the group as an attached file.

A computer-readable recording medium having recorded thereon a program for carrying out the method of any one of claims 1 to 8, claims 10 to 12, and claims 14 to 16.

An apparatus for generating an electronic mail summary,
A clustering unit for clustering a plurality of e-mails having the same subject into one group;
An extracting unit for determining and extracting basic e-mail among the e-mails set as the group;
An evaluation unit for calculating a degree of similarity between the basic electronic mail and another electronic mail in the group; And
And a generation unit configured to determine the basic e-mail as an e-mail summary if the degree of similarity is equal to or greater than a predetermined value based on the degree of similarity calculated for each e-mail in the group,
The extracting unit
an electronic mail having a maximum amount among the electronic mail set as the group; and ii) a priority degree of each electronic mail on the basis of the time and point of receipt and the amount of each electronic mail set as the group, and based on the calculated priority And determines the basic e-mail based on any one of the extracted at least one e-mail.

19. The method of claim 18,
Wherein the clustering unit performs clustering based on a seed keyword extracted from an electronic mail, and the seed keyword includes at least one of an electronic mail title, a sender ID, a recipient ID, and a reference ID. Device.

The method of claim 19,
Wherein the seed keyword extracted from the e-mail title includes at least one noun extracted through morphological analysis of the e-mail subject.

19. The method of claim 18,
The clustering unit extracts title information including at least one or more nouns from morphological analysis of the title of the e-mail, and extracts user information including at least one of recipient information, sender information, and reference information of the e- Extracts a plurality of e-mails having the same subject by using the extracted title information and the user information as seed keywords, and clusters the e-mails into one group.

19. The method of claim 18,
Wherein the clustering unit excludes e-mails received or transmitted before a preset period from the group.

An apparatus for generating an electronic mail summary,
A clustering unit for clustering a plurality of e-mails having the same subject into one group;
An extracting unit for determining and extracting basic e-mail among the e-mails set as the group;
An evaluation unit for calculating a degree of similarity between the basic electronic mail and another electronic mail in the group; And
And a generation unit configured to determine the basic e-mail as an e-mail summary if the degree of similarity is equal to or greater than a predetermined value based on the degree of similarity calculated for each e-mail in the group,
The clustering unit extracts title information including at least one or more nouns from morphological analysis of the title of the e-mail, and extracts user information including at least one of recipient information, sender information, and reference information of the e- Extracting a plurality of e-mails having the same subject by using the extracted title information and the user information as seed keywords,
The clustering unit compares the union of user information of all e-mails set in the group with the seed keyword, adds the non-included user information to the seed keyword if user information not included in the seed keyword exists Wherein the clustering is performed again.

23. The method of claim 21,
Wherein the extracting unit calculates the priority of each electronic mail on the basis of the time and date of receipt and the amount of each electronic mail set as the group and selects the highest priority electronic mail having the highest priority based on the calculated priority,
Wherein the evaluation unit calculates the similarity between the highest priority email and each email set as the group,
Wherein the clustering unit compares user information union of all e-mails having a degree of similarity equal to or higher than a predetermined value to the highest priority e-mail and user information of the highest priority e-mail, and when there is user information not included in the user information of the highest priority e-mail Wherein the clustering is performed after adding the non-included user information to the seed keyword.

23. The method of claim 21,
Wherein the extracting unit calculates the priority of each e-mail based on the time and date of receipt and the amount of each e-mail set as the group, and based on the calculated priority, Select,
Wherein the evaluation unit calculates the degree of similarity between each of the plurality of priority e-mails and each of the e-mails set as the group,
The clustering unit compares the union of user information of all e-mails having a degree of similarity equal to or greater than a predetermined value to any one of the plurality of priority e-mails and the union of user information of the plurality of priority e-mails, And if the user information not included in the information union is present, adding the non-included user information to the seed keyword, and then performing clustering again.

delete

19. The method of claim 18,
Wherein the extracting unit determines the highest priority e-mail having the highest priority as the basic e-mail.

19. The method of claim 18,
Extracts a plurality of priority e-mails in descending order of priority, and merges the extracted plurality of priority e-mails into a basic e-mail.

19. The method of claim 18,
Wherein the generation unit reflects the contents of the electronic mail having the degree of similarity less than the preset value in the basic electronic mail when there is an electronic mail having a degree of similarity less than a predetermined value from the basic electronic mail among the electronic mail in the group And generating an electronic summary of the electronic mail.

delete

19. The method of claim 18,
Wherein the e-mail summary copy generation device displays the same in each group set in the e-mail clustered into the group and outputs the same.

19. The method of claim 18,
Wherein the electronic mail summary file generation device extracts text data from the image file when the electronic mail is an image file.

19. The method of claim 18,
Wherein the electronic mail summary generated by the generation unit includes all the attached files attached to the electronic mail belonging to the group as an attached file.