KR100483602B1

KR100483602B1 - Method and system for monitoring e-mail

Info

Publication number: KR100483602B1
Application number: KR10-2001-0063063A
Authority: KR
Inventors: 이복주; 최순규
Original assignee: (주)이캐빈
Priority date: 2001-10-12
Filing date: 2001-10-12
Publication date: 2005-04-15
Also published as: KR20030030720A; WO2003032107A2; WO2003032107A3; AU2002362631A1

Abstract

본 발명은 소정 집단으로부터 유출되는 이메일을 모니터링하기 위한 이메일 모니터링 방법 및 모니터링 시스템에 관한 것이다. 본 모니터링 방법은, 상기 집단내의 복수의 문서를 보안요구수준에 따라 기밀문서와 일반문서로 분류하는 단계와; 상기 문서에 포함된 단어를 추출하는 단계와; 상기 추출된 각 단어를 소정의 값으로 변환하는 단계와; 상기 값으로 변환된 단어를 이용하여 상기 문서를 SVM알고리즘에 적용 가능한 벡터 형태로 표시하는 단계와; 상기 복수의 문서들을 SVM알고리즘을 이용하여 학습시켜 상기 기밀문서와 일반문서를 구획하는 경계면인 한계면(Hyper-Plane)과 상기 한계면에 가장 근접한 문서의 벡터인 서포트벡터(Support vector)를 산출하는 문서 학습단계와; 상기 집단 내에서 외부로 송신되는 이메일을 스니핑하는 단계와; 상기 스니핑한 이메일문서에 포함된 단어를 추출하는 단계와; 상기 추출된 각 단어를 소정의 값으로 변환하는 단계와; 상기 값으로 변환된 단어를 이용하여 상기 이메일을 SVM알고리즘에 적용 가능한 벡터 형태로 표시하는 단계와; 상기 SVM알고리즘에 학습결과 산출된 서포트벡터 및 벡터 형태로 변환된 이메일을 적용하여 상기 스니핑된 이메일의 기밀문서여부를 판별하는 단계를 포함하는 것을 특징으로 한다. 이에 의해, 일반문서와 기밀문서에 대한 개념을 자동으로 학습하고, 발송되는 이메일을 학습결과에 기초하여 분석함으로써, 집단내의 기밀문서가 이메일을 통해 유출되는 것을 효과적으로 모니터링할 수 있는 이메일 모니터링 방법 및 시스템을 제공할 수 있다.The present invention relates to an email monitoring method and a monitoring system for monitoring an email leaked from a certain group. The monitoring method includes the steps of: classifying a plurality of documents in the group into confidential documents and general documents according to security requirements; Extracting a word included in the document; Converting each of the extracted words into a predetermined value; Displaying the document in a vector form applicable to an SVM algorithm by using the word converted into the value; Learning the plurality of documents using the SVM algorithm to calculate a Hyper-Plane, which is a boundary that divides the confidential document and a general document, and a support vector, which is a vector of the document closest to the limit. Document learning step; Sniffing outgoing email within the population; Extracting words contained in the sniffed email document; Converting each of the extracted words into a predetermined value; Displaying the email in the form of a vector applicable to an SVM algorithm using the word converted into the value; And determining whether the sniffed e-mail is a confidential document by applying the support vector calculated in the learning result and the e-mail converted into the vector form to the SVM algorithm. Thereby, by automatically learning the concept of general documents and confidential documents, and by analyzing the e-mail sent based on the learning results, an email monitoring method and system that can effectively monitor the leakage of confidential documents in the group through the e-mail Can be provided.

Description

Email monitoring method and system {METHOD AND SYSTEM FOR MONITORING E-MAIL}

본 발명은 이메일 모니터링 방법에 관한 것으로서, 보다 상세하게는 일반문서와 기밀문서에 대한 개념을 자동으로 학습하고, 발송되는 이메일을 학습결과에 기초하여 분석함으로써, 집단내의 기밀문서가 이메일을 통해 유출되는 것을 효과적으로 모니터링할 수 있는 이메일 모니터링 방법 및 시스템을 제공하는 것이다.The present invention relates to an e-mail monitoring method, and more particularly, by automatically learning the concept of the general document and confidential documents, and by analyzing the e-mail sent based on the learning results, confidential documents in the group is leaked through the e-mail It is to provide an email monitoring method and system that can effectively monitor this.

네트워크를 통해 우편을 주고 받는 이메일(E-Mail)은 기본적인 우편기능 뿐만 아니라 파일을 전송하는 기능을 제공하고 있으며, 상대방에게 전달되는 시간이 짧고, 동시에 여러 사람에게 보낼 수 있으며, 데이터로서 보관될 수 있다는 장점 때문에 널리 사용되고 있다. E-mail, which sends and receives mail through the network, provides not only basic postal functions, but also the ability to transfer files. The e-mail can be sent to multiple people at the same time, sent to multiple people at the same time, and stored as data. It is widely used because of its advantages.

기업의 임원 또는 직원이 고의적이든 아니든 기업 내의 기밀문서를 이메일로 송부하는 경우 기업의 기밀정보가 바로 유출될 위험이 있다. 이에 따라, 기업에서는 외부로 발송되는 이메일에 기밀내용의 포함여부에 대해 모니터링할 수 있는 시스템을 마련하고 있다. 이러한 종래의 보안시스템은 관리자가 기밀문서에 포함되는 단어들을 수작업으로 추출하여 별도의 데이터베이스를 구축하고, 발송되는 이메일에 데이터베이스에 저장된 단어들이 포함되어 있는지 여부에 따라 이메일에 기밀문서가 포함되었는지 여부를 판별한다. 그런 다음 기밀문서가 포함된 이메일에 대해서는 별도로 분류하여 관리하게 된다. If a company's executives or employees deliberately or not send a confidential document by email, there is a risk of leaking the company's confidential information. As a result, companies have established a system to monitor the inclusion of confidential content in outgoing e-mails. Such a conventional security system manually extracts words included in a confidential document to establish a separate database, and determines whether the email contains a confidential document depending on whether the email contained in the database contains words stored in the database. Determine. Then, e-mails containing confidential documents are classified and managed separately.

그런데 이러한 종래의 보안시스템에서는 기업의 방대한 문서에서 중요 단어의 추출을 일일이 관리자가 검토하여 수작업으로 하기 때문에, 단어의 추출 작업이 매우 어려우며 소요시간도 오래 걸린다. 따라서, 관리비용이 상승하게 된다. However, in such a conventional security system, since the administrator examines and extracts important words from a large number of documents by hand, it is very difficult and takes a long time to extract the words. Therefore, the management cost increases.

또한 관리자가 기밀문서를 검토할 때 어떠한 단어를 추출해야 할 것인가를 결정하는 것도 매우 어려운 문제이다. It is also very difficult for the administrator to decide which words to extract when reviewing confidential documents.

따라서, 본 발명의 목적은, 일반문서와 기밀문서에 대한 개념을 자동으로 학습하고, 발송되는 이메일을 학습결과에 기초하여 분석함으로써, 집단내의 기밀문서가 이메일을 통해 유출되는 것을 효과적으로 모니터링할 수 있는 이메일 모니터링 방법 및 시스템을 제공하는 것이다.Therefore, an object of the present invention, by automatically learning the concept of the general document and confidential documents, and by analyzing the e-mail sent based on the learning results, it is possible to effectively monitor the leakage of confidential documents in the group through the e-mail It is to provide an email monitoring method and system.

상술한 목적은 본 발명에 따라, 소정 집단으로부터 유출되는 이메일을 모니터링하기 위한 이메일 모니터링 방법에 있어서, 상기 집단내의 복수의 문서를 보안요구수준에 따라 기밀문서와 일반문서로 분류하는 단계와; 상기 문서에 포함된 단어를 추출하는 단계와; 상기 추출된 각 단어를 소정의 값으로 변환하는 단계와; 상기 값으로 변환된 단어를 이용하여 상기 문서를 SVM알고리즘에 적용 가능한 벡터 형태로 표시하는 단계와; 상기 복수의 문서들을 SVM알고리즘을 이용하여 학습시켜 상기 기밀문서와 일반문서를 구획하는 경계면인 한계면(Hyper-Plane)과 상기 한계면에 가장 근접한 문서의 벡터인 서포트벡터(Support vector)를 산출하는 문서 학습단계와; 상기 집단 내에서 외부로 송신되는 이메일을 스니핑하는 단계와; 상기 스니핑한 이메일문서에 포함된 단어를 추출하는 단계와; 상기 추출된 각 단어를 소정의 값으로 변환하는 단계와; 상기 값으로 변환된 단어를 이용하여 상기 이메일을 SVM알고리즘에 적용 가능한 벡터 형태로 표시하는 단계와; 상기 SVM알고리즘에 학습결과 산출된 서포트벡터 및 벡터 형태로 변환된 이메일을 적용하여 상기 스니핑된 이메일의 기밀문서여부를 판별하는 단계를 포함하는 것을 특징으로 하는 이메일 모니터링 방법에 의해 달성된다.According to the present invention, there is provided an email monitoring method for monitoring an email leaked from a predetermined group, comprising: classifying a plurality of documents in the group into confidential documents and general documents according to security requirements; Extracting a word included in the document; Converting each of the extracted words into a predetermined value; Displaying the document in a vector form applicable to an SVM algorithm by using the word converted into the value; Learning the plurality of documents using the SVM algorithm to calculate a Hyper-Plane, which is a boundary that divides the confidential document and a general document, and a support vector, which is a vector of the document closest to the limit. Document learning step; Sniffing outgoing email within the population; Extracting words contained in the sniffed email document; Converting each of the extracted words into a predetermined value; Displaying the email in the form of a vector applicable to an SVM algorithm using the word converted into the value; And determining whether the sniffed e-mail is a confidential document by applying the support vector calculated in the learning result and the e-mail converted into the vector form to the SVM algorithm.

여기서, 상기 SVM알고리즘에 적용 가능한 형태로 변환하는 단계는, 상기 문서 및 이메일에 포함된 단어를 추출하는 단계와; 상기 추출된 단어를 소정의 값으로 변환하는 단계와; 상기 값으로 변환된 단어를 이용하여 상기 문서 및 이메일을 벡터 형태로 표시하는 단계를 더 포함하는 것이 가능하다.The converting into a form applicable to the SVM algorithm may include extracting a word included in the document and the email; Converting the extracted word into a predetermined value; It is possible to further include displaying the document and the e-mail in a vector form by using the word converted into the value.

또한, 스니핑된 상기 이메일이 기밀문서인지를 분석하여 그 결과치를 보고하는 단계를 더 포함하여 송신되는 이메일을 실시간으로 모니터링할 수 있도록 하는 것이 가장 바람직하다.In addition, it is most desirable to enable the real-time monitoring of the e-mail sent further comprising the step of analyzing whether the sniffed e-mail is a confidential document and reporting the result.

한편, 상기 목적은 본 발명의 다른 분야에 따르면, 소정 집단으로부터 유출되는 이메일을 모니터링하기 위한 이메일 모니터링 시스템에 있어서, 상기 집단내의 복수의 문서를 보안요구수준에 따라 기밀문서와 일반문서로 분류되어 저장되는 문서데이터베이스와; 상기 집단 내에서 외부로 송신되는 이메일을 스니핑하는 스니퍼와; 상기 스니핑된 이메일이 저장되는 이메일데이터베이스와; 상기 문서데이터베이스 및 상기 이메일데이터베이스에 포함된 단어를 추출하여 SVM알고리즘에 적용 가능한 벡터 형태로 변환하는 벡터변환기와; 상기 변환된 벡터가 저장되는 벡터데이터베이스와; 상기 벡터변환기에 의해 변환된 벡터 형태의 문서데이터베이스로부터의 상기 문서를 SVM알고리즘을 이용하여 기밀문서와 일반문서를 구분하는 한계면(Hyper-Plane)과 한계면에 가장 근접해 있는 문서의 벡터인 서포트벡터(Support vector)를 산출하는 문서학습기와; 상기 문서학습기로부터의 학습결과인 한계면과 서포트벡터가 저장되는 학습결과데이터베이스와; 상기 문서학습기에 의해 상기 SVM알고리즘의 학습결과 산출된 서포트벡터 및 상기 벡터변환기에 의해 벡터 형태로 변환된 이메일을 적용하여 상기 스니핑된 이메일이 기밀문서인지를 판별하는 판별기와; 상기 판별기로부터 판별된 결과치가 저장되는 판별결과데이터베이스를 포함하는 것을 특징으로 하는 이메일 모니터링 시스템에 의해서도 달성된다.On the other hand, according to another field of the present invention, in the email monitoring system for monitoring the e-mail leaked from a predetermined group, a plurality of documents in the group are classified and stored in accordance with the level of security requirements classified as confidential documents and general documents A document database; A sniffer sniffing outgoing emails within the population; An email database in which the sniffed email is stored; A vector converter for extracting a word included in the document database and the email database and converting the word into a vector form applicable to an SVM algorithm; A vector database storing the converted vector; A support vector which is a vector of a document that is closest to a limit plane (Hyper-Plane) and a limit plane that distinguishes a confidential document from a general document by using the SVM algorithm. A document learner for calculating (Support vector); A learning result database in which a limit plane and a support vector, which are learning results from the document learner, are stored; A discriminator for determining whether the sniffed e-mail is a confidential document by applying a support vector calculated by the document learner as a learning result of the SVM algorithm and an e-mail converted into a vector form by the vector converter; It is also achieved by the email monitoring system, characterized in that it comprises a determination result database in which the result value determined from the discriminator is stored.

이하에서는 첨부된 도면을 참조하여 본 발명에 대하여 상세히 설명한다. Hereinafter, with reference to the accompanying drawings will be described in detail with respect to the present invention.

본 이메일 모니터링 시스템은 도 1 및 도 2에 도시된 바와 같이, 내부 네트워크가 구축된 기업내부망(1)과, 외부 네트워크를 통해 기업과 연결된 메일서버(5)를 포함한다. 여기서 외부 네트워크는 인터넷을 의미하나, 기타 다른 네트워크 예를 들면 LAN, WAN, PSTN(Public Switched Telephone Network), PSDN(Public Switched Data Network), 케이블망, 무선통신망 등도 포함될 수 있다. As shown in Figs. 1 and 2, the e-mail monitoring system includes a corporate internal network 1 in which an internal network is established, and a mail server 5 connected to a corporation through an external network. Here, the external network means the Internet, but other networks may include, for example, a LAN, a WAN, a public switched telephone network (PSTN), a public switched data network (PSDN), a cable network, a wireless communication network, and the like.

기업내부망(1)은, 직원들이 사용하는 복수의 직원 단말기(3)와 내부 네트워크에 연결되어 직원 단말기(3)를 통해 발송되는 이메일에 기밀문서가 포함되어 있는지 여부를 모니터링하기 위한 이메일모니터링서버(2)를 포함한다. The corporate internal network (1) is an email monitoring server for monitoring whether a confidential document is included in an e-mail sent through the employee terminal (3) connected to the internal network and the employee terminal (3) used by the employee. It includes (2).

본 이메일링모니터링서버(2)는 기밀문서를 분류해 내기위한 학습과정 및 판단과정에 있어서, SVM(Support Vector Machine) 알고리즘을 적용하고 있다. SVM 알고리즘은 V.Vapnik에 의해 도입된 통계학습방법 이론으로서, 학습이론의 용어로 일반적으로 사용되고 있으며, 그 이론적인 이해와 분석은 다양한 부분에서 적용되고 있다. The e-mailing monitoring server 2 applies a support vector machine (SVM) algorithm in the learning process and the judgment process for classifying confidential documents. The SVM algorithm is a statistical learning method theory introduced by V. Vapnik, and is commonly used in terms of learning theory. The theoretical understanding and analysis is applied in various parts.

SVM 알고리즘을 이용한 문서 분류방법에 대해서는 Text Categorization with Support Vector Machined: Learning with Many Relevant Features, LS-8 Report 23, Thorsten Joachims, Dormund, 27, November, 1997(Revised : 19, April, 1998) 등의 연구논문과, A probabilistic analysis of the rocchio algorithm with tfidf for text categorizaton, In international Comference on Machne Learning(ICML), (Joachims, T. 1997); G.Salton and M. McGill, Introduction to Modern Informatio Retrieval, McGraw Hill, New York, 1983; J.Platt, "Fast Tranining of SVMs Using Sequential Miimal Optimizaton," tobe published in Advances in Kernel Methods-Support Vector Machine Learning, B.Scholkpf,C.Burges,and A. Smola,eds., MIT Press, Cambridge,Mass.,1998 등의 다수의 문헌에 소개되어 있다.Text Categorization with Support Vector Machined: Learning with Many Relevant Features, LS-8 Report 23, Thorsten Joachims, Dormund, 27, November, 1997 (Revised: 19, April, 1998). Paper and A probabilistic analysis of the rocchio algorithm with tfidf for text categorizaton, In international Comference on Machne Learning (ICML), (Joachims, T. 1997); G. Salton and M. McGill, Introduction to Modern Informatio Retrieval, McGraw Hill, New York, 1983; J. Platt, "Fast Tranining of SVMs Using Sequential Miimal Optimizaton," tobe published in Advances in Kernel Methods-Support Vector Machine Learning, B.Scholkpf, C. Burgs, and A. Smola, eds., MIT Press, Cambridge, Mass. And many other documents, such as 1998.

SVM알고리즘을 이용한 문서 분류방법에 따라, 문서를 두 가지 종류로 분류하여 분류하는 방법을 예시하면 다음과 같다. 먼저 두 가지 종류로 분류된 복수의 문서들에 포함된 단어들을 추출하여 추출된 단어를 소정의 값으로 변환하고, 소정의 값으로 변환된 단어를 이용하여 각각의 문서를 벡터 형태로 표시한다. 각각의 문서는 다수의 단어를 포함함으로, 문서의 벡터를 표시하는 좌표계 또한, 다차원 또는 그 이상의 공간으로 구성되며, 학습된 문서가 많을 경우 그 표시되는 차원은 더욱 광범위해진다. 이러한 좌표계에, 각 문서들이 갖는 벡터값에 따라 문서들을 위치시키면, 두 가지 종류로의 벡터화된 복수의 문서들을 구획하는 한계면(Hyper-Plane)과, 한계면에 가장 근접한 문서의 벡터인 서포트벡터(Support Vector)를 산출할 수 있다. 이러한, 일련의 과정은 SVM알고리즘이 적용된 소프트웨어에 의해 달성되며, 각종 문헌에 소개된 SVM에 입각한 문서분류작업의 실험적 데이터를 통해 SVM 이론의 유용성을 확인할 수 있다.According to the document classification method using the SVM algorithm, an example of classifying documents into two types is as follows. First, words included in a plurality of documents classified into two types are extracted and converted into a predetermined value, and each document is displayed in a vector form by using the converted word. Each document contains a number of words, so that the coordinate system representing the vector of the document is also composed of multidimensional or more spaces, and the displayed dimension becomes more extensive when there are many learned documents. In these coordinate systems, if documents are placed according to the vector value of each document, a hyper-plane that divides a plurality of vectorized documents into two kinds, and a support vector which is a vector of documents closest to the boundary plane (Support Vector) can be calculated. This series of processes is achieved by software with SVM algorithm, and the usefulness of SVM theory can be confirmed through experimental data of SVM-based document classification work introduced in various literatures.

한편, 본 발명에 따른 이메일 모니터링 시스템의 이메일모니터링서버(2)는 도 2에 도시된 바와 같이, 임원 또는 직원에 의해 보안요구수준에 따라 기밀문서와 일반문서로 분류된 문서를 등록하기 위한 문서등록기(11)와, 문서등록기(11)에 의해 분류된 문서가 저장되는 문서데이터베이스(13)와, 직원 단말기(3)로부터 메일서버(5)로 전송되는 이메일을 스니핑하여 저장하는 스니퍼(19)와, 스니핑한 이메일이 저장되는 이메일데이터베이스(21)와, 문서 및 이메일에 포함된 단어를 벡터형터로 변환하는 벡터변환기(23)와, 벡터 형태로 변환된 문서 및 이메일이 저장되는 벡터데이터베이스(25)와, 벡터변환기(23)를 통해 벡터 형태로 변환된 문서를 학습하는 문서학습기(15)와, 문서학습기(15)로부터의 학습결과가 저장되는 학습결과데이터베이스(17)와, SVM알고리즘에 학습결과 산출된 서포트벡터 및 벡터로 변환된 이메일을 적용하여 상기 스니핑된 이메일이 기밀문서인지를 판단하는 판별기(27)와, 이메일의 기밀문서 여부 판별결과가 저장되는 판별결과데이터베이스(29)와, 기밀문서 판별결과를 표시하는 리포터(31)와, 이들을 제어하는 제어부(10)를 포함한다. On the other hand, the e-mail monitoring server 2 of the e-mail monitoring system according to the present invention, as shown in Figure 2, a document register for registering documents classified into confidential and general documents according to the security requirements level by the executive or employee (11), a document database (13) storing documents classified by the document register (11), a sniffer (19) for sniffing and storing e-mails transmitted from the employee terminal (3) to the mail server (5); , An email database 21 for storing sniffed emails, a vector converter 23 for converting words in documents and emails into a vector formatter, and a vector database 25 for storing documents and emails converted in vector form And a document learner 15 for learning the document converted into the vector form through the vector converter 23, a learning result database 17 for storing the learning results from the document learner 15, and the SVM algorithm. A discriminator 27 for determining whether the sniffed email is a confidential document by applying the support vector and the email converted into the vector, a discrimination result database 29 for storing whether the email is a confidential document; And a reporter 31 for displaying the confidential document discrimination result, and a control unit 10 for controlling them.

문서등록기(11)는 임원 또는 직원에 의해 기밀문서와 일반문서로 분류된 문서를 문서 데이터베이스(13)에 등록한다. 문서등록기(11)는 문서 등록을 위한 소프트웨어로서 웹 기반으로 수행된다. 이러한 문서등록기(11)로 문서를 등록할 때 부서별 또는 업무 특성별로 문서를 세분화하여 등록하면 학습의 정확도를 높일 수 있다. 특히, 조직체의 크기가 커서 기밀문서의 내용이 다양한 경우에는 부서별로 문서를 분류하여 등록하는 것이 바람직하다. 이러한 경우, 각 부서별로 일반문서는 등록하지 아니하고 기밀문서만을 등록하는 방법을 사용할 수 있다. 따라서, 기밀문서로 분류된 문서 이외의 모든 문서들은 일반문서로 등록되도록 하는 것이다. 예를 들어, 특정부서인 A부, B부, C부가 제각각 기밀문서만을 등록하였다면, A부에 등록된 기밀문서는 A부에서 분류해 등록한 문서이며, A부의 일반문서는 B부에서 등록한 기밀문서와 C부에서 등록한 기밀문서를 사용하는 것이다. 같은 방법으로 B부의 일반문서는 A부에서 등록한 기밀문서 및 C부에서 등록한 기밀문서를 사용한다. 이러한 방법을 사용하는 경우, 각 부서는 일반문서를 따로 등록하는 수고 없이 문서데이터베이스(13)를 관리할 수 있다.The document register 11 registers documents classified into confidential documents and general documents in the document database 13 by an executive or employee. The document register 11 is performed on a web basis as software for document registration. When registering a document by the document register 11, by dividing and registering the document by department or work characteristics can increase the accuracy of the learning. In particular, when the size of the organization is large and the contents of the confidential document vary, it is desirable to classify and register the documents by department. In this case, the general document may not be registered for each department but only confidential documents may be registered. Therefore, all documents other than documents classified as confidential documents are to be registered as general documents. For example, if Part A, Part B, and Part C registered only confidential documents, the confidential documents registered in Part A are classified documents registered in Part A, and the general documents in Part A are confidential documents registered in Part B. Use confidential documents registered in Sections C and C. In the same way, the general documents of part B use the confidential documents registered in part A and the confidential documents registered in part C. Using this method, each department can manage the document database 13 without the trouble of registering the general document separately.

문서학습기(15)는 벡터변환기(23)를 통해 변환된 문서의 벡터를 입력받아 SVM알고리즘을 이용하여, 기밀문서와 일반문서를 구분하는 한계면(Hyper-Plane)과 한계면에 가장 근접해 있는 문서의 벡터인 서포트벡터(Support vector)를 산출하여 학습결과데이터베이스(17)에 저장한다. 문서학습기(15)는 메일모니터링서버(2)를 관리하는 관리자에 의해 문서가 일정 이상 수집된 시기에 작동되거나 또는 자동으로 일정 시간마다 작동되도록 할 수도 있다.The document learner 15 receives the vector of the converted document through the vector converter 23 and uses the SVM algorithm, the document that is closest to the limit plane (Hyper-Plane) and the limit plane that distinguishes a confidential document from a general document. A support vector, which is a vector of, is calculated and stored in the learning result database 17. The document learner 15 may be operated at a time when a document is collected more than a predetermined time by an administrator who manages the mail monitoring server 2 or automatically at a predetermined time.

스니퍼(19)는 외부로 발송되는 이메일을 스니핑하여 이메일데이터베이스(21)에 저장한다. 이때, 스니퍼(19)는 네트워크상에서 네트워크 통신 패킷을 모니터링하여 이메일에 해당하는 패킷만 추출하는 기술을 이용하는 것이 바람직하며, 단순 Wiretap하는 형식인 TCP Based Sniffing과 스니퍼(19)가 논리적인 게이트웨이로 존재하는 ARP-Based Sniffing의 두 가지 방식을 혼용함으로써, 회사의 네트웍 구조에 따라 네트웍의 구조 변경을 최소화하고 네트웍의 부하를 최소화할 수 있도록 설계하는 것이 가장 바람직하다. 스니퍼(19)는 SMTP, POP3, HTTP(웹메일 포함) 등의 프로토콜을 이용하여 송신되는 모든 이메일을 추출할 수 있다. 또한 이메일의 본문뿐만 아니라 첨부파일도 추출할 수 있다. The sniffer 19 sniffs outgoing emails and stores them in the email database 21. In this case, the sniffer 19 preferably monitors network communication packets on the network and extracts only packets corresponding to an e-mail, and TCP based sniffing and sniffer 19, which are simple wiretap forms, exist as logical gateways. It is most desirable to mix ARP-Based Sniffing in a way that is designed to minimize the change of network structure and the load of the network according to the company's network structure. The sniffer 19 can extract all emails sent using protocols such as SMTP, POP3, HTTP (including webmail). You can also extract attachments as well as the body of the email.

벡터변환기(23)는 각 단어에 해당하는 숫자코드테이블(미도시)을 저장하고, 문서데이터베이스(13) 및 이메일데이터베이스(21)에 저장된 문서와 이메일에서 단어를 추출하여, 추출된 각 단어를 상기 숫자코드테이블(미도시)에 기초한 해당 숫자코드로 변환하고, 숫자코드로 변환된 단어를 이용하여 각각의 문서와 이메일을 SVM에 입력할 수 있는 벡터 형태로 변환시킨다. The vector converter 23 stores a numeric code table corresponding to each word (not shown), extracts words from documents and emails stored in the document database 13 and the email database 21, and recalls each extracted word. The document is converted to a corresponding numeric code based on a numeric code table (not shown), and each document and an email are converted into a vector form that can be input to the SVM using the converted word.

판별기(27)는 SVM알고리즘을 이용하여 문서학습기(15)에 의해 SVM알고리즘의 학습결과 산출된 서포트벡터 및 벡터변환기(23)에 의해 벡터 형태로 변환된 이메일을 각각 비교하여, 상기 벡터 형태로 스니핑된 이메일이 기밀문서인지를 판별하고, 그 결과치를 판별결과데이터베이스(29)에 저장한다. 그런데, 각 부서별로 각기 다른 기밀문서 및 일반문서를 등록한 경우, 기밀문서여부의 판단기준이 되는 여러종류의 학습모델이 존재할 수 있다. 이러한 경우, 판별기(27)는 스니핑된 이메일을 각 부서별로 생성된 각각의 학습모델에 각각 적용하여 그 중에서 하나라도 기밀문서로 판별되면 그 이메일을 기밀문서로 판별하도록 한다.The discriminator 27 compares the support vector calculated by the document learner 15 with the learning result of the SVM algorithm using the SVM algorithm and the e-mail converted into the vector form by the vector converter 23, respectively, into the vector form. It is determined whether the sniffed email is a confidential document, and the result value is stored in the discrimination result database 29. However, when different confidential documents and general documents are registered for each department, there may be various types of learning models that serve as a criterion for determining whether confidential documents are used. In this case, the discriminator 27 applies the sniffed e-mail to each learning model generated for each department, and if one of them is identified as a confidential document, the e-mail is identified as a confidential document.

한편, 제어부(10)는 문서등록기를 통해 문서데이터베이스(13)에 저장된 기밀문서 및 일반문서를 필요에 따라 종류별로 독출하여, 벡터변환기(23)를 통해 문서학습기(15)의 SVM에 입력 가능한 형태로 문서를 변환한 후 문서학습기(15)에 제공함으로써, 문서학습기(15)로부터의 학습결과가 파일형태로 학습결과데이터베이스(17)에 저장되도록 한다. 이때, 학습결과는 SVM을 이용할 때 산출되는 기밀문서와 일반문서를 분류하는 한계면과 이에 가장 근접한 문서인 서포트벡터로서 표시된다.On the other hand, the control unit 10 is a form that can be read into the SVM of the document learner 15 through the vector converter 23 by reading the classified documents and general documents stored in the document database 13 through the document register, as needed. By converting the document into a document learner 15, the learning result from the document learner 15 is stored in the learning result database 17 in the form of a file. At this time, the learning result is displayed as the limit vector for classifying classified documents and general documents calculated when using the SVM and a support vector which is the closest document.

또한, 제어부(10)는 스니퍼(19)를 통해 스니핑되어 이메일데이터베이스(21)에 저장된 이메일을 벡터변환기(23)를 통해 문서학습기(15)의 SVM에 입력 가능한 형태로 변환한 후, 판별기(27)에 제공하는 한편, 학습결과데이터베이스(17)에 저장된 한계면 및 서포트벡터를 판별기(27)에 제공함으로써, 판별기(27)가 스니핑된 이메일이 기밀문서로 분류되는지 여부를 분석하도록 한다.In addition, the controller 10 may be sniffed through the sniffer 19 to convert the email stored in the email database 21 into a form that can be input to the SVM of the document learner 15 through the vector converter 23, and then the discriminator ( 27, while providing the limiter and the support vector stored in the learning result database 17 to the discriminator 27, the discriminator 27 analyzes whether the sniffed email is classified as a confidential document. .

제어부(10)는 판별기(27)에서 분석되어 판별결과데이터베이스(29)에 저장되는 이메일의 기밀문서여부 분석결과를 리포터(31)를 통해 사용자에게 표시함으로써, 송신되는 이메일의 기밀문서여부를 모니터링할 수 있도록 한다. The controller 10 monitors the confidential document of the e-mail transmitted by displaying the analysis result of the confidential document of the e-mail analyzed by the discriminator 27 and stored in the determination result database 29 to the user through the reporter 31. Do it.

이러한 이메일 모니터링시스템을 이용한 이메일의 모니터링과정을 도 3을 참조하여 설명하면 다음과 같다.The e-mail monitoring process using the e-mail monitoring system will now be described with reference to FIG. 3.

먼저, 기밀문서 및 일반문서의 단어를 추출하여 각 단어에 해당하는 숫자코드로 변환하고, 숫자코드로 변환된 단어를 이용하여 각 문서를 SVM에 입력할 수 있는 벡터 형태로 변환시킨다. 벡터 형태로 변환된 기밀문서 및 일반문서를 문서학습기(15)를 통해 SVM알고리즘을 이용하여 기밀문서와 일반문서를 구분하는 한계면(Hyper-Plane)과 한계면에 가장 근접해 있는 문서의 벡터인 서포트벡터(Support vector)를 산출하고(S10), 산출된 서포트벡터를 저장하여 데이터베이스화한다(S20).First, the words of confidential and general documents are extracted and converted into numeric codes corresponding to each word, and each document is converted into a vector form that can be input to the SVM using the words converted into numeric codes. The document which is a vector of the document which is closest to the limit plane (Hyper-Plane) and the limit plane that distinguishes the confidential document from the general document by using the SVM algorithm through the document learner (15) A support vector is calculated (S10), and the calculated support vector is stored and databased (S20).

집단의 외부로 송신되는 이메일을 스니퍼(19)로 스니핑하여 이메일데이터베이스(21)에 저장한다(S30). 스니핑된 이메일을 벡터변환기(23)를 통해 이메일의 각 단어를 해당 숫자코드로 변환하고, 숫자코드로 변환된 단어를 이용하여 이메일을 SVM에 입력 가능한 벡터 형태로 변환한다(S40). 판별기(27)는 SVM알고리즘의 학습결과 산출된 서포트벡터 및 벡터변환기(23)에 의해 벡터 형태로 변환된 이메일을 각각 비교하여, 벡터 형태로 스니핑된 이메일이 기밀문서인지를 판별한다(S50). 판별기(27)의 분석결과 이메일이 기밀문서로 판별되면 결과치를 저장하고(S60), 그렇지 않으면 보통문서로 판별한 결과치를 저장한다(S70). 제어부(10)는 리포터(31)를 작동시켜 결과치를 각종 도표 등을 이용하여 보여준다(S80). The email sent to the outside of the group is sniffed by the sniffer 19 and stored in the email database 21 (S30). The sniffed email is converted into a corresponding numeric code of each word of the email through the vector converter 23, and the email is converted into a vector form that can be input to the SVM using the converted word of the numeric code (S40). The discriminator 27 compares the support vector calculated by the learning result of the SVM algorithm and the email converted into the vector form by the vector converter 23, respectively, and determines whether the email sniffed in the vector form is a confidential document (S50). . If the result of the analysis of the discriminator 27 is determined to be a confidential document, the result value is stored (S60). Otherwise, the result value determined as the normal document is stored (S70). The controller 10 operates the reporter 31 to show the result using various charts and the like (S80).

이와 같이, 본 발명은 통계학습이론인 SVM을 적용하여 일반문서와 기밀문서에 대한 개념을 자동으로 학습하고, 발송되는 이메일을 스니핑하여, 스니핑된 이메일의 기밀문서 여부를 미리 학습된 데이터에 기초하여 판단하도록 하고 있다. As described above, the present invention applies the SVM, which is a statistical learning theory, to automatically learn the concepts of general documents and confidential documents, and to sniff outgoing emails, based on pre-learned data on whether the sniffed emails are confidential documents. To judge.

이상 설명한 바와 같이, 본 발명에 따르면 일반문서와 기밀문서에 대한 개념을 자동으로 학습하고, 발송되는 이메일을 학습결과에 기초하여 분석함으로써, 집단내의 기밀문서가 이메일을 통해 유출되는 것을 효과적으로 모니터링할 수 있는 이메일 모니터링 방법 및 시스템을 제공할 수 있다.As described above, according to the present invention, by automatically learning the concept of the general document and confidential documents, and by analyzing the e-mail sent based on the learning results, it is possible to effectively monitor the leakage of confidential documents in the group through the e-mail E-mail monitoring methods and systems can be provided.

도 1은 본 발명에 따른 이메일 모니터링 시스템의 구성도, 1 is a configuration diagram of an email monitoring system according to the present invention;

도 2는 도 1의 모니터링서버의 구체적 구성도, 2 is a detailed configuration diagram of the monitoring server of FIG.

도 3은 도 1의 이메일 모니터링 시스템에 의한 이메일 모니터링 과정을 도시한 흐름도이다. 3 is a flowchart illustrating an e-mail monitoring process by the e-mail monitoring system of FIG. 1.

* 도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

1 : 기업내부망 2 : 이메일모니터링서버 1: Internal corporate network 2: Email monitoring server

3 : 직원단말기 5 : 메일서버 3: employee terminal 5: mail server

10 : 제어부 11 : 문서등록기10 control unit 11: document register

13 : 문서데이터베이스 15 : 문서학습기13: Document Database 15: Document Learner

17 : 학습문서데이터베이스 19 : 스니퍼17: Learning Document Database 19: Sniffer

21 : 이메일데이터베이스 23 : 벡터변환기21: Email Database 23: Vector Converter

25 : 벡터데이터베이스 27 : 판별기25: vector database 27: discriminator

29 : 판별결과데이터베이스 31 : 리포터29: Discrimination result database 31: Reporter

Claims

An email monitoring method for monitoring an email leaked from a predetermined group,

Classifying a plurality of documents in the group into confidential documents and general documents according to security requirements;

Extracting a word included in the document;

Converting each of the extracted words into a predetermined value;

Displaying the document in a vector form applicable to an SVM algorithm by using the word converted into the value;

Learning the plurality of documents using the SVM algorithm to calculate a Hyper-Plane, which is a boundary that divides the confidential document and a general document, and a support vector, which is a vector of the document closest to the limit. Document learning step;

Sniffing outgoing email within the population;

Extracting words contained in the sniffed email document;

Converting each of the extracted words into a predetermined value;

Displaying the email in the form of a vector applicable to an SVM algorithm using the word converted into the value;

And determining whether the sniffed e-mail is a confidential document by applying the support vector calculated in the learning result and the e-mail converted into the vector form to the SVM algorithm.

The method of claim 1, wherein the converting to a form applicable to the SVM algorithm,

Extracting words contained in the documents and emails;

Converting each of the extracted words into a predetermined value;

And displaying the document and the email in a vector form by using the word converted into the value.

The method of claim 1,

Analyzing whether the sniffed email is a confidential document and reporting the result.

An email monitoring system for monitoring an email leaked from a certain group,

A document database storing the plurality of documents in the group into classified documents and general documents according to security requirements;

A sniffer sniffing outgoing emails within the population;

An email database in which the sniffed email is stored;

A vector converter for extracting a word included in the document database and the email database and converting the word into a vector form applicable to an SVM algorithm;

A vector database storing the converted vector;

A support vector which is a vector of a document that is closest to a limit plane (Hyper-Plane) and a limit plane that distinguishes a confidential document from a general document by using the SVM algorithm for the document from the document database of the vector form converted by the vector converter. A document learner for calculating (Support vector);

A learning result database in which a limit plane and a support vector, which are learning results from the document learner, are stored;

A discriminator for determining whether the sniffed e-mail is a confidential document by applying a support vector calculated by the document learner as a learning result of the SVM algorithm and an e-mail converted into a vector form by the vector converter;

And a determination result database in which the result value determined from the discriminator is stored.

The method of claim 4, wherein

And a reporter for displaying a result of determining the confidential document of the email analyzed through the discriminator.

The method of claim 1,

Classifying the confidential document and the general document,

Registering confidential documents for each of the plurality of departments;

And registering a document classified as a confidential document by a department other than the corresponding department as a general document.

The method of claim 6,

The step of determining whether the confidential document,

The sniffing of the confidential documents registered by the plurality of departments is applied to each of the sniffed emails by applying a support vector and an email converted into a vector form, and when the comparison result is determined to be a confidential document of at least one of the departments. E-mail monitoring method characterized in that the step of judging the e-mail as a confidential document.