WO2003032107A2

WO2003032107A2 - Method and system for monitoring e-mail

Info

Publication number: WO2003032107A2
Application number: PCT/KR2002/001882
Authority: WO
Inventors: Bog-Ju Lee; Soon-Kyu Choi
Original assignee: Ecabin Inc.
Priority date: 2001-10-12
Filing date: 2002-10-09
Publication date: 2003-04-17
Also published as: KR100483602B1; AU2002362631A1; KR20030030720A; WO2003032107A3

Abstract

The e-mail monitoring method comprises the steps of classifying documents of the group into confidential documents or general documents; converting the document into a form applicable to a SVM algorithm; calculating a Hyper-Plane and a Support Vector by learning the documents with the SVM algorithm; sniffing the e-mail sent out from the group; converting the sniffed e-mail into the form applicable to the SVM algorithm; and applying the SVM algorithm to both the Support Vector calculated from a result by learning and the e-mail converted to a vector type and discriminating if the sniffed e-mail includes the confidential documents. Thereby, the present invention can provide the e-mail monitoring method and system which can monitor efficiently if the confidential documents of a group are sent out through e-mails by learning the concept of confidential documents and general documents automatically and then, classifying an e-mail on the basis of learning results.

Description

METHOD AND SYSTEM FOR MONITORING E-MAIL

FIELD OF THE INVENTION

The present invention relates in general to a method and ^' a system for monitoring e-mails, and more particularly, to providing the e-mail monitoring method and system which can monitor efficiently if confidential documents of a group are sent out through emails by learning the concept of confidential documents and general documents automatically and by classifying an e- mail on the basis of learning result.

BACKGROUND ART

E-mails through network are used not only for posting mails but also for sending files. It takes a little time to post e-mails to a recipient. E-mails are posted to many persons at once. Also, e-mails have an advantage to be stored as a data. For these reason, e-mails are used widely. In case executives and/or employees of an enterprise send the confidential documents by e-mail intentionally or not, the enterprise runs a risk of letting out its secret. Accordingly, the enterprise prepares a system for monitoring emails being sent out in which any confidential information is included. Therein, under the conventional security systems, administrators couldn't help having read words included in confidential documents and established database thereof and then discriminated if the confidential documents were included in the e- mails according as the sent e-mails included the words stored in the database. Then, they have classified separately the e-mails including the confidential documents and manage them.

But under these conventional security systems, administrators have reviewed the reading of the principle words in enormous enterprise documents by hand in everything. Accordingly, it is very tedious and time consuming. Thus, management cost increases.

Also, it is difficult for administrators to determine what words they are to read from confidential documents when they review confidential documents.

DISCLOSURE OF INVENTION Accordingly, the present invention has been made keeping in mind the above-described shortcoming and user's need, and an object of the present invention is to provide a method and a system for monitoring emails, and more particularly, the e-mail monitoring method and system which can monitor efficiently if the confidential documents of a group are sent out through e-mails by learning the concept of confidential documents and general documents automatically and classifying e-mails on the basis of learning results. This and other objects of the present invention may be accomplished by the provision of an e-mail monitoring method for monitoring an e-mail sent out from predetermined group comprising the steps of classifying documents of the group into confidential documents or general documents as a level of security demands; converting the document into a form applicable to a Support Vector Machine (SVM) algorithm; calculating a Hyper-Plane classifying the documents into the confidential documents or the general documents and a Support Vector which is a vector of a nearest document to the Hyper-plane by learning the documents with the SVM algorithm; sniffing the e-mail sent from an inside of the group to an outside; converting the sniffed e-mail into the form applicable to the SVM algorithm; and applying the SVM algorithm to both the Support Vector calculated from a result by learning and the e-mail converted to a vector type and discriminating if the sniffed e-mail includes the confidential documents.

Herein, the step of converting the document into a form applicable to the SVM algorithm can comprise the steps of reading words included in the document and the e-mails; converting the read words into prescribed values; and indicating the document and the e-mail as a vector type with the words converted into the prescribed values .

Preferably, the e-mail monitoring method further comprises the step of reporting an analyzed result after analyzing if the sniffed e-mail is the confidential document, so that the sent e-mail is monitored in real time.

Also, this and other objects of the present invention may be accomplished by the provision of a monitoring system for monitoring an e-mail sent out from predetermined group comprising a document database for storing documents in the group which are classified into confidential documents or general documents according to a level of security demands; a sniffer for sniffing the e-mail which is being sent out from an inside of the group; a e-mail database for storing the sniffed e-mail; a vector generator for converting words included in the document database and the e-mail database into a form applicable to a Support Vector Machine (SVM) algorithm; a vector database for storing vectors converted by the vector generator; a learner for learning the document of the document database converted by the vector generator with the SVM algorithm; a lea rning result database for storing a Hyper-Plane and a Support Vector which is learning results of the learner; a discriminator for discriminating if the sniffed e-mail is the confidential document by applying the SVM algorithm to the support vector calculated from the learning result and the e-mail converted to a vector type; and a report generator notifying a user of a discriminated result analyzed by the discriminator. BRIEF DESCRIPTION OF DRAWINGS

The present invention will be better understood and its various objects and advantages will be more fully appreciated from the following description taken in conjunction with the accompanying drawings, m which: FIG. 1 is a block diagram of an e-mail monitoring system according to the present invention;

FIG. 2 is a definite block diagram of the monitoring server m FIG. 1; and

FIG. 3 is a flow chart which describes an e-mail monitoring method on the basis of the e-mail monitoring system

MODES FOR CARRYING OUT THE INVENTION

Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the accompanying drawings, and the same configuration has the same number.

According to the description of FIG. 1 and FIG. 2, an e-mail monitoring system comprises an enterprise intranet 1 and a mail server 5 which is connected to each of client terminals 3 in the enterprise intranet via outside network. The outside network includes not only the Internet but also other networks such as LAN, WAN, PSTN(Public Switched Telephone Network), PSDN(Public Switched Data Network) , Cable Network, Wireless communications Network.

The enterprise intranet 1 comprises an e-mail monitoring server 2 to monitor if e-mails sent by the client terminal 3 via the enterprise intranet 1 or other network include any confidential document. The e-mail monitoring server 2 applies a Support Vector Machine (SVM) algorithm to learning process and discriminating process for classifying confidential documents. SVM (Support Vector Machines) is a new learning method introduced by V.Vapnik. The SVM is well founded in terms of computational learning theory and very open to theoretical understanding and analysis.

A text categorization method with the SVM is referred in abundant literatures such as Thorsten Joachims, Text Categorization with Support Vector Machine: Learning with Many Relevant Features, LS-8 Report 23, Dormund, 27, November, 1997 (Revised: 19, April, 1998); Joachims, T, A Probabilistic analysis of the rocchio algorithm with TF*IDF for text categorization, in International Conference on Machine Learning (ICML) , 1997; G.Salton and M.McGill, Introduction to Modern Information Retrieval, McGraw Hill, New York, 1983; J.Platt, "Fast Training of SVMs Using Sequential Minimal Optimization", to be published in Advances in Kernel Methods-Support Vector Machine Learning, B.Scholkopf, C.Burges and A. Smola, eds., MIT Press, Cambridge, Mass., 1998.

According to the text categorization method with the SVM algorithm, documents can be categorized as two types, for example as follows. At first, words are read from the categorized documents to be converted into prescribed values and each document is indicated as a vector form with the words converted into the prescribed values. As each document has many words, a coordinate system indicating the vectors of the documents is also consisted of multidimensional or the more space. If there are many learned documents, the dimension is much higher. If the documents are located according to the vector values of each document at this coordinate system, a Hyper-Plane classifying documents into two categories and Support vectors of being vectors of the nearest documents to the Hyper-plane are calculated. These series of process is obtained by application software with the SVM algorithm. The utility of the SVM theory can be confirm by the empirical data of the text categorization on the basis of the SVM referred in every literature. The e-mail monitoring server 2 of the e-mail monitoring system according to the present invention as shown FIG. 2 comprises a document indexer 11 for registering the documents classified into general documents or confidential documents according to a level of security demands of the employees and the executives, a document database 13 for storing the classified documents by the document indexer 11, a sniffer 19 for sniffing the e-mails sent from each of the client terminals 3 in the enterprise to the mail server 5, an e- mail database 21 for storing the sniffed e-mails, a vector generator 23 for converting the words included by the e-mails or the documents into vector types, a vector database 25 for storing the documents or e-mails converted into vector types, a learner 15 for learning the document converted into vector types by the vector generator 23, a learning result database 17 for storing learning result of learner 15, a discriminator 27 for discriminating if the sniffed e-mails are confidential documents by applying the SVM algorithm to the support vector calculated by learning and the vector type- converted e-mails, a discrimination result database 29 for storing the discrimination result, a report generator 31 for reporting the discrimination result of confidential documents and a controller 10 for controlling all above-described device.

The document indexer 11 registers the documents classified into the general documents or the confidential documents to the document database 13. The document indexer 11 is executed on the basis of web as software to register documents. If documents are subdivided into each division or each characteristic of the job and registered when documents are registered by the indexer 11, the accuracy of learning may increase.

Especially, m case contents of the confidential documents are various because the size of an organization is large, it is desirable to classify documents and register them for each division. In this case, the way that not general documents but only confidential documents are to be registered can be used. Thus, all documents except documents classified into the confidential documents are registered as the general documents. For example, if specific divisions, A, B and C registered only the confidential documents respectively, the confidential documents of A are documents registered m A as classified into the confidential documents and the general documents of A could be the confidential documents of B and C. In the same way, the general documents of B could be the confidential documents of A and C. In this way, each division can manage the document database 13 without registering the general documents separately.

The learner 15 learns the documents converted into vector types by the vector generator 23. That is, the learner 15 is applied to the documents converted into vector types by the vector generator 23 and with the SVM, calculates the Hyper-Plane and the Support Vector and then stores them in the learning result database 17, wherein the Hyper-plane classifies the vector type- converted document into the confidential documents or the general documents and the Support Vector is the vector of the nearest document to the Hyper-plane. The learner 15 can be operated by administrators of the e-mail monitoring server 2 insofar as documents are collected more than a predetermined amount . And the learner can be also operated automatically by every predetermined period. The sniffer 19 sniffs the e-mails sent out and store the sniffed emails in the e-mail database 21. Wherein, it's preferable that the sniffer 19 uses the technology to monitor network communication packets in the network and read packets only corresponding to the e-mails. And it is most desirable that the sniffer 19 is devised to minimize an alteration of network architecture and network load according to the network architecture of the enterprise by making a combined application of both TCP- Based Sniffing in the form of simple wiretap and ARP- Based Sniffing where a sniffer assumes the role of a logical gateway. The sniffer 19 can read all e-mails sent by protocols such as SMTP, P0P3 , HTTP (also including web mail) . Additionally, the sniffer 19 can read not only the document of an e-mail but also attached files.

The vector generator 23 read words from the documents which are stored in the document database 13 and the e-mail database 21 and the e-mails. Further, it converts the read words into the prescribed values. Then the Vector generator 23 converts the words converted into prescribed values into vector types applicable to the SVM algorithm.

The discriminator 27 discriminates if the sniffed e- mails are the confidential documents by applying the SVM algorithm to the support vector calculated from the learning result and the vector type-converted e-mails. And then, it stores the result thereof in the discrimination result database 29. By the way, in the case that each division registers different confidential documents and general documents respectively, there can be various learning model to be the criterion of confidential documents. In this case, the discriminator 27 applies the each learning model of respective division to the sniffed e-mails and discriminates the e-mails as the confidential documents even if there is only one confidential document among them.

On the other hand, the controller 10 reads selectively the confidential documents and the general documents stored m the document database 13 with the indexer at need as species and converts the documents into a form applicable to the SVM algorithm of the learner 15 and provides the learner 15 with the converted documents. Thereby, the controller 10 makes the learning result of the learner 15 stored in the learning result database 17 as a file. Wherein, the learning result is indicated as the Hyper-Plane classifying the vector type- converted document into the confidential documents or the general documents and the Support Vectors which are the vector of the nearest documents to the Hyper-Plane. Also, the controller 10 converts the e-mails which are sniffed by the sniffer 19 and stored m the e-mail database 21 into a form applicable to the SVM algorithm of the learner 15. Thereafter, the controller 10 makes the form thereof provided to the discriminator 27 and at the same time, makes the Hyper- Plane and the Support Vector which are stored in the learning result database 17 provided to the discriminator 27, whereby the controller 10 makes the discriminator 27 analyze if the sniffed e-mails are classified into the confidential documents.

The controller 10 makes the report generator 31 notify a user of the analysis result, that is, if the e- mails include the confidential documents, discriminated by the discriminator 27 and stored in the discriminating result database 29, whereby the controller 10 can monitor if the sent e-mails include confidential documents.

An e-mail monitoring process by the e-mail monitoring system hereof will be fully appreciated from the following description and FIG 3. To begin with, the learner 15 calculates the Hyper- Plane and the Support Vector by learning the confidential documents and general documents with the SVM algorithm (S10) , wherein the Hyper-plane classifies the vector type-converted document into the confidential documents or the general documents and the Support Vector is the vector of the nearest document to the Hyper-plane. Then, the Hyper-plane and the Support vectors thereof are stored in the learning result database 17 (S20) .

The e-mails sent to the outside of the enterprise are sniffed by the sniffer 19 and stored with the e-mail database 21 (S30) . The sniffed e-mails are converted into a form applicable to the SVM algorithm by the vector generator 23 (S40) . The discriminator 27 discriminates if the sniffed e-mails are confidential documents by applying the SVM algorithm to the support vector calculated from the learning result and the vector type- converted e-mails (S50) . If the e-mails thereof are discriminated as the confidential documents according to the analysis result of the discriminator 27, the result values thereof are stored in the discrimination result database 29 (S60) , otherwise the result values discriminated as general documents are stored in the discrimination result database 29 (S70) . The controller 10 shows the result values with all sorts of graphs by operating the report generator 31 (S80) .

Thus, the present invention provides the method and the system which learn the concept of confidential documents and general documents automatically with the SVM, sniff the sent e-mails and discriminate if the sniffed e-mails are confidential documents on the basis of learning results.

As stated above, the present invention can provide the e-mail monitoring method and system which can monitor efficiently if the confidential documents of a group are sent out through e-mails by learning the concept of confidential documents and general documents automatically and then, classifying an e-mail on the basis of learning results.

Although the preferred embodiments of the present invention have been disclosed for illustrative purpose, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.

Claims

WHAT IS CLAIMED IS:

1. An e-mail monitoring method for monitoring an e- mail sent out from predetermined group comprising the steps of : classifying documents of the group into confidential documents or general documents as a level of security demands ; converting the document into a form applicable to a Support Vector Machine (SVM) algorithm; calculating a Hyper-Plane classifying the documents into the confidential documents or the general documents and a Support Vector which is a vector of a nearest document to the Hyper-plane by learning the documents with the SVM algorithm; sniffing the e-mail sent from an inside of the group to an outside; converting the sniffed e-mail into the form applicable to the SVM algorithm; and applying the SVM algorithm to both the Support Vector calculated from a result by learning and the e-mail converted to a vector type and discriminating if the sniffed e-mail includes the confidential documents.

2. The e-mail monitoring method according to claim 1, wherein the step of converting the document into a form applicable to the SVM algorithm comprising the steps of: reading words included in the document and the e- mails ; converting the read words into prescribed values; and indicating the document and the e-mail as a vector type with the words converted into the prescribed values.

3. The e-mail monitoring method according to claim 1 further comprising the step of reporting an analyzed result after analyzing if the sniffed e-mail is the confidential document.

4. A monitoring system for monitoring an e-mail sent out from predetermined group comprising: a document database for storing documents in the group which are classified into confidential documents or general documents according to a level of security demands; a sniffer for sniffing the e-mail which is being sent out from an inside of the group; an e-mail database for storing the sniffed e-mail; a vector generator for converting words included in the document database and the e-mail database into a form applicable to a Support Vector Machine (SVM) algorithm; a vector database for storing vectors converted by the vector generator; a learner for learning the document of the document database converted by the vector generator with the SVM algorithm; a learning result database for storing a Hyper-Plane and a Support Vector which is learning results of the learner; a discriminator for discriminating if the sniffed e- mail is the confidential document by applying the SVM algorithm to the support vector calculated from the learning result and the e-mail converted to a vector type ; and a discrimination result database for storing a discriminated result of the discriminator.

5. The e-mail monitoring system according to claim 4 further comprising a report generator notifying a user of a discriminated result analyzed by the discriminator if the e-mail includes the confidential document.

6. The e-mail monitoring method according to claim 1, wherein the step of classifying documents of the group into confidential documents or general documents comprising the steps of: registering the confidential document respectively for each of divisions; and registering the document classified into the confidential document in other division except a pertinent division as the general document.

7. The e-mail monitoring method according to claim 6, wherein the step of discriminating if the sniffed e-mail includes the confidential document, comprising the step of discriminating the sniffed e-mail as the confidential document in the case that the SVM algorithm is applied into the Support Vector of which the confidential document is registered for each of the divisions and the e-mail converted into the vector type and thereby, the sniffed e-mail is analyzed and on the basis of an analysis result, the sniffed e-mail is discriminated as the confidential document of at least one division among the divisions.