(1), technical field: the present invention relates to a kind of Email method of reducing, particularly relate to a kind of web mail method of reducing based on the http protocal analysis.
(2), background technology: Email is that Inernet occurs just beginning one of network application that occurs in early days, and along with the continuous development of Internet, it remains the indispensable service in the Internet, is that people use one of the most frequent Internet service.When E-mail service offers convenience to people, also brought great challenge to information security.The side effect such as divulge a secret of illegal mail, network brings direct economic loss to enterprise.The transmission approach of current email mainly contains two kinds: send and send through the SMTP mode through the web mode.Because the mail (being called for short the web mail) that sends through the web mode passes through browser operation, need not configuration, and is simple to operate, also need not to install additional software, becomes the main mode that people use Email gradually.Through analysis to the http agreement, according to various mail features to the web mail monitor in the Network Transmission stage, reduction, record, make the network supervision personnel can in time recognize the details of person under surveillance institute send Email intuitively.That existing mail method of reducing is realized is complicated, discovered easily, and the email type that can monitor is single, can not realize the effective monitoring to mail under the complex network service environment now.
(3), summary of the invention:
The technical problem that the present invention will solve is: overcomes the defective of prior art, provides a kind of and can effectively carry out the monitoring of web mailbox service, and to the little web mail method of reducing based on the http protocal analysis of network performance influence.
Technical scheme of the present invention:
A kind of web mail method of reducing based on the http protocal analysis, contain the following step:
Step a: respectively the mailbox of different E-mail service is analyzed; Make up in every kind of mailbox e-mail messages feature database based on the web mail of http; Information in the e-mail messages feature database contains sender, addressee, theme, content and annex; According to the length of every information, for every information setting begins feature string and finishes feature string, the beginning feature string matees with the end feature string each other;
The implementation method of present main email service provider but is not quite similar, and can not use identical method to extract the various information of mail, so it is particularly important that different web mails is carried out labor.
Step b: begin feature string and finish feature string through the bmh algorithmic match; Thereby location sender, addressee, theme, content and annex send the position in the session at mail, are that unit puts into the mail record data structure with the mail with the information that navigates to then;
Step c: registration netfilter Hook Function in the linux kernel module; To the connection of the sending buffer memory of recombinating, and carry out the e-mail messages feature database that the application protocol analysis sets up according to various mails different email types is carried out mark based on the web mail of http; For example: the mailbox mail of Sina is labeled as 1, and the mailbox mail of Netease is labeled as 2, and the mailbox mail of Yahoo is labeled as 3.
Steps d: create kernel thread; Mail session data to the reorganization buffer memory are followed according to the email type mark mail are reduced; The method of reduction is from step c reorganization data in buffer, to extract effective e-mail messages and be kept in the mail record data structure according to the described e-mail messages feature database of step a, then the mail in the mail record data structure is write the mail record file by the record format of an envelope mail delegation;
Step e: user's attitude program is carried out regular processing to the mail record file: read the mail record file and decode according to the different coding type, the coded system of the unified UTF-8 of use is inserted daily record receipt storehouse with decoded mail record file then;
Step f: the system foreground represents the mail in the daily record receipt storehouse intuitively, finally accomplishes the reduction of mail.
The mailbox of different E-mail service contains the mailbox of identical electronic mail service merchant's different editions among the step a.
Contain session id, addressee, sender in the mail record data structure among the step b, make a copy for, closely give, theme, content and annex.
Method of discrimination based on the web mail of http is: the transmission of mail connects that the POST method that is by http realizes; A host during connecting, each POST all can be arranged: row or referer: OK; Whether differentiate according to the content of these two row is the web mail based on http; Can avoid recording processing like this, improve system handles efficient other non-web mail POST data; Concrete determination methods is: the e-mail messages feature database that obtains according to step a is to getting access to the host of POST in connecting: row or referer: the row content is mated; Thereby confirm that whether this connection is that the web mail sends connection, and this sends the web mail protocol that connection is adopted.
What transmit the use of web mail is the http agreement, generally speaking, can use 2 to 3 http when sending mail through the web mode and connect, and connects and can whole Mail Contents be sent to server through these 2 to 3 http usually.Http connects and to contain the user and login the annex that connections, mail send under connection and the independent transmission mode of annex and upload connection.Mail sends to connect to transmit sends out people, addressee, theme, text etc., and annex is uploaded and connected filename and the file content that transmits annex.
The http connection that belongs to same session all has a same session sign (sid value), is used for the different e-mail messages of association so in http connects, mate " sid=" and take out its value preservation.
The method of reorganization buffer memory is among the step c: will belong to a mail and send the datagram buffer memory that connects, and judge that when the tcp that receives this connection finishes link information connecting buffer memory accomplishes, and will recombinate to data in buffer then; The method of reorganization is to the packet location of sorting according to the sequence number (seq) of data cached bag and length (length); The principle of ordering location is: the sequence number (seq) of next bag adds the length of the continuous bag of going forward for the sequence number (seq) of its preceding continuous bag.
The record format of every envelope mail is in the steps d: sender, addressee, make a copy for, closely give, mail matter topics, Mail Contents, source IP address, purpose IP address, session id, transmitting time.
The result of mail reduction is as shown in the table:
Beneficial effect of the present invention:
1, the present invention has favorable expansibility to the web mail monitoring based on http.Through to protocal analysis, can be easy to satisfy under the Internet technology high-speed development to the monitoring demand of new web mailbox service the potential safety hazard of effectively controlling inner mail and being brought based on the web mail of http.
2, main reduction of the present invention is handled and in linux kernel, is carried out, and message is not carried out comprehensive protocol assembly, in network environment at a high speed, has reduced the influence to network performance effectively.
(4), embodiment:
Web mail method of reducing based on the http protocal analysis contains the following step:
Step a: respectively the mailbox of different E-mail service is analyzed; Make up in every kind of mailbox e-mail messages feature database based on the web mail of http; Information in the e-mail messages feature database contains sender, addressee, theme, content and annex; According to the length of every information, for every information setting begins feature string and finishes feature string, the beginning feature string matees with the end feature string each other;
The implementation method of present main email service provider but is not quite similar, and can not use identical method to extract the various information of mail, so it is particularly important that different web mails is carried out labor.
Step b: begin feature string and finish feature string through the bmh algorithmic match; Thereby location sender, addressee, theme, content and annex send the position in the session at mail, are that unit puts into the mail record data structure with the mail with the information that navigates to then;
Step c: registration netfilter Hook Function in the linux kernel module; To the connection of the sending buffer memory of recombinating, and carry out the e-mail messages feature database that the application protocol analysis sets up according to various mails different email types is carried out mark based on the web mail of http; For example: the mailbox mail of Sina is labeled as 1, and the mailbox mail of Netease is labeled as 2, and the mailbox mail of Yahoo is labeled as 3.
Steps d: create kernel thread; Mail session data to the reorganization buffer memory are followed according to the email type mark mail are reduced; The method of reduction is from step c reorganization data in buffer, to extract effective e-mail messages and be kept in the mail record data structure according to the described e-mail messages feature database of step a, then the mail in the mail record data structure is write the mail record file by the record format of an envelope mail delegation;
Step e: user's attitude program is carried out regular processing to the mail record file: read the mail record file and decode according to the different coding type, the coded system of the unified UTF-8 of use is inserted daily record receipt storehouse with decoded mail record file then;
Step f: the system foreground represents the mail in the daily record receipt storehouse intuitively, finally accomplishes the reduction of mail.
The mailbox of different E-mail service contains the mailbox of identical electronic mail service merchant's different editions among the step a.
Contain session id, addressee, sender in the mail record data structure among the step b, make a copy for, closely give, theme, content and annex.
Method of discrimination based on the web mail of http is: the transmission of mail connects that the POST method that is by http realizes; A host during connecting, each POST all can be arranged: row or referer: OK; Whether differentiate according to the content of these two row is the web mail based on http; Can avoid recording processing like this, improve system handles efficient other non-web mail POST data; Concrete determination methods is: the e-mail messages feature database that obtains according to step a is to getting access to the host of POST in connecting: row or referer: the row content is mated; Thereby confirm that whether this connection is that the web mail sends connection, and this sends the web mail protocol that connection is adopted.
What transmit the use of web mail is the http agreement, generally speaking, can use 2 to 3 http when sending mail through the web mode and connect, and connects and can whole Mail Contents be sent to server through these 2 to 3 http usually.Http connects and to contain the user and login the annex that connections, mail send under connection and the independent transmission mode of annex and upload connection.Mail sends to connect to transmit sends out people, addressee, theme, text etc., and annex is uploaded and connected filename and the file content that transmits annex.
The http connection that belongs to same session all has a same session sign (sid value), is used for the different e-mail messages of association so in http connects, mate " sid=" and take out its value preservation.
The method of reorganization buffer memory is among the step c: will belong to a mail and send the datagram buffer memory that connects, and judge that when the tcp that receives this connection finishes link information connecting buffer memory accomplishes, and will recombinate to data in buffer then; The method of reorganization is to the packet location of sorting according to the sequence number (seq) of data cached bag and length (length); The principle of ordering location is: the sequence number (seq) of next bag adds the length of the continuous bag of going forward for the sequence number (seq) of its preceding continuous bag.
The record format of every envelope mail is in the steps d: sender, addressee, make a copy for, closely give, mail matter topics, Mail Contents, source IP address, purpose IP address, session id, transmitting time.
Netease's 163 mailboxes, 4.0 versions are analyzed, made up the mail features storehouse of this mailbox.To pass through an envelope of network gateway devices then uses 4.0 editions mails that send of Netease's 163 mailboxes to catch reduction.
1, following to the analysis result of Netease's 163 mailboxes, 4.0 versions:
Host:cg1a134.mail.163.com
The mark sign: beginning characteristic (sid=) end characteristic (r)
Annex name: beginning characteristic (Content-Disposition:form-data; Name=" file1 "; Filename=")
Finish characteristic (" r :)
Content: the beginning characteristic (r) the end characteristic (r)
Sender: beginning characteristic (%3Cstring%20name%3D%22account%22%3E)
Finish characteristic (%3C%2Fstring%3E)
Addressee: beginning characteristic (%3Carray%20name%3D%22to%22%3E%3Cstring%3E)
Finish characteristic (%3C%2Fstring%3E)
Make a copy for: beginning characteristic (%3Carray%20name%3D%22cc%22%3E%3Cstring%3E)
Finish characteristic (%3C%2Fstring%3E)
Close sending: beginning characteristic (%3Carray%20name%3D%22bcc%22%3E%3Cstring%3E)
Finish characteristic (%3C%2Fstring%3E)
Theme: beginning characteristic
(%3C%2Farray%3E%3Cstring%20name%3D%22subject%22%3E)
Finish characteristic (%3C%2Fstring%3E)
Content: beginning characteristic (%3Cstring%20name%3D%22content%22%3E)
Finish characteristic (%26gt%3B%3C%2Fstring%3E)
2, always listen to the POST method of http agreement at linux kernel, and at its Host: row matches " mail.163.com " character string.Then this session connection reorganization is preserved.The preservation data are following:
Mail sends and connects
POST
/js4/s?sid=zCyCfpVVqNwShaSRjwVVDaycbwvTPSvV&func=mbox:compose&l=compose&action=deliver?HTTP/1.1
Host:cg1a134.mail.163.comContent-Length:1717
var=%3C%3Fxml%20version%3D%221.0%22%3F%3E%3Cobject%3E%3Cstring%20name%3D%22id%22%3Ec%3A1311325276573%3C%2Fstring%3E%3Cobject%20name%3D%22attrs%22%3E%3Cstring%20name%3D%22account%22%3E%22erniusias%22%26lt%3Bximotechacg%40163.com%26gt%3B%3C%2Fstring%3E%3Cboolean%20name%3D%22showOneRcpt%22%3Efalse%3C%2Fboolean%3E%3Carray%20name%3D%22to%22%3E%3Cstring%3Eerniusias%40sina.com%3C%2Fstring%3E%3C%2Farray%3E%3Carray%20name%3D%22cc%22%20%2F%3E%3Carray%20name%3D%22bcc%22%20%2F%3E%3Cstring%20name%3D%22subject%22%3E163mailtestandfujian%3C%2Fstring%3E%3Cboolean%20name%3D%22isHtml%22%3Etrue%3C%2Fboolean%3E%3Cstring%20name%3D%22content%22%3E%26lt%3Bdiv%20style%3D′line-height%3A1.7%3Bcolor%3A%23000000%3Bfont-size%3A14px%3Bfont-family%3Aarial′%26gt%3B%26lt%3Bdiv%26gt%3B111111111111111111111111111111111111111%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3B22222222222222222222222222222222222222222%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3B3333333333333333333333333333333333333333%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3B44444444444444444444444444444444444444444%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3B5555555555555555555555555555555555555555%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3C%2Fstring%3E%3Cboolean%20name%3D%22saveSentCopy%22%3Etrue%3C%2Fboolean%3E%3Cstring%20name%3D%22charset%22%3EGBK%3C%2Fstring%3E%3C%2Fobject%3E%3Cboolean%20name%3D%22returnInfo%22%3Efalse%3C%2Fboolean%3E%3Cstring%20name%3D%22action%22%3Edeliver%3C%2Fstring%3E%3Cint%20name%3D%22saveSentLimit%22%3E1%3C%2Fint%3E%3C%2Fobject%3E
3, send the connection data through mail and carry out feature string coupling taking-up effective information buffer memory:
Sender %22erniusias%22%26lt%3Bximotechacg%40163.com%26gt%3B
Addressee erniusias%40sina.com
Mail matter topics 163mailtestandfujian
Session id zCyCfpVVqNwShaSRjwVVDaycbwvTPSvV
Mail Contents
%3Cboolean%20name%3D%22isHtml%22%3Etrue%3C%2Fboolean%3E%3Cstring%20name%3D%22content%22%3E%26lt%3Bdiv%20style%3D′line-height%3A1.7%3Bcolor%3A%23000000%3Bfont-size%3A14px%3Bfont-family%3Aarial′%26gt%3B%26lt%3Bdiv%26gt%3B111111111111111111111111111111111111111%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3B22222222222222222222222222222222222222222%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3B3333333333333333333333333333333333333333%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3B44444444444444444444444444444444444444444%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3B5555555555555555555555555555555555555555%26lt%3B%2Fdiv%26gt%3B%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3B%26amp%3Bnbsp%3B%26lt%3B%2Fdiv%26gt%3B%0A%26lt%3B%2Fdiv%26gt%3B
Source IP address 192.168.1.25
4, data are write file mail.log;
%22erniusias%22%26lt%3Bximotechacg%40163.com%26gt%3B`
erniusias%40sina.com`163mailtestandfujian`%3Cboolean%20name%3D%22isHtm1%22%3Etrue%3C%2Fboolean%3E%3Cstring%20name%3D%22content%22%3E%26lt%3Bdiv%20style%3D′line-height%3A1.7%3Bcolor%3A%23000000%3Bfont-size%3A14px%3Bfont-family%3Aarial′%26gt%3B%26lt%3Bdiv%26gt%3B111111111111111111111111111111111111111%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3B22222222222222222222222222222222222222222%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3B3333333333333333333333333333333333333333%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3B44444444444444444444444444444444444444444%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3B5555555555555555555555555555555555555555%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3B%26amp%3Bnbsp%3B%26lt%3B%2Fdiv%26gt%3B%0A%26lt%3B%2Fdiv%26gt%3B`192.168.1.25`134.35.24.78`zCyCfpVVqNwShaSRjwVVDaycbwvTPSvV`1320455590
5, data processing in the file is carried out dissection process and deposit database in, and be illustrated in front page layout intuitively, shown in the following form: