CN107453973B - Method and device for discriminating identity characteristics of e-mail sender - Google Patents

Method and device for discriminating identity characteristics of e-mail sender Download PDF

Info

Publication number
CN107453973B
CN107453973B CN201610373221.2A CN201610373221A CN107453973B CN 107453973 B CN107453973 B CN 107453973B CN 201610373221 A CN201610373221 A CN 201610373221A CN 107453973 B CN107453973 B CN 107453973B
Authority
CN
China
Prior art keywords
address
credible
sending
mail
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610373221.2A
Other languages
Chinese (zh)
Other versions
CN107453973A (en
Inventor
沈朝阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610373221.2A priority Critical patent/CN107453973B/en
Publication of CN107453973A publication Critical patent/CN107453973A/en
Application granted granted Critical
Publication of CN107453973B publication Critical patent/CN107453973B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0876Network architectures or network communication protocols for network security for authentication of entities based on the identity of the terminal or configuration, e.g. MAC address, hardware or software configuration or device fingerprint
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/42Mailbox-related aspects, e.g. synchronisation of mailboxes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Power Engineering (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Transfer Between Computers (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The method comprises the steps of establishing a preset feature matching model, taking the preset feature matching model as a judgment basis, judging matching conditions of extracted features in a preset feature set to be screened, and when corresponding features in the preset feature set meet the feature matching conditions, indicating that the identity of a sender of an electronic mail to be screened is real and credible. Therefore, the method provided by the application does not need to require the mail sender to make any setting, but completely collects and compares the mail sending behavior information to match and discriminate the identity characteristics of the mail sender, and can accurately confirm whether the identity of the sender is real and credible through the method.

Description

Method and device for discriminating identity characteristics of e-mail sender
Technical Field
The application relates to the technical field of electronic mail systems, in particular to a method and a device for discriminating identity characteristics of an electronic mail sender.
Background
Due to the development of network technology, the traditional communication mode is almost replaced by the network communication mode, for example, the appearance of e-mail has gradually replaced the traditional letter transmission mode. In the age of network popularity, users are more inclined to transmit files and information by taking e-mails as transmission means either in life or in work.
According to the e-mail protocol, when an e-mail is transmitted in the same e-mail system, aiming at the condition that a receiver and a sender adopt the same e-mail system, the receiver and the sender both need to carry out identity authentication in a password mode and the like when logging in, therefore, if the receiver and the sender adopt the same system, the sender needs to log in the system through identity authentication, at the moment, the identity of the sender can be proved to be real, and therefore the receiver does not need to carry out other authentication. However, when an email is transmitted between two different email systems, the systems of the email used by the receiver and the sender are different, and the system used by the receiver is uncertain about the identity authentication of the system applied by the sender, so the authentication mechanism of the system is limited, and for this case, two commonly used methods of identity identification are SPF method and DKIM method, which are introduced as follows:
spf (sender Policy framework), which is a domain name owner who declares the legally originating IP of the domain by setting a DNS (domain name system) of the sending domain to record in the spf or txt manner, so all mails of the domain should come from these declared IPs, otherwise, it may be a forgery.
DKIM (Domain Keys Identified mail), which is an identity authentication mechanism based on digital signature, a sender sets a signature public key in a DNS (Domain name System) record of a specified domain name in advance, a private key is used for digitally signing the mail when sending the mail, the result is inserted into a mail header, and a receiver acquires the public key from the DNS according to the information of the mail header of the electronic mail to carry out digital signature authentication on the mail.
SPF and DKIM methods are a more effective mechanism for identifying the domain name of a sending email domain between two email systems, but in practical applications, the two methods have the following problems:
both of these two mechanisms require that a domain name owner of a mail sending domain must perform corresponding DNS (domain name system) setting, and the DKIM method also requires that a corresponding signature field be added to the mail header, but the above-mentioned settings of both methods are not mandatory requirements of an e-mail protocol, so that in practical applications, many mail senders do not use corresponding settings, and a receiver cannot use these mechanisms to perform corresponding identity screening. Even if some address domain names are set accordingly, in many cases, some recipients do not explicitly reject mails that do not comply with SPF or DKIM settings, and therefore, these mechanisms alone still do not make good confirmation whether the mail sender identity is authentic.
Disclosure of Invention
The application provides a method for discriminating identity characteristics of an e-mail sender, which aims to solve the problems in the prior art.
The application further provides a device for discriminating the identity characteristics of the e-mail sender.
The application provides a method for discriminating identity characteristics of an email sender, which comprises the following steps:
extracting a preset feature set of the received e-mail to be screened;
according to a feature matching condition in a pre-trained feature matching model, carrying out matching degree test on the features to be screened in the preset feature set and the corresponding features in the feature matching condition, and judging whether the features to be screened in the preset feature set meet the feature matching condition;
and if so, screening the identity characteristics of the sender of the electronic mail to be screened as a credible mail sender.
Optionally, the training mode of the pre-trained feature matching model includes:
acquiring a characteristic matching condition and a credible IP address in a preset mode based on data in a preset time period of an offline data system; the credible IP address is stored into a credible IP address set;
acquiring the associated characteristics corresponding to the credible IP address;
and storing the acquired feature matching conditions, the credible IP address set and the associated features corresponding to the credible IP addresses in the feature matching model.
Optionally, in the step of obtaining the trusted IP address in a preset manner, the preset manner includes at least one of the following manners:
judging whether the transmitting IP address is a credible IP address or not according to the transmitting domain and the transmitting IP address;
judging whether the transmitting IP address is a credible IP address mode or not according to the system identifier of the transmitting party and the transmitting IP address;
judging whether the originating IP address is a credible IP address or not according to the similarity of the associated characteristics of the originating IP address and the associated characteristics corresponding to the credible IP address;
judging whether the sending IP address is a credible IP address or not according to the reply rate corresponding to the mail sent by the sending IP address;
and judging whether the IP address is a credible IP address or not according to the correlation characteristics corresponding to the IP address and the mail opening rate.
Optionally, the manner of determining whether the originating IP address is a trusted IP address according to the sending domain and the originating IP address includes;
acquiring a sending IP address of the e-mail from an offline data system;
analyzing a sending domain of the e-mail to obtain an analyzed IP address;
judging whether the transmitting IP address is matched with the analyzed IP address;
if yes, the IP address corresponding to the transmitting IP address is used as a credible IP address;
the feature matching condition obtained by adopting the preset mode is to judge whether the IP address information after the transmission IP address is analyzed by the transmission domain is matched with the IP address information after the transmission domain is analyzed.
Optionally, in the step of determining whether the originating IP address matches the resolved IP address, the manner of determining whether the originating IP address matches the resolved IP address includes:
judging whether the two addresses of the transmitting IP address and the analyzed IP address have the same number of bits with the preset number;
if yes, the sending IP address is matched with the analyzed IP address information;
if not, the sending IP address is not matched with the analyzed IP address information.
Optionally, in the step of analyzing the sending domain of the email, the sending domain analyzing method includes any one of the following methods:
analyzing the sending domain by setting an IP address record;
and analyzing the sending domain by adopting a mail exchange record mode.
Optionally, the determining whether the originating IP address is a trusted IP address according to the originator system identifier and the originating IP address includes:
acquiring a transmitting IP address and a transmitting party system identification of an e-mail transmitting domain from an offline data system;
judging whether the sender system identification is matched with the sender IP address;
if yes, the IP address corresponding to the transmitting IP address is used as a credible IP address;
the feature matching condition obtained by adopting the preset mode is to judge whether the system identification of the sender of the electronic mail is matched with the IP address of the sender.
Optionally, in the step of determining whether the originator system identifier matches the originator IP address, the manner of determining whether the originator system identifier matches the originator IP address includes:
judging whether the first-level domain name of the e-mail sending domain is the same as the first-level domain name of the sender system identification;
if yes, judging whether the IP address analyzed by the sender system identification is matched with the sending IP address;
when the analyzed IP address and the transmitting IP address have the same number of digits with the preset number, the system identification of the transmitter is matched with the transmitting IP address;
when the resolved IP address and the originating IP address do not have the same number of bits of a preset number, the originator system identifier is not matched with the originating IP address.
Optionally, the determining whether the originating IP address is the trusted IP address according to the similarity between the associated feature of the originating IP address and the associated feature corresponding to the trusted IP address includes:
acquiring a sending IP address of an E-mail and the correlation characteristics of the sending IP address from an offline data system; the associated characteristics comprise sender system identification and mail header characteristics;
judging whether the correlation characteristics of the transmitting IP address are matched with the correlation characteristics corresponding to the credible IP addresses in the credible IP address set or not;
if yes, the IP address corresponding to the transmitting IP address is used as a credible IP;
the feature matching condition obtained in the preset mode is to judge whether the associated feature of the sending IP address is matched with the associated feature corresponding to the credible IP address in the credible IP address set.
Optionally, in the step of determining whether the associated feature of the originating IP address matches the associated feature corresponding to the trusted IP address in the trusted IP address set, a manner of determining whether the associated feature of the originating IP address matches the associated feature corresponding to the trusted IP address in the trusted IP address set includes:
calculating the similarity of the associated feature of the transmitting IP address and the feature vector corresponding to the associated feature corresponding to the credible IP address set by adopting a feature vector similarity measurement method;
judging whether the similarity is greater than or equal to a preset similarity threshold value or not;
if so, the correlation characteristic of the transmitting IP address is matched with the correlation characteristic corresponding to the credible IP address in the credible IP address set;
if not, the associated characteristics of the transmitting IP address are not matched with the associated characteristics corresponding to the credible IP addresses in the credible IP address set.
Optionally, the method for measuring similarity of feature vectors specifically includes:
acquiring the associated characteristics corresponding to the credible IP address set and the associated characteristics of the signaling IP address;
forming a feature vector by using the features in the associated features corresponding to the credible IP address set and the weight corresponding to each feature;
acquiring the same characteristics in the associated characteristics corresponding to the credible IP address set in the associated characteristics of the transmitting IP address;
calculating and obtaining the weight of the same feature;
and comparing the weight of the same characteristic with the total weight of all the characteristics to obtain the similarity of the associated characteristic of the signaling IP address and the characteristic vector of the associated characteristic corresponding to the credible IP address set.
Optionally, the determining whether the originating IP address is a trusted IP address according to the reply rate corresponding to the email sent by the originating IP address includes:
acquiring the quantity of the e-mails sent by adopting the sending IP address from an offline data system;
acquiring the number of the mails received by the sending IP address in the preset time period;
acquiring the reply rate of the sending IP address according to the quantity of the sent e-mails and the quantity of the received e-mails;
judging whether the reply rate is greater than or equal to a preset reply rate threshold value;
if yes, the sending IP address is used as a credible IP address.
Optionally, the manner of determining whether the originating IP address is a trusted IP address according to the correlation characteristic corresponding to the originating IP address and the mail opening rate includes:
acquiring a signaling IP address with the use frequency higher than a preset frequency in a preset time period from an offline data system;
judging whether the sender system identification and the mail header characteristic corresponding to the sending IP address are stable or not;
if yes, judging whether the opening rate of the mail sent by the sending IP address is higher than a preset opening rate threshold value or not;
and when the opening rate of the mail sent by the sending IP address is higher than a preset opening rate threshold value and no bad record exists in the sending IP address, taking the sending IP address as a credible IP address.
Optionally, the feature to be screened in the preset feature set received in the step of extracting the preset feature set of the received e-mail to be screened is a sending IP address to be screened;
correspondingly, the step of performing matching degree test on the features to be screened in the preset feature set and the corresponding features in the feature matching conditions based on the feature matching conditions in the pre-trained feature matching model, and judging whether the features to be screened in the preset feature set meet the feature matching conditions includes:
the characteristic matching condition in the pre-trained characteristic matching model is to judge whether a certain IP address belongs to the credible IP address set;
judging whether the sending IP address to be screened belongs to the credible IP address set or not according to the set characteristic matching condition;
if yes, the sending IP address to be screened meets the feature matching condition.
Optionally, when the determination result that whether the originating IP address to be discriminated belongs to the trusted IP address set is negative, performing determination of a feature matching condition in the feature matching model again, where the feature matching condition is to determine whether the originating IP address matches the IP address after the sending domain analysis;
the specific judgment method comprises the following steps:
analyzing a sending domain of the e-mail to obtain an analyzed IP address;
judging whether the sending IP address to be discriminated is matched with the analyzed IP address according to the characteristic matching condition;
if yes, the sending IP address to be screened meets the feature matching condition.
Optionally, the determining whether the originating IP address to be screened matches the resolved IP address includes:
and judging whether the sending IP address to be screened and the analyzed IP address have the same number of preset digits.
Optionally, when the determination result in the step of determining whether the originating IP address to be screened matches the resolved IP address is negative, performing determination of a feature matching condition in a feature matching model, where the feature matching condition is to determine whether an originator system identifier of the email matches the originating IP address;
the specific judgment method comprises the following steps:
judging whether the calling IP address to be screened is matched with the calling party system identification or not according to the characteristic matching condition;
if yes, the sending IP address to be screened meets the feature matching condition.
Optionally, the determining whether the originating IP address to be screened matches the originator system identifier includes:
judging whether the first-level domain name of the mail sending domain is the same as the first-level domain name of the sender system identification;
if yes, judging whether the IP address analyzed by the sender system identification is matched with the sending IP address to be screened.
Optionally, if the determination result in the step of determining whether the originating IP address to be screened matches the originator system identifier is negative, performing determination of a feature matching condition in a feature matching model, where the feature matching condition is to determine whether an associated feature of the originating IP address matches an associated feature corresponding to a trusted IP address in a set of trusted IP addresses; the associated characteristics comprise sender system identification and mail header characteristics;
the specific judgment method comprises the following steps:
judging whether the correlation characteristics of the transmitting IP address to be screened are matched with the correlation characteristics corresponding to the credible IP addresses in the credible IP address set or not according to the characteristic matching condition;
if yes, the sending IP address to be screened meets the feature matching condition.
Optionally, the determining and matching manner adopted in the step of determining whether the correlation characteristic of the originating IP address to be screened matches the correlation characteristic corresponding to the trusted IP address in the trusted IP address set includes:
and judging whether the similarity between the associated features of the transmitting IP address to be screened and the associated features corresponding to the credible IP address is greater than or equal to a preset similarity threshold value.
Optionally, the step that the originating IP address to be screened satisfies the feature matching condition, executes the following steps:
judging whether the correlation characteristics of the transmitting IP address to be screened are matched with the correlation characteristics corresponding to the credible IP addresses in the credible IP address set;
if yes, the sending IP address to be screened meets the feature matching condition.
Optionally, the receiving manner of the e-mail to be screened in the step of extracting the preset feature set of the received e-mail to be screened includes: and receiving the e-mail to be screened by adopting a mail transmission agent system.
Optionally, the feature information included in the preset feature matching model includes: the sender IP address, the sending domain information, the sender system identification and the mail header characteristics.
The application also provides a device for discriminating the identity characteristics of the e-mail sender, comprising:
the preset feature set extracting unit is used for extracting a preset feature set of the received electronic mail to be screened;
the characteristic matching condition judging unit is used for carrying out matching degree test on the characteristics to be screened in the preset characteristic set and the corresponding characteristics in the characteristic matching conditions according to the characteristic matching conditions in the pre-trained characteristic matching model, and judging whether the characteristics to be screened in the preset characteristic set meet the characteristic matching conditions or not;
and the identity characteristic screening unit is used for screening the identity characteristic of the sender of the electronic mail to be screened as a credible mail sender if the judgment result of the characteristic matching condition judgment unit is positive.
Optionally, the feature matching condition determining unit further includes: a feature matching model training subunit;
the feature matching model training subunit further includes:
the device comprises a characteristic matching condition and credible IP address obtaining subunit, a characteristic matching condition and credible IP address obtaining subunit and a characteristic matching condition and credible IP address obtaining subunit, wherein the characteristic matching condition and credible IP address obtaining subunit are used for obtaining the characteristic matching condition and credible IP address in a preset mode based on data in a preset time period of an offline data system; the credible IP address is stored into a credible IP address set;
the associated feature acquiring subunit is used for acquiring the associated feature corresponding to the trusted IP address;
and the storage subunit is used for storing the acquired feature matching conditions, the credible IP address set and the associated features corresponding to the credible IP addresses in the feature matching model.
Optionally, the preset manner in the feature matching condition and trusted IP address obtaining subunit includes at least one of the following units:
a sending domain judging subunit, configured to judge, according to the sending domain and the originating IP address, whether the originating IP address is a trusted IP address;
the sender system identification judging subunit is used for judging whether the sender IP address is in a credible IP address mode according to the sender system identification and the sender IP address;
the association characteristic similarity judging subunit is used for judging whether the originating IP address is a reliable IP address or not according to the similarity of the association characteristic of the originating IP address and the association characteristic corresponding to the reliable IP address;
the reply rate judging subunit is used for judging whether the sending IP address is a reliable IP address or not according to the reply rate corresponding to the mail sent by the sending IP address;
and the opening rate judging subunit is used for judging whether the sending IP address is a reliable IP address or not according to the correlation characteristic corresponding to the sending IP address and the mail opening rate.
Optionally, the sending domain determining subunit includes:
a sending IP address obtaining subunit, configured to obtain a sending IP address of the email from the offline data system;
the analysis subunit is used for analyzing the sending domain of the e-mail to obtain an analyzed IP address;
a judging subunit, configured to judge whether the originating IP address matches the resolved IP address;
the credible IP address generating subunit is used for taking the IP address corresponding to the transmitting IP address as a credible IP address if the judgment result of the judging subunit is positive;
the feature matching condition obtained by adopting the preset mode is to judge whether the IP address information after the transmission IP address is analyzed by the transmission domain is matched with the IP address information after the transmission domain is analyzed.
Optionally, the judging subunit includes:
a digit judging subunit, configured to judge whether there is a preset number of identical digits between the two addresses of the originating IP address and the resolved IP address;
if yes, the sending IP address is matched with the analyzed IP address information;
if not, the sending IP address is not matched with the analyzed IP address information.
Optionally, the originator system identifier determining subunit includes:
a sending IP address and sender system identification obtaining subunit, configured to obtain a sending IP address and a sender system identification of an e-mail sending domain from an offline data system;
a judging subunit, configured to judge whether the originator system identifier matches the originating IP address;
if yes, the IP address corresponding to the transmitting IP address is used as a credible IP address;
the feature matching condition obtained by adopting the preset mode is to judge whether the system identification of the sender of the electronic mail is matched with the IP address of the sender.
Optionally, the judging subunit includes:
a first-level domain name judging subunit, configured to judge whether a first-level domain name of an email sending domain is the same as a first-level domain name of the originator system identifier;
a matching judgment subunit, configured to judge whether the IP address resolved by the originator system identifier matches the originating IP address if the judgment result of the primary domain name judgment subunit is yes;
when the analyzed IP address and the transmitting IP address have the same number of digits with the preset number, the system identification of the transmitter is matched with the transmitting IP address;
when the resolved IP address and the originating IP address do not have the same number of bits of a preset number, the originator system identifier is not matched with the originating IP address.
Optionally, the association characteristic similarity determining subunit includes:
the associated characteristic acquiring subunit is used for acquiring the sending IP address of the email and the associated characteristic of the sending IP address from the offline data system; the associated characteristics comprise sender system identification and mail header characteristics;
the judging subunit is used for judging whether the associated characteristics of the sending IP address are matched with the associated characteristics corresponding to the credible IP addresses in the credible IP address set;
if yes, the IP address corresponding to the transmitting IP address is used as a credible IP;
the feature matching condition obtained in the preset mode is to judge whether the associated feature of the sending IP address is matched with the associated feature corresponding to the credible IP address in the credible IP address set.
Optionally, the judging subunit includes:
a similarity obtaining subunit, configured to calculate, by using a similarity measurement method for feature vectors, a similarity between the associated feature of the originating IP address and a feature vector corresponding to the associated feature in the trusted IP address set;
a threshold judgment subunit, configured to judge whether the similarity is greater than or equal to a preset similarity threshold;
if so, the correlation characteristic of the transmitting IP address is matched with the correlation characteristic corresponding to the credible IP address in the credible IP address set;
if not, the associated characteristics of the transmitting IP address are not matched with the associated characteristics corresponding to the credible IP addresses in the credible IP address set.
Optionally, the similarity obtaining subunit includes:
the associated characteristic acquiring subunit is used for acquiring the associated characteristic corresponding to the credible IP address set and the associated characteristic of the signaling IP address;
the characteristic vector generating subunit is used for forming a characteristic vector by the characteristics in the associated characteristics corresponding to the credible IP address set and the weight corresponding to each characteristic;
a same feature obtaining subunit, configured to obtain a same feature in association features of the originating IP address, where the association features correspond to the trusted IP address set;
the weight calculation subunit is used for calculating the weight of the same characteristic;
and the similarity obtaining subunit is configured to compare the weight of the same feature with a total weight of all features, and obtain a similarity between the associated feature of the originating IP address and a feature vector of the associated feature corresponding to the trusted IP address set.
Optionally, the recovery rate determining subunit includes:
the sending electronic mail quantity obtaining subunit is used for obtaining the quantity of the electronic mails sent by adopting the sending IP address from the offline data system;
a received mail quantity acquiring subunit, configured to acquire the quantity of the mails received by the originating IP address within the preset time period;
a reply rate calculating subunit, configured to obtain a reply rate of the outgoing IP address according to the number of the sent emails and the number of the received emails;
a threshold judgment subunit, configured to judge whether the reply rate is greater than or equal to a preset reply rate threshold;
if yes, the sending IP address is used as a credible IP address.
Optionally, the opening rate determining subunit includes:
a sending IP address obtaining subunit, configured to obtain, from the offline data system, a sending IP address whose usage frequency is higher than a preset frequency within a preset time period;
a stability judging subunit, configured to judge whether the sender system identifier and the mail header feature corresponding to the sending IP address are stable;
a preset opening rate threshold judging subunit, configured to, if the judgment result of the stability judging subunit is yes, judge whether the opening rate of the email sent by the sending IP address is higher than a preset opening rate threshold;
and the credible IP address generating subunit is used for taking the sending IP address as the credible IP address when the opening rate of the mail sent by the sending IP address is higher than a preset opening rate threshold value and no bad record exists in the sending IP address.
Optionally, the characteristics in the preset characteristic set received in the preset characteristic set extracting unit are the characteristics to be screened, namely the originating IP addresses to be screened;
correspondingly, the feature matching condition judgment unit includes:
a certain IP address judging subunit, configured to judge whether a certain IP address belongs to the set of trusted IP addresses according to a feature matching condition in the pre-trained feature matching model;
a credible IP address set judging subunit, configured to judge, according to the set feature matching condition, whether the originating IP address to be screened belongs to the credible IP address set;
if yes, the sending IP address to be screened meets the feature matching condition.
Optionally, when the determination result of the trusted IP address set determining subunit is negative, the method further includes:
the analyzed IP address acquisition subunit is used for analyzing the sending domain of the e-mail to obtain an analyzed IP address;
an IP address matching judgment subunit, configured to judge whether the originating IP address to be screened matches the resolved IP address according to the feature matching condition;
if yes, the sending IP address to be screened meets the feature matching condition.
Optionally, when the determination result of the IP address matching determination subunit is negative, the method further includes:
the transmitting IP address and the transmitting party system identification matching judgment subunit is used for judging whether the transmitting IP address to be screened is matched with the transmitting party system identification according to the characteristic matching condition;
if yes, the sending IP address to be screened meets the feature matching condition.
Optionally, when the determination result of the sender system identifier matching determination subunit is negative, the method further includes:
the associated feature matching judgment subunit is used for judging whether the associated features of the originating IP address to be screened are matched with the associated features corresponding to the credible IP addresses in the credible IP address set or not according to the feature matching conditions;
if yes, the sending IP address to be screened meets the feature matching condition.
Optionally, the trusted IP address generation subunit further includes, after:
the associated feature matching judgment subunit is used for judging whether the associated features of the originating IP address to be screened are matched with the associated features corresponding to the credible IP addresses in the credible IP address set or not;
if yes, the sending IP address to be screened meets the feature matching condition.
Compared with the prior art, the method has the following advantages:
the application provides a method for discriminating identity characteristics of an email sender, which comprises the following steps: extracting a preset feature set of the received e-mail to be screened; according to a feature matching condition in a pre-trained feature matching model, carrying out matching degree test on the features to be screened in the preset feature set and the corresponding features in the feature matching condition, and judging whether the features to be screened in the preset feature set meet the feature matching condition; and if so, the identity characteristic of the sender of the electronic mail to be screened is a credible mail sender. The method comprises the steps of establishing a preset feature matching model, taking the preset feature matching model as a judgment basis, judging matching conditions of extracted features in a preset feature set to be screened, and when corresponding features in the preset feature set meet the feature matching conditions, showing that the identity of a sender of an electronic mail to be screened is real and credible. Therefore, the method provided by the application does not need to require the mail sender to make any setting, but completely collects and compares the mail sending behavior information to match and discriminate the identity characteristics of the mail sender, and can accurately confirm whether the identity of the sender is real and credible through the method.
Drawings
Fig. 1 is a flowchart of a method for screening identity of an email sender according to a first embodiment of the present application.
Fig. 2 is a flowchart of a feature matching model training method according to a first embodiment of the present application.
Fig. 3 is a schematic structural diagram of an apparatus for screening identity of an email sender according to a second embodiment of the present application
Detailed Description
The first embodiment of the present application provides a method for screening identity characteristics of an email sender, which is used to identify whether the sender of an email has a real identity, when an email is transmitted between two different email systems, a receiving system needs to screen and verify the authenticity of a domain name identity of an outgoing address of the received email to determine whether the received email is indeed from an email system authorized by a domain name owner, or whether the received email is sent from an email sending IP address authorized by the domain name owner, and further, suspicious emails sent by faking or faking domain names of others can be identified by the method.
Wherein the e-mail system is a single system, the system has its own server, there can be multiple domains in the e-mail system, there can be multiple users in each domain, for example, if buying a commercial version of the e-mail system or an open source e-mail system provided by a certain e-mail manufacturer, multiple settings can be made, and its data is put on its own server.
In addition, the method provided by the first embodiment of the present application can be mainly applied to an anti-spam system of an email, and by the method, spam in the email is identified, and the spam is further effectively intercepted or shielded through other relevant manners.
The method provided in the first embodiment of the present application is described below in detail, and fig. 1 is a flowchart of a method for screening identity characteristics of an email sender provided in the first embodiment of the present application, and please refer to fig. 1, where the method includes the following steps:
and S101, extracting a preset feature set of the received electronic mail to be screened.
This step is specific to the received e-mail, and therefore this step is to screen a certain received e-mail, that is, to screen whether the sender of the received e-mail has a real identity, and therefore, the received e-mail may be referred to as an e-mail to be screened. For the e-mail to be screened, the received e-mail necessarily includes a certain feature, and the features related to the method provided in the first embodiment of the present application are generally a plurality of features, so the related features obtained from the e-mail to be screened may also be referred to as a preset feature set.
When an e-mail is transmitted between two different mail systems, a Mail Transfer Agent (MTA) is generally used to receive the e-mail transmitted from the external system for the mail system used by the receiver.
Specifically, the MTA module belongs to a module interacting with a user in an email system, and belongs to a front-end module of a mail server system. Each e-mail client is configured to send and retrieve e-mail addressed to a certain user address from an MTA to a mail system via a mail transfer agent MTA (mail transfer agent). Thus, an account for the e-mail needs to be set up on the mail server and standard Internet protocols can be used, whether to off-line process the e-mail (using POP3) or leave the e-mail on the server (using IMAP). The protocol for sending Mail between the Mail client and the MTA and between the MTA and the MTA is the simple Mail Transfer protocol smtp (simple Mail Transfer protocol).
After the electronic mail to be screened is received by the MTA system, all relevant features need to be extracted from the electronic mail, and the extracted multiple relevant features form a preset feature set. The preset feature set can be mail sending domain information, mail sending IP address, sender system identification and mail header feature.
The sending domain information refers to a domain name part of a mail sending address, the mail sending address generally refers to an SMTP protocol used for standard electronic mail transmission, and the address of a mail source in an SMPT protocol interaction process can be in the form of user @ example. Therefore, the sending domain information may be an example.com part in user @ example.com for the mail sending address. Each mail must carry the domain name part of its sending address when sent.
The mail sending IP address refers to an IP address used when a mail sending domain sends a mail. Generally, the domain name and the IP address can be resolved and transformed by a corresponding resolution method, and the resolution process of the domain name can be completed by a Domain Name System (DNS).
The sender system identification identifies the domain of the sender's mail system, and may be completely different from the mail sending domain. And the originator system identification may be associated with the IP address by way of DNS resolution. For a more canonical mail system, the DNS PTR resolution of the IP is the hello domain, and the DNS A resolution of the hello domain is the originating IP, however, not all mail systems are configured as such.
The mail header features may include header fields (headers). The header characteristics reflect to some extent some of the characteristics associated with the originating mail system.
The sender system identifies that the step has completed the process of acquiring the preset feature set from the e-mail to be screened, and can perform screening judgment on the identity of the sender through the following steps.
And S102, carrying out matching degree test on the features to be screened in the preset feature set and the corresponding features in the feature matching conditions according to the feature matching conditions in the pre-trained feature matching model, and judging whether the features to be screened in the preset feature set meet the feature matching conditions.
The step is mainly a process of matching and discriminating the extracted preset feature set through a trained feature matching model.
First, the method used for training the feature matching model will be described.
The feature matching model mainly comprises a credible IP address set obtained through training in various modes, and also can obtain associated features related to credible IP addresses in the credible IP address set, and meanwhile, the feature matching model also comprises matching conditions of features corresponding to the preset feature set.
The credible IP address set refers to IP addresses which are stored in the set and are credible and have authenticity, and the credible IP addresses are formed after being trained and learned through offline data and can be used as a fixed existing characteristic set.
Fig. 2 is a flowchart of a feature matching model training method according to a first embodiment of the present application, please refer to fig. 2, where the training process of the feature matching model is as follows:
step S102-1, acquiring a characteristic matching condition and a credible IP address in a preset mode based on data in a preset time period of an offline data system; and the credible IP address is stored into a credible IP address set.
The offline data system can periodically mine the behavior log of each mail sending domain in a preset time period, and the sending IP address can be obtained from the behavior log in the preset time period.
The time in the preset time period can be one month or three months, and can be set according to specific actual conditions. The length of the preset time period does not influence the implementation of the subsequent steps.
The method mainly comprises the steps of obtaining corresponding feature matching conditions in a preset mode, obtaining a credible IP address, and storing the detected credible IP address as a set as a credible IP address set.
Specifically, the preset mode may be one of multiple modes, or any combination of multiple modes, and finally, the feature matching condition and the trusted IP address set may be formed in the preset mode.
The preset mode will be described below by way of example.
The first method may be a method of determining whether the originating IP address is a trusted IP address based on the sending domain and the originating IP address.
The first embodiment is specifically as follows:
first, the IP address of the e-mail is obtained from the off-line data system.
Analyzing a sending domain of the e-mail to obtain an analyzed IP address;
after the step of obtaining and recording the email sending IP address, the sending domain is analyzed and obtained the analyzed IP address through an analyzing mode.
The mail sending domain can be resolved into the form of an IP address by a resolving mode. The specific parsing method may be various, and in general, mx (mail exchange) parsing of the transmission domain and a (address) parsing of the transmission domain may be performed. The MX record (mail exchange record) and the a record (mail IP address record) can be acquired separately by the above-described parsing manner.
By the above-described analysis of the sending domain, the IP address after the analysis can be obtained.
Here, the IP address after the resolution will be described. For some large domain name providers, the IP address of the outgoing source may be different from the IP address of the incoming source, that is, for some sending domains, the corresponding IP address of the authentic outgoing source may not be directly determined, but the IP address of the incoming source is generally the real IP address, and therefore, the IP address obtained by analyzing the sending domain at this time is the IP address of the incoming source corresponding to the sending domain.
The above steps already obtain the resolved IP address corresponding to the sending domain, and therefore, it is necessary to determine whether the sending IP address matches the resolved IP address.
The step is a process of comparing the obtained sending IP address with the analyzed IP address, and the step of judging whether the two IP addresses are matched is a process of comparing the two IP addresses, and whether the two IP addresses are the same or adjacent is compared.
If all fields of the two IP addresses are the same, the IP address corresponding to the originating IP address is credible and authentic. In addition, for the case where two IP addresses are adjacent. When most fields of two addresses are the same and a small number of fields are relatively close to each other, it is described that the two IP addresses belong to the same domain, and therefore, the originating IP address in such a case can be set as authentic.
Specifically, in the step of determining whether the originating IP address matches the resolved IP address, the manner of determining whether the originating IP address matches the resolved IP address information includes:
and judging whether the two addresses of the transmitting IP address and the analyzed IP address have the same number of bits with preset quantity.
The judgment process is a judgment process of whether all or part of the fields of the IP address described above are the same.
The same number of bits referred to above as the preset number refers to the same number of fields of the IP address. If all fields of two IP addresses are the same, it means that the two addresses are identical, and if the fields of two IP address parts are the same, there is the number of fields of the same part. In general, the IP addresses having the same IP address field of class C can be regarded as two adjacent IP addresses. For example, one IP address is: 192.168.0.1, and another IP address is: 192.168.0.2, the last fields are different, but the two IP addresses belong to the same IP address as the IP address field of the class C, and therefore, it can be considered that the two IP addresses are adjacent.
And in the step of judging whether the two addresses of the transmitting IP address and the correct IP address have the same number of bits with the preset number, if so, matching the transmitting IP address with the correct IP address information.
On the contrary, if the judgment result is no, the sending IP address is not matched with the correct IP address information.
And if the judging result is yes, the IP address corresponding to the transmitting IP address is taken as a credible IP address and stored into the credible IP address set in the characteristic matching model.
If the above judgment result is yes, it indicates that the originating IP address is the same as or adjacent to the received IP address of the real sending domain, and indicates that the originating IP address is the same as the domain corresponding to the real IP address, so that it can be judged that the originating IP address belongs to a trusted IP address, and in this case, the IP address corresponding to the originating IP address can be stored as a trusted IP address in the set of trusted IP addresses in the feature matching model.
In addition, the step of determining whether the sending IP address matches the IP address information analyzed in the sending domain in the training model process is used as a matching condition in the feature matching model.
When the subsequent on-line judgment whether a certain IP address meets the feature matching condition or not is carried out, the feature matching model is required to be applied, and aiming at the sending IP address to be screened, whether the sending IP address to be screened is matched with the correct IP address information in the feature matching model or not is judged by utilizing the matching condition, and whether the e-mail sent by the sending IP address to be screened is real or not is judged according to the matching result.
The above is to determine and trust the IP address by matching the originating IP address with the analyzed correct IP address information, and to form a trusted IP address set. The set of trusted IP addresses will be stored as a set in the trained feature matching model.
The above description of the first mode of the preset modes is described below with respect to the second mode. The second preset mode is to determine whether the originating IP address is a trusted IP address mode according to the originating system identifier and the originating IP address.
The specific mode is as follows:
first, the originating IP address and the originator system ID of the e-mail sending domain are obtained from the off-line data system.
The sender system identification is a command sent by the sender to the receiver mail server to identify the sender. The originator system identification may be tagged with a hello, which may also be parsed in a similar manner to the form of the IP address. The domain name corresponding to the originator system id needs to match the originating IP address, so this step needs to first obtain the relevant characteristics that it originates from.
After the relevant features are obtained, it is necessary to determine whether the originator system identifier matches the originator IP address according to the relevant features.
The specific matching judgment mode can be realized by the following modes: specifically, in the step of determining whether the originator system identifier matches the originating IP address, the manner of determining whether the originator system identifier matches the originating IP address includes:
and judging whether the first-level domain name of the e-mail sending domain is the same as the first-level domain name of the sender system identification.
A first-class domain name is a domain name consisting of a top-class domain name, such as com, com.cn, or org, to the left of which a ". and a content field, for example, example. Domain name application is always applied according to a first-class domain name, and domain names of two or more classes are all owned by a first-class domain name owner, for example, www.example.com is also owned by an owner belonging to example.
Firstly, judging whether a first-level domain name of a sending domain is the same as a first domain name of an identification of the sender system, judging that the first-level domain names of the sending domain and the first domain name of the identification of the sender system are partially the same, indicating that identification information of the sender is the same as the information of the sending domain, indicating that the identification of the sender is real, then comparing the IP address of the sender with the IP address analyzed by the identification of the sender system, and further determining whether the IP address of the sender is real and credible.
Therefore, when the determination result of determining whether the primary domain name of the sending domain is the same as the primary domain name of the originator system identifier is yes, it needs to further determine whether the IP address resolved by the originator system identifier matches the originating IP address.
Firstly, the system identification of the sender needs to be analyzed into an IP address form through A, and the analyzed IP address is compared with the IP address of the sender and is subjected to matching judgment.
The specific matching determination may be performed by comparing the number of bits in the same field of the IP address, and the comparison in this manner is described above and will not be described in detail here.
Comparing the analyzed IP address with the transmitting IP address, and matching the system identifier of the transmitter with the transmitting IP address when the analyzed IP address and the transmitting IP address have the same number of bits with the preset number.
Conversely, the originator system identification does not match the originating IP address when the resolved IP address and the originating IP address do not have the same number of digits of the predetermined number.
The above describes how to determine the specific way of matching the originator system identifier with the originating IP address, and after the determination is performed according to the specific way, a corresponding determination result can be obtained.
And when the judgment result in the step of judging whether the sender system identification is matched with the sending IP address is yes, storing the IP address corresponding to the sending IP address as a credible IP address into a credible IP address set in the characteristic matching model. The step is the same as the last step in the first mode of the model training mode, namely, a credible IP address is determined and stored as a credible IP address set in the feature matching model.
And judging whether the sender system identification is matched with the sending IP address or not as a matching condition in the feature matching model aiming at the model training mode of the mode. In the subsequent steps, when the feature matching model is applied, the originating IP address to be screened needs to be matched and judged with the originator system identifier in the feature matching model, and whether the originating IP address to be screened is authentic or not is obtained according to the judgment.
In addition to the two preset modes described above, there is a third mode, and the training modes can be combined with one or two of the two training modes, or can be set independently.
The third method is related to the associated features of the originating IP address and the trusted IP address, and therefore, a second step S102-2 of training a feature matching model is required.
Specifically, step S102-2 is to acquire the associated feature corresponding to the trusted IP address.
The associated characteristics include originator system identification and mail header characteristics. The associated features corresponding to the trusted addresses mean that the trusted IP addresses in the trusted IP address set all correspond to the associated features thereof, and the associated features corresponding to the trusted IP address set are used for matching with the associated features of the originating IP address.
Correspondingly, the third mode is a mode of judging whether the originating IP address is a trusted IP address according to the similarity between the associated feature of the originating IP address and the associated feature corresponding to the trusted IP address. The method specifically comprises the following steps:
the sending IP address of the e-mail and the associated characteristics (sender system identification and mail header characteristics) of the sending IP address are obtained from the offline data system.
In the above steps, the sending IP address and the sender system identifier of the e-mail sending domain in the preset time period are both introduced in the second way, and the information is not introduced, and the following description focuses on obtaining the mail header characteristics.
The mail header features may include header fields (headers). The header characteristics reflect to some extent some of the characteristics associated with the originating mail system.
In addition, usually, the system mark of the sender of the normal mail system and the characteristics of the mail system generated by the system will not change frequently, so the above two characteristic information can be used as the judgment basis for identifying the identity characteristics of the sender of the e-mail.
The association characteristic of the transmitting IP address can be obtained through the steps, and whether the association characteristic of the transmitting IP address is matched with the system identification of the transmitting party and the mail header characteristic corresponding to the credible IP address can be judged according to the association characteristic.
The specific matching judgment mode can be realized by the following modes: specifically, in the step of determining whether the correlation characteristic of the originating IP address matches the originator system identifier and the mail header characteristic corresponding to the trusted IP address, the determining method includes:
and calculating the similarity of the associated characteristics of the transmitting IP address and the characteristic vectors of the associated characteristics (the system identification of the transmitting party and the characteristics of the mail header) corresponding to the credible IP address set by adopting a similarity measurement method of the characteristic vectors.
In this step, a method for measuring similarity of feature vectors is mainly introduced, which includes the following steps:
firstly, the associated characteristics corresponding to the credible IP address set and the associated characteristics corresponding to the signaling IP address are obtained.
For example, the associated features herein are typically related features identified by the originator system, such as a hello domain feature, a mail header feature, and the like.
And secondly, forming a feature vector by the features in the associated features corresponding to the credible IP address set and the weight corresponding to each feature.
Com, the associated characteristics of the originator system identification of the set of known trusted IP addresses of the sending domain, said mail header characteristics header, etc. are assumed to be as follows:
the mail header characteristic header comprises the following steps in sequence: and also has X-aliyun-ambient (weight can be set to 1), X-Alimail-AntiSpam (weight can be set to 1), X-aliyun-fingerprint (weight can be set to 1), X-aliyun-cluster header (weight can be set to 1)
Suffix form of the hello domain is: com (weight set to 3).
Wherein, it is required that a forward or reverse resolution of the hello domain exists and matches at the same time.
And secondly, acquiring the same characteristics in the associated characteristics corresponding to the credible IP address set in the associated characteristics of the transmitting IP address.
Checking whether the calling IP address is matched with the header or the hello domain according to the characteristics corresponding to the calling IP address, and if so, having corresponding weight. For example, if the same feature is X-Alimail-AntiSpam, X-aliyun-clusteddate header, and hello dmain, then the same feature has weights of 1, and 3, respectively.
And then, calculating the weight of the same feature.
The weights of the same features are added to obtain the weight of the same feature as 5.
Finally, the weight of the same characteristic is compared with the total weight of all the characteristics, and the similarity of the associated characteristic of the transmitting IP address and the characteristic vector of the associated characteristic corresponding to the credible IP address set is obtained.
Since the total feature weight is 7, the similarity degree at this time is 5/7-0.71.
In addition, in addition: com credible IP set association characteristics are generated by adopting an off-line mining mode, the number of the domain headers can be large, and only the headers of most mails existing in the domain need to be extracted.
headers may only select x-type headers and the weight is derived from their representativeness, e.g., if a header appears in all the sent domain mails with a weight of 1 and in 70% of the mails with a weight of 0.7, only headers weighted more than a certain threshold (e.g., 0.6) are taken.
The hello domain may be weighted to a ratio of 0.3, for example, if the weight of other header features is 7, then the hello domain is at least 3, which is also an empirical value, and may be adjusted for some sending domains in practical applications.
The above method is the similarity calculated by the similarity method of the feature vectors, and in addition, the similarity can be obtained by calculation in other similarity methods.
The similarity can be obtained through the similarity method of the feature vectors, and whether the similarity is larger than or equal to a preset similarity threshold value is judged according to the obtained similarity.
If the preset similarity threshold value is set to be 0.7, the similarity obtained through the calculation is 0.71, and the similarity obtained through the calculation is greater than the preset similarity threshold value of 0.7, so that the matching between the association feature of the originating IP address and the association feature corresponding to the trusted IP address set can be judged according to the result.
The preset similarity threshold is only an illustrative example, and the specific threshold can be set according to specific practical situations. Similarly, when the preset similarity threshold is set to be 0.8, the similarity 0.7 obtained through the above calculation is smaller than the threshold, so that the mismatching between the associated feature of the originating IP address and the associated feature corresponding to the trusted IP address set can be determined according to the result.
Judging whether the correlation characteristics of the transmitting IP address are matched with the correlation characteristics corresponding to the credible IP address set or not by a characteristic vector similarity method, and if so, storing the IP address corresponding to the transmitting IP address as a credible IP address into the credible IP address set in the characteristic matching model. The step is the same as the last step in the first mode of the preset mode, namely, a credible IP address is determined and stored as a credible IP address set in the characteristic matching model.
And judging whether the associated characteristics of the transmitting IP address are matched with the associated characteristics corresponding to the credible IP addresses in the credible IP address set or not as the matching conditions in the characteristic matching model.
In addition to the three preset modes described above, there is a fourth mode, and the training mode may be combined with any combination of the three modes or may be set separately.
Specifically, the fourth mode is a mode of determining whether the originating IP address is a trusted IP address according to a reply rate corresponding to a mail sent by the originating IP address. The specific method comprises the following steps:
and acquiring the quantity of the e-mails sent by adopting the sending IP address from the offline data system.
When a certain IP address of the mail sending domain is used for sending a mail, the number of the mails sent by the IP address can be recorded to form the number of the sent e-mails.
And acquiring the number of the mails received by the sending IP address in the preset time period.
And aiming at the sending IP address, recording the number of the replies of the mail by different recipients to form the number of the received mails.
And acquiring the reply rate of the sending IP address according to the quantity of the sent e-mails and the quantity of the received e-mails.
And comparing the number of the received mails with the number of the sent e-mails to obtain the reply rate of the IP address.
And judging whether the reply rate is greater than or equal to a preset reply rate threshold value.
It can be assumed that the set recovery threshold is 80%. And judging whether the calculated recovery rate exceeds the threshold value of 80%.
And when the reply rate exceeds a preset reply rate threshold value, storing the sending IP address serving as a credible IP address into a credible IP address set in the characteristic matching model.
In the fifth mode, whether the originating IP address is a trusted IP address is determined according to the correlation characteristic corresponding to the originating IP address and the mail opening rate. The specific mode is as follows:
and acquiring the sending IP address with the use frequency higher than the preset frequency in the preset time period from the offline data system.
And then, judging whether the system identification of the sender and the characteristics of the mail header corresponding to the IP address are stable.
If yes, judging whether the opening rate of the mail sent by the sending IP address is higher than a preset opening rate threshold value.
The preset opening rate threshold value here may be set to a degree of 30% -40%.
And when the opening rate of the mail sent by the sending IP address is higher than a preset opening rate threshold value and no bad record exists in the sending IP address, the sending IP address is used as a credible IP address and stored into a credible IP address set in the characteristic matching model.
If the first-level domain name and the mail header feature of the hello domain are kept stable when the mail sending domain uses certain IP addresses for sending for a long time, the hello domain can be set as non-junk mail, and the mail opening rate of the feature is required to be kept, so that the complaint of counterfeit and inferior quality can be explained, and therefore, the IP address can be considered as a credible IP address.
The above is to obtain the trusted IP address set and the feature matching condition in five preset manners, and preferably, the trusted IP address set and the feature matching condition are obtained and collected together in the five preset manners, that is, the trusted IP address set and the feature matching condition included in the feature matching model are realized in the five preset manners.
And S102-3, storing the acquired feature matching conditions, the credible IP address set and the associated features corresponding to the credible IP addresses in the feature matching model.
And finally, in the process of training the feature matching model, storing the acquired feature matching conditions, the credible IP address set and the associated features corresponding to the credible IP addresses in the feature matching model to form the feature matching model.
The introduction of the feature matching model is already completed, and the following is a specific screening method and process for the e-mail to be screened by the feature matching model.
In the above process of training the model, it is also introduced that the feature matching model has a set of reliable IP addresses and matching conditions of corresponding features. And carrying out matching degree test on the features to be screened in the preset feature set and the corresponding features in the feature matching conditions so as to judge whether the features to be screened in the preset feature set meet the feature matching conditions.
The specific matching judgment mode can be introduced and explained correspondingly according to different features to be screened.
And when the features to be screened in the preset feature set received in the step of extracting the preset feature set of the received e-mail to be screened are the sending IP addresses to be screened.
Correspondingly, the step of performing matching degree test on the features to be screened in the preset feature set and the corresponding features in the feature matching conditions based on the feature matching conditions in the pre-trained feature matching model, and judging whether the features to be screened in the preset feature set meet the feature matching conditions includes:
and setting the characteristic matching condition in the pre-trained characteristic matching model to judge whether a certain IP address belongs to the credible IP address set in the preset characteristic matching model.
In this case, first, the matching condition is set as whether or not it matches a set of trusted IP addresses in a preset feature matching model.
And judging whether the sending IP address to be screened in the preset characteristic set information belongs to the credible IP address set or not according to the set characteristic matching condition.
If yes, the sending IP address to be screened meets the feature matching condition; at this point, the end of the screening process may be set. And if the judgment result is negative, the signaling IP address to be discriminated does not meet the feature matching condition, and the judgment is carried out through the feature matching condition.
The discrimination method is that when the mail sending IP address belongs to the credible IP address set, the identity of the mail sender can be considered to be credible, so that the sending IP address to be discriminated meets the characteristic matching condition, otherwise, the sending IP address to be discriminated does not meet the characteristic matching condition.
In addition, a second screening method can be adopted, which can be performed when the originating IP address to be screened is judged not to satisfy the feature matching condition in the step of the first screening method, and certainly, the originating IP address can be screened again when the originating IP address to be screened satisfies the feature matching condition.
The second screening method is as follows:
and the characteristics in the preset characteristic set received in the step of extracting the preset characteristic set of the received electronic mail to be screened are sending domain information and a sending IP address to be screened.
Correspondingly, the step of performing matching degree test on the features to be screened in the preset feature set and the corresponding features in the feature matching conditions based on the feature matching conditions in the pre-trained feature matching model, and judging whether the features to be screened in the preset feature set meet the feature matching conditions includes:
and analyzing the sending domain of the e-mail to obtain the analyzed IP address.
And the characteristic matching condition in the pre-trained characteristic matching model is to judge whether the signaling IP address is matched with the analyzed IP address information.
And taking the to-be-screened sending IP address as a sending IP address in a feature matching condition, and judging whether the to-be-screened sending IP address is matched with the analyzed IP address according to the feature matching condition.
The judging mode of judging whether the sending IP address to be screened is matched with the analyzed IP address in the step of judging whether the sending IP address to be screened is matched with the analyzed IP address comprises the following steps:
and judging whether the sending IP address to be screened and the analyzed IP address have the same number of preset digits.
If yes, the sending IP address to be screened meets the feature matching condition.
If not, the sending IP address to be screened does not meet the feature matching condition.
In addition, a third screening method can be adopted, and the method can be performed under the condition that the transmission IP address to be screened is judged not to meet the feature matching condition in the step of the second screening method, and of course, the method can also perform the process of screening and verifying the transmission IP address again when the transmission IP address to be screened meets the feature matching condition.
The third screening method is as follows:
and the characteristics in the preset characteristic set received in the step of extracting the preset characteristic set of the received electronic mail to be screened are sending domain information, a sending IP address to be screened and a sender system identification.
Correspondingly, the step of performing matching degree test on the features to be screened in the preset feature set and the corresponding features in the feature matching conditions based on the feature matching conditions in the pre-trained feature matching model, and judging whether the features to be screened in the preset feature set meet the feature matching conditions includes:
and the characteristic matching condition in the pre-trained characteristic matching model is to judge whether the sender system identification is matched with the sending IP address.
And taking the transmitting IP address to be screened as the transmitting IP address in the characteristic matching condition, and judging whether the transmitting IP address to be screened is matched with the system identifier of the transmitter or not according to the characteristic matching condition.
The judging matching mode in the step of judging whether the transmitting IP address to be screened is matched with the transmitting party system identification comprises the following steps:
and judging whether the primary domain name of the sending domain is the same as the primary domain name of the sender system identifier. If yes, judging whether the IP address analyzed by the sender system identification is matched with the sending IP address to be screened.
If yes, the sending IP address to be screened meets the feature matching condition; if not, the sending IP address to be screened does not meet the feature matching condition.
In addition, a fourth screening method can be adopted, and the method can be performed under the condition that the sending IP address to be screened is judged not to meet the feature matching condition in the step of the third or second screening method, and of course, the sending IP address can also be screened and verified again when the sending IP address to be screened meets the feature matching condition.
The fourth screening method is as follows:
the characteristics in the preset characteristic set received in the step of extracting the preset characteristic set of the received electronic mail to be screened are the originating IP address to be screened and the correlation characteristics (the system identifier of the originator and the mail header characteristics) of the originating IP address.
Correspondingly, the step of performing matching degree test on the features to be screened in the preset feature set and the corresponding features in the feature matching conditions based on the feature matching conditions in the pre-trained feature matching model, and judging whether the features to be screened in the preset feature set meet the feature matching conditions includes:
and judging whether the associated characteristics of the transmitting IP address to be screened are matched with the associated characteristics corresponding to the credible IP addresses in the credible IP address set.
The judging and matching mode adopted in the step of judging whether the correlation characteristics of the transmitting IP address to be screened are matched with the correlation characteristics corresponding to the credible IP addresses in the credible IP address set comprises the following steps:
and judging whether the similarity between the associated features of the transmitting IP address to be screened and the associated features corresponding to the credible IP address is greater than or equal to a preset similarity threshold value.
If yes, the sending IP address to be screened meets the feature matching condition; if not, the sending IP address to be screened does not meet the feature matching condition.
In addition, in the above-mentioned screening method, after the first screening method is completed, whether the mail outgoing IP address judged belongs to the credible IP address set or not, in order to still ensure the accuracy of the credibility, the second to fourth screening methods can be used for further screening and judging the mail outgoing IP address.
For example: the step of extracting the characteristics in the preset characteristic set received in the step of extracting the preset characteristic set of the received e-mail to be screened further comprises the following steps: sending domain information, originator system identification, and mail header characteristics.
Correspondingly, when the judgment result of the step of judging whether the sending IP address to be screened in the preset feature set information belongs to the credible IP address set is negative, the following steps are executed:
and analyzing the sending domain information to obtain an analyzed IP address.
And judging whether the sending IP address to be screened and the analyzed IP address have the same number of preset digits.
When the two addresses of the transmitting IP address to be screened and the analyzed IP address do not have the same number of preset digits, the following steps are executed:
and judging whether the similarity between the outgoing IP address to be screened and the characteristics of the system identifier and the mail header of the outgoing party is greater than or equal to a preset similarity threshold.
If yes, the sending IP address to be screened meets the feature matching condition.
In another mode, the extracting the features in the preset feature set received in the step of the preset feature set of the received e-mail to be screened further includes: sending domain information, originator system identification, and mail header characteristics.
Correspondingly, after the step that the originating IP address to be screened meets the feature matching condition, the method includes:
and resolving the sending domain information into an resolved IP address.
And judging whether the sending IP address to be screened and the analyzed IP address have the same number of preset digits.
When the sending IP address to be screened and the analyzed IP address have the same number of preset digits, executing the following steps:
and judging whether the similarity between the outgoing IP address to be screened and the characteristics of the system identifier and the mail header of the outgoing party is greater than or equal to a preset similarity threshold.
If yes, the sending IP address to be screened meets the feature matching condition.
In the above manner, when the originating IP address belongs to the trusted IP address set, the further determination is still performed to determine a stricter similarity, and the more feature matching conditions are verified at the same time for the same originating IP address, the higher the reliability of the IP address will be.
And step S103, identifying the identity characteristics of the sender of the electronic mail to be screened as a credible mail sender.
And respectively testing the matching degree of the extracted preset feature set according to the matching conditions in the pre-trained feature matching model, judging and judging whether the features to be screened in the preset feature set meet the feature matching conditions, and judging that the identity features of the sender of the electronic mail to be screened are credible mail senders when the features to be screened in the preset feature set meet the feature matching conditions.
In summary, the method provided by the first embodiment of the present application does not require any setting by the sender of the mail, but discriminates the authenticity and the credibility of the identity of the sender of the mail by collecting and matching the sending behavior information of the mail, so the method does not depend on the relevant setting of the sender, is suitable for the e-mail sent by any sender, and discriminates the identity characteristic of the sender more accurately by the method, and is not limited by any sender mail setting.
In addition, in practical applications, the method provided in the first embodiment of the present application can also be applied in combination with the SPF method or DKIM method in the prior art.
In contrast to the method for screening identity characteristics of an email sender provided in the first embodiment of the present application, a second embodiment of the present application further provides an apparatus for screening identity characteristics of an email sender, fig. 3 is a schematic structural diagram of the apparatus for screening identity characteristics of an email sender provided in the second embodiment of the present application, please refer to fig. 3, and the apparatus includes:
a preset feature set extracting unit 201, configured to extract a preset feature set of a received email to be screened;
a feature matching condition determining unit 202, configured to perform a matching degree test on the features to be screened in the preset feature set and the corresponding features in the feature matching conditions according to feature matching conditions in a pre-trained feature matching model, and determine whether the features to be screened in the preset feature set satisfy the feature matching conditions;
and the identity characteristic screening unit 203 is configured to, if the judgment result of the characteristic matching condition judgment unit is yes, screen the identity characteristic of the sender of the electronic mail to be screened as a trusted mail sender.
Preferably, the feature matching condition determining unit further includes: a feature matching model training subunit;
the feature matching model training subunit further includes:
the device comprises a characteristic matching condition and credible IP address obtaining subunit, a characteristic matching condition and credible IP address obtaining subunit and a characteristic matching condition and credible IP address obtaining subunit, wherein the characteristic matching condition and credible IP address obtaining subunit are used for obtaining the characteristic matching condition and credible IP address in a preset mode based on data in a preset time period of an offline data system; the credible IP address is stored into a credible IP address set;
the associated feature acquiring subunit is used for acquiring the associated feature corresponding to the trusted IP address;
and the storage subunit is used for storing the acquired feature matching conditions, the credible IP address set and the associated features corresponding to the credible IP addresses in the feature matching model.
Preferably, the preset mode in the feature matching condition and trusted IP address obtaining subunit includes at least one of the following units:
a sending domain judging subunit, configured to judge, according to the sending domain and the originating IP address, whether the originating IP address is a trusted IP address;
the sender system identification judging subunit is used for judging whether the sender IP address is in a credible IP address mode according to the sender system identification and the sender IP address;
the association characteristic similarity judging subunit is used for judging whether the originating IP address is a reliable IP address or not according to the similarity of the association characteristic of the originating IP address and the association characteristic corresponding to the reliable IP address;
the reply rate judging subunit is used for judging whether the sending IP address is a reliable IP address or not according to the reply rate corresponding to the mail sent by the sending IP address;
and the opening rate judging subunit is used for judging whether the sending IP address is a reliable IP address or not according to the correlation characteristic corresponding to the sending IP address and the mail opening rate.
Preferably, the sending domain determining subunit includes:
a sending IP address obtaining subunit, configured to obtain a sending IP address of the email from the offline data system;
the analysis subunit is used for analyzing the sending domain of the e-mail to obtain an analyzed IP address;
a judging subunit, configured to judge whether the originating IP address matches the resolved IP address;
the credible IP address generating subunit is used for taking the IP address corresponding to the transmitting IP address as a credible IP address if the judgment result of the judging subunit is positive;
the feature matching condition obtained by adopting the preset mode is to judge whether the IP address information after the transmission IP address is analyzed by the transmission domain is matched with the IP address information after the transmission domain is analyzed.
Preferably, the judging subunit includes:
a digit judging subunit, configured to judge whether there is a preset number of identical digits between the two addresses of the originating IP address and the resolved IP address;
if yes, the sending IP address is matched with the analyzed IP address information;
if not, the sending IP address is not matched with the analyzed IP address information.
Preferably, the originator system identification determining subunit includes:
a sending IP address and sender system identification obtaining subunit, configured to obtain a sending IP address and a sender system identification of an e-mail sending domain from an offline data system;
a judging subunit, configured to judge whether the originator system identifier matches the originating IP address;
if yes, the IP address corresponding to the transmitting IP address is used as a credible IP address;
the feature matching condition obtained by adopting the preset mode is to judge whether the system identification of the sender of the electronic mail is matched with the IP address of the sender.
Preferably, the judging subunit includes:
a first-level domain name judging subunit, configured to judge whether a first-level domain name of an email sending domain is the same as a first-level domain name of the originator system identifier;
a matching judgment subunit, configured to judge whether the IP address resolved by the originator system identifier matches the originating IP address if the judgment result of the primary domain name judgment subunit is yes;
when the analyzed IP address and the transmitting IP address have the same number of digits with the preset number, the system identification of the transmitter is matched with the transmitting IP address;
when the resolved IP address and the originating IP address do not have the same number of bits of a preset number, the originator system identifier is not matched with the originating IP address.
Preferably, the association characteristic similarity determination subunit includes:
the associated characteristic acquiring subunit is used for acquiring the sending IP address of the email and the associated characteristic of the sending IP address from the offline data system; the associated characteristics comprise sender system identification and mail header characteristics;
the judging subunit is used for judging whether the associated characteristics of the sending IP address are matched with the associated characteristics corresponding to the credible IP addresses in the credible IP address set;
if yes, the IP address corresponding to the transmitting IP address is used as a credible IP;
the feature matching condition obtained in the preset mode is to judge whether the associated feature of the sending IP address is matched with the associated feature corresponding to the credible IP address in the credible IP address set.
Preferably, the judging subunit includes:
a similarity obtaining subunit, configured to calculate, by using a similarity measurement method for feature vectors, a similarity between the associated feature of the originating IP address and a feature vector corresponding to the associated feature in the trusted IP address set;
a threshold judgment subunit, configured to judge whether the similarity is greater than or equal to a preset similarity threshold;
if so, the correlation characteristic of the transmitting IP address is matched with the correlation characteristic corresponding to the credible IP address in the credible IP address set;
if not, the associated characteristics of the transmitting IP address are not matched with the associated characteristics corresponding to the credible IP addresses in the credible IP address set.
Preferably, the similarity obtaining subunit includes:
the associated characteristic acquiring subunit is used for acquiring the associated characteristic corresponding to the credible IP address set and the associated characteristic of the signaling IP address;
the characteristic vector generating subunit is used for forming a characteristic vector by the characteristics in the associated characteristics corresponding to the credible IP address set and the weight corresponding to each characteristic;
a same feature obtaining subunit, configured to obtain a same feature in association features of the originating IP address, where the association features correspond to the trusted IP address set;
the weight calculation subunit is used for calculating the weight of the same characteristic;
and the similarity obtaining subunit is configured to compare the weight of the same feature with a total weight of all features, and obtain a similarity between the associated feature of the originating IP address and a feature vector of the associated feature corresponding to the trusted IP address set.
Preferably, the recovery rate determining subunit includes:
the sending electronic mail quantity obtaining subunit is used for obtaining the quantity of the electronic mails sent by adopting the sending IP address from the offline data system;
a received mail quantity acquiring subunit, configured to acquire the quantity of the mails received by the originating IP address within the preset time period;
a reply rate calculating subunit, configured to obtain a reply rate of the outgoing IP address according to the number of the sent emails and the number of the received emails;
a threshold judgment subunit, configured to judge whether the reply rate is greater than or equal to a preset reply rate threshold;
if yes, the sending IP address is used as a credible IP address.
Preferably, the opening rate judging subunit includes:
a sending IP address obtaining subunit, configured to obtain, from the offline data system, a sending IP address whose usage frequency is higher than a preset frequency within a preset time period;
a stability judging subunit, configured to judge whether the sender system identifier and the mail header feature corresponding to the sending IP address are stable;
a preset opening rate threshold judging subunit, configured to, if the judgment result of the stability judging subunit is yes, judge whether the opening rate of the email sent by the sending IP address is higher than a preset opening rate threshold;
and the credible IP address generating subunit is used for taking the sending IP address as the credible IP address when the opening rate of the mail sent by the sending IP address is higher than a preset opening rate threshold value and no bad record exists in the sending IP address.
Preferably, the characteristics in the preset characteristic set received by the preset characteristic set extracting unit are the calling IP addresses to be screened;
correspondingly, the feature matching condition judgment unit includes:
a certain IP address judging subunit, configured to judge whether a certain IP address belongs to the set of trusted IP addresses according to a feature matching condition in the pre-trained feature matching model;
a credible IP address set judging subunit, configured to judge, according to the set feature matching condition, whether the originating IP address to be screened belongs to the credible IP address set;
if yes, the sending IP address to be screened meets the feature matching condition.
Preferably, when the determination result of the trusted IP address set determining subunit is negative, the method further includes:
the analyzed IP address acquisition subunit is used for analyzing the sending domain of the e-mail to obtain an analyzed IP address;
an IP address matching judgment subunit, configured to judge whether the originating IP address to be screened matches the resolved IP address according to the feature matching condition;
if yes, the sending IP address to be screened meets the feature matching condition.
Preferably, when the judgment result of the IP address matching judgment subunit is negative, the method further includes:
the transmitting IP address and the transmitting party system identification matching judgment subunit is used for judging whether the transmitting IP address to be screened is matched with the transmitting party system identification according to the characteristic matching condition;
if yes, the sending IP address to be screened meets the feature matching condition.
Preferably, when the judgment result of the originating IP address and the originator system identifier matching judgment subunit is negative, the method further includes:
the associated feature matching judgment subunit is used for judging whether the associated features of the originating IP address to be screened are matched with the associated features corresponding to the credible IP addresses in the credible IP address set or not according to the feature matching conditions;
if yes, the sending IP address to be screened meets the feature matching condition.
Preferably, the trusted IP address generation subunit further includes, after:
the associated feature matching judgment subunit is used for judging whether the associated features of the originating IP address to be screened are matched with the associated features corresponding to the credible IP addresses in the credible IP address set or not;
if yes, the sending IP address to be screened meets the feature matching condition.
Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims (38)

1. A method for screening the identity of an e-mail sender, comprising:
extracting a preset feature set of the received e-mail to be screened;
according to a feature matching condition in a pre-trained feature matching model, carrying out matching degree test on the features to be screened in the preset feature set and the corresponding features in the feature matching condition, and judging whether the features to be screened in the preset feature set meet the feature matching condition; the feature matching model comprises the feature matching conditions, a credible IP address set and associated features corresponding to credible IP addresses in the credible IP address set; if so, the identity characteristics of the sender of the electronic mail to be screened are screened as a credible mail sender;
the training mode of the pre-trained feature matching model comprises the following steps:
acquiring a characteristic matching condition and a credible IP address in a preset mode based on data in a preset time period of an offline data system; the credible IP address is stored into a credible IP address set;
acquiring the associated characteristics corresponding to the credible IP address;
storing the obtained feature matching conditions, the credible IP address set and the associated features corresponding to the credible IP addresses in the feature matching model;
the credible IP address set refers to IP addresses which are credible and have authenticity and are stored in the address set, and the credible IP addresses are formed after being trained and learned through offline data.
2. The method for screening identity characteristics of an email sender according to claim 1, wherein in the step of obtaining the trusted IP address in a preset manner, the preset manner includes at least one of the following manners:
judging whether the transmitting IP address is a credible IP address or not according to the transmitting domain and the transmitting IP address;
judging whether the transmitting IP address is a credible IP address mode or not according to the system identifier of the transmitting party and the transmitting IP address;
judging whether the originating IP address is a credible IP address or not according to the similarity of the associated characteristics of the originating IP address and the associated characteristics corresponding to the credible IP address;
judging whether the sending IP address is a credible IP address or not according to the reply rate corresponding to the mail sent by the sending IP address;
and judging whether the IP address is a credible IP address or not according to the correlation characteristics corresponding to the IP address and the mail opening rate.
3. The method for screening identity characteristics of an e-mail sender according to claim 2, wherein the manner of determining whether the originating IP address is a trusted IP address according to the sending domain and the originating IP address comprises;
acquiring a sending IP address of the e-mail from an offline data system;
analyzing a sending domain of the e-mail to obtain an analyzed IP address;
judging whether the transmitting IP address is matched with the analyzed IP address;
if yes, the IP address corresponding to the transmitting IP address is used as a credible IP address;
the feature matching condition obtained by adopting the preset mode is to judge whether the IP address information after the transmission IP address is analyzed by the transmission domain is matched with the IP address information after the transmission domain is analyzed.
4. The method for screening identity characteristics of an e-mail sender according to claim 3, wherein in the step of determining whether the originating IP address matches the analyzed IP address, the manner of determining whether the originating IP address matches the analyzed IP address includes:
judging whether the two addresses of the transmitting IP address and the analyzed IP address have the same number of bits with the preset number;
if yes, the sending IP address is matched with the analyzed IP address information;
if not, the sending IP address is not matched with the analyzed IP address information.
5. The method for screening the identity of the sender of the e-mail according to claim 3, wherein in the step of analyzing the sending domain of the e-mail, the sending domain analyzing method comprises any one of the following methods:
analyzing the sending domain by setting an IP address record;
and analyzing the sending domain by adopting a mail exchange record mode.
6. The method for screening identity characteristics of an e-mail sender according to claim 2, wherein the step of determining whether the originating IP address is a trusted IP address according to the originator system identifier and the originating IP address comprises:
acquiring a transmitting IP address and a transmitting party system identification of an e-mail transmitting domain from an offline data system;
judging whether the sender system identification is matched with the sender IP address;
if yes, the IP address corresponding to the transmitting IP address is used as a credible IP address;
the feature matching condition obtained by adopting the preset mode is to judge whether the system identification of the sender of the electronic mail is matched with the IP address of the sender.
7. The method of screening email senders' identity of claim 6, wherein in said step of determining if said sender system identification matches said sender IP address, determining if said sender system identification matches said sender IP address comprises:
judging whether the first-level domain name of the e-mail sending domain is the same as the first-level domain name of the sender system identification;
if yes, judging whether the IP address analyzed by the sender system identification is matched with the sending IP address;
when the analyzed IP address and the transmitting IP address have the same number of digits with the preset number, the system identification of the transmitter is matched with the transmitting IP address;
when the resolved IP address and the originating IP address do not have the same number of bits of a preset number, the originator system identifier is not matched with the originating IP address.
8. The method for screening identity characteristics of an e-mail sender according to claim 2, wherein the manner of judging whether the originating IP address is a trusted IP address according to the similarity between the associated characteristics of the originating IP address and the associated characteristics corresponding to the trusted IP address comprises:
acquiring a sending IP address of an E-mail and the correlation characteristics of the sending IP address from an offline data system; the associated characteristics comprise sender system identification and mail header characteristics;
judging whether the correlation characteristics of the transmitting IP address are matched with the correlation characteristics corresponding to the credible IP addresses in the credible IP address set or not;
if yes, the IP address corresponding to the transmitting IP address is used as a credible IP;
the feature matching condition obtained in the preset mode is to judge whether the associated feature of the sending IP address is matched with the associated feature corresponding to the credible IP address in the credible IP address set.
9. The method for screening identity characteristics of email senders according to claim 8, wherein in said step of determining whether the association characteristic of the originating IP address matches the association characteristic corresponding to the trusted IP address in the set of trusted IP addresses, the manner of determining whether the association characteristic of the originating IP address matches the association characteristic corresponding to the trusted IP address in the set of trusted IP addresses includes:
calculating the similarity of the associated feature of the transmitting IP address and the feature vector corresponding to the associated feature corresponding to the credible IP address set by adopting a feature vector similarity measurement method;
judging whether the similarity is greater than or equal to a preset similarity threshold value or not;
if so, the correlation characteristic of the transmitting IP address is matched with the correlation characteristic corresponding to the credible IP address in the credible IP address set;
if not, the associated characteristics of the transmitting IP address are not matched with the associated characteristics corresponding to the credible IP addresses in the credible IP address set.
10. The method for screening identity characteristics of an email sender according to claim 9, wherein the similarity measurement method of the feature vector specifically comprises:
acquiring the associated characteristics corresponding to the credible IP address set and the associated characteristics of the signaling IP address;
forming a feature vector by using the features in the associated features corresponding to the credible IP address set and the weight corresponding to each feature;
acquiring the same characteristics in the associated characteristics corresponding to the credible IP address set in the associated characteristics of the transmitting IP address;
calculating and obtaining the weight of the same feature;
and comparing the weight of the same characteristic with the total weight of all the characteristics to obtain the similarity of the associated characteristic of the signaling IP address and the characteristic vector of the associated characteristic corresponding to the credible IP address set.
11. The method for screening identity characteristics of e-mail senders according to claim 2, wherein the manner of judging whether the sending IP address is a trusted IP address according to the reply rate corresponding to the e-mail sent by the sending IP address comprises:
acquiring the quantity of the e-mails sent by adopting the sending IP address from an offline data system;
acquiring the number of the mails received by the sending IP address in the preset time period;
acquiring the reply rate of the sending IP address according to the quantity of the sent e-mails and the quantity of the received e-mails;
judging whether the reply rate is greater than or equal to a preset reply rate threshold value;
if yes, the sending IP address is used as a credible IP address.
12. The method for screening identity characteristics of an e-mail sender according to claim 2, wherein the manner of judging whether the originating IP address is a trusted IP address according to the correlation characteristics corresponding to the originating IP address and the mail opening rate comprises:
acquiring a signaling IP address with the use frequency higher than a preset frequency in a preset time period from an offline data system;
judging whether the sender system identification and the mail header characteristic corresponding to the sending IP address are stable or not;
if yes, judging whether the opening rate of the mail sent by the sending IP address is higher than a preset opening rate threshold value or not;
and when the opening rate of the mail sent by the sending IP address is higher than a preset opening rate threshold value and no bad record exists in the sending IP address, taking the sending IP address as a credible IP address.
13. The method for screening the identity characteristics of the sender of the electronic mail according to claim 2, wherein the characteristics to be screened in the preset characteristic set received in the step of extracting the preset characteristic set of the received electronic mail to be screened are the originating IP addresses to be screened;
correspondingly, the step of performing matching degree test on the features to be screened in the preset feature set and the corresponding features in the feature matching conditions based on the feature matching conditions in the pre-trained feature matching model, and judging whether the features to be screened in the preset feature set meet the feature matching conditions includes:
the characteristic matching condition in the pre-trained characteristic matching model is to judge whether a certain IP address belongs to the credible IP address set;
judging whether the sending IP address to be screened belongs to the credible IP address set or not according to the set characteristic matching condition;
if yes, the sending IP address to be screened meets the feature matching condition.
14. The method for screening identity characteristics of an email sender according to claim 1, wherein when the determination result that whether the originating IP address to be screened belongs to the trusted IP address set is no, the determination of the feature matching condition in the feature matching model is performed again, and the feature matching condition is to determine whether the originating IP address matches the IP address after the sending domain analysis;
the specific judgment method comprises the following steps:
analyzing a sending domain of the e-mail to obtain an analyzed IP address;
judging whether the sending IP address to be discriminated is matched with the analyzed IP address according to the characteristic matching condition;
if yes, the sending IP address to be screened meets the feature matching condition.
15. The method for screening the identity characteristics of an e-mail sender according to claim 12, wherein the judging means for judging whether the originating IP address to be screened matches the resolved IP address in the step of judging whether the originating IP address to be screened matches the resolved IP address comprises:
and judging whether the sending IP address to be screened and the analyzed IP address have the same number of preset digits.
16. The method for screening identity characteristics of an email sender according to claim 14, wherein when the determination result in the step of determining whether the originating IP address to be screened matches the resolved IP address is negative, a determination of a feature matching condition in a feature matching model is performed, the feature matching condition being a determination of whether the originating system identifier of the email matches the originating IP address;
the specific judgment method comprises the following steps:
judging whether the calling IP address to be screened is matched with the calling party system identification or not according to the characteristic matching condition;
if yes, the sending IP address to be screened meets the feature matching condition.
17. The method for screening the identity of an e-mail sender according to claim 16, wherein the determining whether the originating IP address to be screened matches the originator system identifier comprises:
judging whether the first-level domain name of the mail sending domain is the same as the first-level domain name of the sender system identification;
if yes, judging whether the IP address analyzed by the sender system identification is matched with the sending IP address to be screened.
18. The method for screening identity characteristics of an e-mail sender according to claim 16, wherein in the step of determining whether the originating IP address to be screened matches the originator system identifier, if not, performing a determination of a feature matching condition in a feature matching model, the feature matching condition being a determination of whether an associated feature of the originating IP address matches an associated feature corresponding to a trusted IP address in a set of trusted IP addresses; the associated characteristics comprise sender system identification and mail header characteristics;
the specific judgment method comprises the following steps:
judging whether the correlation characteristics of the transmitting IP address to be screened are matched with the correlation characteristics corresponding to the credible IP addresses in the credible IP address set or not according to the characteristic matching condition;
if yes, the sending IP address to be screened meets the feature matching condition.
19. The method for screening identity characteristics of an email sender according to claim 18, wherein the step of determining whether the correlation characteristics of the originating IP address to be screened match the correlation characteristics corresponding to the trusted IP addresses in the set of trusted IP addresses includes:
and judging whether the similarity between the associated features of the transmitting IP address to be screened and the associated features corresponding to the credible IP address is greater than or equal to a preset similarity threshold value.
20. The method for screening the identity of an e-mail sender according to claim 13, wherein the step of judging that the originating IP address to be screened meets the feature matching condition is executed by:
judging whether the correlation characteristics of the transmitting IP address to be screened are matched with the correlation characteristics corresponding to the credible IP addresses in the credible IP address set;
if yes, the sending IP address to be screened meets the feature matching condition.
21. The method for screening identity characteristics of email senders according to claim 1, wherein the receiving mode of the email to be screened in the step of extracting the preset feature set of the received email to be screened comprises: and receiving the e-mail to be screened by adopting a mail transmission agent system.
22. The method for screening identity of an e-mail sender according to claim 1, wherein the feature information included in the preset feature matching model comprises: the sender IP address, the sending domain information, the sender system identification and the mail header characteristics.
23. An apparatus for screening the identity of an e-mail sender, comprising:
the preset feature set extracting unit is used for extracting a preset feature set of the received electronic mail to be screened;
the characteristic matching condition judging unit is used for carrying out matching degree test on the characteristics to be screened in the preset characteristic set and the corresponding characteristics in the characteristic matching conditions according to the characteristic matching conditions in the pre-trained characteristic matching model, and judging whether the characteristics to be screened in the preset characteristic set meet the characteristic matching conditions or not;
the identity characteristic screening unit is used for screening the identity characteristic of the sender of the electronic mail to be screened as a credible mail sender if the judgment result of the characteristic matching condition judgment unit is positive;
the feature matching model comprises the feature matching conditions, a credible IP address set and associated features corresponding to credible IP addresses in the credible IP address set;
wherein, the feature matching condition judging unit further comprises: a feature matching model training subunit;
the feature matching model training subunit further includes:
the device comprises a characteristic matching condition and credible IP address obtaining subunit, a characteristic matching condition and credible IP address obtaining subunit and a characteristic matching condition and credible IP address obtaining subunit, wherein the characteristic matching condition and credible IP address obtaining subunit are used for obtaining the characteristic matching condition and credible IP address in a preset mode based on data in a preset time period of an offline data system; the credible IP address is stored into a credible IP address set;
the associated feature acquiring subunit is used for acquiring the associated feature corresponding to the trusted IP address;
the storage subunit is used for storing the acquired feature matching conditions, the credible IP address set and the associated features corresponding to the credible IP addresses in the feature matching model;
the credible IP address set refers to IP addresses which are credible and have authenticity and are stored in the address set, and the credible IP addresses are formed after being trained and learned through offline data.
24. The apparatus for screening identity of an e-mail sender according to claim 23, wherein the preset manner in the feature matching condition and trusted IP address obtaining subunit includes at least one of the following units:
a sending domain judging subunit, configured to judge, according to the sending domain and the originating IP address, whether the originating IP address is a trusted IP address;
the sender system identification judging subunit is used for judging whether the sender IP address is in a credible IP address mode according to the sender system identification and the sender IP address;
the association characteristic similarity judging subunit is used for judging whether the originating IP address is a reliable IP address or not according to the similarity of the association characteristic of the originating IP address and the association characteristic corresponding to the reliable IP address;
the reply rate judging subunit is used for judging whether the sending IP address is a reliable IP address or not according to the reply rate corresponding to the mail sent by the sending IP address;
and the opening rate judging subunit is used for judging whether the sending IP address is a reliable IP address or not according to the correlation characteristic corresponding to the sending IP address and the mail opening rate.
25. The apparatus for screening the identity of an e-mail sender according to claim 24, wherein the sending domain determining subunit comprises:
a sending IP address obtaining subunit, configured to obtain a sending IP address of the email from the offline data system;
the analysis subunit is used for analyzing the sending domain of the e-mail to obtain an analyzed IP address;
a judging subunit, configured to judge whether the originating IP address matches the resolved IP address;
the credible IP address generating subunit is used for taking the IP address corresponding to the transmitting IP address as a credible IP address if the judgment result of the judging subunit is positive;
the feature matching condition obtained by adopting the preset mode is to judge whether the IP address information after the transmission IP address is analyzed by the transmission domain is matched with the IP address information after the transmission domain is analyzed.
26. The apparatus for screening the identity of an e-mail sender according to claim 25, wherein the judging subunit comprises:
a digit judging subunit, configured to judge whether there is a preset number of identical digits between the two addresses of the originating IP address and the resolved IP address;
if yes, the sending IP address is matched with the analyzed IP address information;
if not, the sending IP address is not matched with the analyzed IP address information.
27. The apparatus for screening the identity of an e-mail sender according to claim 24, wherein the originator system identification determining subunit comprises:
a sending IP address and sender system identification obtaining subunit, configured to obtain a sending IP address and a sender system identification of an e-mail sending domain from an offline data system;
a judging subunit, configured to judge whether the originator system identifier matches the originating IP address;
if yes, the IP address corresponding to the transmitting IP address is used as a credible IP address;
the feature matching condition obtained by adopting the preset mode is to judge whether the system identification of the sender of the electronic mail is matched with the IP address of the sender.
28. The apparatus for screening the identity of an e-mail sender according to claim 27, wherein the judging subunit comprises:
a first-level domain name judging subunit, configured to judge whether a first-level domain name of an email sending domain is the same as a first-level domain name of the originator system identifier;
a matching judgment subunit, configured to judge whether the IP address resolved by the originator system identifier matches the originating IP address if the judgment result of the primary domain name judgment subunit is yes;
when the analyzed IP address and the transmitting IP address have the same number of digits with the preset number, the system identification of the transmitter is matched with the transmitting IP address;
when the resolved IP address and the originating IP address do not have the same number of bits of a preset number, the originator system identifier is not matched with the originating IP address.
29. The apparatus for screening identity of an e-mail sender according to claim 24, wherein the correlation characteristic similarity determining subunit comprises:
the associated characteristic acquiring subunit is used for acquiring the sending IP address of the email and the associated characteristic of the sending IP address from the offline data system; the associated characteristics comprise sender system identification and mail header characteristics;
the judging subunit is used for judging whether the associated characteristics of the sending IP address are matched with the associated characteristics corresponding to the credible IP addresses in the credible IP address set;
if yes, the IP address corresponding to the transmitting IP address is used as a credible IP;
the feature matching condition obtained in the preset mode is to judge whether the associated feature of the sending IP address is matched with the associated feature corresponding to the credible IP address in the credible IP address set.
30. The apparatus for screening the identity of an e-mail sender according to claim 29, wherein the judging subunit comprises:
a similarity obtaining subunit, configured to calculate, by using a similarity measurement method for feature vectors, a similarity between the associated feature of the originating IP address and a feature vector corresponding to the associated feature in the trusted IP address set;
a threshold judgment subunit, configured to judge whether the similarity is greater than or equal to a preset similarity threshold;
if so, the correlation characteristic of the transmitting IP address is matched with the correlation characteristic corresponding to the credible IP address in the credible IP address set;
if not, the associated characteristics of the transmitting IP address are not matched with the associated characteristics corresponding to the credible IP addresses in the credible IP address set.
31. The apparatus for screening identity of an e-mail sender according to claim 30, wherein the similarity obtaining subunit comprises:
the associated characteristic acquiring subunit is used for acquiring the associated characteristic corresponding to the credible IP address set and the associated characteristic of the signaling IP address;
the characteristic vector generating subunit is used for forming a characteristic vector by the characteristics in the associated characteristics corresponding to the credible IP address set and the weight corresponding to each characteristic;
a same feature obtaining subunit, configured to obtain a same feature in association features of the originating IP address, where the association features correspond to the trusted IP address set;
the weight calculation subunit is used for calculating the weight of the same characteristic;
and the similarity obtaining subunit is configured to compare the weight of the same feature with a total weight of all features, and obtain a similarity between the associated feature of the originating IP address and a feature vector of the associated feature corresponding to the trusted IP address set.
32. The apparatus for screening the identity of an e-mail sender according to claim 24, wherein the reply rate determining subunit comprises:
the sending electronic mail quantity obtaining subunit is used for obtaining the quantity of the electronic mails sent by adopting the sending IP address from the offline data system;
a received mail quantity acquiring subunit, configured to acquire the quantity of the mails received by the originating IP address within the preset time period;
a reply rate calculating subunit, configured to obtain a reply rate of the outgoing IP address according to the number of the sent emails and the number of the received emails;
a threshold judgment subunit, configured to judge whether the reply rate is greater than or equal to a preset reply rate threshold;
if yes, the sending IP address is used as a credible IP address.
33. The apparatus for screening the identity of an e-mail sender according to claim 24, wherein the opening rate determining subunit comprises:
a sending IP address obtaining subunit, configured to obtain, from the offline data system, a sending IP address whose usage frequency is higher than a preset frequency within a preset time period;
a stability judging subunit, configured to judge whether the sender system identifier and the mail header feature corresponding to the sending IP address are stable;
a preset opening rate threshold judging subunit, configured to, if the judgment result of the stability judging subunit is yes, judge whether the opening rate of the email sent by the sending IP address is higher than a preset opening rate threshold;
and the credible IP address generating subunit is used for taking the sending IP address as the credible IP address when the opening rate of the mail sent by the sending IP address is higher than a preset opening rate threshold value and no bad record exists in the sending IP address.
34. The apparatus for screening identity characteristics of an e-mail sender according to claim 24, wherein the characteristics in the preset characteristic set received in the preset characteristic set extracting unit are the features to be screened, which are the originating IP addresses to be screened;
correspondingly, the feature matching condition judgment unit includes:
a certain IP address judging subunit, configured to judge whether a certain IP address belongs to the set of trusted IP addresses according to a feature matching condition in the pre-trained feature matching model;
a credible IP address set judging subunit, configured to judge, according to the set feature matching condition, whether the originating IP address to be screened belongs to the credible IP address set;
if yes, the sending IP address to be screened meets the feature matching condition.
35. The apparatus for screening the identity of an e-mail sender according to claim 34, wherein when the determination result of the trusted IP address set determination subunit is negative, the apparatus further comprises:
the analyzed IP address acquisition subunit is used for analyzing the sending domain of the e-mail to obtain an analyzed IP address;
an IP address matching judgment subunit, configured to judge whether the originating IP address to be screened matches the resolved IP address according to the feature matching condition;
if yes, the sending IP address to be screened meets the feature matching condition.
36. The apparatus for screening the identity of an e-mail sender according to claim 35, wherein when the determination result of the IP address matching determination subunit is negative, the apparatus further comprises:
the transmitting IP address and the transmitting party system identification matching judgment subunit is used for judging whether the transmitting IP address to be screened is matched with the transmitting party system identification according to the characteristic matching condition;
if yes, the sending IP address to be screened meets the feature matching condition.
37. The apparatus for screening the identity of an e-mail sender according to claim 36, wherein when the determination result of the originating IP address matching the originator system identification determining sub-unit is no, further comprising:
the associated feature matching judgment subunit is used for judging whether the associated features of the originating IP address to be screened are matched with the associated features corresponding to the credible IP addresses in the credible IP address set or not according to the feature matching conditions;
if yes, the sending IP address to be screened meets the feature matching condition.
38. The apparatus for screening the identity of an e-mail sender according to claim 25, wherein the trusted IP address generation sub-unit is followed by:
the associated feature matching judgment subunit is used for judging whether the associated features of the originating IP address to be screened are matched with the associated features corresponding to the credible IP addresses in the credible IP address set or not;
if yes, the sending IP address to be screened meets the feature matching condition.
CN201610373221.2A 2016-05-31 2016-05-31 Method and device for discriminating identity characteristics of e-mail sender Active CN107453973B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610373221.2A CN107453973B (en) 2016-05-31 2016-05-31 Method and device for discriminating identity characteristics of e-mail sender

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610373221.2A CN107453973B (en) 2016-05-31 2016-05-31 Method and device for discriminating identity characteristics of e-mail sender

Publications (2)

Publication Number Publication Date
CN107453973A CN107453973A (en) 2017-12-08
CN107453973B true CN107453973B (en) 2021-04-13

Family

ID=60484987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610373221.2A Active CN107453973B (en) 2016-05-31 2016-05-31 Method and device for discriminating identity characteristics of e-mail sender

Country Status (1)

Country Link
CN (1) CN107453973B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109768916A (en) * 2018-12-29 2019-05-17 论客科技(广州)有限公司 A kind of processing method and system of mail
CN110060150A (en) * 2019-04-28 2019-07-26 宜人恒业科技发展(北京)有限公司 Credit cards Electronic bill method of discrimination and device
CN111182172A (en) * 2020-01-03 2020-05-19 北京中电飞华通信有限公司 Fax service processing method and system and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1367595A (en) * 2001-01-23 2002-09-04 联想(北京)有限公司 Method for filtering electronic mail contents in interconnection network
CN1573782A (en) * 2003-06-23 2005-02-02 微软公司 Advanced spam detection techniques
CN1614607A (en) * 2004-11-25 2005-05-11 中国科学院计算技术研究所 Filtering method and system for e-mail refuse
CN101494546A (en) * 2009-01-05 2009-07-29 东南大学 Method for preventing collaboration type junk mail

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2420391C (en) * 2003-02-28 2014-08-26 Internet Light And Power Inc. Email message filtering system and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1367595A (en) * 2001-01-23 2002-09-04 联想(北京)有限公司 Method for filtering electronic mail contents in interconnection network
CN1573782A (en) * 2003-06-23 2005-02-02 微软公司 Advanced spam detection techniques
CN1614607A (en) * 2004-11-25 2005-05-11 中国科学院计算技术研究所 Filtering method and system for e-mail refuse
CN101494546A (en) * 2009-01-05 2009-07-29 东南大学 Method for preventing collaboration type junk mail

Also Published As

Publication number Publication date
CN107453973A (en) 2017-12-08

Similar Documents

Publication Publication Date Title
US8194564B2 (en) Message filtering method
US8135780B2 (en) Email safety determination
US7660865B2 (en) Spam filtering with probabilistic secure hashes
AU2004202268B2 (en) Origination/destination features and lists for spam prevention
US8549081B2 (en) Recognizing spam email
AU2010263086B2 (en) Real-time spam look-up system
US20080177843A1 (en) Inferring email action based on user input
CN107453973B (en) Method and device for discriminating identity characteristics of e-mail sender
JP2006344197A (en) Junk mail determination device and method
CN108683589B (en) Junk mail detection method and device and electronic equipment
KR20120049194A (en) Reducing unwanted and unsolicited electronic messages
CN110061981A (en) A kind of attack detection method and device
Gupta et al. Forensic analysis of E-mail address spoofing
Jayan et al. Detection of spoofed mails
KR102460497B1 (en) Managing method for noreply mail and unknown sender mail and system thereof
Msongaleli et al. Electronic mail forensic algorithm for crime investigation and dispute settlement
JP6266487B2 (en) Mail information extraction device, mail judgment list creation device, mail information extraction method, mail judgment list creation method, and computer program
CN113938311A (en) Mail attack tracing method and system
CN110661750B (en) Mail sender identity detection method, system, equipment and storage medium
JP7453886B2 (en) Detection device, detection method and detection program
KR102684949B1 (en) Method of detecting for mail attacks sent through accounts created by social engineering techniques and mail system accordingly
CN116866057A (en) DMARC detection method, system, equipment and storage medium based on score
KR20230089766A (en) Email reception confirmation and denial prevention system using blockchain
Obino et al. Analysis of email headers
Johansen Email Communities of Interest and Their Application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant