CN107547552B - Website reputation degree evaluation method and device based on website feature identification and relationship topology - Google Patents

Website reputation degree evaluation method and device based on website feature identification and relationship topology Download PDF

Info

Publication number
CN107547552B
CN107547552B CN201710803281.8A CN201710803281A CN107547552B CN 107547552 B CN107547552 B CN 107547552B CN 201710803281 A CN201710803281 A CN 201710803281A CN 107547552 B CN107547552 B CN 107547552B
Authority
CN
China
Prior art keywords
score
website
evaluated
sub
reputation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710803281.8A
Other languages
Chinese (zh)
Other versions
CN107547552A (en
Inventor
金立峰
范渊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dbappsecurity Technology Co Ltd
Original Assignee
Hangzhou Dbappsecurity Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dbappsecurity Technology Co Ltd filed Critical Hangzhou Dbappsecurity Technology Co Ltd
Priority to CN201710803281.8A priority Critical patent/CN107547552B/en
Publication of CN107547552A publication Critical patent/CN107547552A/en
Application granted granted Critical
Publication of CN107547552B publication Critical patent/CN107547552B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a website credibility assessment method and device based on website feature identification and relationship topology, which relate to the technical field of website credibility assessment and comprise the steps of obtaining a first credibility of a website to be assessed at the current moment, wherein the first credibility is the credibility obtained according to website features of the website to be assessed; acquiring a second credit degree of the website to be evaluated at the current moment, wherein the second credit degree is the credit degree obtained according to the topological relation of the website to be evaluated, and the topological relation is the topological relation constructed according to the domain name and the IP address of the website to be evaluated; and determining the target reputation of the website to be evaluated according to the first reputation and the second reputation. The invention solves the technical problem that a uniform method for accurately considering the credibility of the website is lacked in the prior art.

Description

Website reputation degree evaluation method and device based on website feature identification and relationship topology
Technical Field
The invention relates to the technical field of website credibility assessment, in particular to a website credibility assessment method and device based on website feature identification and relationship topology.
Background
The network security and the informatization are two wings of a double-wheel and integrated body which are healthy, stably developed and driven in the economic society of China. Network security has risen to the national strategy and becomes the key core of the construction of the strong network. Websites playing an important role in a public network play an important role in various industry fields, and especially play a crucial role in content disclosure, interactive communication, online service development and the like, so that reputation evaluation and analysis of websites are necessary, important and urgent for website security consideration.
However, the website has four problems of low construction threshold, low domain name registration cost, low content publishing requirement and low website safety capability, and the four problems cause the website to have a grey zone, wherein the privately-built website or the illegally-invaded legal website causes negative contents such as yellow gambling poison, evil education, violence, an inverse party, an inverse society, illegal advertisements and the like to be filled in the network, which can induce criminal behaviors and destroy the atmosphere of stable associations, cause inharmonious society and cause great negative effects on network safety.
However, how to comprehensively examine the reputation of websites in the internet is an industry problem. On one hand, the website credit is an intricate problem, and not only is the problem of an operator, but also the problem caused by illegal invasion possibly exists; on the other hand, the website has large difference in properties, different standards and large content difference. In view of the above-mentioned phenomena, a unified method for accurately considering the credibility of websites is lacking in the prior art.
Disclosure of Invention
In view of the above, the present invention provides a website reputation evaluation method and apparatus based on website feature identification and relationship topology, so as to alleviate the technical problem that a uniform method for accurately considering website reputation is not available in the prior art.
In a first aspect, an embodiment of the present invention provides a website reputation degree evaluation method based on website feature identification and relationship topology, including:
acquiring a first credibility of a website to be evaluated at the current moment, wherein the first credibility is the credibility obtained according to website characteristics of the website to be evaluated;
acquiring a second credit degree of the website to be evaluated at the current moment, wherein the second credit degree is the credit degree obtained according to the topological relation of the website to be evaluated, and the topological relation is the topological relation constructed according to the domain name and the IP address of the website to be evaluated;
and determining the target reputation of the website to be evaluated according to the first reputation and the second reputation.
With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where obtaining a first reputation of a website to be evaluated at a current time includes:
acquiring attribute information of the website to be evaluated, wherein the attribute information comprises: content attribute, docket attribute and link attribute;
credit degree scoring is carried out on the content attribute to obtain a first score;
scoring the record attribute with credit degree to obtain a second score;
credit degree scoring is carried out on the link attribute to obtain a third score;
and performing weighted average calculation on the first score, the second score and the third score to obtain a first weighted average, and determining the first weighted average as the first credit.
With reference to the first possible implementation manner of the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the scoring the reputation degree of the content attribute to obtain a first score includes:
according to the content attribute, determining illegal content and interval duration, wherein the interval duration is the interval duration between the time when the illegal content appears in the website content to be evaluated at the latest time and the current time;
determining the influence weight of the illegal content on the reputation evaluation of the website to be evaluated according to the illegal content;
by a first formula C1=C1tmp*axCalculating the first score, wherein C1Representing said first score, x representing said interval duration, C1tmpRepresenting the impact weight, a representing a first preset parameter.
With reference to the first possible implementation manner of the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the scoring the reputation degree of the docket attribute to obtain a second score includes:
acquiring a preset score of the record attribute according to the record attribute, wherein the preset score is a credit score preset for the record attribute;
and determining the preset score as the second score.
With reference to the first possible implementation manner of the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where the scoring the reputation of the link attribute to obtain a third score includes:
extracting hyperlink interfaces contained in the web pages of the websites to be evaluated from the link attributes, and determining at least one link website of the websites to be evaluated according to the hyperlink interfaces;
obtaining a reputation evaluation score of each link website in the at least one link website;
extracting a target reputation evaluation score from the plurality of reputation evaluation scores, wherein the target reputation evaluation score is the maximum value of the plurality of reputation evaluation scores;
determining the target reputation evaluation score as the third score.
With reference to the first aspect, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where the obtaining a second reputation of the website to be evaluated at the current time includes:
obtaining at least one first sub-score, and calculating a fourth score according to the at least one first sub-score, wherein each first sub-score is a reputation evaluation score of a first target website, the first target website is a website corresponding to the IP address, and the first target website is different from the website to be evaluated;
acquiring at least one second sub-score, and calculating a fifth score according to the at least one second sub-score, wherein each second sub-score is a credit evaluation score of an IP address of the website to be evaluated;
obtaining at least one third sub-score, and calculating a sixth score according to the at least one third sub-score, wherein each third sub-score is a reputation evaluation score of a second target website, the second target website is a website having the same main domain name as the website to be evaluated, and the second target website is different from the website to be evaluated;
and performing weighted average calculation on the fourth score, the fifth score and the sixth score to obtain a second weighted average, and determining the second weighted average as the second credit.
With reference to the fifth possible implementation manner of the first aspect, this embodiment of the present invention provides a sixth possible implementation manner of the first aspect, where calculating a fourth score according to the at least one first sub-score includes;
and performing weighted average calculation on the at least one first sub-score to obtain a first sub-weighted average, and determining the first sub-weighted average as the fourth score.
With reference to the fifth possible implementation manner of the first aspect, this embodiment of the present invention provides a seventh possible implementation manner of the first aspect, wherein a fifth value is calculated according to the at least one second sub-score, including;
and performing weighted average calculation on the at least one second sub-score to obtain a second sub-weighted average.
With reference to the fifth possible implementation manner of the first aspect, the embodiment of the present invention provides an eighth possible implementation manner of the first aspect, where calculating a sixth score according to the at least one third sub-score includes:
by a second formulaCalculating the sixth score, wherein C6Represents the sixth score, L represents the number of the second target websites, CniA third sub-score of a second target website i in the second target websites is represented, P is a second preset parameter obtained according to the domain name type of the website to be evaluated, and lambda is11Is a first predetermined weight, λ12Is a second predetermined weight.
In a second aspect, an embodiment of the present invention further provides a website reputation degree evaluation apparatus based on website feature identification and relationship topology, including:
the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining a first credibility of a website to be evaluated at the current moment, and the first credibility is obtained according to website characteristics of the website to be evaluated;
the second obtaining module is used for obtaining a second credit degree of the website to be evaluated at the current moment, wherein the second credit degree is the credit degree obtained according to the topological relation of the website to be evaluated, and the topological relation is the topological relation constructed according to the domain name and the IP address of the website to be evaluated;
and the determining module is used for determining the target reputation of the website to be evaluated according to the first reputation and the second reputation.
The embodiment of the invention has the following beneficial effects: the website reputation degree evaluation method based on website feature identification and relationship topology comprises the following steps: and acquiring a first credit degree and a second credit degree of the website to be evaluated at the current moment, and then determining the target credit of the website to be evaluated according to the first credit degree and the second credit degree.
The first credibility is obtained according to website characteristics of the website to be evaluated, namely the first credibility is obtained according to characteristics of the website to be evaluated; the second credit degree is the credit degree obtained according to the topological relation of the website to be evaluated, the topological relation is the topological relation constructed according to the domain name and the IP address of the website to be evaluated, firstly, each website has the domain name and the IP address, and the phenomenon that illegal contents are scattered on the network through the domain name/IP address which is the same as, similar to or associated with a healthy website often exists in the network, so that the first credit degree is combined with the second credit degree to determine the target credit degree, the method is a generally applicable method, the factors influencing the website credit degree are considered more comprehensively, the credit degree evaluation of the website to be evaluated can be more accurately carried out, and the technical problem that a unified method for accurately considering the website credit degree is lacked in the prior art is solved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a website reputation evaluation method based on website feature identification and relationship topology according to an embodiment of the present invention;
fig. 2 is a flowchart of another website reputation degree evaluation method based on website feature identification and relationship topology according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a website reputation degree evaluation apparatus based on website feature identification and relationship topology according to a second embodiment of the present invention;
fig. 4 is a schematic diagram of another website reputation degree evaluation apparatus based on website feature identification and relationship topology according to a second embodiment of the present invention.
Icon: 1-a first acquisition module; 2-a second acquisition module; 3-determining module.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The reputation of the website is an intricate problem, and not only is the problem of an operator, but also the problem caused by illegal invasion possibly exists; on the other hand, the website has large difference in properties, different standards and large content difference. In view of the above-mentioned phenomena, a unified method for accurately considering the credibility of websites is lacking in the prior art. Based on the above, the website reputation evaluation method and device based on website feature identification and relationship topology provided by the embodiments of the present invention can solve the technical problem that a unified method for accurately considering website reputation is not available in the prior art.
Example one
The website reputation evaluation method based on website feature identification and relationship topology provided by the embodiment of the invention, as shown in fig. 1, includes:
step S102, obtaining a first credit degree of a website to be evaluated at the current moment, wherein the first credit degree is the credit degree obtained according to website characteristics of the website to be evaluated;
step S104, obtaining a second credit degree of the website to be evaluated at the current moment, wherein the second credit degree is the credit degree obtained according to the topological relation of the website to be evaluated, and the topological relation is the topological relation constructed according to the domain name and the IP address of the website to be evaluated;
and step S106, determining the target reputation of the website to be evaluated according to the first reputation and the second reputation.
Specifically, the global website information can be continuously crawled by using a crawler system to ensure the real-time performance of the obtained first credibility and the second credibility, wherein the first credibility and the second credibility at the current moment refer to the latest acquired credibility information, and the real-time performance of the first credibility and the second credibility can be ensured by setting the crawling rate of the crawler.
Whether a website is a bad website is distinguished from whether the website can cause bad influence on netizens, the influence of the netizens on the website is mainly from the content of the website and the content of the website associated with the website, and the association includes association through a hyperlink and association through an IP address or a domain name.
In the embodiment of the invention, the first credibility is the credibility obtained according to the website characteristics of the website to be evaluated, namely the first credibility is the credibility obtained by the characteristics of the website to be evaluated; the second credit degree is the credit degree obtained according to the topological relation of the website to be evaluated, the topological relation is the topological relation constructed according to the domain name and the IP address of the website to be evaluated, firstly, each website has the domain name and the IP address, and the phenomenon that illegal contents are scattered on the network through the domain name/IP address which is the same as, similar to or associated with a healthy website often exists in the network, so that the first credit degree is combined with the second credit degree to determine the target credit degree, the method is a generally applicable method, the factors influencing the website credit degree are considered more comprehensively, the credit degree evaluation of the website to be evaluated can be more accurately carried out, and the technical problem that a unified method for accurately considering the website credit degree is lacked in the prior art is solved.
In an optional implementation manner of the embodiment of the present invention, obtaining a first reputation of a website to be evaluated at a current time includes:
acquiring attribute information of a website to be evaluated, wherein the attribute information comprises: content attribute, docket attribute and link attribute;
credit rating is carried out on the content attribute to obtain a first score;
scoring the reliability of the record attributes to obtain a second score;
credit degree scoring is carried out on the link attribute to obtain a third score;
and performing weighted average calculation on the first score, the second score and the third score to obtain a first weighted average, and determining the first weighted average as a first credit.
The content attribute means: the content directly published on the website page can enable a netizen to directly obtain the attributes of characters and pictures of information by reading, the link attribute is the attribute of a hyperlink interface in the webpage, and the record attribute is the attribute when the website is registered and recorded.
In addition, the same web page content may have different functions in different record attributes of the web sites. The embodiment of the invention determines the first credit through the attribute information comprising the content attribute, the record attribute and the link attribute, is more comprehensive and more reasonable, and is beneficial to ensuring the accuracy of the first credit.
In particular, can be represented by V1=λ1C12C23C3To calculate a first reputation, wherein V1Representing a first degree of reputation, C1Denotes a first score, C2Indicates the second score, C3Denotes the third score, λ1、λ2、λ3The weight factors of the first score, the second score and the third score are sequentially included. Here, a smaller value of the first reputation degree indicates a higher reputation degree of the website to be evaluated. According to the influence weight, lambda, of the reputation of the website to be evaluated on the basis of the content attribute, the record attribute and the link attribute1、λ2、λ3Can be sequentially 0.6, 0.2 and 0.2.
It is emphasized that whether the content of a website directly posted on a website page is healthy or legitimate is the most direct factor in determining the influence of a website on netizens, and thus the weighting factor of the first score is larger than the second score and the third score, where λ1、λ2、λ3The value of (a) is an optional value mode of the invention, but not the only value mode.
In another optional implementation manner of the embodiment of the present invention, according to the content attribute, an illegal content and an interval duration are determined, where the interval duration is an interval duration between a time when the illegal content appears in the content of the website to be evaluated last time and a current time;
determining the influence weight of the illegal content on the reputation evaluation of the website to be evaluated according to the illegal content;
by a first formula C1=C1tmp*axCalculating a first score, wherein C1Denotes a first score, x denotes an interval duration, C1tmpRepresenting the impact weight and a representing a first preset parameter.
Specifically, a is a value less than 1, and a may be 0.95; c1tmpThe weight of the influence of the illegal content type on the website credibility score can be specifically taken as follows:
(1) the website to be evaluated is subject to illegal contents, such as yellow gambling poison, evil education, etc., C1tmp=1;
(2) Website to be evaluated composed ofIf the contents of the invaded object carry illegal information such as yellow gambling poison, evil education and the like, C1tmp=0.5;
(3) If the content of the website to be evaluated is legal and healthy content, C1tmp=0.
In a and C1tmpWhen the values are obtained, the larger x is, the larger C is1The smaller the score is, that is, the longer the interval duration is, the higher the reputation of the website to be evaluated is, considering the importance of the content attribute on the reputation of the website and the importance of the interval duration on the content attribute, x is used as an index of a to influence the first score.
In another optional implementation manner of the embodiment of the present invention, the scoring the reputation degree of the record attribute to obtain a second score includes:
acquiring a preset score of the record attribute according to the record attribute, wherein the preset score is a credit score preset for the record attribute;
the preset score is determined as a second score.
Specifically, considering that the probability that the content is healthy content is higher in a website with regular filing attributes, the second score may be selected from the following values:
(1) the website to be evaluated is filed as government affairs, education and career unit C2=0;
(2) The website to be evaluated is filed as an enterprise, C2=0.2;
(3) The website to be evaluated is filed as a type except government affairs, education, career unit and enterprise C2=0.4;
(4) No records on the website to be evaluated, C2=1。
In another optional implementation manner of the embodiment of the present invention, the scoring the reputation degree of the link attribute to obtain a third score includes:
extracting hyperlink interfaces contained in the web pages of the websites to be evaluated from the link attributes, and determining at least one link website of the websites to be evaluated according to the hyperlink interfaces;
obtaining a reputation evaluation score of each link website in at least one link website;
extracting a target reputation evaluation score from the plurality of reputation evaluation scores, wherein the target reputation evaluation score is the maximum value of the plurality of reputation evaluation scores;
the target reputation evaluation score is determined as a third score.
Because each website in a plurality of websites corresponding to the hyperlink interfaces of the websites to be evaluated has the content of a complete website, the influence is great, and in order to ensure that websites with poor reputation can be found, the reputation evaluation score of the website with the worst reputation is used as a third score.
In another optional implementation manner of the embodiment of the present invention, obtaining the second reputation of the website to be evaluated at the current time includes:
obtaining at least one first sub-score, and calculating a fourth score according to the at least one first sub-score, wherein each first sub-score is a reputation evaluation score of a first target website, the first target website is a website corresponding to an IP address, and the first target website is different from the website to be evaluated;
acquiring at least one second sub-score, and calculating a fifth score according to the at least one second sub-score, wherein each second sub-score is a credit evaluation score of an IP address of the website to be evaluated;
obtaining at least one third sub-score, and calculating a sixth score according to the at least one third sub-score, wherein each third sub-score is a reputation evaluation score of a second target website, the second target website is a website having the same main domain name as the website to be evaluated, and the second target website is different from the website to be evaluated;
and performing weighted average calculation on the fourth score, the fifth score and the sixth score to obtain a second weighted average, and determining the second weighted average as a second credit.
In particular, can be represented by V2=λ4C45C56C6To calculate a first reputation, wherein V2Representing a second degree of reputation, C4Denotes the fourth score, C5Represents a fifth value, C6Denotes the sixth score, λ4、λ5、λ6And the weight factors are the fourth score, the fifth score and the sixth score in turn. Here, a smaller value of the second reputation indicates a higher reputation of the website to be evaluated. Lambda [ alpha ]4、λ5、λ6Can be sequentially 0.3, 0.3 and 0.4.
It should be noted that, a website corresponds to one domain name, and a website may correspond to multiple IP addresses, and an IP address corresponds to multiple domain names.
Assuming that websites to be evaluated are W0 websites, W0 websites have two IP addresses of IP (A) and IP (B), and the domain name of the W website is Y1, wherein IP (A) addresses correspond to three websites of W0 websites, W1 websites and W2 websites, IP (B) addresses correspond to only W0 websites, Y1 has a main domain name of Y0, Y0 has sub domain names of Y1 and Y2, then,
(1) the first target websites are the W1 website and the W2 website.
(2) The second sub-score is a reputation evaluation score of an ip (a) or ip (b) address, where it should be noted that the reputation evaluation score of the ip (a) address is a score which is adversely affected during a process of accessing the ip (a) address by a user, and may be specifically determined by a reputation evaluation score calculated at a previous time of the current time, where the reputation evaluation score calculated at the previous time is a reputation evaluation score of a website associated with the ip (a) address, and the reputation evaluation score of the ip (b) address is calculated according to the same principle as the reputation evaluation score of the ip (a) address.
(3) The second destination web site is the web site of domain name Y2.
In the embodiment of the invention, websites associated with the websites to be evaluated and the reputation evaluation scores thereof are obtained through IP address back-check, and the second reputation determined by the related websites is obtained.
In another alternative implementation of this embodiment of the present invention, the fourth score is calculated based on at least one of the first sub-scores, including;
and performing weighted average calculation on at least one first sub-score to obtain a first sub-weighted average, and determining the first sub-weighted average as a fourth score.
It should be noted that, when the weighted average of at least one first sub-score is calculated, if the ratio of the weighting factors is 1, the fourth score is the average of at least one first sub-score. The specific value of the ratio of the weighting factors used in the weighted average calculation may be adjusted herein due to the influence of the first target web site on the reputation score.
In another alternative implementation of this embodiment of the invention, a fifth value is calculated based on the at least one second sub-score, including;
and performing weighted average calculation on at least one second sub-score to obtain a second sub-weighted average, and determining the second sub-weighted average as a fifth value.
Similarly, it should be noted that, when the weighted average of at least one second sub-score is calculated, if the ratio of the weighting factors is 1, the fifth value is the average of at least one second sub-score. The specific value of the ratio of the weighting factors used in the weighted average calculation may be adjusted herein due to the influence of the second target web site on the reputation score.
In another optional implementation of the embodiment of the present invention, the calculating the sixth score according to at least one third sub-score includes:
by a second formula
Figure DEST_PATH_GDA0001431754900000131
Calculating a sixth score, wherein C6Indicates a sixth score, L indicates the number of second target sites, CniA third sub-score of a second target website i in the second target websites is represented, P is a second preset parameter obtained according to the domain name type of the website to be evaluated, and lambda is11Is a first predetermined weight, λ12Is a second predetermined weight.
Specifically, if the domain name of the website to be evaluated is the main domain name, P is 0; and if the domain name of the website to be evaluated is the sub-domain name, P is 1. Further, λ11And λ120.7 and 0.3 can be taken in sequence.
In another optional implementation manner of the embodiment of the present invention, as shown in fig. 2, after determining the target reputation of the website to be evaluated according to the first reputation and the second reputation, the website reputation evaluation method based on website feature identification and relationship topology further includes:
step S107, whether the corresponding score of the target reputation is larger than a preset score threshold value or not, wherein the step S108 is executed under the condition that the judgment result is yes; and under the condition that the judgment result is negative, ending the process of evaluating the credibility of the website to be evaluated.
And step S108, sending notification information to a manager, wherein the notification information indicates that the website to be evaluated is a bad website.
Specifically, the administrator may be an administrator of a service side such as an audit organization and a protection platform. According to the embodiment of the invention, downstream manufacturers such as an audit organization, a protection platform and the like can timely process problematic websites through a real-time feedback mechanism.
According to the reputation influence weight of the website to be evaluated, the reputation influence weight is respectively subjected to addition operation, multiplication operation and power exponent operation in the target reputation process of the website to be evaluated, wherein factors corresponding to each addend of the addition operation indicate that the reputation influence weights are the same, the influence weights of the corresponding factors are determined by introducing weight factors in the multiplication operation, and the exponent of the power exponent operation corresponds to a factor with a larger influence weight. The embodiment of the invention gives the target credit of the website to be evaluated comprehensively and reasonably by comprehensively considering the website characteristics and the website topological relation and the influence weight of various factors in the website characteristics and the website topological relation. And the target reputation is given in a score form, so that the method is more intuitive and has stronger comparability and evaluability.
Example two
The website reputation degree evaluation device based on website feature identification and relationship topology provided by the embodiment of the invention, as shown in fig. 3, includes:
the system comprises a first obtaining module 1, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining a first credibility of a website to be evaluated at the current moment, and the first credibility is obtained according to website characteristics of the website to be evaluated;
the second obtaining module 2 is configured to obtain a second reputation of the website to be evaluated at the current time, where the second reputation is a reputation obtained according to a topological relation of the website to be evaluated, and the topological relation is a topological relation constructed according to a domain name and an IP address of the website to be evaluated;
and the determining module 3 is used for determining the target reputation of the website to be evaluated according to the first reputation and the second reputation.
In the embodiment of the invention, a first obtaining module 1 and a second obtaining module 2 respectively obtain a first credit degree and a second credit degree of a website to be evaluated at the current moment, and then a determining module 3 determines a target credit of the website to be evaluated according to the first credit degree and the second credit degree.
The first credibility is obtained according to website characteristics of the website to be evaluated, namely the first credibility is obtained according to characteristics of the website to be evaluated; the second credit degree is the credit degree obtained according to the topological relation of the website to be evaluated, the topological relation is the topological relation constructed according to the domain name and the IP address of the website to be evaluated, firstly, each website has the domain name and the IP address, and the phenomenon that illegal contents are scattered on the network through the domain name/IP address which is the same as, similar to or associated with a healthy website often exists in the network, so that the first credit degree is combined with the second credit degree to determine the target credit degree, the method is a generally applicable method, the factors influencing the website credit degree are considered more comprehensively, the credit degree evaluation of the website to be evaluated can be more accurately carried out, and the technical problem that a unified method for accurately considering the website credit degree is lacked in the prior art is solved.
In an optional implementation manner of the embodiment of the present invention, as shown in fig. 4, the first obtaining module 1 includes:
the system comprises an acquisition unit and a processing unit, wherein the acquisition unit is used for acquiring attribute information of a website to be evaluated, and the attribute information comprises: content attribute, docket attribute and link attribute;
the first scoring unit is used for scoring the credit degree of the content attribute to obtain a first score;
the second scoring unit is used for scoring the reputation of the filing attribute to obtain a second score;
the third scoring unit is used for scoring the credit degree of the link attribute to obtain a third score;
and the first calculating unit is used for performing weighted average calculation on the first score, the second score and the third score to obtain a first weighted average and determining the first weighted average as the first credit.
In another optional implementation manner of the embodiment of the present invention, the first scoring unit is configured to:
according to the content attribute, determining illegal contents and interval duration, wherein the interval duration is the interval duration between the time when the illegal contents appear in the website contents to be evaluated at the latest time and the current time;
determining the influence weight of the illegal content on the reputation evaluation of the website to be evaluated according to the illegal content;
by a first formula C1=C1tmp*axCalculating a first score, wherein C1Denotes a first score, x denotes an interval duration, C1tmpRepresenting the impact weight and a representing a first preset parameter.
In another optional implementation manner of the embodiment of the present invention, the second scoring unit is configured to:
acquiring a preset score of the record attribute according to the record attribute, wherein the preset score is a credit score preset for the record attribute;
the preset score is determined as a second score.
In another optional implementation manner of the embodiment of the present invention, the third scoring unit is configured to:
extracting hyperlink interfaces contained in the web pages of the websites to be evaluated from the link attributes, and determining at least one link website of the websites to be evaluated according to the hyperlink interfaces;
obtaining a reputation evaluation score of each link website in at least one link website;
extracting a target reputation evaluation score from the plurality of reputation evaluation scores, wherein the target reputation evaluation score is the maximum value of the plurality of reputation evaluation scores;
the target reputation evaluation score is determined as a third score.
In another optional implementation manner of the embodiment of the present invention, as shown in fig. 4, the second obtaining module 2 includes:
the first obtaining and calculating unit is used for obtaining at least one first sub-score and calculating a fourth score according to the at least one first sub-score, wherein each first sub-score is a credit evaluation score of a first target website, the first target website is a website corresponding to an IP address, and the first target website is different from the website to be evaluated;
the second obtaining and calculating unit is used for obtaining at least one second sub-score and calculating a fifth score according to the at least one second sub-score, wherein each second sub-score is a credit evaluation score of an IP address of the website to be evaluated;
the third obtaining and calculating unit is used for obtaining at least one third sub-score and calculating a sixth score according to the at least one third sub-score, wherein each third sub-score is a reputation evaluation score of a second target website, the second target website is a website having the same main domain name as the website to be evaluated, and the second target website is different from the website to be evaluated;
and the second calculating unit is used for performing weighted average calculation on the fourth score, the fifth score and the sixth score to obtain a second weighted average, and determining the second weighted average as the second credit.
In another optional implementation manner of the embodiment of the present invention, the first obtaining and calculating unit is configured to:
and performing weighted average calculation on at least one first sub-score to obtain a first sub-weighted average, and determining the first sub-weighted average as a fourth score.
In another optional implementation manner of the embodiment of the present invention, the second obtaining and calculating unit is configured to:
and performing weighted average calculation on at least one second sub-score to obtain a second sub-weighted average, and determining the second sub-weighted average as a fifth value.
In another optional implementation manner of the embodiment of the present invention, the third obtaining and calculating unit is configured to:
by a second formula
Figure DEST_PATH_GDA0001431754900000181
Calculating a sixth score, wherein C6Indicates a sixth score, L indicates the number of second target sites, CniA third sub-score of a second target website i in the second target websites is represented, P is a second preset parameter obtained according to the domain name type of the website to be evaluated, and lambda is11Is a first predetermined weight, λ12Is a second predetermined weight.
The computer program product of the website reputation evaluation method and device based on website feature identification and relationship topology provided by the embodiments of the present invention includes a computer readable storage medium storing a program code, and instructions included in the program code may be used to execute the method in the foregoing method embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A website reputation degree evaluation method based on website feature identification and relationship topology is characterized by comprising the following steps:
acquiring a first credibility of a website to be evaluated at the current moment, wherein the first credibility is the credibility obtained according to website characteristics of the website to be evaluated;
acquiring a second credit degree of the website to be evaluated at the current moment, wherein the second credit degree is the credit degree obtained according to the topological relation of the website to be evaluated, and the topological relation is the topological relation constructed according to the domain name and the IP address of the website to be evaluated;
determining the target reputation of the website to be evaluated according to the first reputation and the second reputation;
the method for obtaining the first credibility of the website to be evaluated at the current moment comprises the following steps:
acquiring attribute information of the website to be evaluated, wherein the attribute information comprises: content attribute, docket attribute and link attribute;
credit degree scoring is carried out on the content attribute to obtain a first score;
scoring the record attribute with credit degree to obtain a second score;
credit degree scoring is carried out on the link attribute to obtain a third score;
performing weighted average calculation on the first score, the second score and the third score to obtain a first weighted average, and determining the first weighted average as the first credit;
acquiring a second credibility of the website to be evaluated at the current moment, wherein the second credibility comprises the following steps:
obtaining at least one first sub-score, and calculating a fourth score according to the at least one first sub-score, wherein each first sub-score is a reputation evaluation score of a first target website, the first target website is a website corresponding to the IP address, and the first target website is different from the website to be evaluated;
acquiring at least one second sub-score, and calculating a fifth score according to the at least one second sub-score, wherein each second sub-score is a credit evaluation score of an IP address of the website to be evaluated;
obtaining at least one third sub-score, and calculating a sixth score according to the at least one third sub-score, wherein each third sub-score is a reputation evaluation score of a second target website, the second target website is a website having the same main domain name as the website to be evaluated, and the second target website is different from the website to be evaluated;
and performing weighted average calculation on the fourth score, the fifth score and the sixth score to obtain a second weighted average, and determining the second weighted average as the second credit.
2. The method of claim 1, wherein creditworthiness scoring the content attribute to obtain a first score comprises:
according to the content attribute, determining illegal content and interval duration, wherein the interval duration is the interval duration between the time when the illegal content appears in the website content to be evaluated at the latest time and the current time;
determining the influence weight of the illegal content on the reputation evaluation of the website to be evaluated according to the illegal content;
by a first formula C1=C1tmp*axCalculating the first score, wherein C1Representing said first score, x representing said interval duration, C1tmpRepresenting the impact weight, a representing a first preset parameter.
3. The method of claim 1, wherein creditworthiness scoring the docket attribute to obtain a second score comprises:
acquiring a preset score of the record attribute according to the record attribute, wherein the preset score is a credit score preset for the record attribute;
and determining the preset score as the second score.
4. The method of claim 1, wherein scoring the link attribute for reputation, resulting in a third score, comprises:
extracting hyperlink interfaces contained in the web pages of the websites to be evaluated from the link attributes, and determining at least one link website of the websites to be evaluated according to the hyperlink interfaces;
obtaining a reputation evaluation score of each link website in the at least one link website;
extracting a target reputation evaluation score from the plurality of reputation evaluation scores, wherein the target reputation evaluation score is the maximum value of the plurality of reputation evaluation scores;
determining the target reputation evaluation score as the third score.
5. The method according to claim 1, characterized in that a fourth score is calculated from said at least one first sub-score, comprising;
and performing weighted average calculation on the at least one first sub-score to obtain a first sub-weighted average, and determining the first sub-weighted average as the fourth score.
6. The method according to claim 1, characterized in that a fifth value is calculated from said at least one second sub-score, comprising;
and calculating a weighted average value of the at least one second sub-score to obtain a second sub-weighted average value, and determining the second sub-weighted average value as the fifth value.
7. The method of claim 1, wherein calculating a sixth score based on the at least one third sub-score comprises:
through the second maleFormula (II)Calculating the sixth score, wherein C6Represents the sixth score, L represents the number of the second target websites, CniA third sub-score of a second target website i in the second target websites is represented, P is a second preset parameter obtained according to the domain name type of the website to be evaluated, and lambda is11Is a first predetermined weight, λ12Is a second predetermined weight.
8. A website reputation degree evaluation device based on website feature recognition and relationship topology is characterized by comprising the following steps:
the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining a first credibility of a website to be evaluated at the current moment, and the first credibility is obtained according to website characteristics of the website to be evaluated;
the second obtaining module is used for obtaining a second credit degree of the website to be evaluated at the current moment, wherein the second credit degree is the credit degree obtained according to the topological relation of the website to be evaluated, and the topological relation is the topological relation constructed according to the domain name and the IP address of the website to be evaluated;
the determining module is used for determining the target reputation of the website to be evaluated according to the first reputation and the second reputation;
the first acquisition module includes:
an obtaining unit, configured to obtain attribute information of the website to be evaluated, where the attribute information includes: content attribute, docket attribute and link attribute;
the first scoring unit is used for scoring the credit degree of the content attribute to obtain a first score;
the second scoring unit is used for scoring the reputation of the filing attribute to obtain a second score;
the third scoring unit is used for scoring the reputation of the link attribute to obtain a third score;
the first calculating unit is used for performing weighted average calculation on the first score, the second score and the third score to obtain a first weighted average, and determining the first weighted average as the first credibility;
the second acquisition module includes:
the first obtaining and calculating unit is used for obtaining at least one first sub-score and calculating a fourth score according to the at least one first sub-score, wherein each first sub-score is a reputation evaluation score of a first target website, the first target website is a website corresponding to the IP address, and the first target website is different from the website to be evaluated;
the second obtaining and calculating unit is used for obtaining at least one second sub-score and calculating a fifth score according to the at least one second sub-score, wherein each second sub-score is a credit evaluation score of an IP address of the website to be evaluated;
a third obtaining and calculating unit, configured to obtain at least one third sub-score, and calculate a sixth score according to the at least one third sub-score, where each third sub-score is a reputation evaluation score of a second target website, the second target website is a website having a same main domain name as the website to be evaluated, and the second target website is different from the website to be evaluated;
and the second calculating unit is used for performing weighted average calculation on the fourth score, the fifth score and the sixth score to obtain a second weighted average, and determining the second weighted average as the second credibility.
CN201710803281.8A 2017-09-07 2017-09-07 Website reputation degree evaluation method and device based on website feature identification and relationship topology Active CN107547552B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710803281.8A CN107547552B (en) 2017-09-07 2017-09-07 Website reputation degree evaluation method and device based on website feature identification and relationship topology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710803281.8A CN107547552B (en) 2017-09-07 2017-09-07 Website reputation degree evaluation method and device based on website feature identification and relationship topology

Publications (2)

Publication Number Publication Date
CN107547552A CN107547552A (en) 2018-01-05
CN107547552B true CN107547552B (en) 2020-02-21

Family

ID=60957605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710803281.8A Active CN107547552B (en) 2017-09-07 2017-09-07 Website reputation degree evaluation method and device based on website feature identification and relationship topology

Country Status (1)

Country Link
CN (1) CN107547552B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109064067B (en) * 2018-09-17 2021-09-28 杭州安恒信息技术股份有限公司 Financial risk operation subject determination method and device based on Internet
CN110866259A (en) * 2019-11-14 2020-03-06 杭州安恒信息技术股份有限公司 Method and system for calculating potential safety hazard score based on multi-dimensional data
CN111654500A (en) * 2020-06-05 2020-09-11 杭州安恒信息技术股份有限公司 Multi-dimensional website reliability detection method and device and computer equipment
CN112989341B (en) * 2021-03-03 2021-10-29 中国信息通信研究院 Method, system and medium for determining fraud-related webpage
CN113656671B (en) * 2021-06-16 2024-05-24 北京百度网讯科技有限公司 Model training method, link scoring method, device, equipment, medium and product
CN113536086B (en) * 2021-06-30 2023-07-14 北京百度网讯科技有限公司 Model training method, account scoring method, device, equipment, medium and product
CN114640513B (en) * 2022-03-04 2023-06-23 中国互联网络信息中心 Domain name abuse governance method and system based on reputation excitation
CN114844786B (en) * 2022-03-31 2023-11-14 广州大学 Internet of things resource credibility evaluation method based on heterogeneous information map

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103327029A (en) * 2013-07-09 2013-09-25 腾讯科技(深圳)有限公司 Malicious URL (Uniform Resource Locator) detection method and malicious URL detection device
CN103685307A (en) * 2013-12-25 2014-03-26 北京奇虎科技有限公司 Method, system, client and server for detecting phishing fraud webpage based on feature library
CN104598595A (en) * 2015-01-23 2015-05-06 安一恒通(北京)科技有限公司 Fraud webpage detection method and corresponding device
CN104615760A (en) * 2015-02-13 2015-05-13 北京瑞星信息技术有限公司 Phishing website recognizing method and phishing website recognizing system
CN105323210A (en) * 2014-06-10 2016-02-10 腾讯科技(深圳)有限公司 Method, apparatus and cloud server for detecting website security

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103327029A (en) * 2013-07-09 2013-09-25 腾讯科技(深圳)有限公司 Malicious URL (Uniform Resource Locator) detection method and malicious URL detection device
CN103685307A (en) * 2013-12-25 2014-03-26 北京奇虎科技有限公司 Method, system, client and server for detecting phishing fraud webpage based on feature library
CN105323210A (en) * 2014-06-10 2016-02-10 腾讯科技(深圳)有限公司 Method, apparatus and cloud server for detecting website security
CN104598595A (en) * 2015-01-23 2015-05-06 安一恒通(北京)科技有限公司 Fraud webpage detection method and corresponding device
CN104615760A (en) * 2015-02-13 2015-05-13 北京瑞星信息技术有限公司 Phishing website recognizing method and phishing website recognizing system

Also Published As

Publication number Publication date
CN107547552A (en) 2018-01-05

Similar Documents

Publication Publication Date Title
CN107547552B (en) Website reputation degree evaluation method and device based on website feature identification and relationship topology
JP6093396B2 (en) System and method for developing risk profiles for Internet resources
US11042630B2 (en) Dynamic page similarity measurement
ES2866723T3 (en) Online fraud detection dynamic score aggregation methods and systems
CN104077396B (en) Method and device for detecting phishing website
US8434150B2 (en) Using social graphs to combat malicious attacks
US9692777B2 (en) Monitoring and managing user privacy levels
CN106295349A (en) Risk Identification Method, identification device and the anti-Ore-controlling Role that account is stolen
JP2015146188A (en) Method and system for protecting against unknown malicious activities by determining evaluation of link
CN111143175A (en) Risk behavior detection method, device, equipment and computer storage medium
Ramesh et al. Identification of phishing webpages and its target domains by analyzing the feign relationship
CN110009297A (en) A kind of fiduciary qualification signal auditing method, device and equipment
CN112819611A (en) Fraud identification method, device, electronic equipment and computer-readable storage medium
CN105208009A (en) Safety detection method and apparatus of account number
CN114091042A (en) Risk early warning method
Layton et al. Unsupervised authorship analysis of phishing webpages
WO2019052469A1 (en) Network request processing method and apparatus, electronic device, and storage medium
CN111861733B (en) Fraud prevention and control system and method based on address fuzzy matching
CN114553517A (en) Nonlinear weighted network security assessment method, device, equipment and storage medium
Liu et al. Financial websites oriented heuristic anti-phishing research
US10158659B1 (en) Phony profiles detector
JP2007233904A (en) Forged site detection method and computer program
Shah et al. Chrome Extension for Detecting Phishing Websites
GB2512754A (en) System and method for developing a risk profile for an internet resource
CN116915893A (en) Information display method, information display device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 310052 188 Lianhui street, Xixing street, Binjiang District, Hangzhou, Zhejiang Province

Applicant after: Hangzhou Annan information technology Limited by Share Ltd

Address before: Zhejiang Zhongcai Building No. 68 Binjiang District road Hangzhou City, Zhejiang Province, the 310051 and 15 layer

Applicant before: Dbappsecurity Co.,ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant