CN109543118B - Web landmark reliability assessment method and device based on multi-layer decision - Google Patents

Web landmark reliability assessment method and device based on multi-layer decision Download PDF

Info

Publication number
CN109543118B
CN109543118B CN201811338745.3A CN201811338745A CN109543118B CN 109543118 B CN109543118 B CN 109543118B CN 201811338745 A CN201811338745 A CN 201811338745A CN 109543118 B CN109543118 B CN 109543118B
Authority
CN
China
Prior art keywords
candidate
landmark
landmarks
web
filtering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811338745.3A
Other languages
Chinese (zh)
Other versions
CN109543118A (en
Inventor
尹美娟
杨文�
刘晓楠
陈静
罗向阳
孙志豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN201811338745.3A priority Critical patent/CN109543118B/en
Publication of CN109543118A publication Critical patent/CN109543118A/en
Application granted granted Critical
Publication of CN109543118B publication Critical patent/CN109543118B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention belongs to the technical field of network security application, and particularly relates to a Web landmark reliability assessment method and device based on multilayer decision, wherein the method comprises the following steps: resolving the IP address in the candidate landmark; aiming at the analyzed candidate landmarks, filtering the candidate landmarks by using a filter, and deleting invalid data; and evaluating the filtered candidate landmarks to obtain the credibility value of the candidate landmarks. On the premise of not depending on path detection, the method fully utilizes the Internet public service, carries out classification filtering on invalid Web landmarks with different characteristics, quantificationally evaluates the reliability of the landmarks, realizes automatic evaluation of large-scale Web landmarks, and solves the problems of low accuracy, low efficiency, automatic quantitative operation of large-scale Web landmarks and the like of the current method; the filtering of the invalid landmarks and the quantitative evaluation of the credibility of the valid landmarks are realized, the accuracy of landmark acquisition and the accuracy of positioning results are effectively improved, and the method has important guiding significance on the accurate acquisition technology of the entity landmarks such as network servers.

Description

Web landmark reliability assessment method and device based on multi-layer decision
Technical Field
The invention belongs to the technical field of network security application, and particularly relates to a Web landmark reliability assessment method and device based on multi-layer decision-making.
Background
A network entity positioning technology, i.e., an IP positioning technology, is a technology for determining a geographical location of a network entity through an IP address, and the technology is widely applied to targeted advertisement delivery, region-based content customization, network performance optimization, and the like. The positioning technology based on the landmark is widely applied with higher accuracy and reliability, a large number of high-density and high-precision network landmarks form the key basic support of IP positioning, and meanwhile, the stability of the landmark also directly influences the IP positioning effect. Web servers distributed all over the world are widely distributed, large in quantity and large in scale, and the relation between an IP address and a geographic position of the Web servers is relatively fixed, so that the Web servers are ideal choices of network landmarks, and the landmark is called a Web landmark for short. The existing Web landmark mining method takes the acquired geographic position of an organization to which a Web website belongs as the geographic position of a landmark, however, due to the large number of host hosting, shared hosts and CND networks, particularly the rapid development of cloud services in recent years, the geographic position provided by the Web landmark is not necessarily the real position of a Web server, and effective support cannot be provided for IP positioning. It is therefore necessary to adopt an efficient algorithm to evaluate the reliability of the geographical location information of the Web landmark.
The methods currently used for Web evaluation are mainly: the method is based on homepage redirection, referred to as LVM for short, and filters CDN network and shared host by homepage redirection and mail zone information contrast, and can filter partial invalid landmarks, but not redirection is not necessarily a credible landmark, and many websites do not support IP address access, so that the accuracy of Web landmarks obtained by the method is not high, and the evaluation speed is slow because of the need of twice webpage access to each landmark and the redirection judgment, and the method is not suitable for large-scale landmark evaluation; the method is based on a street-level landmark evaluation method of a nearest common router, SLE for short, the method groups landmarks according to access routes, and estimates the credibility according to whether the landmarks in the group meet the constraint relation of network delay and geographic distance, thereby greatly improving the accuracy of the landmarks.
Disclosure of Invention
Therefore, the invention provides a method and a device for evaluating the reliability of a Web landmark based on multi-layer decision, which solve the problems of low accuracy, low efficiency, incapability of automatic operation and the like of the current landmark acquisition.
According to the design scheme provided by the invention, the Web landmark reliability evaluation method based on the multilayer decision comprises the following contents:
resolving the IP address in the candidate landmark;
aiming at the analyzed candidate landmarks, filtering the candidate landmarks by using a filter, and deleting invalid data;
and evaluating the filtered candidate landmarks to obtain the credibility value of the candidate landmarks.
In the above, the candidate landmark analyzing process includes the following steps:
grouping the candidate landmarks according to the domain name, and deleting non-standard data, wherein the non-standard data comprises candidate landmarks which do not provide the domain name and are unqualified;
performing DNS holographic analysis, performing DNS query on domain names by using a plurality of DNS servers distributed around the world, combining the record information returned by the DNS servers, and generating an IP address list mapped by the domain names;
aiming at the IP address list mapped by the domain name, if the domain name only contains one IP address, the IP address is given to a candidate landmark corresponding to the domain name; if the domain name contains n IP addresses, the candidate landmark corresponding to the domain name is copied into n shares, and each share is assigned with one IP address in the IP address list, wherein n is an integer greater than 1.
The process of filtering the analyzed landmark candidates by using the filter includes the following steps:
and for the analyzed candidate landmarks, sequentially grouping and filtering according to the domain name and the IP address, respectively sending Http requests to the domain name and the IP address thereof provided in the Web landmark for the remaining candidate landmarks after filtering, filtering the landmarks with inconsistent return results, and setting an initial value of the reliability of the candidate landmarks according to the return results.
Preferably, the analyzed candidate landmarks are sequentially grouped and filtered according to domain names and IP addresses, and the contents include the following: firstly, grouping according to the domain names of the candidate landmarks, extracting the declaration positions in each group of candidate landmarks, acquiring the distribution radius of the declaration positions, and deleting the group of the candidate landmarks of which the distribution radius exceeds a preset value; then, grouping the candidate landmarks according to the IP addresses, extracting a domain name list corresponding to each group of IP addresses, combining sub domain names of the same website, counting the number of domain names, and deleting the candidate landmark groups with non-unique number of domain names; and traversing each candidate landmark, and deleting the candidate landmarks of which the IP addresses are distributed in more than two/24 network segments according to the IP addresses obtained by analysis.
Preferably, the evaluation process is performed on the filtered candidate landmarks, and includes the following steps:
traversing each candidate landmark, and determining the number of domain names borne by the IP address of the candidate landmark; correcting the initial value of the reliability of the candidate landmarks according to the number of the domain names;
and comparing the Web landmark and the Whois registration information of the IP thereof with the national province and city information of the candidate landmark extracted from the third party library, adjusting the corrected credibility, and writing the adjusted credibility value into the candidate landmark.
Preferably, the adjusting process of the corrected reliability includes the following steps: comparing the Whois registration information of the Web landmark and the IP thereof to obtain the similarity of the information, and carrying out weighting adjustment on the credibility according to the similarity; and matching the extracted national provincial and municipal information of the candidate landmarks through a third party library, and adjusting the credibility value according to the matching degree.
Furthermore, the Whois registration information at least includes organization name, administrative division and contact address.
Preferably, traversing each candidate landmark, and determining the number of domain names carried by the IP address of the candidate landmark, the method includes the following steps:
traversing each candidate landmark to obtain an IP address of the candidate landmark;
using a plurality of reverse-check websites to query the domain name carried by the IP address, and combining the domain name list of the query result; performing DNS holographic query on the domain names in the combined domain name list to obtain an IP address list of the domain names, and deleting the domain names which do not contain the candidate landmark IP addresses in the domain name list to obtain a domain name list carried by the candidate landmark IP addresses;
and combining the sub domain names of the same website in the domain name list, and counting the total number of the domain names.
A Web landmark reliability assessment device based on multi-layer decision-making comprises a parsing module, a filtering module and an assessment module, wherein,
the resolving module is used for resolving the IP addresses in the candidate landmarks;
the filtering module is used for filtering the candidate landmarks by using a filter according to the analyzed candidate landmarks and deleting invalid data;
and the evaluation module is used for evaluating the filtered candidate landmarks to obtain the credibility values of the candidate landmarks.
In the above device, the filtering module comprises a first filtering submodule, a second filtering submodule and an initial value obtaining submodule, wherein the first filtering submodule, the second filtering submodule and the initial value obtaining submodule are arranged in the filtering module
The filtering submodule I is used for grouping the analyzed candidate landmarks according to the domain names of the candidate landmarks, extracting the declaration positions in each group of candidate landmarks, acquiring the distribution radius of the declaration positions, and deleting the candidate landmark groups with the distribution radius exceeding the preset value;
the filtering submodule II is used for grouping the candidate landmarks according to the IP addresses, extracting a domain name list corresponding to each group of IP addresses, combining the subdomain names of the same website, counting the number of the domain names and deleting the candidate landmark groups with the number of the domain names which is not unique; traversing each candidate landmark, and deleting the candidate landmarks of which the IP addresses are distributed in more than two network segments according to the IP addresses obtained by analysis;
and the initial value acquisition submodule is used for respectively sending Http requests to the domain name and the IP address thereof provided in the Web landmark aiming at the reserved candidate landmarks after filtering, filtering the landmarks with inconsistent return results, and setting the initial value of the reliability of the candidate landmarks according to the return results.
The invention has the beneficial effects that:
according to the invention, on the premise of not depending on path detection, the Internet public service is fully utilized, the classification and filtration of invalid Web landmarks with different characteristics are carried out, the reliability of the landmarks is quantitatively evaluated, the automatic evaluation of large-scale Web landmarks is realized, and the problems of low accuracy, low efficiency, incapability of automatic operation and the like of the current method are solved; furthermore, according to the mapping relation characteristics of the invalid candidate landmark domain name and the IP address, filtering the candidate landmarks using the shared host, the CDN network and the cloud server; the reliability of the landmark information is further inferred by comprehensively utilizing homepage redirection, Whois service, third-party IP library and the like, so that the credibility value of the landmark is quantized, the filtering of an invalid landmark and the quantitative evaluation of the credibility of an effective landmark are realized, the automatic quantitative evaluation problem of the large-scale Web landmark is solved, the accuracy of landmark acquisition and the accuracy of a positioning result are effectively improved, and the method has important guiding significance for the accurate acquisition technology of the entity landmark such as a network server.
Description of the drawings:
FIG. 1 is a schematic flow chart of an evaluation method in an embodiment;
FIG. 2 is a schematic diagram of an exemplary analysis process;
FIG. 3 is a schematic diagram of a sub-process of candidate landmark evaluation in an embodiment;
FIG. 4 is a flowchart illustrating the domain name counting process in an embodiment;
FIG. 5 is a schematic view of an evaluation apparatus according to an embodiment;
FIG. 6 is a schematic diagram of a filtration module in an embodiment;
FIG. 7 is a diagram of an evaluation framework in an embodiment
FIG. 8 is a schematic diagram illustrating the verification of the positioning accuracy based on 146 IP addresses in Zhengzhou in the embodiment;
fig. 9 is a schematic diagram illustrating the verification of the positioning accuracy based on 119 IP addresses in beijing in the embodiment.
The specific implementation mode is as follows:
in order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described in detail below with reference to the accompanying drawings and technical solutions.
At present, in the evaluation of the reliability of the Web landmark, the situations of low accuracy, low evaluation speed, unsuitability for large-scale landmark evaluation, and the like exist, and the embodiment of the present invention, as shown in fig. 1, provides a method for evaluating the reliability of the Web landmark based on a multi-layer decision, which includes the following contents:
s101, resolving IP addresses in the candidate landmarks;
s102, aiming at the analyzed candidate landmarks, filtering the candidate landmarks by using a filter, and deleting invalid data;
and S103, evaluating the filtered candidate landmarks to obtain the credibility value of the candidate landmarks.
On the premise of not depending on path detection, the Internet public service is fully utilized, the classification and filtration of invalid Web landmarks with different characteristics are carried out, the reliability of the landmarks is quantitatively evaluated, the automatic evaluation of large-scale Web landmarks is realized, and the problems of low accuracy, low efficiency, incapability of automatic operation and the like of the current method are solved.
In the preprocessing process of analyzing the candidate landmarks, referring to fig. 2, a further embodiment of the present invention includes the following steps in the process of analyzing the candidate landmarks:
s1001, grouping the candidate landmarks according to domain names, and deleting non-standard data, wherein the non-standard data comprises candidate landmarks which do not provide domain names and are unqualified;
s1002, respectively carrying out DNS query on domain names by using a plurality of DNS servers distributed in the world, combining the returned record information of the DNS servers, and generating an IP address list mapped by the domain names;
s1003, aiming at the IP address list mapped by the domain name, if the domain name only contains one IP address, the IP address is given to a candidate landmark corresponding to the domain name; if the domain name contains n IP addresses, the candidate landmark corresponding to the domain name is copied into n shares, and each share is assigned with one IP address in the IP address list, wherein n is an integer greater than 1.
Taking the evaluation of the web candidate landmark in beijing as an example for explanation, in the DNS query, 23 DNS servers distributed around the world may be used, as shown in table 1, to perform DNS query on domain name distribution, merge a record information returned by each server, and generate an IP address list mapped by the domain name.
Table 1: DNS server geographical distribution for DNS holographic query
Figure BDA0001861935350000061
In another embodiment of the present invention, a process for filtering candidate landmarks by using a filter includes the following steps: and for the analyzed candidate landmarks, sequentially grouping and filtering according to the domain name and the IP address, respectively sending Http requests to the domain name and the IP address thereof provided in the Web landmark for the remaining candidate landmarks after filtering, filtering the landmarks with inconsistent return results, and setting an initial value of the reliability of the candidate landmarks according to the return results. And the characteristics of the invalid landmarks are subjected to layered filtering, so that the evaluation efficiency and accuracy are improved.
The hierarchical filtering is performed according to the domain name and the IP address in sequence, and in another embodiment of the invention, the hierarchical filtering content design comprises the following steps: firstly, grouping according to the domain names of the candidate landmarks, extracting the declaration positions in each group of candidate landmarks, acquiring the distribution radius of the declaration positions, and deleting the group of the candidate landmarks of which the distribution radius exceeds a preset value; then, grouping the candidate landmarks according to the IP addresses, extracting a domain name list corresponding to each group of IP addresses, combining sub domain names of the same website, counting the number of domain names, and deleting the candidate landmark groups with non-unique number of domain names; and traversing each candidate landmark, and deleting the candidate landmarks of which the IP addresses are distributed in more than two network segments according to the IP addresses obtained by analysis.
Specifically, the invalid landmarks are hierarchically filtered through domain name position filtering, same IP filtering, same domain name filtering, redirection and the like, wherein the domain name position filtering is used for grouping the candidate landmarks according to the domain name, extracting the declaration positions of each group of candidate landmarks, calculating the distribution radius of the positions, and deleting the distribution radius exceeding RDThe set of candidate landmarks of (1); the same IP filtering is carried out, candidate landmarks are grouped according to IP addresses, a domain name list corresponding to each group of IP addresses is extracted, the number of domain names is counted after the same website sub-domain names are combined, and candidate landmark groups with the number of the domain names not unique are deleted; filtering the same domain name, traversing each candidate landmark, inquiring IP address information obtained by DNS holographic resolution according to the domain name, and deleting the candidate landmarks of which the IP addresses are distributed in more than two/24 network segments; redirecting the homepage, traversing each candidate landmark, respectively sending Http requests to the domain name and the IP address thereof provided in the Web landmark, filtering the landmarks with inconsistent returned HTML results, and generating an initial value r of the credibility according to the returned results0
Figure BDA0001861935350000071
Wherein resIPHTML result, res, obtained by Http request, representing IP address structure of Web landmarkdominHTML results indicating its domain name, null indicating no content in the returned result, delete indicating filtering the candidate placemark. Filtering the candidate landmarks using the shared host, the CDN network and the cloud server according to the mapping relation characteristics of the invalid candidate landmark domain name and the IP address; and the reliability of the landmark information is further inferred by comprehensively utilizing homepage redirection, Whois service, third-party IP library and the like to quantify the reliability value of the landmark, thereby realizing the filtration of the invalid landmark and the quantitative evaluation of the reliability of the valid landmark.
In yet another embodiment of the present invention, referring to fig. 3, the evaluation process includes the following steps:
s3001, traversing each candidate landmark, and determining the number of domain names borne by the IP address of the candidate landmark; correcting the initial value of the credibility of the candidate landmarks according to the number of the domain names;
s3002, by comparing the Whois registration information of the Web landmark and the IP thereof with the national province and city information of the candidate landmark extracted from the third party library, the corrected credibility is adjusted, and the adjusted credibility value is written into the candidate landmark.
Preferably, the adjusting process of the corrected reliability includes the following steps: comparing the Whois registration information of the Web landmark and the IP thereof to obtain the similarity of the information, and carrying out weighting adjustment on the credibility according to the similarity; and matching the extracted national provincial and municipal information of the candidate landmarks through a third party library, and adjusting the credibility value according to the matching degree. Wherein, the Whois registration information at least includes organization name, administrative division and contact address.
Traversing each candidate landmark, determining the number of domain names carried by the IP address of the candidate landmark, as shown in fig. 4, the steps in a further embodiment of the present invention are as follows:
s3101, traversing each candidate landmark to obtain the IP address;
s3102, a plurality of reverse checking websites are used for inquiring the domain name carried by the IP address, and a domain name list of the inquiry result is merged; performing DNS holographic query on the domain names in the combined domain name list to obtain an IP address list of the domain names, and deleting the domain names which do not contain the candidate landmark IP addresses in the domain name list to obtain a domain name list carried by the candidate landmark IP addresses;
s3103, combining the sub domain names of the same website in the domain name list, and counting the total number of the domain names.
IP back-check reasoning, traversing each candidate landmark, determining the number of the carried domain names of the landmark by adopting a reverse check method for the IP address of the landmark, and then correcting the initial credibility of the landmark according to the number of the carried domain names to obtain the credibility r1. Acquiring an IP address of each candidate landmark by traversing the candidate landmark; and inquiring the IP address by using a reverse verification method, firstly, inquiring the domain name carried by the IP address by using a plurality of reverse checking websites, such as a table 2, merging a result list, then, carrying out DNS (domain name system) holographic inquiry on the domain name in the domain name list to obtain an IP address list of the domain name which does not contain the landmark IP address, deleting the domain name which does not contain the landmark IP address in the list, and finally obtaining the domain name list carried by the landmark IP address.
Table 2: reverse checking website test result
Figure BDA0001861935350000081
Merging the sub-domain names of the same website in the domain name list, merging the sub-domain names and the domain names in the landmark into the list, and counting the total number n of the domain names; the reliability of the Web landmark is corrected to obtain the reliability r1
r1=(1-pd)r0+pdf(n) (7)
f(n)=e1-n,(n=1,2,...) (8)
Wherein p isdAnd the credibility weight of the IP back-check information is obtained. The back-check does not guarantee that all domain names are obtained, so the IP back-check result is used as an evaluation reference and not a filtering standard.
Comparing Whois registration information of Web and IP thereof, and weighting and adjusting the credibility according to the similarity of information such as organization names, administrative divisions, contact information and the like to obtain the credibility r2
r2=(1-pw)r1+pwwr(2)
wr=kcwc+kowo+(1-kc-ko)wd(3)
wc=kcowco+kprwpr+(1-kco-kpr)wci(4)
Wherein p iswIs a confidence weight, k, of the Whois informationc,koRegister weights, w, of administrative districts and registration authorities for whois, respectivelyc,wo,wdThe matching index of the administrative district, the registration institution and the registration domain name is registered for whois, and the value range is 0-1, k is calculated by an LCS methodco,kprWeight, w, of the country and provincial administrative district, respectivelyco,wpr,wciThe granularity of administrative district matching for whois registration information and landmark information is shown in table 3.
Table 3: administrative region matching degree assignment rule
Figure BDA0001861935350000091
Extracting the country and province and city information of the landmark from public data such as a third-party free database, comparing the information with the data obtained by obtaining the position of the IP through the IP, namely obtaining the data through the IP2location DB9, and calculating a correction coefficient l according to the matching degreerAnd obtaining the reliability r of the landmark and writing the reliability value into the landmark.
r=(1-pl)r2+pllr(5)
lr=kcowLco+kprwLpr+(1-kco-kpr)wLci(6)
Wherein p islIs a confidence weight, w, of the IP location repositoryLco,wLpr,wLciThe administrative district matching granularity of the IP location base information and the landmark information is shown in table 3.
Aiming at the mapping characteristics of the domain name and the IP of an invalid Web landmark, DNS holographic query for effectively analyzing all IP addresses of the domain name is carried out, reverse verification of an IP bearing domain name is obtained to the maximum extent, the landmarks are classified and screened layer by adopting the thought of a decision tree, the credibility of the landmarks is evaluated by comprehensively utilizing open data and services, the credible landmarks with quantitative credibility are obtained, the defects of the existing evaluation method can be overcome, high-credible landmarks are obtained, and the accuracy and the positioning accuracy of the landmarks are obviously improved.
Based on the reliability assessment method, the embodiment of the present invention further provides a Web landmark reliability assessment apparatus based on multi-layer decision, as shown in fig. 5, which includes a parsing module 101, a filtering module 102, and an assessment module 103, wherein,
the resolving module 101 is used for resolving the IP addresses in the candidate landmarks;
the filtering module 102 is configured to filter the analyzed candidate landmarks by using a filter, and delete invalid data;
and the evaluation module 103 is configured to evaluate the filtered candidate landmarks to obtain a reliability value thereof.
In the above-mentioned apparatus, the filtering module 102 includes a first filtering submodule 201, a second filtering submodule 202 and an initial value obtaining submodule 203, wherein the first filtering submodule 201, the second filtering submodule 202 and the initial value obtaining submodule 203 are included in the filtering module
The filtering submodule I201 is used for grouping the analyzed candidate landmarks according to the domain names of the candidate landmarks, extracting the declaration positions in each group of candidate landmarks, acquiring the distribution radius of the declaration positions, and deleting the candidate landmark groups with the distribution radius exceeding the preset value;
the second filtering submodule 202 is used for grouping the candidate landmarks according to the IP addresses, extracting a domain name list corresponding to each group of IP addresses, combining the subdomain names of the same website, counting the number of the domain names and deleting the candidate landmark groups with the number of the domain names which is not unique; traversing each candidate landmark, and deleting the candidate landmarks of which the IP addresses are distributed in more than two network segments according to the IP addresses obtained by analysis;
and the initial value acquisition submodule 203 is used for respectively sending Http requests to the domain name and the IP address thereof provided in the Web landmark aiming at the candidate landmarks which are reserved after filtering, filtering the landmarks with inconsistent return results, and setting the initial value of the reliability of the candidate landmarks according to the return results.
In the embodiment of the invention, as shown in fig. 7, hierarchical filtering is performed on the characteristics of invalid landmarks, and the reliability of the Web landmarks is evaluated by public data such as public services and third-party free databases. The method solves the problem of automatic quantitative evaluation of large-scale Web landmarks, realizes the filtration of invalid Web landmarks and the quantitative evaluation of the credibility of the valid Web landmarks, and effectively improves the accuracy of the landmarks and the accuracy of positioning results.
In order to verify the effectiveness of the method, the distribution adopts a cross-validation and positioning comparison method to evaluate the effectiveness of the method.
The cross validation is a method for judging the landmark evaluation effect by comparing with a third-party IP position base with a different source from a landmark mining and evaluation method and according to the overlapping proportion of the third-party IP position base and the landmark mining and evaluation method. Candidate landmarks for 5 cities were evaluated using Evaluator, LVM and SLE methods, respectively, dividing landmarks for each city into 5 groups: (1) a set of candidate landmarks; (2) evaluating the obtained landmarks by an LVM method; (3) evaluating the obtained landmarks by the Evaluator framework; (4) evaluating the Evaluator framework to obtain a landmark with the credibility of more than 0.5; (5) landmarks with confidence levels greater than 0.8 are evaluated by the Evaluator framework. And comparing each group of landmarks with the query result of the Maxmind database, wherein the accuracy of the landmarks is reflected to a certain extent by the height of the overlapping rate.
TABLE 4 statistics of landmark entries mined by two methods
Figure BDA0001861935350000111
It can be seen from table 4 that the two evaluation methods both greatly improve the accuracy of the landmark, the evaluation scheme of the invention is more excellent, and the accuracy of the landmark is obviously improved. The positioning verification is a method for verifying the landmark effectiveness by positioning an IP address of a known position, and comprises the steps of firstly, respectively adopting the scheme of the invention, a LVM method based on homepage redirection and an SLE method based on a recent common route to evaluate the landmark credibility aiming at candidate landmarks of Zhengzhou city and Beijing city, then respectively and manually marking the IP addresses (146 Zhengzhou and 119 Beijing) with reliable positions in two cities, finally positioning the IP addresses of the known position by using the credible landmarks obtained through evaluation, and counting the positioning errors. The distribution of the statistical cumulative probability density curves of the positioning errors of the three methods of Zhengzhou and Beijing is shown in FIG. 8 and FIG. 9. As can be seen, in the positioning of Zheng State City, the average error of the scheme of the invention is 9.1 km, which is similar to the accuracy of SLE method (average error of 8.6 km), and far exceeds the positioning accuracy of LVM method (average error of 19.7 km). In the positioning to Beijing, the average error of the scheme of the invention is 7.3 km, the SLE is 6.6 km and the average error of the LVM method is 23.4 km. Therefore, the scheme of the invention greatly improves the positioning accuracy on the basis of the LVM, is similar to the SLE method with the highest current accuracy, avoids the time overhead of repeated detection of the SLE method, and shows the effectiveness of the credible landmark evaluated by the invention.
Based on the foregoing method, an embodiment of the present invention further provides a server, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method described above.
Based on the above method, the embodiment of the present invention further provides a computer readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the above method.
Unless specifically stated otherwise, the relative steps, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.
The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In all examples shown and described herein, any particular value should be construed as merely exemplary, and not as a limitation, and thus other examples of example embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (9)

1. A Web landmark reliability assessment method based on multi-layer decision is characterized by comprising the following contents:
resolving the IP address in the candidate landmark;
aiming at the analyzed candidate landmarks, filtering the candidate landmarks by using a filter, and deleting invalid data;
evaluating the filtered candidate landmarks to obtain a credibility value of the candidate landmarks;
the candidate landmark resolving process comprises the following steps:
grouping the candidate landmarks according to the domain name, and deleting non-standard data, wherein the non-standard data comprises candidate landmarks which do not provide the domain name and are unqualified;
respectively carrying out DNS query on domain names by using a plurality of DNS servers distributed globally, combining the returned record information of each DNS server, and generating an IP address list mapped by the domain names;
aiming at the IP address list mapped by the domain name, if the domain name only contains one IP address, the IP address is given to a candidate landmark corresponding to the domain name; if the domain name contains n IP addresses, the candidate landmark corresponding to the domain name is copied into n shares, and each share is assigned with one IP address in the IP address list, wherein n is an integer greater than 1.
2. The method for evaluating the reliability of the Web landmark based on the multi-layer decision as claimed in claim 1, wherein the process of filtering the candidate landmark by using the filter for the parsed candidate landmark comprises the following steps:
and for the analyzed candidate landmarks, sequentially grouping and filtering according to the domain name and the IP address, respectively sending Http requests to the domain name and the IP address thereof provided in the Web landmark for the remaining candidate landmarks after filtering, filtering the landmarks with inconsistent return results, and setting an initial value of the reliability of the candidate landmarks according to the return results.
3. The Web landmark reliability assessment method based on multi-layer decision-making according to claim 2, wherein the analyzed candidate landmarks are sequentially grouped and filtered according to domain names and IP addresses, and the contents are as follows: firstly, grouping according to the domain names of the candidate landmarks, extracting the declaration positions in each group of candidate landmarks, acquiring the distribution radius of the declaration positions, and deleting the group of the candidate landmarks of which the distribution radius exceeds a preset value; then, grouping the candidate landmarks according to the IP addresses, extracting a domain name list corresponding to each group of IP addresses, combining sub domain names of the same website, counting the number of domain names, and deleting the candidate landmark groups with non-unique number of domain names; and traversing each candidate landmark, and deleting the candidate landmarks of which the IP addresses are distributed in more than two network segments according to the IP addresses obtained by analysis.
4. The method for evaluating the reliability of the Web landmark based on the multi-layer decision as claimed in claim 2, wherein the evaluation process for the filtered candidate landmark comprises the following steps:
traversing each candidate landmark, and determining the number of domain names borne by the IP address of the candidate landmark; according to the number of the domain names, a formula r is utilized1=(1-pd)r0+pdf (n) correcting the initial confidence value of the candidate landmark, wherein pdF (n) e as confidence weight of IP back-check information1-n,n=1,2,...,r0The initial credibility of the landmark is taken as n is the total number of the domain names counted;
extracting national province and city information of candidate landmarks from the Whois registration information of the Web landmarks and IP thereof and a third party library by using a formula r ═ 1-pl)r2+pllrRepair theAdjusting the corrected reliability, and writing the adjusted reliability into the candidate landmarks, wherein r2=(1-pw)r1+pwwr,pwIs a confidence weight, w, of the Whois informationr=kcwc+kowo+(1-kc-ko)wd,wc=kcowco+kprwpr+(1-kco-kpr)wci,kc、koRegister weights, w, of administrative districts and registration authorities for whois, respectivelyc、wo、wdRegistration of administrative districts, registrars and registration domains for whois, kco、kprWeight, w, of the country and provincial administrative district, respectivelyco、wpr、wciGranularity of administrative region matching, l, for whois registration information and landmark informationr=kcowLco+kprwLpr+(1-kco-kpr)wLci,wLco、wLpr、wLciThe granularity of the administrative district match for the IP location repository information and landmark information, respectively.
5. The method for evaluating the reliability of the Web landmark based on the multi-layer decision as claimed in claim 4, wherein the adjusting process of the corrected credibility comprises the following steps: comparing the Whois registration information of the Web landmark and the IP thereof to obtain the similarity of the information, and carrying out weighting adjustment on the credibility according to the similarity; and matching the extracted national provincial and municipal information of the candidate landmarks through a third party library, and adjusting the credibility value according to the matching degree.
6. The method for evaluating the reliability of Web landmarks based on multi-layer decision making as claimed in claim 5, wherein the Whois registration information at least includes organization name, administrative division and contact address.
7. The Web landmark reliability assessment method based on multi-layer decision-making according to claim 4, wherein traversing each candidate landmark, determining the number of domain names carried by the IP address of the candidate landmark, comprises the following steps:
traversing each candidate landmark to obtain an IP address of the candidate landmark;
using a plurality of reverse-check websites to query the domain name carried by the IP address, and combining the domain name list of the query result; performing DNS holographic query on the domain names in the combined domain name list to obtain an IP address list of the domain names, and deleting the domain names which do not contain the candidate landmark IP addresses in the domain name list to obtain a domain name list carried by the candidate landmark IP addresses;
and combining the sub domain names of the same website in the domain name list, and counting the total number of the domain names.
8. A Web landmark reliability assessment device based on multi-layer decision is characterized by being realized based on the Web landmark reliability assessment method based on multi-layer decision of claim 1, and comprising a parsing module, a filtering module and an assessment module,
the resolving module is used for resolving the IP addresses in the candidate landmarks;
the filtering module is used for filtering the candidate landmarks by using a filter according to the analyzed candidate landmarks and deleting invalid data;
and the evaluation module is used for evaluating the filtered candidate landmarks to obtain the credibility values of the candidate landmarks.
9. The device for evaluating the reliability of Web landmarks based on multi-layer decision as claimed in claim 8, wherein the filtering module comprises a first filtering submodule, a second filtering submodule and an initial value obtaining submodule, wherein
The filtering submodule I is used for grouping the analyzed candidate landmarks according to the domain names of the candidate landmarks, extracting the declaration positions in each group of candidate landmarks, acquiring the distribution radius of the declaration positions, and deleting the candidate landmark groups with the distribution radius exceeding the preset value;
the filtering submodule II is used for grouping the candidate landmarks according to the IP addresses, extracting a domain name list corresponding to each group of IP addresses, combining the subdomain names of the same website, counting the number of the domain names and deleting the candidate landmark groups with the number of the domain names which is not unique; traversing each candidate landmark, and deleting the candidate landmarks of which the IP addresses are distributed in more than two network segments according to the IP addresses obtained by analysis;
and the initial value acquisition submodule is used for respectively sending Http requests to the domain name and the IP address thereof provided in the Web landmark aiming at the reserved candidate landmarks after filtering, filtering the landmarks with inconsistent return results, and setting the initial value of the reliability of the candidate landmarks according to the return results.
CN201811338745.3A 2018-11-12 2018-11-12 Web landmark reliability assessment method and device based on multi-layer decision Active CN109543118B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811338745.3A CN109543118B (en) 2018-11-12 2018-11-12 Web landmark reliability assessment method and device based on multi-layer decision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811338745.3A CN109543118B (en) 2018-11-12 2018-11-12 Web landmark reliability assessment method and device based on multi-layer decision

Publications (2)

Publication Number Publication Date
CN109543118A CN109543118A (en) 2019-03-29
CN109543118B true CN109543118B (en) 2020-06-12

Family

ID=65846850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811338745.3A Active CN109543118B (en) 2018-11-12 2018-11-12 Web landmark reliability assessment method and device based on multi-layer decision

Country Status (1)

Country Link
CN (1) CN109543118B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110119437B (en) * 2019-04-03 2021-04-23 中国人民解放军战略支援部队信息工程大学 Network entity landmark evaluation method and device with error upper limit
CN110188954A (en) * 2019-05-31 2019-08-30 中国人民解放军战略支援部队信息工程大学 Terrestrial reference reliability estimation method and device based on POP network
CN111970262B (en) * 2020-08-07 2023-02-28 杭州安恒信息技术股份有限公司 Method and device for detecting third-party service enabling state of website and electronic device
CN113783855B (en) * 2021-08-30 2023-07-21 北京百度网讯科技有限公司 Site evaluation method, apparatus, electronic device, storage medium, and program product
CN114896522B (en) * 2022-04-14 2023-04-07 北京航空航天大学 Multi-platform information epidemic situation risk assessment method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103051717A (en) * 2012-12-25 2013-04-17 北京小米科技有限责任公司 Method, device and equipment for processing http request
CN104168341A (en) * 2014-08-15 2014-11-26 北京百度网讯科技有限公司 IP address locating method and CDN dispatching method and device
CN104333609A (en) * 2014-10-15 2015-02-04 北京百度网讯科技有限公司 IP address positioning method and device thereof
CN104537105A (en) * 2015-01-14 2015-04-22 中国人民解放军信息工程大学 Automatic network physical landmark excavating method based on Web maps

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103051717A (en) * 2012-12-25 2013-04-17 北京小米科技有限责任公司 Method, device and equipment for processing http request
CN104168341A (en) * 2014-08-15 2014-11-26 北京百度网讯科技有限公司 IP address locating method and CDN dispatching method and device
CN104333609A (en) * 2014-10-15 2015-02-04 北京百度网讯科技有限公司 IP address positioning method and device thereof
CN104537105A (en) * 2015-01-14 2015-04-22 中国人民解放军信息工程大学 Automatic network physical landmark excavating method based on Web maps

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于IP地址的网络实体地理位置定位技术研究与实现;朱彬;《基于IP地址的网络实体地理位置定位技术研究与实现》;20150831;19-30 *

Also Published As

Publication number Publication date
CN109543118A (en) 2019-03-29

Similar Documents

Publication Publication Date Title
CN109543118B (en) Web landmark reliability assessment method and device based on multi-layer decision
US20200026721A1 (en) Method and system for generating a geocode trie and facilitating reverse geocode lookups
Koukoletsos et al. Assessing data completeness of VGI through an automated matching procedure for linear data
JP6091736B2 (en) Method and system for evaluating the quality of location content
US9569522B2 (en) Classifying uniform resource locators
CN108628811B (en) Address text matching method and device
US11681927B2 (en) Analyzing geotemporal proximity of entities through a knowledge graph
CN101313300B (en) Local search
CN106547770B (en) User classification and user identification method and device based on user address information
US20110119268A1 (en) Method and system for segmenting query urls
CN104537105B (en) A kind of network entity terrestrial reference automatic mining method based on Web maps
RU2702048C1 (en) Method of analyzing a source and destination of internet traffic
Christen et al. A probabilistic geocoding system based on a national address file
Goldberg Improving geocoding match rates with spatially‐varying block metrics
Yin et al. A deep learning approach for rooftop geocoding
Li et al. Street‐Level Landmark Evaluation Based on Nearest Routers
CN111026829B (en) Street-level landmark obtaining method based on service identification and domain name association
CN112835877A (en) Epidemic situation big data cleaning method for public burst transactions
KR101773910B1 (en) Location based big data system
CN109783521B (en) IP home location determination method, device and computer storage medium
CN107135281B (en) IP region feature extraction method based on multi-data source fusion
CN108737592A (en) A method of verification IP address resources bank precision
WO2021246954A1 (en) Processing apparatus and method for determining road names
Yin et al. Evaluator: A Multilevel Decision Approach for Web‐Based Landmark Evaluation
CN110866611A (en) Malicious domain name detection method based on SVM machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant