CN111026829A

CN111026829A - Street-level landmark obtaining method based on service identification and domain name association

Info

Publication number: CN111026829A
Application number: CN201911264591.2A
Authority: CN
Inventors: 罗向阳; 李瑞祥; 尹美娟; 徐锐; 杨文�; 郭鑫淼; 杨春芳; 朱玛
Original assignee: Individual
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2019-12-11
Filing date: 2019-12-11
Publication date: 2020-04-17
Anticipated expiration: 2039-12-11
Also published as: CN111026829B

Abstract

The invention discloses a street-level landmark obtaining method based on service identification and domain name association, which comprises the steps of firstly extracting and simplifying features from the scanning result of an IP (Internet protocol) of a known service type to obtain training features, utilizing the training features to train a classifier to obtain an IP classifier, and identifying the service borne by the IP of the unknown service type by using the IP classifier to obtain a server IP; then, based on the relation between the mechanism information and the domain name obtained by statistics, estimating the domain name keyword of the mechanism according to the mechanism name, and constructing a mechanism information base of a target area to realize the mapping between the geographic position of the mechanism and the domain name; finally, converting the identified server IP into a domain name, obtaining the geographic position of the domain name by using the strategies of database query, online map search, mechanism information base matching and the like, thereby obtaining a street-level landmark, and evaluating the reliability of the landmark to obtain a reliable street-level landmark; the invention improves the quantity of the obtained street-level landmarks.

Description

Street-level landmark obtaining method based on service identification and domain name association

Technical Field

The invention relates to the technical field of landmark acquisition, in particular to a street-level landmark acquisition method based on service identification and domain name association.

Background

At present, IP positioning has very high application prospect in the aspects of determining network space boundary, tracking network attack object, positioning hidden communication main body and the like. The IP positioning based on landmarks is a common positioning method with a more accurate positioning result, and the number, precision and reliability of landmarks directly affect the reliability of the positioning result. How to obtain rich reliable landmarks is an urgent problem to be solved in IP positioning. According to the source of the obtained landmark, the existing landmark obtaining methods are mainly divided into a landmark obtaining method based on IP position database query, a landmark obtaining method based on Web pages and a social base.

Currently, some data service companies establish an IP location database (e.g., Baidu, IPIP, IP. cn, MaxMind, etc.) to map IP and geographic locations. The landmark obtaining method based on the position database query is to query the geographic position corresponding to the IP in the existing position database, thereby realizing the landmark obtaining. The method can acquire a large number of landmarks in a short time, but the highest precision of the landmarks provided by the existing database is only the city level, and the reliability of the database is not high. Therefore, it is difficult to obtain a large number of reliable street-level landmarks using this method.

The Web page contains rich geographic position information, and the geographic position in the Web page is associated with the IP address corresponding to the Web domain name, so that landmark acquisition is realized. Based on the thought, Guo C et al propose a Structon method, take a Web page with wide source and huge number as a landmark acquisition source for the first time, and expand the number of landmarks based on a position inference algorithm. The Structon method realizes the acquisition of street-level landmarks, but due to limitations in network bandwidth limitation, difficulty in acquiring URL sources, diversity of Web page structures and the like, the method has difficulty in acquiring a large number of street-level landmarks.

Wang Y et al obtain an organization directory in a specific range by querying based on organization data (which usually includes an organization name, an organization address, an organization domain name, etc.) included in an online map, associate an organization position with an organization domain name IP, and realize landmark acquisition; jiang H et al, based on the information of colleges and universities included in Wikipedia, associates the IP address of the university Web server with the geographic location of the university, and establishes a landmark library.

The online map-based landmark acquisition method and the navigation page-based landmark acquisition method can acquire a plurality of landmarks in one visit or query, and the efficiency of acquiring street-level landmarks is higher, but the number of the acquired landmarks is limited by the amount of the included data.

After analyzing the characteristics of different types of internet forums, zhugue et al propose an internet forum-based urban landmark mining method, guess the geographical position of a forum user set based on semantic information in the forum name, and associate the access IP of the forum user, thereby realizing landmark acquisition. The method also extracts the location nouns searched by the user from the search engine log according to the social relationship between the location nouns searched by the user and the user, and associates the location nouns with the IP used for searching, thereby realizing the acquisition of the landmark. Compared with database query and online map search methods, the two methods can obtain more landmarks, but the landmark position granularity is coarse, the landmark position granularity can only reach the city level, and a large number of street-level landmarks are difficult to obtain.

In addition, other landmark acquisition methods exist, such as acquiring landmarks based on a target cooperation mode, acquiring longitude and latitude data of equipment through a GPS, and associating the longitude and latitude data with an IP address of the equipment to realize landmark acquisition. The method can obtain the high-precision reliable landmarks, but needs the support of hardware, and has high cost for obtaining the landmarks in a large batch.

Therefore, a method for rapidly acquiring a large number of street-level landmarks is needed.

Disclosure of Invention

The invention aims to provide a street-level landmark obtaining method based on service identification and domain name association, which can firstly classify a server IP and obtain a corresponding domain name, then obtain an organization domain name keyword according to organization information, and finally match the server domain name and the organization domain name to realize mapping between the server IP and the organization geographic position, thereby obtaining a street-level landmark.

The technical scheme adopted by the invention is as follows: a street-level landmark obtaining method based on service identification and domain name association comprises the following steps:

step 1: acquiring a plurality of IPs, wherein the IPs comprise an IP of a known service type and an IP of a plurality of unknown service types;

step 2: using a port scanning tool to perform open port scanning on all IP ports to obtain the open condition of each IP port;

and step 3: extracting training features for classification from the scan results of the IP of a known service type: reducing an IP open port of a known service type by adopting a feature reduction algorithm to obtain a minimum feature set, wherein the minimum feature set is used as a training feature;

and 4, step 4: training an IP classifier by using the training characteristics obtained in the step 3, and classifying the IP of the unknown service type by using the trained IP classifier to obtain a server IP;

and 5: acquiring a domain name corresponding to a server IP: performing domain name resolution on the server IP obtained in the step 4 under each DNS respectively to obtain domain name information corresponding to the server IP; if one server IP analyzes a plurality of domain name information, respectively establishing the mapping relation between the IP and the domain name;

step 6: obtaining a city to which an unknown service type IP belongs based on a voting strategy, and constructing an organization information base of the city based on a domain name and an organization name; obtaining mechanism information corresponding to the domain name of each server IP by utilizing one or more of an online map, a mechanism record base and a mechanism information base matching method according to the characteristics of various domain names of the server IP obtained in the step 5, thereby obtaining the association between the server IP and the mechanism geographic position and obtaining street-level candidate landmarks;

and 7: and (4) evaluating the street-level candidate landmarks obtained in the step (6) by using a street-level landmark evaluation method, so as to obtain reliable street-level landmarks.

Preferably, the step 3 comprises the following steps:

3.1: setting m types of service types provided by the IP of the known service type, setting SE (IP) to represent a set of IP structures of the known service types providing the same type of service, and sequentially representing the set of the IP structures of the m known service types providing the same type of service as SE1(IP) and SE2(IP) … SEq (IP) … SEm (IP); q is more than or equal to 1 and less than or equal to m;

setting feature (IP) as a feature set of a single IP in SE (IP); sorting all features (IP) according to the order of the number of elements in the features (IP) from small to large, and sorting the elements in SE (IP); the sorted Feature sets are respectively marked as Feature (IP1), Feature (IP2), … Feature (IPi) … Feature (IPj) … Feature (IPn); elements in se (IP) are correspondingly denoted as IP1, IP2, … IPi … IPj … IPn; i is more than or equal to 1 and less than or equal to j and less than or equal to n, wherein n is the number of the IPs of the known service types in the SE (IP);

the reduction algorithm for SE (IP) is as follows:

if it satisfies

Then IP will be_jDeleted from SE (IP);

until when

All satisfy

The SE (IP) reduced feature set FeatureSet is a union of feature sets FeatureSet (IP) of all the remaining IPs in SE (IP);

3.2: respectively reducing SE1(IP), SE2(IP) … SEq (IP) … SEm (IP) to obtain reduced feature sets Featureset1 and Featureset2 … Featureset … Featureset;

3.3: the minimum feature set is the union of the feature sets FeatureSet1, FeatureSet2 … FeatureSetq … FeatureSetm.

Specifically, in step 6, constructing the organization information base of the city includes the following steps:

the method comprises the steps of obtaining a POI library and a mechanism directory of a target city from public data sets, obtaining mechanism names and categories of mechanisms, and extracting domain name keywords from the mechanism names; for non-English organization names, converting Chinese keywords in the organization names into letter combinations, and taking the letter combinations as domain name keywords; and associating the domain name keywords with the organization names to construct an organization information base.

Specifically, in step 6, the mechanism information base matching method includes the following steps:

and extracting a subdomain name field which implies mechanism information in the domain name of the server IP as an information field, matching the information field with domain name keywords in an mechanism information base, establishing association between the domain name of the server IP and the mechanism name, and finally establishing mapping between the server IP and the mechanism geographical position to obtain street-level candidate landmarks.

Firstly, extracting and simplifying features from the scanning result of the IP of the known service type to obtain training features, training a classifier by using the training features to obtain an IP classifier, and identifying the service borne by the IP of the unknown service type by using the IP classifier to obtain a server IP; then, based on the relation between the mechanism information and the domain name obtained by statistics, estimating the domain name keyword of the mechanism according to the mechanism name, and constructing a mechanism information base of a target area to realize the mapping between the geographic position of the mechanism and the domain name; finally, converting the identified server IP into a domain name, obtaining the geographic position of the domain name by using the strategies of database query, online map search, mechanism information base matching and the like, thereby obtaining a street-level landmark, and evaluating the reliability of the landmark to obtain a reliable street-level landmark; the invention can not only obtain the landmarks of the Web server, but also obtain the landmarks of other types of servers, thereby increasing the number of the obtained street level landmarks and serving for improving the IP positioning and positioning accuracy.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 shows F of the SVM-IP classifier of the present invention when Kernel function is linear and accuracy is Tol 0.001₁A value;

FIG. 3 shows F of the SVM-IP classifier of the present invention when Kernel function Kernel is rbf and accuracy Tol is 0.001₁A value;

FIG. 4 shows F of the SVM-IP classifier of the present invention when Kernel is 0.001 and accuracy is 0.01₁A value;

FIG. 5 shows F of the SVM-IP classifier of the present invention when Kernel function is linear and accuracy is Tol 0.0001₁A value;

FIG. 6 shows F of the SVM-IP classifier of the present invention when Kernel function Kernel is rbf and accuracy Tol is 0.0001₁A value;

FIG. 7 shows F of the SVM-IP classifier of the present invention when Kernel is 0.0001 and accuracy is Kernel is sigmoid₁A value;

FIG. 8 is the average classification F of SVM-IP classifiers of different parameters of the present invention₁A value comparison graph;

FIG. 9 shows the KNN-IP classifier of the present invention adjusting the neighbor node number neighbor nodes to F of uniform₁A value;

FIG. 10 shows the KNN-IP classifier of the present invention adjusting the neighbor node number neighbor distances F₁A value;

FIG. 11 shows the average classification F of the KNN-IP classifiers of different parameters according to the present invention₁A value comparison graph;

FIG. 12 shows an MLP-IP classifier with 1 hidden Level and 1 Activation function ideF with ntity and accuracy Tol of 0.001₁A value;

FIG. 13 shows F with hidden Level 1, Activation function Activation and precision Tol 0.001 for MLP-IP classifier of the present invention₁A value;

fig. 14 shows F1 values of the MLP-IP classifier of the present invention with the hidden Level 1, Activation function Activation relu, and precision Tol 0.001;

fig. 15 shows F1 values of the MLP-IP classifier of the present invention with the hidden Level 1, the Activation function Activation tanh, and the accuracy Tol 0.001;

FIG. 16 shows F with hidden Level 1, Activation function activity and precision Tol of MLP-IP classifier of the present invention being 0.0001₁A value;

FIG. 17 shows F with hidden Level 1, Activation function Activation and precision Tol 0.0001 for MLP-IP classifier of the present invention₁A value;

FIG. 18 shows F with hidden Level 1, Activation function relu, and precision Tol of MLP-IP classifier of the present invention₁A value;

FIG. 19 shows F with hidden Level 1, Activation function Activation tanh and precision Tol of MLP-IP classifier of the present invention₁A value;

FIG. 20 is a diagram illustrating an average classification F of an MLP-IP classifier with different adjusted hidden layer number 2 and different other parameters according to the present invention₁A value comparison graph;

FIG. 21 is F of MLP-IP classifier with hidden Level 2, Activation function identity and precision Tol 0.001₁A value;

FIG. 22 shows F with hidden Level 2, Activation function Activation and precision Tol 0.001 for MLP-IP classifier of the present invention₁A value;

FIG. 23 shows F with hidden Level 2, Activation function relu, and precision Tol of MLP-IP classifier of the present invention₁A value;

FIG. 24 shows an adjusting hidden Level 2 and an activating function Acti of the MLP-IP classifier of the present inventionF with variation of tanh and accuracy of Tol 0.001₁A value;

FIG. 25 is F of MLP-IP classifier with hidden Level 2, Activation function activity and precision Tol 0.0001₁A value;

FIG. 26 shows F with hidden Level 2, Activation function Activation and precision Tol 0.0001 for MLP-IP classifier of the present invention₁A value;

FIG. 27 shows F with hidden Level 2, Activation function Activation relu, and precision Tol of MLP-IP classifier of the present invention₁A value;

FIG. 28 is a diagram of F with the hidden layer number of adjustment Level 2, the Activation function Activation tanh and the precision Tol of the MLP-IP classifier of the present invention being 0.0001₁A value;

FIG. 29 is a diagram illustrating an average classification F of an MLP-IP classifier with different adjusted hidden layer numbers (Level 2) and different other parameters according to the present invention₁A value comparison graph;

FIG. 30 is a graph comparing the number of reliable street level landmarks obtained by the method of the present invention, the Structon method and the Online Maps method;

FIG. 31 is a graph showing the relationship between the positioning error and the cumulative probability of 100 accurate street level landmarks at street level positioning by all reliable landmarks in Beijing, which are obtained by the method of the present invention and the Structon method, respectively;

FIG. 32 is a graph of relationships between positioning errors and cumulative probabilities for 100 accurate street level landmarks at street level using all reliable landmarks in Shanghai obtained by the method of the present invention and the Structon method, respectively;

FIG. 33 is a graph of relationships between positioning errors and cumulative probabilities for 100 accurate street level landmarks at the street level for all reliable landmarks in Guangzhou obtained by the method of the present invention and the Structon method, respectively;

FIG. 34 is a graph of relationships between positioning errors and cumulative probabilities for street-level positioning of 100 accurate street-level landmarks with all reliable landmarks in Shenzhen obtained by the method of the present invention and the Structon method, respectively;

FIG. 35 is a graph of relationships between positioning errors and cumulative probabilities for 100 accurate street level landmarks at street level for all reliable landmarks in hong Kong obtained by the method of the present invention and the Structon method, respectively;

FIG. 36 is a graph of relationships between positioning errors and cumulative probabilities for street-level positioning of 100 accurate street-level landmarks with all reliable landmarks in Wuhan that are obtained by the method of the present invention and the Structon method, respectively;

FIG. 37 is a graph showing the relationship between the positioning error and the cumulative probability of 100 accurate street level landmarks at the street level for all reliable Zheng states landmarks obtained by the method of the present invention and the Structon method, respectively;

FIG. 38 is a graph of relationships between positioning errors and cumulative probabilities for 100 accurate street level landmarks at all reliable landmarks obtained by the method of the present invention and the Structon method, respectively;

FIG. 39 is a graph showing the relationship between the positioning error and the cumulative probability of 100 accurate street-level landmarks at the street-level location of all reliable Kaemphra landmarks obtained by the method of the present invention, the Structon method, and the Online Maps method, respectively;

FIG. 40 is a graph of relationships between positioning errors and cumulative probabilities for 100 accurate street level landmarks at all reliable landmarks in John Nernsberg obtained by the method of the present invention, the Structon method, and the Online Maps method, respectively;

FIG. 41 is a graph of relationships between positioning errors and cumulative probabilities for street-level positioning of 100 accurate street-level landmarks by all reliable Laves landmarks respectively obtained by the method of the present invention, the Structon method, and the Online Maps method;

FIG. 42 is a graph of relationships between positioning errors and cumulative probabilities for 100 accurate street level landmarks at all reliable landmarks in London obtained by the method of the present invention, the Structon method, and the Online Maps method, respectively;

FIG. 43 is a graph of the relationship between the positioning error and the cumulative probability for street-level positioning of 100 accurate street-level landmarks by all reliable landmarks in los Angeles obtained by the method of the present invention, the Structon method, and the Online Maps method, respectively;

FIG. 44 is a graph of the relationship between positioning error and cumulative probability for street level positioning of 100 accurate street level landmarks at all reliable landmarks in Mexico city obtained by the method of the present invention, the Structon method, and the Online Maps method, respectively;

FIG. 45 is a graph of relationships between positioning errors and cumulative probabilities for 100 accurate street level landmarks at the street level for all reliable New York landmarks obtained by the method of the present invention, the Structon method, and the Online Maps method, respectively;

FIG. 46 is a graph showing the relationship between positioning error and cumulative probability for 100 accurate street level landmarks at all reliable landmarks at Ottawa obtained by the method of the present invention, the Structon method, and the Online Maps method, respectively;

FIG. 47 is a graph of relationships between positioning errors and cumulative probabilities for 100 accurate street level landmarks at the street level by all reliable landmarks of Seoul obtained by the method of the present invention, the Structon method, and the Online Maps method, respectively;

FIG. 48 is a graph showing the relationship between the positioning error and the cumulative probability of 100 accurate street-level landmarks at the street-level location of all reliable landmarks in Tokyo obtained by the method of the present invention, the Structon method, and the Online Maps method, respectively.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the method for obtaining street-level landmarks based on service identification and domain name association according to the present invention includes the following steps:

the IP of the unknown service type comprises the types of a host IP, a server IP and the like, and the server IP has the characteristics of stability and small IP change, and after the server IP is classified by the IP classifier in the step 4, only the domain name of the server IP is selected and analyzed, so that the calculation amount of domain name analysis performed under the DNS in the step 5 is reduced.

Step 6: meanwhile, obtaining a city to which the unknown service type IP belongs based on a voting strategy, and constructing an organization information base of the city based on the domain name and the organization name; according to the characteristics of the domain names of the server IPs obtained in the step 5, acquiring mechanism information corresponding to the domain name of each server IP by using one or more of an online map, a mechanism record base and a mechanism information base matching method, thereby acquiring mapping between the server IPs and mechanism geographical positions and acquiring street-level candidate landmarks;

The street level landmark evaluation method is the prior art, for example, chinese patent application No. 201811338745.3, which describes a method and apparatus for evaluating reliability of Web landmarks based on multi-layer decision-making.

Specifically, because there is a difference between the characteristics of the IPs carrying the same service (for example, the IP carrying the Web service may open 80 ports or 443 ports), the characteristic reduction cannot be performed based on the existing characteristic reduction algorithm, in this embodiment, the step 3 includes the following steps:

the reduction algorithm for SE (IP) is as follows:

if it satisfies

Then IP will be_jDeleted from SE (IP);

until when

All satisfy

For example: SE1(IP) comprises three IPs of IP1, IP2 and IP 3; feature (IP1) includes open port 1 and open port 2; feature (IP2) includes open port 1, open port 2, and open port 4; feature (IP3) includes open port 1 and open port 3; then in the SE1(IP) the,

delete Feature (IP 2); the union of Feature (IP1) and Feature (IP3) is open port 1, open port 2 and open port 3; the SE1(IP) reduced feature set FeatureSet1 is open port 1, open port 2 and open port 3.

The effectiveness of the reduction algorithm is demonstrated below.

280 DNS IPs, 1000 Email server IPs, 900 Web server IPs and 1200 user host IPs are scanned, an open port is extracted from the scanning result to be used as a feature, and the size of the finally obtained training feature is 317 when the feature is not reduced.

After the feature reduction is performed by using the feature reduction method provided by the invention, the size of the obtained training feature is 62. Training by using training features before and after reduction respectively by using an SVM-IP classifier, and when penalty factors C are 2.0, 1.0, 0.5 and 0.2 and Kernel functions Kernel are linear, rbf and sigmoid respectively, the SVM-IP classifier classifies average F of 300 IPs (DNS, Email and 100 IP of a Web server)₁The values are shown in Table 1, with the values in parentheses being the average F obtained from training and classification using the pre-reduced features₁The value is obtained.

TABLE 1 mean F for SVM classification using features before and after reduction, respectively₁Value of

Kernel	linear	rbf	sigmoid
				C＝2.0	0.893831(0.887214)	0.900894(0.902594)	0.888605(0.55851)
C＝1.0	0.900737(0.89625)	0.886828(0.55851)	0.890583(0.542526)
				C＝0.5	0.904237(0.90355)	0.890583(0.540644)	0.596991(0.242038)
C＝0.2	0.906152(0.905421)	0.556114(0.219239)	0.492444(0.203704)

As can be seen from Table 1, the reduced training features are used for classifier training and classification, resulting in an average F₁The value is mostly smaller than the average F obtained using the pre-reduction features₁The values illustrate the effectiveness of the feature reduction algorithm proposed by the present invention.

In this embodiment, constructing the organization information base of the city includes the following steps:

the method comprises the steps of obtaining a POI library and a mechanism directory of a target city from public data sets, obtaining mechanism names and categories of mechanisms, and extracting domain name keywords from the mechanism names; for non-English mechanism names, converting Chinese keywords in the mechanism names into letter combinations, and taking the letter combinations as domain name keywords, thereby realizing domain name keyword estimation of the mechanisms according to the mechanism names of the mechanisms; and associating the domain name keywords with the organization names to construct an organization information base.

The mechanism information base matching method comprises the following steps:

extracting a subdomain name field which implies mechanism information in a domain name of the server IP as an information field, matching the information field with domain name keywords in an mechanism information base, and establishing mapping between the domain name of the server IP and the domain name keywords. If the information field matches multiple agency names, multiple street level candidate landmarks will be obtained.

Obtaining the institution information corresponding to the domain name of the server IP by an online map matching method and an institution record base matching method is the prior art and is not described herein again.

ICANN defines top-level domain names representing various countries (the top-level domain names of countries usually consist of two english letters), and also defines top-level category domain names such as top, com, edu, gov, org; the second-level domain names below the top-level domain name are also generally classified by category, such as education and scientific research second-level domain names, edu,. ca,. com, etc.; in order to quickly obtain the mechanism information field in the domain name, the domain name needs to be classified;

the invention mainly divides the domain name into three categories, the category 1 is top-level domain name such as top, com, edu, gov, org, etc.; class I

The other 2 is secondary domain names such as com, edu, ca, gov, org and the like; category 3 is other domain names;

according to the definition of ICANN,. top represents a business (individuals may also register),. com represents a business,. edu

Denotes an educational institution,. gov denotes a government institution,. org denotes a non-profit organization; second level of representation category under country domain name

Domain names are generally synonymous with those in ICANN, i.e., under the national domain name, com domain name denotes the business entity,. edu

Indicating an educational institution (some countries also use ca for scientific research education institutions),. org for non-profit organizations, and. gov for government departments.

During domain name registration, sub-domain name fields under the domain names are artificially defined, and the sub-domain name fields usually have some correlation with characteristics such as organization names, organization functions, working characteristics and the like and contain organization information, for example, the domain name of harvard university is harvard. Therefore, a subdomain name field which implies organization information is taken as an information field.

Meanwhile, the statistics of 1000 domain names shows that the information field of more than 96% of the domain names does not exceed 10 letters. For English countries, the keywords in the organization name are letter combinations; for non-english countries, the obtained organization name keyword is often not english and needs to be converted into an alphabetic combination similar to english, and when the keyword is converted into an english alphabetic combination by directly using a translation tool, the situation of more than 10 characters often occurs, so that the alphabetic combination obtained by conversion needs to be subjected to deformation processing.

The method comprises the steps of counting the relation between a domain name information field and mechanism characteristics, and extracting domain name keywords from mechanism names when verifying a mechanism information base of a city, wherein the mechanism information base is provided by the invention; and in the mechanism information base matching method, the information field is correspondingly matched with the domain name keyword in the mechanism information base, so that the reasonability of the method of associating the server IP with the mechanism is obtained.

The relationship between the 1000 domain name information fields and the organization characteristics is counted, and the result is shown in table 2.

TABLE 2 statistical results of the relationship between Domain name information fields and organizational characteristics

Features of the mechanism	Name of organization	Function of organization	Working characteristic	Others
					Number of domain names	955	24	15	6
In proportion of	0.955	0.024	0.015	0.006

As can be seen from table 2, in the 1000 domain names of the experiment, there is a relationship between the information field and the organization name of more than 95% of the domain names.

Meanwhile, according to the mechanism names, a mechanism information base corresponding to 3000 domain names is constructed by using the method, the number of the domain names successfully matched with the domain name information field by using the mechanism information base is 2791, and the success rate exceeds 93%. Therefore, it is reasonable to select domain name keywords extracted from organization names to construct an organization information base.

To verify the effectiveness of obtaining reliable street-level landmarks through the present invention, a classifier selection experiment and a street-level landmark obtaining experiment are developed below, and the experimental results are analyzed.

A. Classifier selection experiment

In order to obtain a better classifier to realize the classification of the unknown service type IP, classifiers such as SVM, KNN (K-nearest neighbor algorithm), MLP (Multi-Layer Perception), and the like are trained under multiple parameters to obtain a trained IP classifier, and a harmonic factor F is used₁To determine the quality of each IP classifier on the experimental data set. The parameter settings for training each classifier are as follows:

SVM-IP classifier: and adjusting a penalty factor C, a Kernel function Kernel and precision Tol. The variation range of C is 0.1-10, and the step length is 0.1; kernel is respectively set as linear, rbf and sigmoid; tol was 0.001 and 0.0001 respectively.

KNN-IP classifier: adjusting neighbor node numbers neighbor and neighbor weight calculation modes weight. The value range of the neighbor is 4-22, and the step length is 1; weights are set to uniform and distance, respectively.

MLP-IP classifier: and adjusting the hidden layer number Level, the number Node of hidden layers, an Activation function Activation and the precision Tol. Level is respectively set to be 1 and 2; num ranges from 10 to 100, and the step length is 5; activation is respectively set as identity, logic, tanh and relu; tol was set to 0.001, 0.0001, respectively.

IP classifier training dataset: 1000 DNS servers (500 of which belong to China and 500 of which belong to other countries), 6000 DNS servers (2000 of which belong to China and 4000 of which belong to other countries), 6000 DNS servers (IP), Web servers (IP) and user host computers (IP) respectively.

The IP classifier validates the data set: 100 DNS servers, Email servers, Web servers and user hosts IP (30 of which belong to China and 70 of which belong to other countries) are respectively arranged.

Firstly, an nmap detection tool is used for carrying out port detection on the IPs in the training data set, an open port of each IP is obtained from a detection result, and the open port is used as a classification characteristic. And reducing the classification features of each type of IP by using a feature reduction algorithm, and taking the reduced union of feature sets of each type of IP as a feature set of the training classification model.

F of SVM-IP classifier under different parameters₁The values are shown in fig. 2-7.

Average classification F of SVM-IP classifiers with different parameters₁The value comparison is shown in fig. 8.

As can be seen from FIGS. 2-7, on the experimental data set of the present example, the classification F of the DNS type IP by the SVM-IP classifier₁Class F for Email type IP with highest value₁The value is lowest.

Meanwhile, the adjustment parameter precision Tol has no influence on the classifier, the adjustment penalty factor C has a large influence on the classifier of the nonlinear kernel function, and the error tolerance of the nonlinear kernel function on the experimental data set is small. And the average class F obtained by the linear kernel classifier₁The value is slightly influenced by the penalty factor C, and the data set has better linear separability.

As can be seen from fig. 8, on a given data set, when C is smaller, the classifier of the linear kernel function is better than the classifiers of the other kernel functions, and when C is greater than 3, the influence of the kernel function and C on the classification becomes smaller on the experimental data set of this section.

KNN-IP classifier F under different parameters₁The values are shown in fig. 9-10.

KNN-IP averagely categorizing F of different parameters₁The value comparison is shown in fig. 11.

As can be seen from FIGS. 9 and 10, in the experimental data set of this example, the classification F of the DNS type IP by the KNN-IP classifier₁Class F for Email type IP with highest value₁The value is lowest. Meanwhile, as the number of neighbor nodes increases, the KNN-IP classifier classifies the IP of the DNS and the Web type by F₁Overall trend of values is slightly reduced, while classification F for Email type IP₁The overall trend of the values increased slightly.

From FIG. 11, average F₁The value is less influenced by the change of the neighbor node, and the average classification F of the KNN-IP classifier under the mode of calculating the weight according to the distance₁The value is higher than the mean weight calculation method due to the characteristics of the same type of IPThe features have higher similarity and closer distance in the multidimensional space, which shows that the classification effect of the KNN-IP classifier based on the distance calculation weight is better in the experimental data set of the embodiment.

When the number of hidden layers is 1, the MLP-IP classifier is F under each parameter₁Values As shown in FIGS. 12-19, average F of MLP-IP classifier₁The values are shown in fig. 20.

When the number of hidden layers is 2, the MLP-IP classifier is F under each parameter₁Values As shown in FIGS. 21-28, average F of MLP-IP classifier₁The values are shown in FIG. 29.

From FIGS. 12-29, on the experimental data set of this example, the classification F of DNS type IP by the MLP-IP classifier₁Class F for Email type IP with highest value₁The value is lowest. Meanwhile, adjusting the activation function and precision of the MLP-IP classifier, and changing the number of hidden layers and the number of nodes in the hidden layers to classify F₁The influence of the value is small.

Maximum average F of each classifier on the data set of this example₁The values and corresponding parameter settings are shown in table 3.

TABLE 3 maximum average F for each classifier₁Parameter setting at value

According to Table 3, the MLP-IP classifier obtains the maximum average F when the activation function is relu, the precision is 0.0001, and the number of hidden nodes is 25 and 20, respectively₁The value is obtained.

Therefore, in the subsequent classification experiment of the embodiment, the classifier is used for classification.

B. Street level landmark acquisition experiment

Using a database query method to respectively obtain IP sections of the following cities: beijing, Shanghai, Guangzhou, Shenzhen, hong Kong, Wuhan, Zheng, Chengzhou, Chengdu, Kanbera, south Africa, Nigeria, England, los Angeles, Mexico, city of Mexico, New York, Canada, Ottal, Japan.

And performing open port detection on the obtained urban IP by using an Nmap tool. And classifying DNS, Email and Web server IP in the MLP-IP classifier based on the port detection result by using the trained MLP-IP classifier, and acquiring one or more domain names corresponding to the server IP by using 100 DNS servers which are widely distributed.

And meanwhile, constructing the mechanism information base of the city according to public data such as POI database, enterprise directory and the like. The method comprises the steps of using methods such as online map search, database query, organization information base association and the like to realize association between a domain name of a server IPD and organization information, obtaining Street-Level candidate landmarks, and finally using a Landmark Evaluation method recorded in the text Ruixiang Li, Yuche Sun, Jianwei Hu, Ma Te, Xiangyang Luo, "Street-Level Landmark Evaluation Based on road Routers," Security & Communication Networks, vol.2018, pp.1-12,2018 to evaluate, and obtaining reliable Street-Level landmarks.

Meanwhile, street-level candidate landmarks of the cities are obtained by using a Structon method and an Online Maps method, and the obtained landmarks are evaluated by using the street-level landmark evaluation method in the article. Due to some policies of the Chinese government, map services of non-Chinese companies such as Google are limited in China, and map services of Chinese companies such as Baidu hardly provide domain name information, so that the use of the Online Maps method in cities of China is limited, and the method provided in this chapter is not compared with the method for analysis.

Table 4 lists the number of Web pages crawled in each city using the Structon method.

TABLE 4 number of pages crawled by the Structon method

City	Number of Web pages	City	Number of Web pages	City	Number of Web pages
						Beijing	2648767	Shanghai province	2851134	Guangzhou province	2932164
Shenzhen (Shenzhen medicine)	2564635	Hong Kong	1716904	Wuhan dynasty	2510708
						Zhengzhou province	1963874	All of the achievements	2512227	Kanbeila	2164781
John Nernsberg	1849305	Ladies	1766507	London	2468066
						Los Angeles	3010358	City of Mexico	2785581	New York, New York	3096256
Ottawa	2902743	Chuer (Chinese character of 'Shou' an)	2710160	Tokyo	2923927

The number of reliable street level landmarks obtained using the present invention (abbreviated as "deployed", fig.), the Structon method, and the Online Maps method is shown in fig. 30.

As can be seen from fig. 30, in developed network areas, the number of street level landmarks obtained using the Online Maps method is greater than that of the Structon method, since the map service is more developed in developed network areas, and all Web pages of these areas cannot be obtained in this experiment.

The number of the street-level landmarks obtained by the method provided by the invention is more than that of the Structon method and the Online Maps method, because the method can obtain the landmarks of the Web server and the landmarks of other types of servers, the number of the obtained street-level landmarks is improved.

In each city, 100 accurate street-level landmarks were positioned street-level using all reliable landmarks obtained by each type of method, respectively, and the relationship between the positioning error and the cumulative probability is shown in fig. 31-48.

According to fig. 31-48, the positioning accuracy in the above-mentioned cities is improved using the method of the present invention compared to the Structon method and the Online Maps method. The method is characterized in that the positioning accuracy is positively correlated with the number of reliable landmarks when IP positioning is carried out, for the same area, the more the number of reliable landmarks is, the higher the positioning accuracy is, and the more reliable landmarks are obtained by the method, so that the positioning accuracy of street-level landmarks in cities is improved.

The experiment shows that compared with the existing method for acquiring street-level landmarks based on Web, the method provided by the invention can acquire more street-level landmarks and has stronger applicability to developed regions of different networks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A street-level landmark obtaining method based on service identification and domain name association is characterized in that: the method comprises the following steps:

2. The method of claim 1, wherein the method comprises the steps of: the step 3 comprises the following steps:

the reduction algorithm for SE (IP) is as follows:

if it satisfies

Then IP will be_jDeleted from SE (IP);

until when

All satisfy

3. The method of claim 2, wherein the method comprises the steps of: in the step 6, the establishment of the city institution information base comprises the following steps:

4. The method of claim 3, wherein the method comprises the steps of: in step 6, the method for matching the organization information base includes the following steps: