CN113495892A - Method and device for updating IP address information base - Google Patents

Method and device for updating IP address information base Download PDF

Info

Publication number
CN113495892A
CN113495892A CN202010201990.0A CN202010201990A CN113495892A CN 113495892 A CN113495892 A CN 113495892A CN 202010201990 A CN202010201990 A CN 202010201990A CN 113495892 A CN113495892 A CN 113495892A
Authority
CN
China
Prior art keywords
address
target
information
attribution information
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010201990.0A
Other languages
Chinese (zh)
Inventor
李强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010201990.0A priority Critical patent/CN113495892A/en
Publication of CN113495892A publication Critical patent/CN113495892A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses an updating method and device of an IP address information base, and relates to the technical field of computers. Wherein, the method comprises the following steps: candidate attribution information of the target IP addresses is captured from a plurality of websites according to a capture template; grading the candidate attribution information, and taking the candidate attribution information with the maximum grading value as the credible attribution information of the target IP address; and updating a local IP address information base according to the credible attribution information of the target IP address. Through the steps, the updating efficiency of the IP address information base and the data accuracy of the IP address information base can be improved.

Description

Method and device for updating IP address information base
Technical Field
The invention relates to the technical field of computers, in particular to an updating method and device of an IP address information base.
Background
With the increasing popularity of internet technology, a great amount of online data is generated every day, including browsing data, clicking data, and the like. Regional analysis of online data is a fundamental analytical task.
When a website receives an access request of a user, an Internet Protocol (IP, which is a Protocol for interconnecting networks for short) of the user can be obtained. In the internet, an IP address represents a user's place, and an enterprise usually maintains a home information data table (which may be called an IP address information base, or "IP base") corresponding to the IP address. When regional analysis is carried out, the attribution information corresponding to the IP address can be directly obtained from the IP library.
In the prior art, IP libraries are generally built and updated by the following methods: the IP library published on a certain website is directly downloaded to the local for use, and meanwhile, the latest IP library is periodically and manually downloaded from the website and is updated to the local.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: in the prior art, the updating of the local IP library completely depends on the updating of the IP library published by a certain website, which possibly causes the updating of IP address information to be untimely and influences analysis results, and meanwhile, the updating is carried out by a manual mode, thereby increasing the updating cost of the IP library and reducing the updating efficiency of the IP library; moreover, the local IP library is updated only by means of the IP address query service provided by a certain website, and when the data provided by the website is inaccurate, the address information data of the local IP library is inaccurate.
Disclosure of Invention
In view of this, the present invention provides a method and an apparatus for updating an IP address information base, which can improve the updating efficiency of the IP address information base and the data accuracy of the IP address information base.
To achieve the above object, according to one aspect of the present invention, there is provided an update method of an IP address information base.
The updating method of the IP address information base comprises the following steps: candidate attribution information of the target IP addresses is captured from a plurality of websites according to a capture template; grading the candidate attribution information, and taking the candidate attribution information with the maximum grading value as the credible attribution information of the target IP address; and updating a local IP address information base according to the credible attribution information of the target IP address.
Optionally, the grabbing template comprises: the system comprises websites and regular expressions corresponding to the websites; the crawling of the candidate attribution information of the target IP addresses from the plurality of websites according to the crawling template comprises the following steps: initiating a query request to a website of the website by taking a target IP address as a request parameter, and then receiving a query result returned by the website; and extracting candidate attribution information of the target IP address from the query result according to the regular expression corresponding to the website.
Optionally, the scoring the candidate home information comprises: and calculating the score value of the candidate attribution information according to the weight coefficient of each website capturing the same candidate attribution information.
Optionally, the updating a local IP address information base according to the trusted home information of the target IP address includes: constructing a new IP address section by taking the target IP address as a left boundary IP address and a right boundary IP address; updating the attribution information of the new IP address field to the credible attribution information of the target IP address; dividing the original IP address segment where the target IP address is located into a first IP address segment and a second IP address segment by taking the target IP address as a critical reference value; and the value of the IP address in the first IP address field is smaller than the target IP address, and the value of the IP address in the second IP address field is larger than the target IP address.
Optionally, the method further comprises: before updating a local IP address information base according to the credible attribution information of the target IP address, confirming that the credible attribution information is inconsistent with the attribution information of the target IP address stored in the local IP address information base.
Optionally, the method further comprises: taking the IP address positioned in the middle position of the first IP address field and the IP address positioned in the middle position of the second IP address field as new target IP addresses; and capturing candidate attribution information of the new target IP address from a plurality of websites according to a capturing template.
Optionally, the method further comprises: before the scoring of the candidate attribution information, confirming that the candidate attribution information of the target IP addresses captured from the plurality of websites is not completely consistent.
Optionally, the method further comprises: after the local IP address information base is updated according to the credible attribution information of the target IP address, adjacent IP address fields corresponding to the same attribution information in the IP address information base are merged.
Optionally, the merging the adjacent IP address segments corresponding to the same home location information in the IP address information base includes: polling IP address fields in the IP address information base according to the sequence of the values of the IP addresses from small to large or from large to small; when the attribution information of the current IP address field is different from that of the last IP address field, combining adjacent IP address fields which are positioned in front of the current IP address field and have the same attribution information, and then continuously polling the next IP address field; and when the address information of the current IP address field is the same as that of the previous IP address field, continuing to poll the next IP address field.
To achieve the above object, according to another aspect of the present invention, there is provided an apparatus for updating an IP address information base.
The updating device of the IP address information base comprises: the grabbing module is used for grabbing candidate attribution information of the target IP addresses from a plurality of websites according to the grabbing template; the analysis module is used for grading the candidate attribution information and taking the candidate attribution information with the maximum grading value as the credible attribution information of the target IP address; and the updating module is used for updating a local IP address information base according to the credible attribution information of the target IP address.
To achieve the above object, according to still another aspect of the present invention, there is provided an electronic apparatus.
The electronic device of the present invention includes: one or more processors; and storage means for storing one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors implement the update method of the IP address information base of the present invention.
To achieve the above object, according to still another aspect of the present invention, there is provided a computer-readable medium.
The computer-readable medium of the present invention has stored thereon a computer program which, when executed by a processor, implements the update method of the IP address information base of the present invention.
One embodiment of the above invention has the following advantages or benefits: the method comprises the steps of capturing candidate attribution information of target IP addresses from a plurality of websites according to a capture template, scoring the candidate attribution information, taking the candidate attribution information with the largest scoring value as credible attribution information of the target IP addresses, and updating a local IP address information base according to the credible attribution information of the target IP addresses, so that the updating efficiency of the IP address information base and the data accuracy of the IP address information base can be improved.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a schematic main flow chart of an update method of an IP address information base according to a first embodiment of the present invention;
fig. 2 is a schematic main flow chart of an update method of an IP address information base according to a second embodiment of the present invention;
fig. 3 is a schematic main flowchart of capturing candidate home information of a target IP address according to a third embodiment of the present invention;
FIG. 4 is a schematic diagram of a main process for updating an IP address information base and selecting a new target IP address according to a fourth embodiment of the present invention;
FIG. 5 is a schematic diagram of new target IP address selection using dichotomy;
fig. 6 is a schematic main flow chart of merging IP address segments according to a fifth embodiment of the present invention.
Fig. 7 is a schematic diagram of main blocks of an apparatus for updating an IP address information base according to a sixth embodiment of the present invention;
FIG. 8 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
FIG. 9 is a block diagram of a computer system suitable for use with the electronic device to implement an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
Before describing in detail various embodiments of the present invention, some technical terms related to the embodiments of the present invention will be described.
HttpClient: is a sub-item under Apache Jakarta Common that can be used to provide an efficient, up-to-date, feature-rich client programming toolkit supporting the HTTP protocol, and which supports the latest versions and recommendations of the HTTP protocol.
Fig. 1 is a schematic main flow chart of an update method of an IP address information base according to a first embodiment of the present invention. As shown in fig. 1, the method for updating an IP address information base according to the embodiment of the present invention includes:
and step S101, capturing candidate attribution information of the target IP addresses from a plurality of websites according to the capture template.
Illustratively, the crawl templates may include web addresses of a plurality of pre-configured websites; the websites are specifically websites capable of providing an IP address information query service, and may be websites in an APP (application) form or websites in a Web (Web page) form. For example, a capture template includes a URL (uniform resource locator) of website 1, a URL of website 2, and a URL of website 3.
In this step, a target IP address (which may also be referred to as a "seed IP address" or "seed IP") may be referred to as a request entry, and an access request may be initiated to URLs of a plurality of websites configured in a crawling template to crawl candidate home information of the target IP. In specific implementation, the access request can be simulated based on the http policy technology, and candidate attribution information of the target IP address corresponding to each website can be captured. For example, for the target IP address 1, the candidate attribution information crawled from the website 1 at a time is "sikawa cheng", the candidate attribution information crawled from the website 2 is "sikawa yang", and the candidate attribution information crawled from the website 3 is "sikawa yang"; for the target IP address 2, the candidate home location information extracted from the website 1 at a certain time is "beijing china", the candidate home location information extracted from the website 2 is "beijing china", and the candidate home location information extracted from the website 3 is "beijing china".
And step S102, scoring the candidate attribution information, and taking the candidate attribution information with the maximum scoring value as the credible attribution information of the target IP address.
In this step, all candidate attribution information corresponding to the target IP captured in step S101 may be scored, and then the candidate attribution information having the largest scoring value may be used as the trusted attribution information of the target IP address. For example, if the candidate attribution information corresponding to the target IP address 1 extracted from the website 1 is "sikawa Chengdu", the candidate attribution information corresponding to the target IP address extracted from the website 2 is "sikawa Yang", and the candidate attribution information corresponding to the target IP address extracted from the website 3 is "sikawa Yang", the candidate attribution information "sikawa Chengdu" and "sikawa Yang" may be scored, respectively, and then the larger score of the two may be used as the credible attribution information corresponding to the target IP address 1.
In an alternative embodiment, the score value of the candidate attribution information can be calculated according to the weight coefficient of each website capturing the same candidate attribution information. For example, the weighting coefficients of the websites that capture the same candidate attribution information may be added, and the sum obtained by the addition may be used as the score value of the candidate attribution information. For example, if the weighting factor of the website 1 is 1, the weighting factor of the website 2 is 2, the weighting factor of the website 3 is 3, the candidate attribution information "tetrachuan junior" is extracted from the website 1, and the candidate attribution information "tetrachuan yang" is extracted from the website 2 and the website 3, the score of the candidate attribution information "tetrachuan junior" is 1, the score of the candidate attribution information "tetrachuan junior" is 5, and "tetrachuan junior" with the largest score is used as the credible attribution information of the target IP address.
And step S103, updating a local IP address information base according to the credible attribution information of the target IP address.
The IP address information base is used for storing IP addresses and attribution information corresponding to the IP addresses. In an alternative example, after determining the trusted home information of the target IP address through step S102, the local IP address information base may be updated directly according to the trusted home information of the target IP address.
In another alternative example, after determining the trusted attribution information of the target IP address in step S102, it is further determined whether the trusted attribution information of the target IP address is consistent with the attribution information of the target IP address stored in the IP address information base, and if the trusted attribution information of the target IP address is not consistent with the attribution information of the target IP address, step S103 is executed; if both are identical, step S103 is not required.
In the embodiment of the invention, the candidate attribution information of the target IP address is captured from a plurality of websites according to the capture template, the candidate attribution information is scored, the candidate attribution information with the largest scoring value is taken as the credible attribution information of the target IP address, and the local IP address information base is updated according to the credible attribution information of the target IP address, so that the updating efficiency of the IP address information base can be improved, and the data accuracy of the IP address information base can be improved.
Fig. 2 is a schematic main flow chart of an update method of an IP address information base according to a second embodiment of the present invention. As shown in fig. 2, the method for updating an IP address information base according to the embodiment of the present invention includes:
step S201, candidate attribution information of the target IP addresses is extracted from a plurality of websites according to the extracting template.
Illustratively, the crawl templates may include web addresses of a plurality of pre-configured websites; the websites are specifically websites capable of providing an IP address information query service, and may be websites in an APP (application) form or websites in a Web (Web page) form. For example, a capture template includes a URL (uniform resource locator) of website 1, a URL of website 2, and a URL of website 3.
In this step, a target IP address (which may also be referred to as a "seed IP address" or "seed IP") may be referred to as a request entry, and an access request may be initiated to URLs of a plurality of websites configured in a crawling template to crawl candidate home information of the target IP. In specific implementation, the access request can be simulated based on the http policy technology, and candidate attribution information of the target IP address corresponding to each website can be captured. For example, for the target IP address 1, the candidate attribution information crawled from the website 1 at a time is "sikawa cheng", the candidate attribution information crawled from the website 2 is "sikawa yang", and the candidate attribution information crawled from the website 3 is "sikawa yang"; for the target IP address 2, the candidate home location information extracted from the website 1 at a certain time is "beijing china", the candidate home location information extracted from the website 2 is "beijing china", and the candidate home location information extracted from the website 3 is "beijing china".
Further, before step S201, the method of the embodiment of the present invention may further include the following steps: an initial value of the target IP is determined. Specifically, all IP address fields in the IP address information base may be obtained first, and then the IP address located in the middle of each IP address field may be used as the target IP address. For example, if there are 4 consecutive IP address segments, 125.0.0.0-126.255.255.255, 127.0.0.0-128.255.255.255, 129.0.0.0-129.125.125.125, and 129.125.125.126-129.255.255.255, in the IP address information base, the IP address located in the middle of each IP address segment can be used as the target IP address. In specific implementation, if one IP address is located in the middle of a certain IP address field, the IP address can be used as a target IP address; if there are two IP addresses located in the middle of a certain IP address segment, either one of the two IP addresses may be used as the destination IP address.
Further, before step S201, the method of the embodiment of the present invention may further include the following steps: and configuring the grabbing templates and the weight coefficients of the websites related to the grabbing templates. The weighting coefficient of each website can be determined by adopting a sample statistical method. For example, artificially collecting 100 IP addresses and the actual attribution information of the 100 IP addresses; and then capturing attribution information corresponding to the 100 IP addresses on the websites related to the captured template, comparing the attribution information returned by the websites with the actual attribution information, further counting the accuracy of each website, and then setting a corresponding weight coefficient according to the accuracy of each website. Generally, the higher the accuracy of a website, the higher its corresponding weighting factor is configurable. In a specific example, the value range of the weight coefficient can be set to 1-5, and the higher the weight coefficient is, the more accurate the attribution information corresponding to the IP address queried from the website is.
Step S202, judging whether the candidate attribution information of the target IP addresses captured from a plurality of websites is not completely consistent.
In this step, the candidate attribution information of the target IP addresses retrieved from the plurality of websites may be compared. If the candidate attribution information of the target IP addresses fetched from the plurality of websites completely coincides, step S203 may be performed; in the case where the candidate attribution information of the target IP addresses fetched from the plurality of websites does not completely coincide, step S204 may be performed.
Step S203, the candidate attribution information is used as the credible attribution information of the target IP address.
For example, if the candidate attribution information corresponding to the target IP address 2 extracted from the website 1, the website 2, and the website 3 is "beijing tong", the candidate attribution information "beijing tong" can be directly used as the trusted attribution information corresponding to the target IP address 2.
In the embodiment of the present invention, it is determined through step S202 whether the candidate attribution information of the target IP addresses captured from the plurality of websites is not completely consistent, and if the candidate attribution information is completely consistent, the candidate attribution information is directly used as the trusted attribution information of the target IP address, which is helpful for improving the analysis efficiency of the candidate attribution information, and further is helpful for improving the updating efficiency of the local IP address information base.
And step S204, scoring the candidate attribution information, and taking the candidate attribution information with the maximum scoring value as the credible attribution information of the target IP address.
In this step, all candidate attribution information corresponding to the target IP captured in step S201 may be scored, and then the candidate attribution information having the largest scoring value may be used as the trusted attribution information of the target IP address. For example, if the candidate attribution information corresponding to the target IP address 1 extracted from the website 1 is "sikawa Chengdu", the candidate attribution information corresponding to the target IP address extracted from the website 2 is "sikawa Yang", and the candidate attribution information corresponding to the target IP address extracted from the website 3 is "sikawa Yang", the candidate attribution information "sikawa Chengdu" and "sikawa Yang" may be scored, respectively, and then the larger score of the two may be used as the credible attribution information corresponding to the target IP address 1.
In an alternative embodiment, the score value of the candidate attribution information can be calculated according to the weight coefficient of each website capturing the same candidate attribution information. For example, the weighting coefficients of the websites that capture the same candidate attribution information may be added, and the sum obtained by the addition may be used as the score value of the candidate attribution information. For example, if the weighting factor of the website 1 is 1, the weighting factor of the website 2 is 2, the weighting factor of the website 3 is 3, the candidate attribution information "tetrachuan junior" is extracted from the website 1, and the candidate attribution information "tetrachuan yang" is extracted from the website 2 and the website 3, the score of the candidate attribution information "tetrachuan junior" is 1, the score of the candidate attribution information "tetrachuan junior" is 5, and "tetrachuan junior" with the largest score is used as the credible attribution information of the target IP address.
In the embodiment of the invention, the candidate attribution information of the target IP address on a plurality of websites is captured, the candidate attribution information is graded, and the candidate attribution information with the largest grading value is taken as the credible attribution information of the target IP address, so that the data accuracy of the IP address information base is improved.
And step S205, judging whether the credible attribution information is inconsistent with the attribution information of the target IP in the IP address information base.
In this step, a local IP address information base may be queried according to the target IP address to query the attribution information corresponding to the target IP address, and then the trusted attribution information of the captured target IP address is compared with the attribution information corresponding to the target IP address in the IP address information base, if the two information are consistent, step S207 may be executed; if the two are not consistent, step S206 is executed, i.e., the process is ended.
And step S207, updating a local IP address information base according to the credible attribution information of the target IP address.
In the embodiment of the invention, whether the credible attribution information is inconsistent with the attribution information of the target IP in the IP address information base or not is judged, the local IP address information base is updated under the condition that the credible attribution information is inconsistent with the attribution information of the target IP in the IP address information base, and the local IP address information base is not updated under the condition that the credible attribution information is consistent with the attribution information of the target IP in the IP address information base, so that unnecessary updating operation of the IP address information base is reduced, and the system performance is improved.
And S208, selecting a target IP address of the information from the IP address field of the IP address information base by adopting a dichotomy.
As to how step S208 is implemented, reference may be made to the following description related to the embodiment shown in fig. 4. Further, after step S208, step S201 may be performed again, namely, the candidate attribution information of the new target IP address is crawled from a plurality of websites according to the crawl template.
In the embodiment of the invention, the local IP address information base can be automatically updated through the steps, and compared with the prior art in which the IP address information base is manually updated, the updating efficiency can be improved, and the labor cost for updating is reduced; the credible attribution information is selected by capturing the candidate attribution information of the target IP address from the plurality of websites and grading the candidate attribution information, so that the data accuracy of the IP address information base can be improved compared with the method that a local IP address information base is updated by only depending on the IP address query service provided by a certain website; in addition, the target IP address is selected by adopting a dichotomy, the steps of information capture, grading, updating and the like are carried out on the target IP address, and the steps of information capture, grading, updating and the like are carried out on all IP addresses in the IP address information base, so that the updating efficiency of the IP address information base can be further improved.
Fig. 3 is a schematic main flowchart of capturing candidate home information of a target IP address according to a third embodiment of the present invention. In the embodiment of the invention, the grabbing template comprises the websites of a plurality of websites and regular expressions corresponding to the websites. As shown in fig. 3, in the embodiment of the present invention, the candidate attribution information of the target IP address according to the crawling template mainly includes:
step S301, a query request is initiated to the website address of the website by taking the target IP address as a request parameter.
The website is specifically a website capable of providing an IP address information query service, and may be an APP (application program) website or a Web (Web page) website.
Wherein the target IP may be determined according to the following: and acquiring all IP address fields in the IP address information base, and then taking the IP address positioned in the middle position of each IP address field as a target IP address. For example, if there are 4 consecutive IP address segments, 125.0.0.0-126.255.255.255, 127.0.0.0-128.255.255.255, 129.0.0.0-129.125.125.125, and 129.125.125.126-129.255.255.255, in the IP address information base, the IP address located in the middle of each IP address segment can be used as the target IP address. In specific implementation, if one IP address is located in the middle of a certain IP address field, the IP address can be used as a target IP address; if there are two IP addresses located in the middle of a certain IP address segment, either one of the two IP addresses may be used as the destination IP address.
In this step, the target IP address (which may also be referred to as "seed IP address" or "seed IP") may be referred to as a request, and an access request may be initiated to a website address (such as a URL) of the website configured in the crawl template. In particular, the access request can be simulated based on the http policy technology.
And step S302, receiving a query result returned by the website.
In this step, the query result returned by the website may be HTML (hypertext markup language) fragment content containing candidate home information of the target IP address.
Step S303, extracting candidate attribution information of the target IP address from the query result according to the regular expression corresponding to the website.
In specific implementation, considering that the page structure of each website is usually different, a regular expression corresponding to each website may be set for each website, and candidate attribution information of the target IP address may be extracted based on the regular expression.
In the embodiment of the invention, the candidate attribution information of the target IP address can be automatically captured from a plurality of websites through the steps, so that the updating efficiency is improved.
Fig. 4 is a schematic diagram of a main process for updating an IP address information base and selecting a new target IP address according to a fourth embodiment of the present invention. As shown in fig. 4, in the embodiment of the present invention, updating the IP address information base and selecting a new target IP address mainly include:
step S401, a target IP address is used as a left boundary IP address and a right boundary IP address, and a new IP address segment is constructed.
For example, assuming the target IP address is 0.0.0.40, then 0.0.0.40 as the left border IP address and 0.0.0.40 as the right border IP address, a new IP address segment can be constructed as follows: 0.0.0.40-0.0.0.40.
Step S402, updating the attribution information of the new IP address field to the credible attribution information of the target IP address.
For example, assuming that the new IP address segments constructed through the step S401 are 0.0.0.40-0.0.0.40, the attribution information corresponding to the IP address segments 0.0.0.40-0.0.0.40 can be updated to the trusted attribution information corresponding to the target IP address "0.0.0.40".
Step S403, dividing the original IP address segment into a first IP address segment and a second IP address segment by using the target IP address as a critical reference value.
And the numerical value of the IP address in the first IP address field is smaller than the target IP address, and the numerical value of the IP address in the second IP address field is larger than the target IP address. For example, assuming that the target IP address is 0.0.0.40, and the original IP address field of the target IP address is 0.0.0-0.0.0.100 in the IP address information base, the first IP address field obtained by splitting the original IP address field is 0.0.0.1-0.0.0.39, and the second IP address field 0.0.0.41-0.0.0.100 obtained by splitting the original IP address field.
And S404, taking the IP address positioned in the middle position of the first IP address field and the IP address positioned in the middle position of the second IP address field as new target IP addresses.
For example, assuming that the first IP address segment is 0.0.0.1 ~ 0.0.0.39 and the second IP address segment is 0.0.0.41 ~ 0.0.0.100, 0.0.0.29 located in the middle of the first IP address segment can be used as a new target IP address, and 0.0.0.70 or 0.0.0.71 located in the middle of the second IP address segment can also be used as a new target IP address.
In the embodiment of the invention, the original IP address section is divided into two sections by taking the target IP address as a critical reference value, and the two divided IP address sections are taken as new IP address sections, so that the same attribution information corresponding to the same IP address section in an IP address information base can be ensured.
Fig. 5 is a schematic diagram of new target IP address selection using dichotomy. As shown in fig. 5, with the seed IP (i.e., the target IP address) as the critical reference value, the left part of the original IP address field where the seed IP is located is divided into a first IP address field, and the right part of the original IP address field where the seed IP is located is divided into a second IP address field. And, the IP address located at the middle of the first IP address segment (i.e., the middle IP on the left in the figure) and the IP address located at the middle of the second IP address segment (i.e., the middle IP on the right in the figure) are taken as new seed IPs.
Fig. 6 is a schematic main flow chart of merging IP address segments according to a fifth embodiment of the present invention. In the embodiment of the present invention, the method for updating the IP address information base includes, in addition to the steps shown in fig. 2, the following steps: after updating a local IP address information base according to the credible attribution information of the target IP address, merging adjacent IP address fields corresponding to the same attribution information in the IP address information base. As shown in fig. 6, in the embodiment of the present invention, merging IP address segments mainly includes the following steps:
step S601, polling IP address fields in an IP address information base according to the sequence of the values of the IP addresses from small to large.
For example, assume that the local IP address information base stores the following four IP address fields: 125.0.0.0-126.255.255.255, 127.0.0.0-128.255.255.255, 129.0.0.0-129.125.125.125 and 129.125.125.126-129.255.255.255, the IP address segments 125.0.0.0-126.255.255.255 can be polled first, and then the IP address segments 127.0.0.0-128.255.255.255, 129.0.0.0-129.125.125.125 and 129.125.125.126-129.255.255.255 can be polled in sequence.
In another alternative embodiment, the IP address fields in the IP address information base may also be polled in descending order of the values of the IP addresses.
Step S602, judging whether the attribution information of the current IP address field is different from the attribution information of the last IP address field.
In this step, the home information corresponding to the current IP address field may be compared with the home information corresponding to the last IP address field. Executing step S603 if the home information of the current IP address field is different from the home information of the previous IP address field; in the case where the home information of the current IP address field is the same as that of the previous IP address field, step S604 is performed. For example, assuming that the home information corresponding to the current IP address fields "127.0.0.0-128.255.255.255" is "Sichuan Mianyang" and the home information corresponding to the previous IP address fields "125.0.0.0-126.255.255.255" is "Sichuan Chengdan", step S603 is executed.
Step S603, merging the adjacent IP address segments that are located before the current IP address segment and have the same home information.
In this step, if there are a plurality of adjacent IP address fields (or consecutive IP address fields) that precede the current IP address field and have the same home information, these adjacent IP address fields are merged, and then the process proceeds to step S604. If the last IP address field is different from the attribution information corresponding to the last IP address field, the merging process is not required, and the process continues to step S604.
Step S604, judging whether all IP address fields are polled. If yes, ending the process; if not, step S602 may be executed again.
In the embodiment of the invention, in view of the fact that the update of the local IP address information base possibly causes that the continuous IP addresses corresponding to the same attribution information are not in the same IP address segment, a plurality of continuous IP addresses corresponding to the same attribution information in the IP address information base can be merged into one IP segment through the steps, so that the data volume stored in the whole IP address information base can be reduced, and the query efficiency of the IP address information base is improved.
Fig. 7 is a schematic diagram of main blocks of an apparatus for updating an IP address information base according to a sixth embodiment of the present invention. As shown in fig. 7, an apparatus 700 for updating an IP address information base according to an embodiment of the present invention includes: a grabbing module 701, an analyzing module 702, and an updating module 703.
The fetching module 701 is configured to fetch candidate attribution information of the target IP addresses from multiple websites according to a fetching template.
Illustratively, the crawl templates may include web addresses of a plurality of pre-configured websites; the websites are specifically websites capable of providing an IP address information query service, and may be websites in an APP (application) form or websites in a Web (Web page) form. For example, a capture template includes a URL (uniform resource locator) of website 1, a URL of website 2, and a URL of website 3.
Specifically, the crawling module 701 may refer to a target IP address (which may also be referred to as a "seed IP address" or "seed IP") as a request entry, and initiate an access request to URLs of a plurality of websites configured in a crawling template to crawl candidate home information of the target IP. In specific implementation, the access request can be simulated based on the http policy technology, and candidate attribution information of the target IP address corresponding to each website can be captured. For example, for the target IP address 1, the candidate attribution information crawled from the website 1 at a time is "sikawa cheng", the candidate attribution information crawled from the website 2 is "sikawa yang", and the candidate attribution information crawled from the website 3 is "sikawa yang"; for the target IP address 2, the candidate home location information extracted from the website 1 at a certain time is "beijing china", the candidate home location information extracted from the website 2 is "beijing china", and the candidate home location information extracted from the website 3 is "beijing china".
An analysis module 702, configured to score the candidate attribution information, and use the candidate attribution information with the largest score as the trusted attribution information of the target IP address.
In this step, all candidate attribution information corresponding to the target IP captured by the capturing module 701 may be scored, and then the candidate attribution information with the largest scoring value may be used as the trusted attribution information of the target IP address. For example, if the candidate attribution information corresponding to the target IP address 1 extracted from the website 1 is "sikawa Chengdu", the candidate attribution information corresponding to the target IP address extracted from the website 2 is "sikawa Yang", and the candidate attribution information corresponding to the target IP address extracted from the website 3 is "sikawa Yang", the candidate attribution information "sikawa Chengdu" and "sikawa Yang" may be scored, respectively, and then the larger score of the two may be used as the credible attribution information corresponding to the target IP address 1.
In an alternative embodiment, the analysis module 702 may calculate the score value of the candidate attribution information according to the weighting coefficient of each website capturing the same candidate attribution information. For example, the analysis module 702 may add the weight coefficients of the websites capturing the same candidate attribution information, and use the sum obtained by the addition as the score value of the candidate attribution information. For example, if the weighting factor of the website 1 is 1, the weighting factor of the website 2 is 2, the weighting factor of the website 3 is 3, the candidate attribution information "tetrachuan junior" is extracted from the website 1, and the candidate attribution information "tetrachuan yang" is extracted from the website 2 and the website 3, the score of the candidate attribution information "tetrachuan junior" is 1, the score of the candidate attribution information "tetrachuan junior" is 5, and "tetrachuan junior" with the largest score is used as the credible attribution information of the target IP address.
An updating module 703 is configured to update a local IP address information base according to the trusted home information of the target IP address.
The IP address information base is used for storing IP addresses and attribution information corresponding to the IP addresses. In an alternative example, after determining the trusted home information of the target IP address by the analysis module 702, the update module 703 may update the local IP address information base directly according to the trusted home information of the target IP address.
In another optional example, after determining the trusted attribution information of the target IP address by the analysis module 702, it is further determined whether the trusted attribution information of the target IP address is consistent with the attribution information of the target IP address stored in the IP address information base, and in case that the trusted attribution information of the target IP address is not consistent with the attribution information of the target IP address, the local IP address information base is updated by the update module 703; if the two are consistent, the local IP address information base does not need to be updated by the updating module 703.
In the device provided by the embodiment of the invention, candidate attribution information of the target IP address is grabbed from a plurality of websites through the grabbing module according to the grabbing template, the candidate attribution information is graded through the analysis module, the candidate attribution information with the largest grading value is used as the credible attribution information of the target IP address, and the local IP address information base is updated through the updating module according to the credible attribution information of the target IP address, so that the updating efficiency of the IP address information base can be improved, and the data accuracy of the IP address information base can be improved.
Fig. 8 shows an exemplary system architecture 800 of an update method of an IP address information base or an update apparatus of an IP address information base to which an embodiment of the present invention can be applied.
As shown in fig. 8, the system architecture 800 may include terminal devices 801, 802, 803, a network 804, and a server 805. The network 804 serves to provide a medium for communication links between the terminal devices 801, 802, 803 and the server 805. Network 804 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 801, 802, 803 to interact with a server 805 over a network 804 to receive or send messages or the like. Various communication client applications, such as an IP address information base management application, a shopping application, a web browser application, a search application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 801, 802, 803.
The terminal devices 801, 802, 803 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 805 may be a server that provides various services, such as a background management server that supports an IP address information base management type application or a website browsed by a user using the terminal apparatus 801, 802, 803. The background management server may analyze and otherwise process the received data such as the IP address information base update request, and feed back a processing result (for example, response information that the IP address information base is successfully updated, and the like) to the terminal device.
It should be noted that the method for updating the IP address information base provided by the embodiment of the present invention is generally executed by the server 805, and accordingly, the apparatus for updating the IP address information base is generally disposed in the server 805.
It should be understood that the number of terminal devices, networks, and servers in fig. 8 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 9, shown is a block diagram of a computer system 900 suitable for use in implementing an electronic device of an embodiment of the present invention. The computer system illustrated in FIG. 9 is only an example and should not impose any limitations on the scope of use or functionality of embodiments of the invention.
As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU)901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the system 900 are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The above-described functions defined in the system of the present invention are executed when the computer program is executed by a Central Processing Unit (CPU) 901.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprises a grabbing module, an analyzing module and an updating module. The names of these modules do not in some cases constitute a limitation on the module itself, and for example, the grasping module may also be described as a "module that grasps home information of the target IP address".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to perform the following: candidate attribution information of the target IP addresses is captured from a plurality of websites according to a capture template; grading the candidate attribution information, and taking the candidate attribution information with the maximum grading value as the credible attribution information of the target IP address; and updating a local IP address information base according to the credible attribution information of the target IP address.
According to the technical scheme provided by the embodiment of the invention, the updating efficiency of the IP address information base and the data accuracy of the IP address information base can be improved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (12)

1. A method for updating an IP address information base, the method comprising:
candidate attribution information of the target IP addresses is captured from a plurality of websites according to a capture template;
grading the candidate attribution information, and taking the candidate attribution information with the maximum grading value as the credible attribution information of the target IP address;
and updating a local IP address information base according to the credible attribution information of the target IP address.
2. The method of claim 1, wherein the grasping the template comprises: the system comprises websites and regular expressions corresponding to the websites; the crawling of the candidate attribution information of the target IP addresses from the plurality of websites according to the crawling template comprises the following steps:
initiating a query request to a website of the website by taking a target IP address as a request parameter, and then receiving a query result returned by the website; and extracting candidate attribution information of the target IP address from the query result according to the regular expression corresponding to the website.
3. The method of claim 1, wherein scoring the candidate home information comprises:
and calculating the score value of the candidate attribution information according to the weight coefficient of each website capturing the same candidate attribution information.
4. The method of claim 1, wherein updating a local IP address information base according to trusted home information of the target IP address comprises:
constructing a new IP address section by taking the target IP address as a left boundary IP address and a right boundary IP address; updating the attribution information of the new IP address field to the credible attribution information of the target IP address; dividing the original IP address segment where the target IP address is located into a first IP address segment and a second IP address segment by taking the target IP address as a critical reference value; and the value of the IP address in the first IP address field is smaller than the target IP address, and the value of the IP address in the second IP address field is larger than the target IP address.
5. The method of claim 4, further comprising:
before updating a local IP address information base according to the credible attribution information of the target IP address, confirming that the credible attribution information is inconsistent with the attribution information of the target IP address stored in the local IP address information base.
6. The method of claim 4, further comprising:
taking the IP address positioned in the middle position of the first IP address field and the IP address positioned in the middle position of the second IP address field as new target IP addresses; and capturing candidate attribution information of the new target IP address from a plurality of websites according to a capturing template.
7. The method of claim 1, further comprising:
before the scoring of the candidate attribution information, confirming that the candidate attribution information of the target IP addresses captured from the plurality of websites is not completely consistent.
8. The method of claim 4, further comprising:
after the local IP address information base is updated according to the credible attribution information of the target IP address, adjacent IP address fields corresponding to the same attribution information in the IP address information base are merged.
9. The method of claim 8, wherein the merging the adjacent IP address segments corresponding to the same home information in the IP address information base comprises:
polling IP address fields in the IP address information base according to the sequence of the values of the IP addresses from small to large or from large to small; when the attribution information of the current IP address field is different from that of the last IP address field, combining adjacent IP address fields which are positioned in front of the current IP address field and have the same attribution information, and then continuously polling the next IP address field; and when the address information of the current IP address field is the same as that of the previous IP address field, continuing to poll the next IP address field.
10. An apparatus for updating an IP address information base, the apparatus comprising:
the grabbing module is used for grabbing candidate attribution information of the target IP addresses from a plurality of websites according to the grabbing template;
the analysis module is used for grading the candidate attribution information and taking the candidate attribution information with the maximum grading value as the credible attribution information of the target IP address;
and the updating module is used for updating a local IP address information base according to the credible attribution information of the target IP address.
11. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.
12. A computer-readable medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 9.
CN202010201990.0A 2020-03-20 2020-03-20 Method and device for updating IP address information base Pending CN113495892A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010201990.0A CN113495892A (en) 2020-03-20 2020-03-20 Method and device for updating IP address information base

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010201990.0A CN113495892A (en) 2020-03-20 2020-03-20 Method and device for updating IP address information base

Publications (1)

Publication Number Publication Date
CN113495892A true CN113495892A (en) 2021-10-12

Family

ID=77993932

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010201990.0A Pending CN113495892A (en) 2020-03-20 2020-03-20 Method and device for updating IP address information base

Country Status (1)

Country Link
CN (1) CN113495892A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102665014A (en) * 2012-04-13 2012-09-12 北京搜狗科技发展有限公司 Number information prompting method and system
CN103678676A (en) * 2013-12-25 2014-03-26 乐视网信息技术(北京)股份有限公司 IP library processing method and system
CN104780235A (en) * 2014-01-14 2015-07-15 腾讯科技(深圳)有限公司 IP attribution inquiry method and device and server
CN106096040A (en) * 2016-06-29 2016-11-09 中国人民解放军国防科学技术大学 Organization web ownership place method of discrimination based on search engine and device thereof
CN106357835A (en) * 2016-09-05 2017-01-25 百度在线网络技术(北京)有限公司 Method and device for determining subordinate region of target IP address
CN107172129A (en) * 2017-04-25 2017-09-15 北京潘达互娱科技有限公司 A kind of server collocation method and device
CN107277188A (en) * 2017-06-19 2017-10-20 网宿科技股份有限公司 A kind of method, client, server and operation system for determining IP address attaching information
CN107807976A (en) * 2017-10-25 2018-03-16 世纪龙信息网络有限责任公司 IP attribution inquiry methods and device
CN108875006A (en) * 2018-06-15 2018-11-23 泰康保险集团股份有限公司 Determine method and device regional belonging to IP address
CN109783521A (en) * 2018-12-29 2019-05-21 湖南安数网络有限公司 A kind of IP ownership place determines method, apparatus and computer storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102665014A (en) * 2012-04-13 2012-09-12 北京搜狗科技发展有限公司 Number information prompting method and system
CN103678676A (en) * 2013-12-25 2014-03-26 乐视网信息技术(北京)股份有限公司 IP library processing method and system
CN104780235A (en) * 2014-01-14 2015-07-15 腾讯科技(深圳)有限公司 IP attribution inquiry method and device and server
CN106096040A (en) * 2016-06-29 2016-11-09 中国人民解放军国防科学技术大学 Organization web ownership place method of discrimination based on search engine and device thereof
CN106357835A (en) * 2016-09-05 2017-01-25 百度在线网络技术(北京)有限公司 Method and device for determining subordinate region of target IP address
CN107172129A (en) * 2017-04-25 2017-09-15 北京潘达互娱科技有限公司 A kind of server collocation method and device
CN107277188A (en) * 2017-06-19 2017-10-20 网宿科技股份有限公司 A kind of method, client, server and operation system for determining IP address attaching information
CN107807976A (en) * 2017-10-25 2018-03-16 世纪龙信息网络有限责任公司 IP attribution inquiry methods and device
CN108875006A (en) * 2018-06-15 2018-11-23 泰康保险集团股份有限公司 Determine method and device regional belonging to IP address
CN109783521A (en) * 2018-12-29 2019-05-21 湖南安数网络有限公司 A kind of IP ownership place determines method, apparatus and computer storage medium

Similar Documents

Publication Publication Date Title
CN110019211A (en) The methods, devices and systems of association index
CN109829121B (en) Method and device for reporting click behavior data
CN111190888A (en) Method and device for managing graph database cluster
CN110187880B (en) Method and device for identifying similar elements and computing equipment
CN111038906B (en) Order sorting method and device
CN107908662B (en) Method and device for realizing search system
CN107247798B (en) Method and device for constructing search word bank
CN110851468A (en) Method and device for making simulation response to test request of client
CN111435406A (en) Method and device for correcting database statement spelling errors
CN113760948A (en) Data query method and device
CN111814024A (en) Distributed data acquisition method, system and storage medium
CN107291835B (en) Search term recommendation method and device
CN109002389B (en) Method and device for automatically testing page
CN113760722A (en) Test system and test method
CN110321252B (en) Skill service resource scheduling method and device
CN112947919A (en) Method and device for constructing service model and processing service request
CN113312355A (en) Data management method and device
CN110554951A (en) Method and device for managing embedded points
CN111414523A (en) Data acquisition method and device
CN112667368A (en) Task data processing method and device
CN113590985B (en) Page jump configuration method and device, electronic equipment and computer readable medium
CN113495892A (en) Method and device for updating IP address information base
CN115496544A (en) Data processing method and device
CN113138943B (en) Method and device for processing request
CN113722113A (en) Traffic statistic method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination