CN106603742A - IP address and domain name corresponding relationship update method and device - Google Patents

IP address and domain name corresponding relationship update method and device Download PDF

Info

Publication number
CN106603742A
CN106603742A CN201611155172.1A CN201611155172A CN106603742A CN 106603742 A CN106603742 A CN 106603742A CN 201611155172 A CN201611155172 A CN 201611155172A CN 106603742 A CN106603742 A CN 106603742A
Authority
CN
China
Prior art keywords
domain name
address
natural law
continuously
date
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611155172.1A
Other languages
Chinese (zh)
Other versions
CN106603742B (en
Inventor
刘芳
常思源
刘军
齐勇刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201611155172.1A priority Critical patent/CN106603742B/en
Publication of CN106603742A publication Critical patent/CN106603742A/en
Application granted granted Critical
Publication of CN106603742B publication Critical patent/CN106603742B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]

Abstract

The invention provides an IP address and domain name corresponding relationship update method and device. The method comprises at least one conversation list file is acquired, the at least conversation file is pre-processed to acquire a to-be-stored-in-database data file, the to-be-stored-in-database data file comprises at least one to-be-stored-in-database data record, and each to-be-stored-in-database data record comprises a first IP address, a first domain name and a first data; the first continuous occurrence day quantity of first same domain names corresponding to first same IP addresses of the to-be-stored-in-database data file is determined; according to the first IP address, the first domain name and the first data in the at least one to-be-stored-in-database data record and the first continuous occurrence day quantity, data records of second IP addresses in a database table are updated. Through the method, accuracy of corresponding relationships of the IP addresses and the domain names is improved, and a storage space utilization rate is improved.

Description

The update method and device of a kind of IP address and domain name corresponding relation
Technical field
The present invention relates to the renewal side of technical field of domain name resolution, more particularly to a kind of IP address and domain name corresponding relation Method and device.
Background technology
Agreement (Internet Protocol, the IP) address and domain name interconnected between network is all important Internet resources, Wherein, the corresponding relation of IP address and domain name includes the relation of multi-to-multi, i.e., one IP address can simultaneously correspond to multiple domain names, Or a domain name can also simultaneously correspond to multiple IP address.In practical application, the corresponding relation of some IP address and domain name is long Time keeps stable, and the corresponding relation of some IP address and domain name is changing always.
In the prior art, the corresponding relation of IP address and domain name is stored in IP resources banks, in IP resources banks both Accurate IP address and domain name corresponding relation are stored, invalid and expired IP address and domain name corresponding relation is also stored.With During the analytical data of family, the corresponding relation for reading all IP address and domain name in IP resources banks is needed.
It can be seen that, in the IP resources banks of prior art, the IP address of storage and the corresponding relation of domain name had both included accurate IP Address and the corresponding relation of domain name, also including the corresponding relation of invalid and expired IP address and domain name, this storage IP address With the method for domain name corresponding relation so that user is in analytical data, it is impossible to obtain the correspondence pass of accurate IP address and domain name System.
The content of the invention
The purpose of the embodiment of the present invention is the update method and device for providing a kind of IP address and domain name corresponding relation, with Realize improving the accuracy of IP address and domain name corresponding relation.Concrete technical scheme is as follows:
On the one hand, the embodiment of the invention discloses the update method of a kind of IP address and domain name corresponding relation, methods described Including:
At least one CDR file is obtained, pretreatment is carried out at least one CDR file, obtain database data to be entered File, wherein, the data file to be put in storage includes at least one data record to be put in storage, the data record bag to be put in storage per bar Include:First IP address, the first domain name and the first date;
It is determined that first of identical first domain name of identical first IP address correspondence in the data file to be put in storage continuously goes out Existing natural law, wherein, described first continuously there is natural law for identical first IP address in data file put it is correspondingly identical The natural law occurred in the first domain name continuous date;
The first IP address, the first domain name, the first date in described at least one data record to be put in storage and described First continuously there is natural law, updates the data record that the second IP address is located in database table, wherein, second IP address The data record being located in database table includes:The domain name of second IP address second, the second date and second continuously occur Natural law.
Optionally, at least one CDR file of the acquisition, at least one CDR file pretreatment is carried out, and is obtained Data file to be put in storage, including:
At least one CDR file is obtained, wherein, each CDR file includes at least two column data;
Initial ip address data row and user's request host name information data row are extracted from least two column data, Wherein, the initial ip address and user's request host name information are corresponded;
The initial ip address data row and user's request host name information data row are filtered, IP ground is obtained Location and the column data of first user requesting host name information two;
According to the whitelist file for having stored, from the first user requesting host name information the first domain name is extracted; First IP address is corresponded with first domain name;
The title of at least one CDR file is obtained, wherein, the title includes generating at least one ticket First date of file;
First date correspondence is added in first IP address and first domain name, at least one is generated and is treated Warehouse-in data record;
For the phase same date in first date, by the corresponding all first domain name duplicate removals of same first IP address, Obtain the data file to be put in storage.
It is optionally, described to filter the initial ip address data row and user's request host name information data row, The first IP address and the column data of first object user's request host name information two are obtained, including:
In initial ip address data row and user's request host name information data row, search pre-conditioned In the range of initial ip address and user's request host name information, will search obtain pre-conditioned scope in initial IP ground Location and user's request host name information are deleted, and obtain the first IP address and the columns of first user requesting host name information two According to.
Optionally, the phase same date in first date, by same first IP address corresponding all One domain name duplicate removal, obtains the data file to be put in storage, including:
For the phase same date in first date, by preset algorithm, the corresponding institute of same first IP address is calculated There is corresponding first Hamming distance of the domain name of any two first in the first domain name;
According to first Hamming distance, by preset formula, the character string of the domain name of any two first is obtained First similarity;
Judge first similarity whether more than the first predetermined threshold value;
When judged result is to be, retains the domain name of any two first and put in storage in data file described;
When judged result is no, what arbitrary first domain name in the deletion domain name of any two first was located waits to put in storage Data record, the data file to be put in storage after being updated.
Optionally, first IP address in described at least one data record to be put in storage, the first domain name, first Date and described first continuously there is natural law, update the data record that the second IP address is located in database table, including:
For first IP address, judge in the database table with the presence or absence of the second IP address and IP ground Location is identical;
When judged result is no, the data record to be put in storage and described first that first IP address is located continuously goes out Existing natural law is stored in the database table;
When judged result is to be, update second IP address in the database table corresponding second domain name, the Two dates and second continuously there is natural law.
Optionally, it is described when judged result is to be, update second IP address corresponding in the database table Continuously there is natural law in second domain name, the second date and second, including:
When judged result is to be, second IP address corresponding second domain name and institute in the database table are judged Whether identical state the first domain name;
When corresponding second domain name is differed second IP address with first domain name in the database table, Continuously are there is into natural law in first IP address, first domain name, first date, described first and is stored in the data In the table of storehouse;
When corresponding second domain name is identical with first domain name in the database table for second IP address, more Continuously there is natural law in new second IP address corresponding second domain name, the second date and second in the database table.
Optionally, it is described to judge the second IP address corresponding second domain name and described first in the database table Whether domain name is identical, including:
Calculate the second Hamming distance of first domain name and second domain name;
According to second Hamming distance, by preset formula, the of first domain name and second domain name is determined Two similarities;
When second similarity is more than the second predetermined threshold value, it is judged as first IP address in the database table In corresponding second domain name differ with first domain name;
When second similarity is less than or equal to second predetermined threshold value, it is judged as first IP address in institute State corresponding second domain name in database table identical with first domain name.
Optionally, it is described when second IP address in the database table corresponding second domain name and first domain When name is differed, update second IP address corresponding second domain name, the second date and second in the database table and connect It is continuous natural law occur, including:
When second IP address, corresponding second domain name is differed with first domain name in the database table, and Described first continuously there is natural law when continuously there is natural law less than described second, by first IP address, first domain name, Continuously there is natural law and are stored in the database table in first date, described first;
When second IP address, corresponding second domain name is differed with first domain name in the database table, and Described first continuously there is natural law when continuously there is natural law more than described second, deletes the data note that second IP address is located Record, and natural law are continuously occurred in first IP address, first domain name, first date, described first and be stored in institute In stating database table;
When second IP address in the database table corresponding second domain name and first domain name differ, institute State the second IP address corresponding 3rd domain name in the database table to differ with first domain name, and it is described first continuous When appearance natural law natural law continuously occurs, continuously natural law occur more than the 3rd less than described second, delete the 3rd domain name and be located Data record, and are continuously there is into natural law in first IP address, first domain name, first date, described first In being stored in the database table, wherein, the 3rd domain name is differed with first domain name and second domain name, institute State the 3rd and the continuous appearance that natural law is the data record that the second IP address correspondence the 3rd domain name is located continuously occur Natural law.
Optionally, it is described when second IP address in the database table corresponding second domain name and first domain Famous prime minister simultaneously, updates the second IP address corresponding second domain name, the second date and second in the database table continuous There is natural law, including:
When second IP address, corresponding second domain name is identical with first domain name in the database table, and institute When stating first and natural law continuously occur and continuously natural law occur equal to described second, second IP address is updated in the database table In corresponding second date be first date, update described second and natural law continuously occur and natural law continuously occur for the 3rd, its In, the described 3rd continuously there is the difference that natural law deducts second date equal to first date, continuously goes out with described first The sum of existing natural law;
When second IP address, corresponding second domain name is identical with first domain name in the database table, and institute State first and natural law continuously occur and continuously occur natural law less than described second, described first natural law continuously occur more than the 2nd IP Continuously there is natural law in address the corresponding 3rd, updates the second IP address corresponding second date in the database table and is Continuously there is natural law and natural law continuously occurs for the 4th in first date, renewal described second, deletes the described 3rd continuous appearance The data record that natural law is located, wherein, the described 4th natural law continuously occurs deducts second date equal to first date Difference, with described first continuously occur natural law and;The 3rd domain in described 3rd data record for natural law place continuously occur Name is differed with first domain name.
On the other hand, the embodiment of the invention also discloses the updating device of a kind of IP address and domain name corresponding relation, described Device includes:
Acquiring unit, for obtaining at least one CDR file, at least one CDR file pretreatment is carried out, and is obtained To data file to be put in storage, wherein, the data file to be put in storage includes at least one data record to be put in storage, waits to put in storage per bar Data record includes:First IP address, the first domain name and the first date;
Determining unit, for determining data file put in corresponding identical first domain name of identical first IP address First continuously there is natural law, wherein, described first continuously there is natural law for an identical IP in the data file to be put in storage The natural law that correspondence identical first domain name in address occurs in the continuous date;
Updating block, for the first IP address in described at least one data record to be put in storage, the first domain name, One date and described first continuously there is natural law, update first IP address in database table corresponding second domain name, the Two dates and second continuously there is natural law;Wherein, described second continuously occur natural law by described first continuously occur natural law, first Date and the second date determine.
In the embodiment of the present invention, first, at least one CDR file is obtained, and pre- place is carried out at least one CDR file Reason, obtains data file to be put in storage, secondly, determines the identical first IP address correspondence identical first in the data file to be put in storage Continuously there is natural law in the first of domain name, finally, the first IP address, the first domain at least one data record to be put in storage Continuously there is natural law in name, the first date and first, update the data record that the second IP address is located in database table.It can be seen that, In this programme, continuously there is natural law in the first IP address, the first domain name, the first date and first according to data record to be put in storage, The IP address and domain name corresponding relation in database table is constantly updated, the accuracy of IP address and domain name corresponding relation is improve, Further increase the accuracy of customer analysis data.In the embodiment of the present invention, by invalid and expired IP address and domain name pair Filtration should be related to, the utilization rate of memory space is improve.Certainly, implementing arbitrary product or method of the present invention must be not necessarily required to Reach all the above advantage simultaneously.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of flow chart of the update method of IP address provided in an embodiment of the present invention and domain name corresponding relation;
Fig. 2 is another kind of flow chart of the update method of IP address provided in an embodiment of the present invention and domain name corresponding relation;
Fig. 3 is another kind of flow chart of the update method of IP address provided in an embodiment of the present invention and domain name corresponding relation;
Fig. 4 is a kind of schematic diagram that data record to be put in storage is stored in database table provided in an embodiment of the present invention;
Fig. 5 is another kind of schematic diagram that data record to be put in storage is stored in database table provided in an embodiment of the present invention;
Fig. 6 is another kind of schematic diagram that data record to be put in storage is stored in database table provided in an embodiment of the present invention;
Fig. 7 is another schematic diagram that data record to be put in storage is stored in database table provided in an embodiment of the present invention;
Fig. 8 is provided in an embodiment of the present invention another kind of data record to be put in storage is stored in into another of database table to show It is intended to;
Fig. 9 is provided in an embodiment of the present invention another kind of data record to be put in storage is stored in into another of database table to show It is intended to;
Figure 10 is a kind of structural representation of the updating device of IP address provided in an embodiment of the present invention and domain name corresponding relation Figure;
Figure 11 is the acquiring unit in the updating device of IP address provided in an embodiment of the present invention and domain name corresponding relation Structural representation;
Figure 12 is the updating block in the updating device of IP address provided in an embodiment of the present invention and domain name corresponding relation Structural representation.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than the embodiment of whole.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.
In order to solve prior art problem, the renewal of a kind of IP address and domain name corresponding relation is embodiments provided Method and device, to improve the accuracy of IP address and domain name corresponding relation.
The update method of a kind of IP address provided in an embodiment of the present invention and domain name corresponding relation is situated between first below Continue.
As shown in figure 1, the update method of a kind of IP address and domain name corresponding relation is embodiments provided, including such as Lower step:
S101, obtains at least one CDR file, and at least one CDR file pretreatment is carried out, and obtains waiting to put in storage Data file;
It is understood that the number of CDR file can be multiple, and CDR file one date of correspondence, at least one Individual CDR file is monofile if not same date.Wherein, CDR file includes multi-column data.Pre- place is carried out to CDR file Reason, can obtain the data file to be put in storage required for user.Wherein, pretreatment can include:By preset algorithm process to A few CDR file, obtains data file to be put in storage, and preset algorithm can include:Mapping function (map function), letter Change function (reduce function).
Wherein, the data file to be put in storage includes at least one data record to be put in storage, the data record to be put in storage per bar Including:First IP address, the first domain name and the first date;
Specifically, at least one CDR file of the acquisition, at least one CDR file pretreatment is carried out, and is obtained Data file to be put in storage, including:
At least one CDR file is obtained, wherein, each CDR file includes at least two column data;From described at least two row Extracting data initial ip address data arrange and user's request host name information data row, wherein, the initial ip address and User's request host name information is corresponded;Filter the initial ip address and the row of user's request host name information two Data, obtain the first IP address and the column data of first user requesting host name information two;According to the whitelist file for having stored, The first domain name is extracted from the first user requesting host name information;First IP address is with first domain name one by one Correspondence;The title of at least one CDR file is obtained, wherein, the title includes generating at least one CDR file The first date;First date correspondence is added in first IP address and first domain name, at least one is generated Bar data record to be put in storage;For the phase same date in first date, by same first IP address corresponding all first Domain name duplicate removal, obtains the data file to be put in storage.
In practical application, at least one CDR file for obtaining, initial ip address data row and user's request are extracted Host name information data are arranged, and IP address here is server ip address, i.e. the server address of user's access, wherein, carry In the initial ip address data row and user's request host name information data row that take out, initial ip address and user's request master Machine name information is corresponded, it is, initial ip address one user's request host name information of correspondence, but it is identical Initial ip address be not unique corresponding same user's request host name information, for example, table 1 shows initial ip address number According to row and host data row, user's request host name information is represented by " host ".
Table 1
Generally, in the initial ip address data row and user's request host name information data row that extract from CDR file Both legal IP address and host had been included, also including illegal IP address, IP address of internal network and illegal host.It should be noted that Need exist for filtering initial ip address data row and user's request host name information data row, it is first after being filtered Beginning IP address data are arranged and user's request host name information data row.
Specifically, the filtration initial ip address and the column data of user's request host name information two, obtain First IP address and the column data of first object user's request host name information two, including:
In initial ip address data row and user's request host name information data row, pre-conditioned model is searched Initial ip address and user's request host name information in enclosing, will search the initial ip address in the pre-conditioned scope for obtaining Delete with user's request host name information, obtain the first IP address and the column data of first user requesting host name information two.
It should be noted that illegal IP address and illegal host are collected when generation CDR file carries out data acquisition Wrong illegal IP address and illegal host, so, perform warehouse-in logic (CDR file is input to into the behaviour of database table Make) before, these illegal IP address and illegal host are filtered out, improve the utilization rate of memory space.
Specifically, in a kind of possible implementation of the embodiment of the present invention, pre-conditioned scope includes:Illegal IP ground Location, IP address of internal network, illegal host.
Wherein, illegal IP address includes:Incomplete IP address (for example, 101.226) or initial ip address actually It is not IP address, for example, tel:12679.
This programme is the research for outer net IP address, and correspondence Intranet IP needs all to filter out, wherein, IP address of internal network Including:A class Intranet scope 10.0.0.0 to 10.255.255.255, b class Intranets scope 172.16.0.0 is extremely 172.31.255.255, c class Intranet scope 192.168.0.0 to 192.168.255.255.
In CDR file, there are many illegal host fields, these host fields are mostly due to gathering CDR file When, the incomplete host of generation.For example:" tjajtg, m ", " wap.3xiaren.comhttp:", " www, baido.com ", “www...moc”、“cat.sh.cn.”、“::" etc..But, similar to " test.mzread.com:8080 " although host it is many Port numbers " 8080 ", but " test.mzread.com:8080 " be it is legal, complete, need by host from " test.mzread.com " is extracted.In addition, some host fields are IP address form, this kind of host also can be one Degree is determined from upper reflection host information, therefore do not delete the host that form is IP address and filter.
After illegal IP address in pre-conditioned scope, IP address of internal network and illegal host are filtered, after being filtered Initial ip address data are arranged and user's request host name information data are arranged as the first IP address and first user requesting host The column data of name information two.
In practical application, without host in data file to be put in storage, but the corresponding domain names of the host, it is therefore desirable to from Host extracts domain name, and host represents user's request host name information, and domain name refers to the domain name of user's request, and for example, user please The domain name for seeking host name information " www.sohu.com " is " sohu ", and the domain name of " www.sports.sina.com.cn " is “sports.sina”.It can be seen that, domain name is extracted from host, need to be not belonging to domain name part etc. in accurately identification host endings Character, here, introduces a white list, wherein, white list includes:“com”、“com.cn”“com.co”“com.hk” All common host ending character strings such as " edu.cn " " net " " net.cn ", but not limited to this.This side for extracting domain name Formula, can accurately identify the ending character of host, so as to correct domain name be extracted from host.Generally domain name is used " domain " is representing.
It is that each data record to be put in storage adds a date after extracting domain name in host, wherein, each word The title of monofile includes generating the date of the CDR file, and for example, the title of CDR file is generating date of CDR file Name, or the title of CDR file is with the type of the content of CDR file and date name.Carry in the title of CDR file Take and generate the date of the CDR file, and using the date as the first date, be added to waiting per bar in data file to be put in storage In warehouse-in data record, for example, data record to be put in storage is " 166.88.8.172 baidu ", extracts the data record to be put in storage Date in the title of corresponding CDR file is " 20160102 ", and " 20160102 " are added to into the data record to be put in storage In, data record to be put in storage is updated to " 166.88.8.172 baidu 20160102 ".
It should be noted that data file to be put in storage may be from the CDR file on different dates, so waiting to put in storage The first date can include multiple dates in data record.For example, the data file to be put in storage shown in table 2.
Table 2
166.88.8.172 baidu 20160102
23.44.156.40 sina 20160202
120.25.240.235 sina01 20160303
114.113.101.47 google 20160303
…… …… ……
23.44.156.56 hao123 20130509
…… …… ……
Specifically, as shown in Fig. 2 the phase same date in first date, by same first IP address pair The all first domain name duplicate removals answered, obtaining the concrete steps of the data file to be put in storage includes:
Step 1, for the phase same date in first date, by preset algorithm, calculates same first IP address pair Corresponding first Hamming distance of the domain name of any two first in all first domain names answered;
It will be appreciated that will be in data file be put in storage, in the data record to be put in storage of phase same date, same IP ground The corresponding all first domain name duplicate removals in location, for example, the date has 5 for the data record to be put in storage of " 20160202 ", wherein, IP ground Location is that " 1.1.1.1 " corresponds to altogether 3 domain names, for example:“1.1.1.1 baidu 20160202”、“1.1.1.1 sina 20160202”、“1.1.1.1 google 20160202”。
Here, preset algorithm is:Calculate a character string and be transformed into the character replaced required for another character string Several algorithm, it is preferred that preset algorithm includes:Hamming distance algorithm.Wherein, a domain name is a character string, and two isometric Hamming distance between character string is the number of the kinds of characters of two character string correspondence positions.So, calculated by Hamming distance Method, calculates the Hamming distance between the domain name of any two first in the corresponding all domain names of same first IP address.
For example, two the first domain names of the first IP address correspondence, respectively domain name " sina " and domain name " sina 01 ", According to formula:
The Hamming distance of domain name " sina " and domain name " sina 01 " is calculated, wherein, H is domain name " sina " and domain name " sina 01 " Hamming distance, n is character number in domain name " sina 01 ", it should be noted that n generally takes the character of two domain names Character number corresponding to the more domain name of number, ViFor i-th character in domain name " sina " or domain name " sina 01 ", VjFor domain J-th character in name " sina " or domain name " sina 01 ", it should be noted that work as ViFor the i-th character in domain name " sina " When, work as VjFor the jth character in domain name " sina 01 ", or, work as ViFor the i-th character in domain name " sina 01 " when, work as VjFor Jth character in domain name " sina ".Above-mentioned formula is used to calculate any two in the corresponding all domain names of same first IP address Hamming distance between first domain name, using the Hamming distance as the first Hamming distance.
Step 2, according to first Hamming distance, by preset formula, obtains the word of the domain name of any two first First similarity of symbol string;
Specifically, by preset formula, the first similarity of the character string of the domain name of any two first, for example, domain are obtained The character string of name 1 is P, and the character string of domain name 2 is H for the Hamming distance of T, domain name 1 and domain name 2, according to preset formula:
Obtain the first similarity between corresponding character string P of domain name 1 and corresponding character string T of domain name 2.
Wherein, Adj (P, T) is similar for first between corresponding character string P of domain name 1 and corresponding character string T of domain name 2 Degree, H is the Hamming distance of domain name 1 and domain name 2, and max H are the Hamming distance of maximum first of domain name 1 and domain name 2.
Whether step 3, judge first similarity more than the first predetermined threshold value;
Step 4, when judged result is to be, retains the domain name of any two first and puts in storage in data file described;
Step 5, when judged result is no, deletes what arbitrary first domain name in the domain name of any two first was located Data record to be put in storage, the data file to be put in storage after being updated.
For example, the first predetermined threshold value is A, and Adj (P, T) is corresponding character string P of domain name 1 and corresponding character string T of domain name 2 Between the first similarity, as Adj (P, T)>During A, determine that domain name 1 and domain name 2 are the domain name for differing, to domain name 1 and domain name 2 Any operation is not carried out, i.e., reservation domain name 1 and domain name 2 are in warehouse-in data file.As Adj (P, T)≤A, the He of domain name 1 is determined Domain name 2 is identical domain name, deletes the data record to be put in storage that domain name 1 and domain name 2 are located, the database data to be entered after being updated File.
S102, it is determined that the first company of identical first domain name of identical first IP address correspondence in the data file to be put in storage It is continuous natural law occur;
Wherein, described first continuously there is natural law for identical first IP address in data file put it is correspondingly identical The natural law occurred in the first domain name continuous date;
For example, it is determined that in data file to be put in storage, the first IP address " 1.1.1.1 " corresponds to the company of the first domain name " baidu " The continuous natural law for occurring, when treating that the first IP address " 1.1.1.1 " the first domain name of correspondence " baidu " in data file to be put in storage is located The first date in warehouse-in data record includes:" 20160102 ", " 20160103 " and " 20160104 " occur, then IP ground The natural law of the continuous appearance of location " 1.1.1.1 " the first domain name of correspondence " baidu " is 3.
S103, the first IP address, the first domain name, the first date in described at least one data record to be put in storage and Described first continuously there is natural law, updates the data record that the second IP address is located in database table;
Wherein, the data record that second IP address is located in database table includes:Second IP address second Continuously there is natural law in domain name, the second date and second.
Specifically, as shown in figure 3, first IP address in described at least one data record to be put in storage, Continuously there is natural law in one domain name, the first date and described first, update the data note that the second IP address is located in database table Record, comprises the steps:
Step 1, for first IP address, judges in the database table with the presence or absence of the second IP address and described the One IP address is identical;
It is emphasized that by wait put in storage data file in wait put in storage data record storage to database table when, need The data record to be put in storage in data file to be put in storage is read one by one, and is judged in database table with the presence or absence of the 2nd IP ground Whether location is identical with the first IP address in data record to be put in storage.
Step 2, when judged result is no, the data record to be put in storage and described first that first IP address is located Continuously there is natural law to be stored in the database table;
Specifically, the first IP address and the second IP address are compared, when there is no the second IP address in database table When identical with the first IP address, directly the data record to be put in storage that first IP address is located is stored in database table, Be exactly by the first IP address, the first domain name, the first date storage in the database table.
As shown in figure 4, a kind of signal that data record to be put in storage is stored in database table provided in an embodiment of the present invention Figure, including database table 410, data file to be put in storage 420 and database table 430, wherein, database table 410 is to treat one Warehouse-in data record stores the database table before the database table 410, and data file to be put in storage includes a plurality of database data to be entered Record, database table 430 is that a data record to be put in storage is stored in the database table after the database table 410.Here, First IP address be located data record put be " 1.1.1.1 sina 20160101 ", first IP address " 1.1.1.1 " Do not exist in database table, and it is 1 that natural law continuously occurs in the first of the data record to be put in storage, then the first IP address is located Data record to be put in storage be the first of " 1.1.1.1 sina 20160101 " and the data record to be put in storage natural law continuously occur After storing database table 410, database table 430 is obtained, wherein, include that data record is " 1.1.1.1 in database table 430 sina 20160101 1”。
Step 3, when judged result is to be, updates the second IP address corresponding second domain in the database table Continuously there is natural law in name, the second date and second.
When there is the second IP address in the database table and being identical with the first IP address, need that second IP address exists Data record in database table is updated, that is, update second domain name of second IP address in institute's data record, second Date and second continuously there is natural law.
Specifically, it is described when judged result is to be, update second IP address corresponding in the database table Continuously there is natural law in second domain name, the second date and second, including:
When judged result is to be, second IP address corresponding second domain name and institute in the database table are judged Whether identical state the first domain name;
When corresponding second domain name is differed second IP address with first domain name in the database table, Continuously are there is into natural law in first IP address, first domain name, first date, described first and is stored in the data In the table of storehouse;
When corresponding second domain name is identical with first domain name in the database table for second IP address, more Continuously there is natural law in new second IP address corresponding second domain name, the second date and second in the database table.
The first IP address phase in practical application, in it there is the second IP address and data record to be put in storage in database table Meanwhile, need to judge whether corresponding first domain name of first IP address the second domain name corresponding with the 2nd IP is identical, according to One domain name and the second domain name whether identical judged result, updates second IP address corresponding the in the database table Continuously there is natural law in two domain names, the second date and second.
Wherein, it is described to judge the second IP address corresponding second domain name and first domain in the database table Whether name is identical, including:
Calculate the second Hamming distance of first domain name and second domain name;
According to second Hamming distance, by preset formula, the of first domain name and second domain name is determined Two similarities;
When second similarity is more than the second predetermined threshold value, it is judged as first IP address in the database table In corresponding second domain name differ with first domain name;
When second similarity is less than or equal to second predetermined threshold value, it is judged as first IP address described Corresponding second domain name is identical with first domain name in database table.
For example, there is the first IP address in the second IP address and data record to be put in storage in database table is judged On the basis of identical, judge whether corresponding second domain name of the second IP address the first domain name corresponding with the first IP address is identical, Firstly, it is necessary to calculate the second Hamming distance of the first domain name and the second domain name.
Specifically, the first domain name is " google ", and the second domain name is " baidu ", according to formula:
It is " google " to calculate the first domain name, and the second domain name is second Hamming distance of " baidu ", wherein, H is the first domain Second Hamming distance of name " google " and the second domain name " baidu ", n is the character number of the first domain name or the second domain name, is needed It should be noted that n generally takes the character number corresponding to the more domain name of character number of the first domain name or the second domain name, ViFor I-th character in first domain name or the second domain name, VjFor j-th character in the second domain name or the first domain name, should be noted , work as ViFor the i-th character in the first domain name when, work as VjFor the jth character in the second domain name, or, work as ViFor the second domain name In the i-th character when, work as VjFor the jth character in the first domain name.
Secondly, according to preset formula:
Determine the second similarity of the first domain name " google " and the second domain name " baidu ".Wherein, Adj (P, T) is first Between corresponding character string P of domain name (that is, google) and corresponding character string T of the second domain name (that is, baidu) second is similar Degree, H is the second Hamming distance of the first domain name and the second domain name, and max H are the Chinese of maximum second of the first domain name and the second domain name Prescribed distance.
For example, the second predetermined threshold value is B, as Adj (P, T)>During B, the first domain name and the second domain name are differed;When Adj (P, T)≤B when, the first domain name is identical with the second domain name.
Specifically, it is described when second IP address is in the data in a kind of implementation of the embodiment of the present invention When corresponding second domain name is differed with first domain name in the table of storehouse, renewal second IP address is in the database table Continuously there is natural law in corresponding second domain name, the second date and second, including:
When second IP address, corresponding second domain name is differed with first domain name in the database table, and Described first continuously there is natural law when continuously there is natural law less than described second, by first IP address, first domain name, Continuously there is natural law and are stored in the database table in first date, described first;
As shown in figure 5, provided in an embodiment of the present invention show the another kind that data record to be put in storage is stored in database table It is intended to, including database table 510, data file to be put in storage 520 and database table 530, wherein, database table 510 is by one Data record to be put in storage stores the database table before the database table 510, and data file to be put in storage includes a plurality of number to be put in storage According to record, database table 530 is that a data record to be put in storage is stored in the database table after the database table 510.This In, data record put that the first IP address is located is " 1.1.1.1 sina01 20160106 ", and the first IP address institute Data record to be put in storage first continuously there is natural law for 2, data record " the 1.1.1.1 sina in database table 510 20160101 5 " the second domain name " sina " is different from the first domain name " sina01 ", and second natural law continuously occurs for " 5 ", it is seen then that First natural law " 2 " continuously occurs continuously occurs natural law " 5 " less than second, then the data record to be put in storage the first IP address being located Continuously there is natural law " 2 " storage to data in for " 1.1.1.1 sina01 20160106 " and the data record to be put in storage first After storehouse table 510, database table 530 is obtained, wherein, the data record in database table 530 includes:“1.1.1.1 sina 20160101 5 " and " 1.1.1.1 sina01 20,160,106 2 ".
When second IP address, corresponding second domain name is differed with first domain name in the database table, and Described first continuously there is natural law when continuously there is natural law more than described second, deletes the data note that second IP address is located Record, and natural law are continuously occurred in first IP address, first domain name, first date, described first and be stored in institute In stating database table;
For example, as shown in fig. 6, provided in an embodiment of the present invention be stored in database table by data record to be put in storage Another kind of schematic diagram, including database table 610, data file to be put in storage 620 and database table 630, wherein, database table 610 It is that a data record to be put in storage is stored into database table before the database table 510, data file to be put in storage includes a plurality of Data record to be put in storage, database table 630 is that a data record to be put in storage is stored in the data after the database table 610 Storehouse table.Here, the first IP address be located data record put be " 1.1.1.1 sina01 20160106 ", and this first It is to include in 3, database table 610 that natural law continuously occurs in the first of the data record to be put in storage that IP address is located:Data record " 1.1.1.1 sina 20,160,101 2 ", and the second domain name " sina " in the data record is with the first domain name " sina01 " no Together, second continuously there is natural law for " 2 ", it is seen then that first natural law " 3 " continuously occurs continuously occurs natural law " 2 " more than second, then will The data record to be put in storage that first IP address is located is " 1.1.1.1 sina01 20160106 " and the data record to be put in storage First continuously there is natural law " 3 " storage to after database table 610, and deletes the data record " 1.1.1.1 in database table Sina 20,160,101 2 ", obtains database table 630, wherein, the data record in database table 630 includes:“1.1.1.1 sina01 201601016 3”。
When second IP address in the database table corresponding second domain name and first domain name differ, institute State the second IP address corresponding 3rd domain name in the database table to differ with first domain name, and it is described first continuous When appearance natural law natural law continuously occurs, continuously natural law occur more than the 3rd less than described second, delete the 3rd domain name and be located Data record, and are continuously there is into natural law in first IP address, first domain name, first date, described first In being stored in the database table, wherein, the 3rd domain name is differed with first domain name and second domain name, institute State the 3rd and the continuous appearance that natural law is the data record that the second IP address correspondence the 3rd domain name is located continuously occur Natural law.
Data record to be put in storage is stored in into another of database table shows as shown in fig. 7, provided in an embodiment of the present invention It is intended to, including database table 710, data file to be put in storage 720 and database table 730, wherein, database table 710 is by one Data record to be put in storage stores the database table before the database table 710, and data file to be put in storage includes a plurality of number to be put in storage According to record, database table 730 is that a data record to be put in storage is stored in the database table after the database table 710.This In, data record put that the first IP address is located is " 1.1.1.1 google 20160106 ", and the first IP address institute Data record to be put in storage first continuously there is natural law for 3, database table 710 includes:Data record " 1.1.1.1 sina 20160101 6 " and data record " 1.1.1.1 baidu 20,160,101 2 ", wherein, the second domain name " sina " and the 3rd domain name " baidu " is respectively different from the first domain name " google ", second in data record " 1.1.1.1 sina 20,160,101 6 " Continuously there is natural law for " 6 ", the 3rd of data record " 1.1.1.1 baidu 20,160,101 2 " natural law continuously occurs for " 2 ", It can be seen that, first natural law " 3 " continuously occurs continuously occurs natural law " 6 " less than second, continuously occurs natural law " 2 " more than the 3rd, then will The data record to be put in storage that first IP address is located is " 1.1.1.1 google 20160106 " and the data record to be put in storage First continuously there is natural law " 3 " storage to after database table 710, deletes data record " the 1.1.1.1 baidu in database table 20160101 2 ", database table 730 is obtained, wherein, the data record in database table 730 includes:“1.1.1.1 sina 20160101 6 " and " 1.1.1.1 google 20,160,106 3 ".
Specifically, it is described when second IP address is in the data in a kind of implementation of the embodiment of the present invention When corresponding second domain name is identical with first domain name in the table of storehouse, second IP address is updated right in the database table Continuously there is natural law in the second domain name for answering, the second date and second, including:
When second IP address, corresponding second domain name is identical with first domain name in the database table, and institute When stating first and natural law continuously occur and continuously natural law occur equal to described second, second IP address is updated in the database table In corresponding second date be first date, update described second and natural law continuously occur and natural law continuously occur for the 3rd, its In, the described 3rd continuously there is the difference that natural law deducts second date equal to first date, continuously goes out with described first The sum of existing natural law;
As shown in figure 8, data record to be put in storage is stored in database table again by another kind provided in an embodiment of the present invention A kind of schematic diagram, including database table 810, data file to be put in storage 820 and database table 830, wherein, database table 810 is One data record to be put in storage is stored into the database table before the database table 810, data file to be put in storage includes a plurality for the treatment of Warehouse-in data record, database table 830 is that a data record to be put in storage is stored in the data base after the database table 810 Table.Here, data record put that the first IP address is located is " 1.1.1.1 sina 20160106 ", and an IP It is 3 that natural law continuously occurs in the first of the data record to be put in storage that location is located, and database table 810 includes:Data record Second domain name " sina " of " 1.1.1.1 sina 20,160,101 3 " is identical with the first domain name " sina ", and second day continuously occurs Number is " 3 ", it is seen then that first natural law " 3 " continuously occurs continuously occurs natural law " 3 " equal to second, then by the number in database table 810 According to record " 1.1.1.1 sina 20,160,101 3 " the second date " 20160101 " be updated to the first date " 20160106 ", Second natural law " 3 " continuously occurs is updated to the 3rd and natural law " 8 " continuously occurs, wherein, the 3rd continuously there is natural law equal to first day Phase (20160106) deducts the difference (5) of the second date (20160101), and poor (5) are along with the first sum for natural law " 3 " continuously occur Continuously there is natural law for the 3rd in " 8 ", " 8 ".So, the data record obtained in database table 830 includes:“1.1.1.1 sina 20160106 8”。
When second IP address, corresponding second domain name is identical with first domain name in the database table, and institute State first and natural law continuously occur and continuously occur natural law less than described second, described first natural law continuously occur more than the 2nd IP Continuously there is natural law in address the corresponding 3rd, updates the second IP address corresponding second date in the database table and is Continuously there is natural law and natural law continuously occurs for the 4th in first date, renewal described second, deletes the described 3rd continuous appearance The data record that natural law is located, wherein, the described 4th natural law continuously occurs deducts second date equal to first date Difference, with described first continuously occur natural law and;The 3rd domain in described 3rd data record for natural law place continuously occur Name is differed with first domain name.
As shown in figure 9, data record to be put in storage is stored in database table again by another kind provided in an embodiment of the present invention A kind of schematic diagram, including database table 910, data file to be put in storage 920 and database table 930, wherein, database table 910 is One data record to be put in storage is stored into the database table before the database table 910, data file to be put in storage includes a plurality for the treatment of Warehouse-in data record, database table 930 is that a data record to be put in storage is stored in the data base after the database table 910 Table.Here, data record put that the first IP address is located is " 1.1.1.1 sina 20160106 ", and an IP It is to include in 3, database table 910 that natural law continuously occurs in the first of the data record to be put in storage that location is located:Data record " 1.1.1.1 sina 20,160,101 6 " and data record " 1.1.1.1 baidu 20,160,101 1 ", and data record The second domain name " sina " in " 1.1.1.1 sina 20,160,101 6 " is identical with the first domain name " sina ", data record Continuously there is natural law for " 6 ", data record " 1.1.1.1 baidu in " 1.1.1.1 sina 20,160,101 6 " second 20160101 1 " the 3rd domain name " baidu " is differed with the first domain name, and data record " 1.1.1.1 baidu 20160101 1 " continuously there is natural law for " 2 " in the 3rd in, it is seen then that first natural law " 3 " continuously occurs less than the second continuous appearance Natural law " 6 ", continuously occurs natural law " 2 " more than the 3rd, then update second IP address corresponding in the database table Two dates " 20160101 " are the first date " 20160106 ", and renewal second natural law " 6 " continuously occurs and day continuously occurs for the 4th Number " 11 ", deletes data record " 1.1.1.1 baidu 20,160,101 2 ", wherein, the 4th continuously there is natural law " 11 " equal to the One date " 20160106 " deducts the difference " 5 " of the second date " 20160101 ", poor " 5 " with first continuously occur natural law " 6 " and And " 11 " should be, wherein, the data record in database table 930 includes:“1.1.1.1 sina 20160101 11”.
In the embodiment of the present invention, at least one CDR file is obtained, and pretreatment is carried out at least one CDR file, obtained To data file to be put in storage, first of identical first domain name of identical first IP address correspondence in the data file to be put in storage is determined Continuously there is natural law, the first IP address, the first domain name at least one data record to be put in storage, the first date and first Continuously there is natural law, update the data record that the second IP address is located in database table.In this programme, according to database data to be entered Continuously there is natural law in first IP address of record, the first domain name, the first date and first, constantly update the IP ground in database table Location and domain name corresponding relation, improve the accuracy of IP address and domain name corresponding relation.
As shown in Figure 10, the updating device of a kind of IP address provided in an embodiment of the present invention and domain name corresponding relation, the dress Putting 1000 includes:
Acquiring unit 1010, for obtaining at least one CDR file, at least one CDR file pre- place is carried out Reason, obtains data file to be put in storage, wherein, the data file to be put in storage includes at least one data record to be put in storage, per bar Data record to be put in storage includes:First IP address, the first domain name and the first date;
Determining unit 1020, for determining data file put in identical first IP address correspondingly identical first Continuously there is natural law in the first of domain name, wherein, described first continuously there is natural law in the data file to be put in storage identical the The natural law occurred in identical first domain name of the one IP address correspondence continuous date;
Updating block 1030, for the first IP address in described at least one data record to be put in storage, the first domain Continuously there is natural law in name, the first date and described first, update the first IP address corresponding second domain in database table Continuously there is natural law in name, the second date and second;Wherein, described second continuously there is natural law and day continuously occurs by described first Number, the first date and the second date determine.
Optionally, as shown in figure 11, the acquiring unit 1010 includes:
First obtains subelement 1011, for obtaining at least one CDR file, wherein, each CDR file is included at least Two column data;
First extracts subelement 1012, for extracting initial ip address data row and user from least two column data Requesting host name information data are arranged, wherein, the initial ip address and user's request host name information are corresponded;
Subelement 1013 is filtered, for filtering the initial ip address data row and the user's request host name information Data are arranged, and obtain the first IP address and the column data of first user requesting host name information two;
Second extracts subelement 1014, for according to the whitelist file for having stored, from the first user requesting host The first domain name is extracted in name information;First IP address is corresponded with first domain name;
Second obtains subelement 1015, for obtaining the title of at least one CDR file, wherein, the title bag Include the first date for generating at least one CDR file;
Addition subelement 1016, for first date correspondence to be added to into first IP address and first domain In name, at least one data record to be put in storage is generated;
Duplicate removal subelement 1017, for for the phase same date in first date, by same first IP address correspondence All first domain name duplicate removals, obtain the data file to be put in storage.
Optionally, filter subelement 1013 specifically for,
In initial ip address data row and user's request host name information data row, search pre-conditioned In the range of initial ip address and user's request host name information, will search obtain pre-conditioned scope in initial IP ground Location and user's request host name information are deleted, and obtain the first IP address and the columns of first user requesting host name information two According to.
Optionally, the duplicate removal subelement 1017 specifically for,
For the phase same date in first date, by preset algorithm, the corresponding institute of same first IP address is calculated There is corresponding first Hamming distance of the domain name of any two first in the first domain name;
According to first Hamming distance, by preset formula, the character string of the domain name of any two first is obtained First similarity;
Judge first similarity whether more than the first predetermined threshold value;
When judged result is to be, retains the domain name of any two first and put in storage in data file described;
When judged result is no, what arbitrary first domain name in the deletion domain name of any two first was located waits to put in storage Data record, the data file to be put in storage after being updated.
Optionally, as shown in figure 12, the updating block 1030 includes:
Judgment sub-unit 1031, for for first IP address, judging to whether there is second in the database table IP address is identical with first IP address;
Storing sub-units 1032, for when judged result is no, by the database data to be entered at first IP address place Record and described first continuously there is natural law and be stored in the database table;
Subelement 1033 is updated, for when judged result is to be, updating second IP address in the database table In corresponding second domain name, the second date and second continuously there is natural law.
Optionally, the renewal subelement 1033 is specifically for when judged result is to be, judging second IP address Whether corresponding second domain name is identical with first domain name in the database table;
When corresponding second domain name is differed second IP address with first domain name in the database table, Continuously are there is into natural law in first IP address, first domain name, first date, described first and is stored in the data In the table of storehouse;
When corresponding second domain name is identical with first domain name in the database table for second IP address, more Continuously there is natural law in new second IP address corresponding second domain name, the second date and second in the database table.
Optionally, the renewal subelement 1033 is specifically for calculating the of first domain name and second domain name Two Hamming distances;
According to second Hamming distance, by preset formula, the of first domain name and second domain name is determined Two similarities;
When second similarity is more than the second predetermined threshold value, it is judged as first IP address in the database table In corresponding second domain name differ with first domain name;
When second similarity is less than or equal to second predetermined threshold value, it is judged as first IP address in institute State corresponding second domain name in database table identical with first domain name.
Optionally, it is described renewal subelement 1033 specifically for, when second IP address it is right in the database table The second domain name answered is differed with first domain name, and described first natural law continuously occurs and day continuously occur less than described second During number, are continuously there is into natural law in first IP address, first domain name, first date, described first and is stored in institute In stating database table;
When second IP address, corresponding second domain name is differed with first domain name in the database table, and Described first continuously there is natural law when continuously there is natural law more than described second, deletes the data note that second IP address is located Record, and natural law are continuously occurred in first IP address, first domain name, first date, described first and be stored in institute In stating database table;
When second IP address in the database table corresponding second domain name and first domain name differ, institute State the second IP address corresponding 3rd domain name in the database table to differ with first domain name, and it is described first continuous When appearance natural law natural law continuously occurs, continuously natural law occur more than the 3rd less than described second, delete the 3rd domain name and be located Data record, and are continuously there is into natural law in first IP address, first domain name, first date, described first In being stored in the database table, wherein, the 3rd domain name is differed with first domain name and second domain name, institute State the 3rd and the continuous appearance that natural law is the data record that the second IP address correspondence the 3rd domain name is located continuously occur Natural law.
Optionally, it is described renewal subelement 1033 specifically for, when second IP address it is right in the database table The second domain name answered is identical with first domain name, and described first natural law continuously occurs and natural law continuously occur equal to described second When, it is first date to update second IP address corresponding second date in the database table, updates described the Two natural law continuously occur continuously there is natural law for the 3rd, wherein, the described 3rd natural law continuously occurs subtracts equal to first date Go the difference on second date, with described first continuously occur natural law and;
When second IP address, corresponding second domain name is identical with first domain name in the database table, and institute State first and natural law continuously occur and continuously occur natural law less than described second, described first natural law continuously occur more than the 2nd IP Continuously there is natural law in address the corresponding 3rd, updates the second IP address corresponding second date in the database table and is Continuously there is natural law and natural law continuously occurs for the 4th in first date, renewal described second, deletes the described 3rd continuous appearance The data record that natural law is located, wherein, the described 4th natural law continuously occurs deducts second date equal to first date Difference, with described first continuously occur natural law and;The 3rd domain in described 3rd data record for natural law place continuously occur Name is differed with first domain name.
In the embodiment of the present invention, at least one CDR file is obtained, and pretreatment is carried out at least one CDR file, obtained To data file to be put in storage, first of identical first domain name of identical first IP address correspondence in the data file to be put in storage is determined Continuously there is natural law, the first IP address, the first domain name at least one data record to be put in storage, the first date and first Continuously there is natural law, update the data record that the second IP address is located in database table.In this programme, according to database data to be entered Continuously there is natural law in first IP address of record, the first domain name, the first date and first, constantly update the IP ground in database table Location and domain name corresponding relation, improve the accuracy of IP address and domain name corresponding relation.
For device embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, it is related Part is illustrated referring to the part of embodiment of the method.
It should be noted that herein, such as first and second or the like relational terms are used merely to a reality Body or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or deposit between operating In any this actual relation or order.And, term " including ", "comprising" or its any other variant are intended to Nonexcludability is included, so that a series of process, method, article or equipment including key elements not only will including those Element, but also including other key elements being not expressly set out, or also include for this process, method, article or equipment Intrinsic key element.In the absence of more restrictions, the key element for being limited by sentence "including a ...", it is not excluded that Also there is other identical element in process, method, article or equipment including the key element.
Each embodiment in this specification is described by the way of correlation, identical similar portion between each embodiment Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.Especially for system reality For applying example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method Part explanation.
Presently preferred embodiments of the present invention is the foregoing is only, protection scope of the present invention is not intended to limit.It is all Any modification, equivalent substitution and improvements made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention It is interior.

Claims (10)

1. the update method of a kind of IP address and domain name corresponding relation, it is characterised in that include:
At least one CDR file is obtained, pretreatment is carried out at least one CDR file, obtain data file to be put in storage, Wherein, the data file to be put in storage includes at least one data record to be put in storage, and per bar, data record to be put in storage includes:First IP address, the first domain name and the first date;
It is determined that first of identical first domain name of identical first IP address correspondence in the data file to be put in storage continuously there is day Number, wherein, described first continuously there is natural law for identical first IP address correspondence identical first in the data file to be put in storage The natural law occurred in the domain name continuous date;
The first IP address, the first domain name, the first date and described first in described at least one data record to be put in storage Continuously there is natural law, update the data record that the second IP address is located in database table, wherein, second IP address is being counted Include according to the data record being located in the table of storehouse:Continuously there is natural law in the domain name of second IP address second, the second date and second.
2. method according to claim 1, it is characterised in that the CDR file of the acquisition at least one, to it is described at least One CDR file carries out pretreatment, obtains data file to be put in storage, including:
At least one CDR file is obtained, wherein, each CDR file includes at least two column data;
Initial ip address data row and user's request host name information data row are extracted from least two column data, its In, the initial ip address and user's request host name information are corresponded;
Filter initial ip address data row and user's request host name information data row, obtain the first IP address and The column data of first user requesting host name information two;
According to the whitelist file for having stored, from the first user requesting host name information the first domain name is extracted;It is described First IP address is corresponded with first domain name;
The title of at least one CDR file is obtained, wherein, the title includes generating at least one CDR file The first date;
First date correspondence is added in first IP address and first domain name, at least one is generated and is waited to put in storage Data record;
For the phase same date in first date, the corresponding all first domain name duplicate removals of same first IP address are obtained The data file to be put in storage.
3. method according to claim 2, it is characterised in that the filtration initial ip address data row and the use Family requesting host name information data row, obtain the first IP address and the columns of first object user's request host name information two According to, including:
In initial ip address data row and user's request host name information data row, pre-conditioned scope is searched Interior initial ip address and user's request host name information, by the initial ip address searched in the pre-conditioned scope for obtaining and User's request host name information is deleted, and obtains the first IP address and the column data of first user requesting host name information two.
4. method according to claim 2, it is characterised in that the phase same date in first date, will The corresponding all first domain name duplicate removals of same first IP address, obtain the data file to be put in storage, including:
For the phase same date in first date, by preset algorithm, same first IP address corresponding all the is calculated Corresponding first Hamming distance of the domain name of any two first in one domain name;
According to first Hamming distance, by preset formula, obtain the domain name of any two first character string first Similarity;
Judge first similarity whether more than the first predetermined threshold value;
When judged result is to be, retains the domain name of any two first and put in storage in data file described;
When judged result is no, the database data to be entered that arbitrary first domain name in the domain name of any two first is located is deleted Record, the data file to be put in storage after being updated.
5. method according to claim 1, it is characterised in that described according in described at least one data record to be put in storage The first IP address, the first domain name, the first date and described first continuously there is natural law, update the second IP address in database table The data record at middle place, including:
For first IP address, judge in the database table with the presence or absence of the second IP address and the first IP address phase Together;
When judged result is no, continuously there is day in the data record to be put in storage and described first that first IP address is located Number is stored in the database table;
When judged result is to be, the second IP address corresponding second domain name, second day in the database table are updated Phase and second continuously there is natural law.
6. method according to claim 5, it is characterised in that described when judged result is to be, updates the 2nd IP Continuously there is natural law in corresponding second domain name, the second date and second in the database table for address, including:
When judged result is to be, second IP address corresponding second domain name and described the in the database table is judged Whether one domain name is identical;
When corresponding second domain name is differed second IP address with first domain name in the database table, by institute The first IP address, first domain name are stated, first date, described first natural law is continuously occurred and is stored in the database table In;
When corresponding second domain name is identical with first domain name in the database table for second IP address, institute is updated State the second IP address corresponding second domain name, the second date and second in the database table and natural law continuously occur.
7. method according to claim 6, it is characterised in that the judgement second IP address is in the database table In corresponding second domain name it is whether identical with first domain name, including:
Calculate the second Hamming distance of first domain name and second domain name;
According to second Hamming distance, by preset formula, the second phase of first domain name and second domain name is determined Like degree;
When second similarity is more than the second predetermined threshold value, it is judged as that first IP address is right in the database table The second domain name answered is differed with first domain name;
When second similarity is less than or equal to second predetermined threshold value, it is judged as first IP address in the number It is identical with first domain name according to corresponding second domain name in the table of storehouse.
8. method according to claim 6, it is characterised in that described when second IP address is in the database table When corresponding second domain name is differed with first domain name, second IP address is updated corresponding in the database table Continuously there is natural law in second domain name, the second date and second, including:
When second IP address, corresponding second domain name is differed with first domain name in the database table, and described First continuously there is natural law when continuously there is natural law less than described second, by first IP address, first domain name, described Continuously there is natural law and are stored in the database table in first date, described first;
When second IP address, corresponding second domain name is differed with first domain name in the database table, and described First continuously there is natural law when continuously there is natural law more than described second, deletes the data record that second IP address is located, And natural law are continuously occurred in first IP address, first domain name, first date, described first be stored in the number According in the table of storehouse;
When second IP address in the database table corresponding second domain name and first domain name differ, described Two IP address corresponding 3rd domain name in the database table is differed with first domain name, and the described first continuous appearance When natural law natural law continuously occurs, continuously natural law occur more than the 3rd less than described second, the number that the 3rd domain name is located is deleted According to record, and continuously there is into natural law storage in first IP address, first domain name, first date, described first In the database table, wherein, the 3rd domain name is differed with first domain name and second domain name, and described Three natural law for the continuous appearance that natural law is the data record that the second IP address correspondence the 3rd domain name is located continuously occur.
9. method according to claim 6, it is characterised in that described when second IP address is in the database table When corresponding second domain name is identical with first domain name, second IP address is updated corresponding the in the database table Continuously there is natural law in two domain names, the second date and second, including:
When second IP address, corresponding second domain name is identical with first domain name in the database table, and described One continuously there is natural law when continuously there is natural law equal to described second, updates second IP address right in the database table The second date answered is first date, and renewal described second natural law continuously occurs and natural law continuously occurs for the 3rd, wherein, institute State the 3rd and the difference that natural law deducts second date equal to first date continuously occur, natural law continuously occur with described first Sum;
When second IP address, corresponding second domain name is identical with first domain name in the database table, and described One natural law continuously occurs continuously occurs natural law less than described second, described first natural law continuously occurs more than second IP address Corresponding 3rd continuously there is natural law, and renewal second IP address corresponding second date in the database table is described Continuously there is natural law and natural law continuously occurs for the 4th in first date, renewal described second, deletes the described 3rd and natural law continuously occurs The data record at place, wherein, the described 4th continuously there is the difference that natural law deducts second date equal to first date, With described first continuously occur natural law and;The 3rd domain name and institute in described 3rd data record for natural law place continuously occur State the first domain name to differ.
10. the updating device of a kind of IP address and domain name corresponding relation, it is characterised in that include:
Acquiring unit, for obtaining at least one CDR file, at least one CDR file pretreatment is carried out, and is treated Warehouse-in data file, wherein, the data file to be put in storage includes at least one data record to be put in storage, the database data to be entered per bar Record includes:First IP address, the first domain name and the first date;
Determining unit, for determining data file put in identical first IP address correspondingly identical first domain name the One continuously there is natural law, wherein, described first continuously there is natural law for identical first IP address in the data file to be put in storage The natural law occurred in identical first domain name of the correspondence continuous date;
Updating block, for the first IP address in described at least one data record to be put in storage, the first domain name, first day Phase and described first continuously there is natural law, update the first IP address corresponding second domain name, second day in database table Phase and second continuously there is natural law;Wherein, described second continuously there is natural law and natural law, the first date continuously occurs by described first And the determination of the second date.
CN201611155172.1A 2016-12-14 2016-12-14 A kind of update method and device of IP address and domain name corresponding relationship Active CN106603742B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611155172.1A CN106603742B (en) 2016-12-14 2016-12-14 A kind of update method and device of IP address and domain name corresponding relationship

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611155172.1A CN106603742B (en) 2016-12-14 2016-12-14 A kind of update method and device of IP address and domain name corresponding relationship

Publications (2)

Publication Number Publication Date
CN106603742A true CN106603742A (en) 2017-04-26
CN106603742B CN106603742B (en) 2019-04-26

Family

ID=58801551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611155172.1A Active CN106603742B (en) 2016-12-14 2016-12-14 A kind of update method and device of IP address and domain name corresponding relationship

Country Status (1)

Country Link
CN (1) CN106603742B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107197058A (en) * 2017-07-21 2017-09-22 北京亚鸿世纪科技发展有限公司 A kind of high coverage and accurate domain name IP corresponding relations acquisition methods and device
CN107832406A (en) * 2017-11-03 2018-03-23 北京锐安科技有限公司 Duplicate removal storage method, device, equipment and the storage medium of massive logs data
CN114143332A (en) * 2021-11-03 2022-03-04 阿里巴巴(中国)有限公司 Content delivery network CDN-based processing method, electronic device and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101087253A (en) * 2007-04-04 2007-12-12 华为技术有限公司 Method, device, domain parsing method and device for saving domain system record
CN103220379A (en) * 2013-05-10 2013-07-24 广东睿江科技有限公司 Domain name reverse-resolution method and device
US8549118B2 (en) * 2009-12-10 2013-10-01 At&T Intellectual Property I, L.P. Updating a domain name server with information corresponding to dynamically assigned internet protocol addresses
CN103532852A (en) * 2013-10-11 2014-01-22 小米科技有限责任公司 Routing scheduling method, routing scheduling device and network equipment
CN105763668A (en) * 2016-02-26 2016-07-13 杭州华三通信技术有限公司 Domain name resolution method and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101087253A (en) * 2007-04-04 2007-12-12 华为技术有限公司 Method, device, domain parsing method and device for saving domain system record
US8549118B2 (en) * 2009-12-10 2013-10-01 At&T Intellectual Property I, L.P. Updating a domain name server with information corresponding to dynamically assigned internet protocol addresses
CN103220379A (en) * 2013-05-10 2013-07-24 广东睿江科技有限公司 Domain name reverse-resolution method and device
CN103532852A (en) * 2013-10-11 2014-01-22 小米科技有限责任公司 Routing scheduling method, routing scheduling device and network equipment
CN105763668A (en) * 2016-02-26 2016-07-13 杭州华三通信技术有限公司 Domain name resolution method and apparatus

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107197058A (en) * 2017-07-21 2017-09-22 北京亚鸿世纪科技发展有限公司 A kind of high coverage and accurate domain name IP corresponding relations acquisition methods and device
CN107197058B (en) * 2017-07-21 2019-09-17 北京亚鸿世纪科技发展有限公司 A kind of high coverage and accurate domain name IP corresponding relationship acquisition methods and device
CN107832406A (en) * 2017-11-03 2018-03-23 北京锐安科技有限公司 Duplicate removal storage method, device, equipment and the storage medium of massive logs data
CN107832406B (en) * 2017-11-03 2020-09-11 北京锐安科技有限公司 Method, device, equipment and storage medium for removing duplicate entries of mass log data
CN114143332A (en) * 2021-11-03 2022-03-04 阿里巴巴(中国)有限公司 Content delivery network CDN-based processing method, electronic device and medium

Also Published As

Publication number Publication date
CN106603742B (en) 2019-04-26

Similar Documents

Publication Publication Date Title
CN110489633B (en) Intelligent brain service system based on library data
Ackland Mapping the US political blogosphere: Are conservative bloggers more prominent?
JP3547069B2 (en) Information associating apparatus and method
CN103902653B (en) A kind of method and apparatus for building data warehouse table genetic connection figure
CN104899508B (en) A kind of multistage detection method for phishing site and system
Wang et al. Ranking user's relevance to a topic through link analysis on web logs
CN105447186B (en) A kind of user behavior analysis system based on big data platform
CN106960063A (en) A kind of internet information crawl and commending system for field of inviting outside investment
CN103226618B (en) The related term extracting method excavated based on Data Mart and system
US20130144860A1 (en) System and Method for Automatically Identifying Classified Websites
US20130006975A1 (en) System and method for matching entities and synonym group organizer used therein
CN106603742A (en) IP address and domain name corresponding relationship update method and device
CN110737821B (en) Similar event query method, device, storage medium and terminal equipment
WO2009147185A1 (en) Method for mapping an x500 data model onto a relational database
CN103714120B (en) A kind of system that user interest topic is extracted in the access record from user url
CN111341458B (en) Single-gene disease name recommendation method and system based on multi-level structure similarity
CN107273405B (en) Intelligent retrieval system of electronic medical record files based on MeSH table
CN106776640A (en) A kind of stock information information displaying method and device
CN106250456A (en) Bid winning announcement extraction method and device
WO2015149550A1 (en) Method and apparatus for determining grades of links within website
CN106547764A (en) The method and device of web data duplicate removal
CN108228565A (en) A kind of recognition methods of merchandise news keyword
CN103136223B (en) A kind of excavation has the method and device of the inquiry of similar demands
CN103246697B (en) A kind of method and apparatus for determining nearly justice sequence cluster
CN104462613B (en) Hot spot polymerization and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant