CN106603742A - IP address and domain name corresponding relationship update method and device - Google Patents
IP address and domain name corresponding relationship update method and device Download PDFInfo
- Publication number
- CN106603742A CN106603742A CN201611155172.1A CN201611155172A CN106603742A CN 106603742 A CN106603742 A CN 106603742A CN 201611155172 A CN201611155172 A CN 201611155172A CN 106603742 A CN106603742 A CN 106603742A
- Authority
- CN
- China
- Prior art keywords
- domain name
- address
- natural law
- continuously
- date
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
- H04L61/45—Network directories; Name-to-address mapping
- H04L61/4505—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
- H04L61/4511—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
Abstract
The invention provides an IP address and domain name corresponding relationship update method and device. The method comprises at least one conversation list file is acquired, the at least conversation file is pre-processed to acquire a to-be-stored-in-database data file, the to-be-stored-in-database data file comprises at least one to-be-stored-in-database data record, and each to-be-stored-in-database data record comprises a first IP address, a first domain name and a first data; the first continuous occurrence day quantity of first same domain names corresponding to first same IP addresses of the to-be-stored-in-database data file is determined; according to the first IP address, the first domain name and the first data in the at least one to-be-stored-in-database data record and the first continuous occurrence day quantity, data records of second IP addresses in a database table are updated. Through the method, accuracy of corresponding relationships of the IP addresses and the domain names is improved, and a storage space utilization rate is improved.
Description
Technical field
The present invention relates to the renewal side of technical field of domain name resolution, more particularly to a kind of IP address and domain name corresponding relation
Method and device.
Background technology
Agreement (Internet Protocol, the IP) address and domain name interconnected between network is all important Internet resources,
Wherein, the corresponding relation of IP address and domain name includes the relation of multi-to-multi, i.e., one IP address can simultaneously correspond to multiple domain names,
Or a domain name can also simultaneously correspond to multiple IP address.In practical application, the corresponding relation of some IP address and domain name is long
Time keeps stable, and the corresponding relation of some IP address and domain name is changing always.
In the prior art, the corresponding relation of IP address and domain name is stored in IP resources banks, in IP resources banks both
Accurate IP address and domain name corresponding relation are stored, invalid and expired IP address and domain name corresponding relation is also stored.With
During the analytical data of family, the corresponding relation for reading all IP address and domain name in IP resources banks is needed.
It can be seen that, in the IP resources banks of prior art, the IP address of storage and the corresponding relation of domain name had both included accurate IP
Address and the corresponding relation of domain name, also including the corresponding relation of invalid and expired IP address and domain name, this storage IP address
With the method for domain name corresponding relation so that user is in analytical data, it is impossible to obtain the correspondence pass of accurate IP address and domain name
System.
The content of the invention
The purpose of the embodiment of the present invention is the update method and device for providing a kind of IP address and domain name corresponding relation, with
Realize improving the accuracy of IP address and domain name corresponding relation.Concrete technical scheme is as follows:
On the one hand, the embodiment of the invention discloses the update method of a kind of IP address and domain name corresponding relation, methods described
Including:
At least one CDR file is obtained, pretreatment is carried out at least one CDR file, obtain database data to be entered
File, wherein, the data file to be put in storage includes at least one data record to be put in storage, the data record bag to be put in storage per bar
Include:First IP address, the first domain name and the first date;
It is determined that first of identical first domain name of identical first IP address correspondence in the data file to be put in storage continuously goes out
Existing natural law, wherein, described first continuously there is natural law for identical first IP address in data file put it is correspondingly identical
The natural law occurred in the first domain name continuous date;
The first IP address, the first domain name, the first date in described at least one data record to be put in storage and described
First continuously there is natural law, updates the data record that the second IP address is located in database table, wherein, second IP address
The data record being located in database table includes:The domain name of second IP address second, the second date and second continuously occur
Natural law.
Optionally, at least one CDR file of the acquisition, at least one CDR file pretreatment is carried out, and is obtained
Data file to be put in storage, including:
At least one CDR file is obtained, wherein, each CDR file includes at least two column data;
Initial ip address data row and user's request host name information data row are extracted from least two column data,
Wherein, the initial ip address and user's request host name information are corresponded;
The initial ip address data row and user's request host name information data row are filtered, IP ground is obtained
Location and the column data of first user requesting host name information two;
According to the whitelist file for having stored, from the first user requesting host name information the first domain name is extracted;
First IP address is corresponded with first domain name;
The title of at least one CDR file is obtained, wherein, the title includes generating at least one ticket
First date of file;
First date correspondence is added in first IP address and first domain name, at least one is generated and is treated
Warehouse-in data record;
For the phase same date in first date, by the corresponding all first domain name duplicate removals of same first IP address,
Obtain the data file to be put in storage.
It is optionally, described to filter the initial ip address data row and user's request host name information data row,
The first IP address and the column data of first object user's request host name information two are obtained, including:
In initial ip address data row and user's request host name information data row, search pre-conditioned
In the range of initial ip address and user's request host name information, will search obtain pre-conditioned scope in initial IP ground
Location and user's request host name information are deleted, and obtain the first IP address and the columns of first user requesting host name information two
According to.
Optionally, the phase same date in first date, by same first IP address corresponding all
One domain name duplicate removal, obtains the data file to be put in storage, including:
For the phase same date in first date, by preset algorithm, the corresponding institute of same first IP address is calculated
There is corresponding first Hamming distance of the domain name of any two first in the first domain name;
According to first Hamming distance, by preset formula, the character string of the domain name of any two first is obtained
First similarity;
Judge first similarity whether more than the first predetermined threshold value;
When judged result is to be, retains the domain name of any two first and put in storage in data file described;
When judged result is no, what arbitrary first domain name in the deletion domain name of any two first was located waits to put in storage
Data record, the data file to be put in storage after being updated.
Optionally, first IP address in described at least one data record to be put in storage, the first domain name, first
Date and described first continuously there is natural law, update the data record that the second IP address is located in database table, including:
For first IP address, judge in the database table with the presence or absence of the second IP address and IP ground
Location is identical;
When judged result is no, the data record to be put in storage and described first that first IP address is located continuously goes out
Existing natural law is stored in the database table;
When judged result is to be, update second IP address in the database table corresponding second domain name, the
Two dates and second continuously there is natural law.
Optionally, it is described when judged result is to be, update second IP address corresponding in the database table
Continuously there is natural law in second domain name, the second date and second, including:
When judged result is to be, second IP address corresponding second domain name and institute in the database table are judged
Whether identical state the first domain name;
When corresponding second domain name is differed second IP address with first domain name in the database table,
Continuously are there is into natural law in first IP address, first domain name, first date, described first and is stored in the data
In the table of storehouse;
When corresponding second domain name is identical with first domain name in the database table for second IP address, more
Continuously there is natural law in new second IP address corresponding second domain name, the second date and second in the database table.
Optionally, it is described to judge the second IP address corresponding second domain name and described first in the database table
Whether domain name is identical, including:
Calculate the second Hamming distance of first domain name and second domain name;
According to second Hamming distance, by preset formula, the of first domain name and second domain name is determined
Two similarities;
When second similarity is more than the second predetermined threshold value, it is judged as first IP address in the database table
In corresponding second domain name differ with first domain name;
When second similarity is less than or equal to second predetermined threshold value, it is judged as first IP address in institute
State corresponding second domain name in database table identical with first domain name.
Optionally, it is described when second IP address in the database table corresponding second domain name and first domain
When name is differed, update second IP address corresponding second domain name, the second date and second in the database table and connect
It is continuous natural law occur, including:
When second IP address, corresponding second domain name is differed with first domain name in the database table, and
Described first continuously there is natural law when continuously there is natural law less than described second, by first IP address, first domain name,
Continuously there is natural law and are stored in the database table in first date, described first;
When second IP address, corresponding second domain name is differed with first domain name in the database table, and
Described first continuously there is natural law when continuously there is natural law more than described second, deletes the data note that second IP address is located
Record, and natural law are continuously occurred in first IP address, first domain name, first date, described first and be stored in institute
In stating database table;
When second IP address in the database table corresponding second domain name and first domain name differ, institute
State the second IP address corresponding 3rd domain name in the database table to differ with first domain name, and it is described first continuous
When appearance natural law natural law continuously occurs, continuously natural law occur more than the 3rd less than described second, delete the 3rd domain name and be located
Data record, and are continuously there is into natural law in first IP address, first domain name, first date, described first
In being stored in the database table, wherein, the 3rd domain name is differed with first domain name and second domain name, institute
State the 3rd and the continuous appearance that natural law is the data record that the second IP address correspondence the 3rd domain name is located continuously occur
Natural law.
Optionally, it is described when second IP address in the database table corresponding second domain name and first domain
Famous prime minister simultaneously, updates the second IP address corresponding second domain name, the second date and second in the database table continuous
There is natural law, including:
When second IP address, corresponding second domain name is identical with first domain name in the database table, and institute
When stating first and natural law continuously occur and continuously natural law occur equal to described second, second IP address is updated in the database table
In corresponding second date be first date, update described second and natural law continuously occur and natural law continuously occur for the 3rd, its
In, the described 3rd continuously there is the difference that natural law deducts second date equal to first date, continuously goes out with described first
The sum of existing natural law;
When second IP address, corresponding second domain name is identical with first domain name in the database table, and institute
State first and natural law continuously occur and continuously occur natural law less than described second, described first natural law continuously occur more than the 2nd IP
Continuously there is natural law in address the corresponding 3rd, updates the second IP address corresponding second date in the database table and is
Continuously there is natural law and natural law continuously occurs for the 4th in first date, renewal described second, deletes the described 3rd continuous appearance
The data record that natural law is located, wherein, the described 4th natural law continuously occurs deducts second date equal to first date
Difference, with described first continuously occur natural law and;The 3rd domain in described 3rd data record for natural law place continuously occur
Name is differed with first domain name.
On the other hand, the embodiment of the invention also discloses the updating device of a kind of IP address and domain name corresponding relation, described
Device includes:
Acquiring unit, for obtaining at least one CDR file, at least one CDR file pretreatment is carried out, and is obtained
To data file to be put in storage, wherein, the data file to be put in storage includes at least one data record to be put in storage, waits to put in storage per bar
Data record includes:First IP address, the first domain name and the first date;
Determining unit, for determining data file put in corresponding identical first domain name of identical first IP address
First continuously there is natural law, wherein, described first continuously there is natural law for an identical IP in the data file to be put in storage
The natural law that correspondence identical first domain name in address occurs in the continuous date;
Updating block, for the first IP address in described at least one data record to be put in storage, the first domain name,
One date and described first continuously there is natural law, update first IP address in database table corresponding second domain name, the
Two dates and second continuously there is natural law;Wherein, described second continuously occur natural law by described first continuously occur natural law, first
Date and the second date determine.
In the embodiment of the present invention, first, at least one CDR file is obtained, and pre- place is carried out at least one CDR file
Reason, obtains data file to be put in storage, secondly, determines the identical first IP address correspondence identical first in the data file to be put in storage
Continuously there is natural law in the first of domain name, finally, the first IP address, the first domain at least one data record to be put in storage
Continuously there is natural law in name, the first date and first, update the data record that the second IP address is located in database table.It can be seen that,
In this programme, continuously there is natural law in the first IP address, the first domain name, the first date and first according to data record to be put in storage,
The IP address and domain name corresponding relation in database table is constantly updated, the accuracy of IP address and domain name corresponding relation is improve,
Further increase the accuracy of customer analysis data.In the embodiment of the present invention, by invalid and expired IP address and domain name pair
Filtration should be related to, the utilization rate of memory space is improve.Certainly, implementing arbitrary product or method of the present invention must be not necessarily required to
Reach all the above advantage simultaneously.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of flow chart of the update method of IP address provided in an embodiment of the present invention and domain name corresponding relation;
Fig. 2 is another kind of flow chart of the update method of IP address provided in an embodiment of the present invention and domain name corresponding relation;
Fig. 3 is another kind of flow chart of the update method of IP address provided in an embodiment of the present invention and domain name corresponding relation;
Fig. 4 is a kind of schematic diagram that data record to be put in storage is stored in database table provided in an embodiment of the present invention;
Fig. 5 is another kind of schematic diagram that data record to be put in storage is stored in database table provided in an embodiment of the present invention;
Fig. 6 is another kind of schematic diagram that data record to be put in storage is stored in database table provided in an embodiment of the present invention;
Fig. 7 is another schematic diagram that data record to be put in storage is stored in database table provided in an embodiment of the present invention;
Fig. 8 is provided in an embodiment of the present invention another kind of data record to be put in storage is stored in into another of database table to show
It is intended to;
Fig. 9 is provided in an embodiment of the present invention another kind of data record to be put in storage is stored in into another of database table to show
It is intended to;
Figure 10 is a kind of structural representation of the updating device of IP address provided in an embodiment of the present invention and domain name corresponding relation
Figure;
Figure 11 is the acquiring unit in the updating device of IP address provided in an embodiment of the present invention and domain name corresponding relation
Structural representation;
Figure 12 is the updating block in the updating device of IP address provided in an embodiment of the present invention and domain name corresponding relation
Structural representation.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than the embodiment of whole.It is based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made
Embodiment, belongs to the scope of protection of the invention.
In order to solve prior art problem, the renewal of a kind of IP address and domain name corresponding relation is embodiments provided
Method and device, to improve the accuracy of IP address and domain name corresponding relation.
The update method of a kind of IP address provided in an embodiment of the present invention and domain name corresponding relation is situated between first below
Continue.
As shown in figure 1, the update method of a kind of IP address and domain name corresponding relation is embodiments provided, including such as
Lower step:
S101, obtains at least one CDR file, and at least one CDR file pretreatment is carried out, and obtains waiting to put in storage
Data file;
It is understood that the number of CDR file can be multiple, and CDR file one date of correspondence, at least one
Individual CDR file is monofile if not same date.Wherein, CDR file includes multi-column data.Pre- place is carried out to CDR file
Reason, can obtain the data file to be put in storage required for user.Wherein, pretreatment can include:By preset algorithm process to
A few CDR file, obtains data file to be put in storage, and preset algorithm can include:Mapping function (map function), letter
Change function (reduce function).
Wherein, the data file to be put in storage includes at least one data record to be put in storage, the data record to be put in storage per bar
Including:First IP address, the first domain name and the first date;
Specifically, at least one CDR file of the acquisition, at least one CDR file pretreatment is carried out, and is obtained
Data file to be put in storage, including:
At least one CDR file is obtained, wherein, each CDR file includes at least two column data;From described at least two row
Extracting data initial ip address data arrange and user's request host name information data row, wherein, the initial ip address and
User's request host name information is corresponded;Filter the initial ip address and the row of user's request host name information two
Data, obtain the first IP address and the column data of first user requesting host name information two;According to the whitelist file for having stored,
The first domain name is extracted from the first user requesting host name information;First IP address is with first domain name one by one
Correspondence;The title of at least one CDR file is obtained, wherein, the title includes generating at least one CDR file
The first date;First date correspondence is added in first IP address and first domain name, at least one is generated
Bar data record to be put in storage;For the phase same date in first date, by same first IP address corresponding all first
Domain name duplicate removal, obtains the data file to be put in storage.
In practical application, at least one CDR file for obtaining, initial ip address data row and user's request are extracted
Host name information data are arranged, and IP address here is server ip address, i.e. the server address of user's access, wherein, carry
In the initial ip address data row and user's request host name information data row that take out, initial ip address and user's request master
Machine name information is corresponded, it is, initial ip address one user's request host name information of correspondence, but it is identical
Initial ip address be not unique corresponding same user's request host name information, for example, table 1 shows initial ip address number
According to row and host data row, user's request host name information is represented by " host ".
Table 1
Generally, in the initial ip address data row and user's request host name information data row that extract from CDR file
Both legal IP address and host had been included, also including illegal IP address, IP address of internal network and illegal host.It should be noted that
Need exist for filtering initial ip address data row and user's request host name information data row, it is first after being filtered
Beginning IP address data are arranged and user's request host name information data row.
Specifically, the filtration initial ip address and the column data of user's request host name information two, obtain
First IP address and the column data of first object user's request host name information two, including:
In initial ip address data row and user's request host name information data row, pre-conditioned model is searched
Initial ip address and user's request host name information in enclosing, will search the initial ip address in the pre-conditioned scope for obtaining
Delete with user's request host name information, obtain the first IP address and the column data of first user requesting host name information two.
It should be noted that illegal IP address and illegal host are collected when generation CDR file carries out data acquisition
Wrong illegal IP address and illegal host, so, perform warehouse-in logic (CDR file is input to into the behaviour of database table
Make) before, these illegal IP address and illegal host are filtered out, improve the utilization rate of memory space.
Specifically, in a kind of possible implementation of the embodiment of the present invention, pre-conditioned scope includes:Illegal IP ground
Location, IP address of internal network, illegal host.
Wherein, illegal IP address includes:Incomplete IP address (for example, 101.226) or initial ip address actually
It is not IP address, for example, tel:12679.
This programme is the research for outer net IP address, and correspondence Intranet IP needs all to filter out, wherein, IP address of internal network
Including:A class Intranet scope 10.0.0.0 to 10.255.255.255, b class Intranets scope 172.16.0.0 is extremely
172.31.255.255, c class Intranet scope 192.168.0.0 to 192.168.255.255.
In CDR file, there are many illegal host fields, these host fields are mostly due to gathering CDR file
When, the incomplete host of generation.For example:" tjajtg, m ", " wap.3xiaren.comhttp:", " www, baido.com ",
“www...moc”、“cat.sh.cn.”、“::" etc..But, similar to " test.mzread.com:8080 " although host it is many
Port numbers " 8080 ", but " test.mzread.com:8080 " be it is legal, complete, need by host from
" test.mzread.com " is extracted.In addition, some host fields are IP address form, this kind of host also can be one
Degree is determined from upper reflection host information, therefore do not delete the host that form is IP address and filter.
After illegal IP address in pre-conditioned scope, IP address of internal network and illegal host are filtered, after being filtered
Initial ip address data are arranged and user's request host name information data are arranged as the first IP address and first user requesting host
The column data of name information two.
In practical application, without host in data file to be put in storage, but the corresponding domain names of the host, it is therefore desirable to from
Host extracts domain name, and host represents user's request host name information, and domain name refers to the domain name of user's request, and for example, user please
The domain name for seeking host name information " www.sohu.com " is " sohu ", and the domain name of " www.sports.sina.com.cn " is
“sports.sina”.It can be seen that, domain name is extracted from host, need to be not belonging to domain name part etc. in accurately identification host endings
Character, here, introduces a white list, wherein, white list includes:“com”、“com.cn”“com.co”“com.hk”
All common host ending character strings such as " edu.cn " " net " " net.cn ", but not limited to this.This side for extracting domain name
Formula, can accurately identify the ending character of host, so as to correct domain name be extracted from host.Generally domain name is used
" domain " is representing.
It is that each data record to be put in storage adds a date after extracting domain name in host, wherein, each word
The title of monofile includes generating the date of the CDR file, and for example, the title of CDR file is generating date of CDR file
Name, or the title of CDR file is with the type of the content of CDR file and date name.Carry in the title of CDR file
Take and generate the date of the CDR file, and using the date as the first date, be added to waiting per bar in data file to be put in storage
In warehouse-in data record, for example, data record to be put in storage is " 166.88.8.172 baidu ", extracts the data record to be put in storage
Date in the title of corresponding CDR file is " 20160102 ", and " 20160102 " are added to into the data record to be put in storage
In, data record to be put in storage is updated to " 166.88.8.172 baidu 20160102 ".
It should be noted that data file to be put in storage may be from the CDR file on different dates, so waiting to put in storage
The first date can include multiple dates in data record.For example, the data file to be put in storage shown in table 2.
Table 2
166.88.8.172 | baidu | 20160102 |
23.44.156.40 | sina | 20160202 |
120.25.240.235 | sina01 | 20160303 |
114.113.101.47 | 20160303 | |
…… | …… | …… |
23.44.156.56 | hao123 | 20130509 |
…… | …… | …… |
Specifically, as shown in Fig. 2 the phase same date in first date, by same first IP address pair
The all first domain name duplicate removals answered, obtaining the concrete steps of the data file to be put in storage includes:
Step 1, for the phase same date in first date, by preset algorithm, calculates same first IP address pair
Corresponding first Hamming distance of the domain name of any two first in all first domain names answered;
It will be appreciated that will be in data file be put in storage, in the data record to be put in storage of phase same date, same IP ground
The corresponding all first domain name duplicate removals in location, for example, the date has 5 for the data record to be put in storage of " 20160202 ", wherein, IP ground
Location is that " 1.1.1.1 " corresponds to altogether 3 domain names, for example:“1.1.1.1 baidu 20160202”、“1.1.1.1 sina
20160202”、“1.1.1.1 google 20160202”。
Here, preset algorithm is:Calculate a character string and be transformed into the character replaced required for another character string
Several algorithm, it is preferred that preset algorithm includes:Hamming distance algorithm.Wherein, a domain name is a character string, and two isometric
Hamming distance between character string is the number of the kinds of characters of two character string correspondence positions.So, calculated by Hamming distance
Method, calculates the Hamming distance between the domain name of any two first in the corresponding all domain names of same first IP address.
For example, two the first domain names of the first IP address correspondence, respectively domain name " sina " and domain name " sina 01 ",
According to formula:
The Hamming distance of domain name " sina " and domain name " sina 01 " is calculated, wherein, H is domain name " sina " and domain name " sina
01 " Hamming distance, n is character number in domain name " sina 01 ", it should be noted that n generally takes the character of two domain names
Character number corresponding to the more domain name of number, ViFor i-th character in domain name " sina " or domain name " sina 01 ", VjFor domain
J-th character in name " sina " or domain name " sina 01 ", it should be noted that work as ViFor the i-th character in domain name " sina "
When, work as VjFor the jth character in domain name " sina 01 ", or, work as ViFor the i-th character in domain name " sina 01 " when, work as VjFor
Jth character in domain name " sina ".Above-mentioned formula is used to calculate any two in the corresponding all domain names of same first IP address
Hamming distance between first domain name, using the Hamming distance as the first Hamming distance.
Step 2, according to first Hamming distance, by preset formula, obtains the word of the domain name of any two first
First similarity of symbol string;
Specifically, by preset formula, the first similarity of the character string of the domain name of any two first, for example, domain are obtained
The character string of name 1 is P, and the character string of domain name 2 is H for the Hamming distance of T, domain name 1 and domain name 2, according to preset formula:
Obtain the first similarity between corresponding character string P of domain name 1 and corresponding character string T of domain name 2.
Wherein, Adj (P, T) is similar for first between corresponding character string P of domain name 1 and corresponding character string T of domain name 2
Degree, H is the Hamming distance of domain name 1 and domain name 2, and max H are the Hamming distance of maximum first of domain name 1 and domain name 2.
Whether step 3, judge first similarity more than the first predetermined threshold value;
Step 4, when judged result is to be, retains the domain name of any two first and puts in storage in data file described;
Step 5, when judged result is no, deletes what arbitrary first domain name in the domain name of any two first was located
Data record to be put in storage, the data file to be put in storage after being updated.
For example, the first predetermined threshold value is A, and Adj (P, T) is corresponding character string P of domain name 1 and corresponding character string T of domain name 2
Between the first similarity, as Adj (P, T)>During A, determine that domain name 1 and domain name 2 are the domain name for differing, to domain name 1 and domain name 2
Any operation is not carried out, i.e., reservation domain name 1 and domain name 2 are in warehouse-in data file.As Adj (P, T)≤A, the He of domain name 1 is determined
Domain name 2 is identical domain name, deletes the data record to be put in storage that domain name 1 and domain name 2 are located, the database data to be entered after being updated
File.
S102, it is determined that the first company of identical first domain name of identical first IP address correspondence in the data file to be put in storage
It is continuous natural law occur;
Wherein, described first continuously there is natural law for identical first IP address in data file put it is correspondingly identical
The natural law occurred in the first domain name continuous date;
For example, it is determined that in data file to be put in storage, the first IP address " 1.1.1.1 " corresponds to the company of the first domain name " baidu "
The continuous natural law for occurring, when treating that the first IP address " 1.1.1.1 " the first domain name of correspondence " baidu " in data file to be put in storage is located
The first date in warehouse-in data record includes:" 20160102 ", " 20160103 " and " 20160104 " occur, then IP ground
The natural law of the continuous appearance of location " 1.1.1.1 " the first domain name of correspondence " baidu " is 3.
S103, the first IP address, the first domain name, the first date in described at least one data record to be put in storage and
Described first continuously there is natural law, updates the data record that the second IP address is located in database table;
Wherein, the data record that second IP address is located in database table includes:Second IP address second
Continuously there is natural law in domain name, the second date and second.
Specifically, as shown in figure 3, first IP address in described at least one data record to be put in storage,
Continuously there is natural law in one domain name, the first date and described first, update the data note that the second IP address is located in database table
Record, comprises the steps:
Step 1, for first IP address, judges in the database table with the presence or absence of the second IP address and described the
One IP address is identical;
It is emphasized that by wait put in storage data file in wait put in storage data record storage to database table when, need
The data record to be put in storage in data file to be put in storage is read one by one, and is judged in database table with the presence or absence of the 2nd IP ground
Whether location is identical with the first IP address in data record to be put in storage.
Step 2, when judged result is no, the data record to be put in storage and described first that first IP address is located
Continuously there is natural law to be stored in the database table;
Specifically, the first IP address and the second IP address are compared, when there is no the second IP address in database table
When identical with the first IP address, directly the data record to be put in storage that first IP address is located is stored in database table,
Be exactly by the first IP address, the first domain name, the first date storage in the database table.
As shown in figure 4, a kind of signal that data record to be put in storage is stored in database table provided in an embodiment of the present invention
Figure, including database table 410, data file to be put in storage 420 and database table 430, wherein, database table 410 is to treat one
Warehouse-in data record stores the database table before the database table 410, and data file to be put in storage includes a plurality of database data to be entered
Record, database table 430 is that a data record to be put in storage is stored in the database table after the database table 410.Here,
First IP address be located data record put be " 1.1.1.1 sina 20160101 ", first IP address " 1.1.1.1 "
Do not exist in database table, and it is 1 that natural law continuously occurs in the first of the data record to be put in storage, then the first IP address is located
Data record to be put in storage be the first of " 1.1.1.1 sina 20160101 " and the data record to be put in storage natural law continuously occur
After storing database table 410, database table 430 is obtained, wherein, include that data record is " 1.1.1.1 in database table 430
sina 20160101 1”。
Step 3, when judged result is to be, updates the second IP address corresponding second domain in the database table
Continuously there is natural law in name, the second date and second.
When there is the second IP address in the database table and being identical with the first IP address, need that second IP address exists
Data record in database table is updated, that is, update second domain name of second IP address in institute's data record, second
Date and second continuously there is natural law.
Specifically, it is described when judged result is to be, update second IP address corresponding in the database table
Continuously there is natural law in second domain name, the second date and second, including:
When judged result is to be, second IP address corresponding second domain name and institute in the database table are judged
Whether identical state the first domain name;
When corresponding second domain name is differed second IP address with first domain name in the database table,
Continuously are there is into natural law in first IP address, first domain name, first date, described first and is stored in the data
In the table of storehouse;
When corresponding second domain name is identical with first domain name in the database table for second IP address, more
Continuously there is natural law in new second IP address corresponding second domain name, the second date and second in the database table.
The first IP address phase in practical application, in it there is the second IP address and data record to be put in storage in database table
Meanwhile, need to judge whether corresponding first domain name of first IP address the second domain name corresponding with the 2nd IP is identical, according to
One domain name and the second domain name whether identical judged result, updates second IP address corresponding the in the database table
Continuously there is natural law in two domain names, the second date and second.
Wherein, it is described to judge the second IP address corresponding second domain name and first domain in the database table
Whether name is identical, including:
Calculate the second Hamming distance of first domain name and second domain name;
According to second Hamming distance, by preset formula, the of first domain name and second domain name is determined
Two similarities;
When second similarity is more than the second predetermined threshold value, it is judged as first IP address in the database table
In corresponding second domain name differ with first domain name;
When second similarity is less than or equal to second predetermined threshold value, it is judged as first IP address described
Corresponding second domain name is identical with first domain name in database table.
For example, there is the first IP address in the second IP address and data record to be put in storage in database table is judged
On the basis of identical, judge whether corresponding second domain name of the second IP address the first domain name corresponding with the first IP address is identical,
Firstly, it is necessary to calculate the second Hamming distance of the first domain name and the second domain name.
Specifically, the first domain name is " google ", and the second domain name is " baidu ", according to formula:
It is " google " to calculate the first domain name, and the second domain name is second Hamming distance of " baidu ", wherein, H is the first domain
Second Hamming distance of name " google " and the second domain name " baidu ", n is the character number of the first domain name or the second domain name, is needed
It should be noted that n generally takes the character number corresponding to the more domain name of character number of the first domain name or the second domain name, ViFor
I-th character in first domain name or the second domain name, VjFor j-th character in the second domain name or the first domain name, should be noted
, work as ViFor the i-th character in the first domain name when, work as VjFor the jth character in the second domain name, or, work as ViFor the second domain name
In the i-th character when, work as VjFor the jth character in the first domain name.
Secondly, according to preset formula:
Determine the second similarity of the first domain name " google " and the second domain name " baidu ".Wherein, Adj (P, T) is first
Between corresponding character string P of domain name (that is, google) and corresponding character string T of the second domain name (that is, baidu) second is similar
Degree, H is the second Hamming distance of the first domain name and the second domain name, and max H are the Chinese of maximum second of the first domain name and the second domain name
Prescribed distance.
For example, the second predetermined threshold value is B, as Adj (P, T)>During B, the first domain name and the second domain name are differed;When Adj (P,
T)≤B when, the first domain name is identical with the second domain name.
Specifically, it is described when second IP address is in the data in a kind of implementation of the embodiment of the present invention
When corresponding second domain name is differed with first domain name in the table of storehouse, renewal second IP address is in the database table
Continuously there is natural law in corresponding second domain name, the second date and second, including:
When second IP address, corresponding second domain name is differed with first domain name in the database table, and
Described first continuously there is natural law when continuously there is natural law less than described second, by first IP address, first domain name,
Continuously there is natural law and are stored in the database table in first date, described first;
As shown in figure 5, provided in an embodiment of the present invention show the another kind that data record to be put in storage is stored in database table
It is intended to, including database table 510, data file to be put in storage 520 and database table 530, wherein, database table 510 is by one
Data record to be put in storage stores the database table before the database table 510, and data file to be put in storage includes a plurality of number to be put in storage
According to record, database table 530 is that a data record to be put in storage is stored in the database table after the database table 510.This
In, data record put that the first IP address is located is " 1.1.1.1 sina01 20160106 ", and the first IP address institute
Data record to be put in storage first continuously there is natural law for 2, data record " the 1.1.1.1 sina in database table 510
20160101 5 " the second domain name " sina " is different from the first domain name " sina01 ", and second natural law continuously occurs for " 5 ", it is seen then that
First natural law " 2 " continuously occurs continuously occurs natural law " 5 " less than second, then the data record to be put in storage the first IP address being located
Continuously there is natural law " 2 " storage to data in for " 1.1.1.1 sina01 20160106 " and the data record to be put in storage first
After storehouse table 510, database table 530 is obtained, wherein, the data record in database table 530 includes:“1.1.1.1 sina
20160101 5 " and " 1.1.1.1 sina01 20,160,106 2 ".
When second IP address, corresponding second domain name is differed with first domain name in the database table, and
Described first continuously there is natural law when continuously there is natural law more than described second, deletes the data note that second IP address is located
Record, and natural law are continuously occurred in first IP address, first domain name, first date, described first and be stored in institute
In stating database table;
For example, as shown in fig. 6, provided in an embodiment of the present invention be stored in database table by data record to be put in storage
Another kind of schematic diagram, including database table 610, data file to be put in storage 620 and database table 630, wherein, database table 610
It is that a data record to be put in storage is stored into database table before the database table 510, data file to be put in storage includes a plurality of
Data record to be put in storage, database table 630 is that a data record to be put in storage is stored in the data after the database table 610
Storehouse table.Here, the first IP address be located data record put be " 1.1.1.1 sina01 20160106 ", and this first
It is to include in 3, database table 610 that natural law continuously occurs in the first of the data record to be put in storage that IP address is located:Data record
" 1.1.1.1 sina 20,160,101 2 ", and the second domain name " sina " in the data record is with the first domain name " sina01 " no
Together, second continuously there is natural law for " 2 ", it is seen then that first natural law " 3 " continuously occurs continuously occurs natural law " 2 " more than second, then will
The data record to be put in storage that first IP address is located is " 1.1.1.1 sina01 20160106 " and the data record to be put in storage
First continuously there is natural law " 3 " storage to after database table 610, and deletes the data record " 1.1.1.1 in database table
Sina 20,160,101 2 ", obtains database table 630, wherein, the data record in database table 630 includes:“1.1.1.1
sina01 201601016 3”。
When second IP address in the database table corresponding second domain name and first domain name differ, institute
State the second IP address corresponding 3rd domain name in the database table to differ with first domain name, and it is described first continuous
When appearance natural law natural law continuously occurs, continuously natural law occur more than the 3rd less than described second, delete the 3rd domain name and be located
Data record, and are continuously there is into natural law in first IP address, first domain name, first date, described first
In being stored in the database table, wherein, the 3rd domain name is differed with first domain name and second domain name, institute
State the 3rd and the continuous appearance that natural law is the data record that the second IP address correspondence the 3rd domain name is located continuously occur
Natural law.
Data record to be put in storage is stored in into another of database table shows as shown in fig. 7, provided in an embodiment of the present invention
It is intended to, including database table 710, data file to be put in storage 720 and database table 730, wherein, database table 710 is by one
Data record to be put in storage stores the database table before the database table 710, and data file to be put in storage includes a plurality of number to be put in storage
According to record, database table 730 is that a data record to be put in storage is stored in the database table after the database table 710.This
In, data record put that the first IP address is located is " 1.1.1.1 google 20160106 ", and the first IP address institute
Data record to be put in storage first continuously there is natural law for 3, database table 710 includes:Data record " 1.1.1.1 sina
20160101 6 " and data record " 1.1.1.1 baidu 20,160,101 2 ", wherein, the second domain name " sina " and the 3rd domain name
" baidu " is respectively different from the first domain name " google ", second in data record " 1.1.1.1 sina 20,160,101 6 "
Continuously there is natural law for " 6 ", the 3rd of data record " 1.1.1.1 baidu 20,160,101 2 " natural law continuously occurs for " 2 ",
It can be seen that, first natural law " 3 " continuously occurs continuously occurs natural law " 6 " less than second, continuously occurs natural law " 2 " more than the 3rd, then will
The data record to be put in storage that first IP address is located is " 1.1.1.1 google 20160106 " and the data record to be put in storage
First continuously there is natural law " 3 " storage to after database table 710, deletes data record " the 1.1.1.1 baidu in database table
20160101 2 ", database table 730 is obtained, wherein, the data record in database table 730 includes:“1.1.1.1 sina
20160101 6 " and " 1.1.1.1 google 20,160,106 3 ".
Specifically, it is described when second IP address is in the data in a kind of implementation of the embodiment of the present invention
When corresponding second domain name is identical with first domain name in the table of storehouse, second IP address is updated right in the database table
Continuously there is natural law in the second domain name for answering, the second date and second, including:
When second IP address, corresponding second domain name is identical with first domain name in the database table, and institute
When stating first and natural law continuously occur and continuously natural law occur equal to described second, second IP address is updated in the database table
In corresponding second date be first date, update described second and natural law continuously occur and natural law continuously occur for the 3rd, its
In, the described 3rd continuously there is the difference that natural law deducts second date equal to first date, continuously goes out with described first
The sum of existing natural law;
As shown in figure 8, data record to be put in storage is stored in database table again by another kind provided in an embodiment of the present invention
A kind of schematic diagram, including database table 810, data file to be put in storage 820 and database table 830, wherein, database table 810 is
One data record to be put in storage is stored into the database table before the database table 810, data file to be put in storage includes a plurality for the treatment of
Warehouse-in data record, database table 830 is that a data record to be put in storage is stored in the data base after the database table 810
Table.Here, data record put that the first IP address is located is " 1.1.1.1 sina 20160106 ", and an IP
It is 3 that natural law continuously occurs in the first of the data record to be put in storage that location is located, and database table 810 includes:Data record
Second domain name " sina " of " 1.1.1.1 sina 20,160,101 3 " is identical with the first domain name " sina ", and second day continuously occurs
Number is " 3 ", it is seen then that first natural law " 3 " continuously occurs continuously occurs natural law " 3 " equal to second, then by the number in database table 810
According to record " 1.1.1.1 sina 20,160,101 3 " the second date " 20160101 " be updated to the first date " 20160106 ",
Second natural law " 3 " continuously occurs is updated to the 3rd and natural law " 8 " continuously occurs, wherein, the 3rd continuously there is natural law equal to first day
Phase (20160106) deducts the difference (5) of the second date (20160101), and poor (5) are along with the first sum for natural law " 3 " continuously occur
Continuously there is natural law for the 3rd in " 8 ", " 8 ".So, the data record obtained in database table 830 includes:“1.1.1.1 sina
20160106 8”。
When second IP address, corresponding second domain name is identical with first domain name in the database table, and institute
State first and natural law continuously occur and continuously occur natural law less than described second, described first natural law continuously occur more than the 2nd IP
Continuously there is natural law in address the corresponding 3rd, updates the second IP address corresponding second date in the database table and is
Continuously there is natural law and natural law continuously occurs for the 4th in first date, renewal described second, deletes the described 3rd continuous appearance
The data record that natural law is located, wherein, the described 4th natural law continuously occurs deducts second date equal to first date
Difference, with described first continuously occur natural law and;The 3rd domain in described 3rd data record for natural law place continuously occur
Name is differed with first domain name.
As shown in figure 9, data record to be put in storage is stored in database table again by another kind provided in an embodiment of the present invention
A kind of schematic diagram, including database table 910, data file to be put in storage 920 and database table 930, wherein, database table 910 is
One data record to be put in storage is stored into the database table before the database table 910, data file to be put in storage includes a plurality for the treatment of
Warehouse-in data record, database table 930 is that a data record to be put in storage is stored in the data base after the database table 910
Table.Here, data record put that the first IP address is located is " 1.1.1.1 sina 20160106 ", and an IP
It is to include in 3, database table 910 that natural law continuously occurs in the first of the data record to be put in storage that location is located:Data record
" 1.1.1.1 sina 20,160,101 6 " and data record " 1.1.1.1 baidu 20,160,101 1 ", and data record
The second domain name " sina " in " 1.1.1.1 sina 20,160,101 6 " is identical with the first domain name " sina ", data record
Continuously there is natural law for " 6 ", data record " 1.1.1.1 baidu in " 1.1.1.1 sina 20,160,101 6 " second
20160101 1 " the 3rd domain name " baidu " is differed with the first domain name, and data record " 1.1.1.1 baidu
20160101 1 " continuously there is natural law for " 2 " in the 3rd in, it is seen then that first natural law " 3 " continuously occurs less than the second continuous appearance
Natural law " 6 ", continuously occurs natural law " 2 " more than the 3rd, then update second IP address corresponding in the database table
Two dates " 20160101 " are the first date " 20160106 ", and renewal second natural law " 6 " continuously occurs and day continuously occurs for the 4th
Number " 11 ", deletes data record " 1.1.1.1 baidu 20,160,101 2 ", wherein, the 4th continuously there is natural law " 11 " equal to the
One date " 20160106 " deducts the difference " 5 " of the second date " 20160101 ", poor " 5 " with first continuously occur natural law " 6 " and
And " 11 " should be, wherein, the data record in database table 930 includes:“1.1.1.1 sina 20160101 11”.
In the embodiment of the present invention, at least one CDR file is obtained, and pretreatment is carried out at least one CDR file, obtained
To data file to be put in storage, first of identical first domain name of identical first IP address correspondence in the data file to be put in storage is determined
Continuously there is natural law, the first IP address, the first domain name at least one data record to be put in storage, the first date and first
Continuously there is natural law, update the data record that the second IP address is located in database table.In this programme, according to database data to be entered
Continuously there is natural law in first IP address of record, the first domain name, the first date and first, constantly update the IP ground in database table
Location and domain name corresponding relation, improve the accuracy of IP address and domain name corresponding relation.
As shown in Figure 10, the updating device of a kind of IP address provided in an embodiment of the present invention and domain name corresponding relation, the dress
Putting 1000 includes:
Acquiring unit 1010, for obtaining at least one CDR file, at least one CDR file pre- place is carried out
Reason, obtains data file to be put in storage, wherein, the data file to be put in storage includes at least one data record to be put in storage, per bar
Data record to be put in storage includes:First IP address, the first domain name and the first date;
Determining unit 1020, for determining data file put in identical first IP address correspondingly identical first
Continuously there is natural law in the first of domain name, wherein, described first continuously there is natural law in the data file to be put in storage identical the
The natural law occurred in identical first domain name of the one IP address correspondence continuous date;
Updating block 1030, for the first IP address in described at least one data record to be put in storage, the first domain
Continuously there is natural law in name, the first date and described first, update the first IP address corresponding second domain in database table
Continuously there is natural law in name, the second date and second;Wherein, described second continuously there is natural law and day continuously occurs by described first
Number, the first date and the second date determine.
Optionally, as shown in figure 11, the acquiring unit 1010 includes:
First obtains subelement 1011, for obtaining at least one CDR file, wherein, each CDR file is included at least
Two column data;
First extracts subelement 1012, for extracting initial ip address data row and user from least two column data
Requesting host name information data are arranged, wherein, the initial ip address and user's request host name information are corresponded;
Subelement 1013 is filtered, for filtering the initial ip address data row and the user's request host name information
Data are arranged, and obtain the first IP address and the column data of first user requesting host name information two;
Second extracts subelement 1014, for according to the whitelist file for having stored, from the first user requesting host
The first domain name is extracted in name information;First IP address is corresponded with first domain name;
Second obtains subelement 1015, for obtaining the title of at least one CDR file, wherein, the title bag
Include the first date for generating at least one CDR file;
Addition subelement 1016, for first date correspondence to be added to into first IP address and first domain
In name, at least one data record to be put in storage is generated;
Duplicate removal subelement 1017, for for the phase same date in first date, by same first IP address correspondence
All first domain name duplicate removals, obtain the data file to be put in storage.
Optionally, filter subelement 1013 specifically for,
In initial ip address data row and user's request host name information data row, search pre-conditioned
In the range of initial ip address and user's request host name information, will search obtain pre-conditioned scope in initial IP ground
Location and user's request host name information are deleted, and obtain the first IP address and the columns of first user requesting host name information two
According to.
Optionally, the duplicate removal subelement 1017 specifically for,
For the phase same date in first date, by preset algorithm, the corresponding institute of same first IP address is calculated
There is corresponding first Hamming distance of the domain name of any two first in the first domain name;
According to first Hamming distance, by preset formula, the character string of the domain name of any two first is obtained
First similarity;
Judge first similarity whether more than the first predetermined threshold value;
When judged result is to be, retains the domain name of any two first and put in storage in data file described;
When judged result is no, what arbitrary first domain name in the deletion domain name of any two first was located waits to put in storage
Data record, the data file to be put in storage after being updated.
Optionally, as shown in figure 12, the updating block 1030 includes:
Judgment sub-unit 1031, for for first IP address, judging to whether there is second in the database table
IP address is identical with first IP address;
Storing sub-units 1032, for when judged result is no, by the database data to be entered at first IP address place
Record and described first continuously there is natural law and be stored in the database table;
Subelement 1033 is updated, for when judged result is to be, updating second IP address in the database table
In corresponding second domain name, the second date and second continuously there is natural law.
Optionally, the renewal subelement 1033 is specifically for when judged result is to be, judging second IP address
Whether corresponding second domain name is identical with first domain name in the database table;
When corresponding second domain name is differed second IP address with first domain name in the database table,
Continuously are there is into natural law in first IP address, first domain name, first date, described first and is stored in the data
In the table of storehouse;
When corresponding second domain name is identical with first domain name in the database table for second IP address, more
Continuously there is natural law in new second IP address corresponding second domain name, the second date and second in the database table.
Optionally, the renewal subelement 1033 is specifically for calculating the of first domain name and second domain name
Two Hamming distances;
According to second Hamming distance, by preset formula, the of first domain name and second domain name is determined
Two similarities;
When second similarity is more than the second predetermined threshold value, it is judged as first IP address in the database table
In corresponding second domain name differ with first domain name;
When second similarity is less than or equal to second predetermined threshold value, it is judged as first IP address in institute
State corresponding second domain name in database table identical with first domain name.
Optionally, it is described renewal subelement 1033 specifically for, when second IP address it is right in the database table
The second domain name answered is differed with first domain name, and described first natural law continuously occurs and day continuously occur less than described second
During number, are continuously there is into natural law in first IP address, first domain name, first date, described first and is stored in institute
In stating database table;
When second IP address, corresponding second domain name is differed with first domain name in the database table, and
Described first continuously there is natural law when continuously there is natural law more than described second, deletes the data note that second IP address is located
Record, and natural law are continuously occurred in first IP address, first domain name, first date, described first and be stored in institute
In stating database table;
When second IP address in the database table corresponding second domain name and first domain name differ, institute
State the second IP address corresponding 3rd domain name in the database table to differ with first domain name, and it is described first continuous
When appearance natural law natural law continuously occurs, continuously natural law occur more than the 3rd less than described second, delete the 3rd domain name and be located
Data record, and are continuously there is into natural law in first IP address, first domain name, first date, described first
In being stored in the database table, wherein, the 3rd domain name is differed with first domain name and second domain name, institute
State the 3rd and the continuous appearance that natural law is the data record that the second IP address correspondence the 3rd domain name is located continuously occur
Natural law.
Optionally, it is described renewal subelement 1033 specifically for, when second IP address it is right in the database table
The second domain name answered is identical with first domain name, and described first natural law continuously occurs and natural law continuously occur equal to described second
When, it is first date to update second IP address corresponding second date in the database table, updates described the
Two natural law continuously occur continuously there is natural law for the 3rd, wherein, the described 3rd natural law continuously occurs subtracts equal to first date
Go the difference on second date, with described first continuously occur natural law and;
When second IP address, corresponding second domain name is identical with first domain name in the database table, and institute
State first and natural law continuously occur and continuously occur natural law less than described second, described first natural law continuously occur more than the 2nd IP
Continuously there is natural law in address the corresponding 3rd, updates the second IP address corresponding second date in the database table and is
Continuously there is natural law and natural law continuously occurs for the 4th in first date, renewal described second, deletes the described 3rd continuous appearance
The data record that natural law is located, wherein, the described 4th natural law continuously occurs deducts second date equal to first date
Difference, with described first continuously occur natural law and;The 3rd domain in described 3rd data record for natural law place continuously occur
Name is differed with first domain name.
In the embodiment of the present invention, at least one CDR file is obtained, and pretreatment is carried out at least one CDR file, obtained
To data file to be put in storage, first of identical first domain name of identical first IP address correspondence in the data file to be put in storage is determined
Continuously there is natural law, the first IP address, the first domain name at least one data record to be put in storage, the first date and first
Continuously there is natural law, update the data record that the second IP address is located in database table.In this programme, according to database data to be entered
Continuously there is natural law in first IP address of record, the first domain name, the first date and first, constantly update the IP ground in database table
Location and domain name corresponding relation, improve the accuracy of IP address and domain name corresponding relation.
For device embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, it is related
Part is illustrated referring to the part of embodiment of the method.
It should be noted that herein, such as first and second or the like relational terms are used merely to a reality
Body or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or deposit between operating
In any this actual relation or order.And, term " including ", "comprising" or its any other variant are intended to
Nonexcludability is included, so that a series of process, method, article or equipment including key elements not only will including those
Element, but also including other key elements being not expressly set out, or also include for this process, method, article or equipment
Intrinsic key element.In the absence of more restrictions, the key element for being limited by sentence "including a ...", it is not excluded that
Also there is other identical element in process, method, article or equipment including the key element.
Each embodiment in this specification is described by the way of correlation, identical similar portion between each embodiment
Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.Especially for system reality
For applying example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method
Part explanation.
Presently preferred embodiments of the present invention is the foregoing is only, protection scope of the present invention is not intended to limit.It is all
Any modification, equivalent substitution and improvements made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention
It is interior.
Claims (10)
1. the update method of a kind of IP address and domain name corresponding relation, it is characterised in that include:
At least one CDR file is obtained, pretreatment is carried out at least one CDR file, obtain data file to be put in storage,
Wherein, the data file to be put in storage includes at least one data record to be put in storage, and per bar, data record to be put in storage includes:First
IP address, the first domain name and the first date;
It is determined that first of identical first domain name of identical first IP address correspondence in the data file to be put in storage continuously there is day
Number, wherein, described first continuously there is natural law for identical first IP address correspondence identical first in the data file to be put in storage
The natural law occurred in the domain name continuous date;
The first IP address, the first domain name, the first date and described first in described at least one data record to be put in storage
Continuously there is natural law, update the data record that the second IP address is located in database table, wherein, second IP address is being counted
Include according to the data record being located in the table of storehouse:Continuously there is natural law in the domain name of second IP address second, the second date and second.
2. method according to claim 1, it is characterised in that the CDR file of the acquisition at least one, to it is described at least
One CDR file carries out pretreatment, obtains data file to be put in storage, including:
At least one CDR file is obtained, wherein, each CDR file includes at least two column data;
Initial ip address data row and user's request host name information data row are extracted from least two column data, its
In, the initial ip address and user's request host name information are corresponded;
Filter initial ip address data row and user's request host name information data row, obtain the first IP address and
The column data of first user requesting host name information two;
According to the whitelist file for having stored, from the first user requesting host name information the first domain name is extracted;It is described
First IP address is corresponded with first domain name;
The title of at least one CDR file is obtained, wherein, the title includes generating at least one CDR file
The first date;
First date correspondence is added in first IP address and first domain name, at least one is generated and is waited to put in storage
Data record;
For the phase same date in first date, the corresponding all first domain name duplicate removals of same first IP address are obtained
The data file to be put in storage.
3. method according to claim 2, it is characterised in that the filtration initial ip address data row and the use
Family requesting host name information data row, obtain the first IP address and the columns of first object user's request host name information two
According to, including:
In initial ip address data row and user's request host name information data row, pre-conditioned scope is searched
Interior initial ip address and user's request host name information, by the initial ip address searched in the pre-conditioned scope for obtaining and
User's request host name information is deleted, and obtains the first IP address and the column data of first user requesting host name information two.
4. method according to claim 2, it is characterised in that the phase same date in first date, will
The corresponding all first domain name duplicate removals of same first IP address, obtain the data file to be put in storage, including:
For the phase same date in first date, by preset algorithm, same first IP address corresponding all the is calculated
Corresponding first Hamming distance of the domain name of any two first in one domain name;
According to first Hamming distance, by preset formula, obtain the domain name of any two first character string first
Similarity;
Judge first similarity whether more than the first predetermined threshold value;
When judged result is to be, retains the domain name of any two first and put in storage in data file described;
When judged result is no, the database data to be entered that arbitrary first domain name in the domain name of any two first is located is deleted
Record, the data file to be put in storage after being updated.
5. method according to claim 1, it is characterised in that described according in described at least one data record to be put in storage
The first IP address, the first domain name, the first date and described first continuously there is natural law, update the second IP address in database table
The data record at middle place, including:
For first IP address, judge in the database table with the presence or absence of the second IP address and the first IP address phase
Together;
When judged result is no, continuously there is day in the data record to be put in storage and described first that first IP address is located
Number is stored in the database table;
When judged result is to be, the second IP address corresponding second domain name, second day in the database table are updated
Phase and second continuously there is natural law.
6. method according to claim 5, it is characterised in that described when judged result is to be, updates the 2nd IP
Continuously there is natural law in corresponding second domain name, the second date and second in the database table for address, including:
When judged result is to be, second IP address corresponding second domain name and described the in the database table is judged
Whether one domain name is identical;
When corresponding second domain name is differed second IP address with first domain name in the database table, by institute
The first IP address, first domain name are stated, first date, described first natural law is continuously occurred and is stored in the database table
In;
When corresponding second domain name is identical with first domain name in the database table for second IP address, institute is updated
State the second IP address corresponding second domain name, the second date and second in the database table and natural law continuously occur.
7. method according to claim 6, it is characterised in that the judgement second IP address is in the database table
In corresponding second domain name it is whether identical with first domain name, including:
Calculate the second Hamming distance of first domain name and second domain name;
According to second Hamming distance, by preset formula, the second phase of first domain name and second domain name is determined
Like degree;
When second similarity is more than the second predetermined threshold value, it is judged as that first IP address is right in the database table
The second domain name answered is differed with first domain name;
When second similarity is less than or equal to second predetermined threshold value, it is judged as first IP address in the number
It is identical with first domain name according to corresponding second domain name in the table of storehouse.
8. method according to claim 6, it is characterised in that described when second IP address is in the database table
When corresponding second domain name is differed with first domain name, second IP address is updated corresponding in the database table
Continuously there is natural law in second domain name, the second date and second, including:
When second IP address, corresponding second domain name is differed with first domain name in the database table, and described
First continuously there is natural law when continuously there is natural law less than described second, by first IP address, first domain name, described
Continuously there is natural law and are stored in the database table in first date, described first;
When second IP address, corresponding second domain name is differed with first domain name in the database table, and described
First continuously there is natural law when continuously there is natural law more than described second, deletes the data record that second IP address is located,
And natural law are continuously occurred in first IP address, first domain name, first date, described first be stored in the number
According in the table of storehouse;
When second IP address in the database table corresponding second domain name and first domain name differ, described
Two IP address corresponding 3rd domain name in the database table is differed with first domain name, and the described first continuous appearance
When natural law natural law continuously occurs, continuously natural law occur more than the 3rd less than described second, the number that the 3rd domain name is located is deleted
According to record, and continuously there is into natural law storage in first IP address, first domain name, first date, described first
In the database table, wherein, the 3rd domain name is differed with first domain name and second domain name, and described
Three natural law for the continuous appearance that natural law is the data record that the second IP address correspondence the 3rd domain name is located continuously occur.
9. method according to claim 6, it is characterised in that described when second IP address is in the database table
When corresponding second domain name is identical with first domain name, second IP address is updated corresponding the in the database table
Continuously there is natural law in two domain names, the second date and second, including:
When second IP address, corresponding second domain name is identical with first domain name in the database table, and described
One continuously there is natural law when continuously there is natural law equal to described second, updates second IP address right in the database table
The second date answered is first date, and renewal described second natural law continuously occurs and natural law continuously occurs for the 3rd, wherein, institute
State the 3rd and the difference that natural law deducts second date equal to first date continuously occur, natural law continuously occur with described first
Sum;
When second IP address, corresponding second domain name is identical with first domain name in the database table, and described
One natural law continuously occurs continuously occurs natural law less than described second, described first natural law continuously occurs more than second IP address
Corresponding 3rd continuously there is natural law, and renewal second IP address corresponding second date in the database table is described
Continuously there is natural law and natural law continuously occurs for the 4th in first date, renewal described second, deletes the described 3rd and natural law continuously occurs
The data record at place, wherein, the described 4th continuously there is the difference that natural law deducts second date equal to first date,
With described first continuously occur natural law and;The 3rd domain name and institute in described 3rd data record for natural law place continuously occur
State the first domain name to differ.
10. the updating device of a kind of IP address and domain name corresponding relation, it is characterised in that include:
Acquiring unit, for obtaining at least one CDR file, at least one CDR file pretreatment is carried out, and is treated
Warehouse-in data file, wherein, the data file to be put in storage includes at least one data record to be put in storage, the database data to be entered per bar
Record includes:First IP address, the first domain name and the first date;
Determining unit, for determining data file put in identical first IP address correspondingly identical first domain name the
One continuously there is natural law, wherein, described first continuously there is natural law for identical first IP address in the data file to be put in storage
The natural law occurred in identical first domain name of the correspondence continuous date;
Updating block, for the first IP address in described at least one data record to be put in storage, the first domain name, first day
Phase and described first continuously there is natural law, update the first IP address corresponding second domain name, second day in database table
Phase and second continuously there is natural law;Wherein, described second continuously there is natural law and natural law, the first date continuously occurs by described first
And the determination of the second date.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611155172.1A CN106603742B (en) | 2016-12-14 | 2016-12-14 | A kind of update method and device of IP address and domain name corresponding relationship |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611155172.1A CN106603742B (en) | 2016-12-14 | 2016-12-14 | A kind of update method and device of IP address and domain name corresponding relationship |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106603742A true CN106603742A (en) | 2017-04-26 |
CN106603742B CN106603742B (en) | 2019-04-26 |
Family
ID=58801551
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611155172.1A Active CN106603742B (en) | 2016-12-14 | 2016-12-14 | A kind of update method and device of IP address and domain name corresponding relationship |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106603742B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107197058A (en) * | 2017-07-21 | 2017-09-22 | 北京亚鸿世纪科技发展有限公司 | A kind of high coverage and accurate domain name IP corresponding relations acquisition methods and device |
CN107832406A (en) * | 2017-11-03 | 2018-03-23 | 北京锐安科技有限公司 | Duplicate removal storage method, device, equipment and the storage medium of massive logs data |
CN114143332A (en) * | 2021-11-03 | 2022-03-04 | 阿里巴巴(中国)有限公司 | Content delivery network CDN-based processing method, electronic device and medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101087253A (en) * | 2007-04-04 | 2007-12-12 | 华为技术有限公司 | Method, device, domain parsing method and device for saving domain system record |
CN103220379A (en) * | 2013-05-10 | 2013-07-24 | 广东睿江科技有限公司 | Domain name reverse-resolution method and device |
US8549118B2 (en) * | 2009-12-10 | 2013-10-01 | At&T Intellectual Property I, L.P. | Updating a domain name server with information corresponding to dynamically assigned internet protocol addresses |
CN103532852A (en) * | 2013-10-11 | 2014-01-22 | 小米科技有限责任公司 | Routing scheduling method, routing scheduling device and network equipment |
CN105763668A (en) * | 2016-02-26 | 2016-07-13 | 杭州华三通信技术有限公司 | Domain name resolution method and apparatus |
-
2016
- 2016-12-14 CN CN201611155172.1A patent/CN106603742B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101087253A (en) * | 2007-04-04 | 2007-12-12 | 华为技术有限公司 | Method, device, domain parsing method and device for saving domain system record |
US8549118B2 (en) * | 2009-12-10 | 2013-10-01 | At&T Intellectual Property I, L.P. | Updating a domain name server with information corresponding to dynamically assigned internet protocol addresses |
CN103220379A (en) * | 2013-05-10 | 2013-07-24 | 广东睿江科技有限公司 | Domain name reverse-resolution method and device |
CN103532852A (en) * | 2013-10-11 | 2014-01-22 | 小米科技有限责任公司 | Routing scheduling method, routing scheduling device and network equipment |
CN105763668A (en) * | 2016-02-26 | 2016-07-13 | 杭州华三通信技术有限公司 | Domain name resolution method and apparatus |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107197058A (en) * | 2017-07-21 | 2017-09-22 | 北京亚鸿世纪科技发展有限公司 | A kind of high coverage and accurate domain name IP corresponding relations acquisition methods and device |
CN107197058B (en) * | 2017-07-21 | 2019-09-17 | 北京亚鸿世纪科技发展有限公司 | A kind of high coverage and accurate domain name IP corresponding relationship acquisition methods and device |
CN107832406A (en) * | 2017-11-03 | 2018-03-23 | 北京锐安科技有限公司 | Duplicate removal storage method, device, equipment and the storage medium of massive logs data |
CN107832406B (en) * | 2017-11-03 | 2020-09-11 | 北京锐安科技有限公司 | Method, device, equipment and storage medium for removing duplicate entries of mass log data |
CN114143332A (en) * | 2021-11-03 | 2022-03-04 | 阿里巴巴(中国)有限公司 | Content delivery network CDN-based processing method, electronic device and medium |
Also Published As
Publication number | Publication date |
---|---|
CN106603742B (en) | 2019-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110489633B (en) | Intelligent brain service system based on library data | |
Ackland | Mapping the US political blogosphere: Are conservative bloggers more prominent? | |
JP3547069B2 (en) | Information associating apparatus and method | |
CN103902653B (en) | A kind of method and apparatus for building data warehouse table genetic connection figure | |
CN104899508B (en) | A kind of multistage detection method for phishing site and system | |
Wang et al. | Ranking user's relevance to a topic through link analysis on web logs | |
CN105447186B (en) | A kind of user behavior analysis system based on big data platform | |
CN106960063A (en) | A kind of internet information crawl and commending system for field of inviting outside investment | |
CN103226618B (en) | The related term extracting method excavated based on Data Mart and system | |
US20130144860A1 (en) | System and Method for Automatically Identifying Classified Websites | |
US20130006975A1 (en) | System and method for matching entities and synonym group organizer used therein | |
CN106603742A (en) | IP address and domain name corresponding relationship update method and device | |
CN110737821B (en) | Similar event query method, device, storage medium and terminal equipment | |
WO2009147185A1 (en) | Method for mapping an x500 data model onto a relational database | |
CN103714120B (en) | A kind of system that user interest topic is extracted in the access record from user url | |
CN111341458B (en) | Single-gene disease name recommendation method and system based on multi-level structure similarity | |
CN107273405B (en) | Intelligent retrieval system of electronic medical record files based on MeSH table | |
CN106776640A (en) | A kind of stock information information displaying method and device | |
CN106250456A (en) | Bid winning announcement extraction method and device | |
WO2015149550A1 (en) | Method and apparatus for determining grades of links within website | |
CN106547764A (en) | The method and device of web data duplicate removal | |
CN108228565A (en) | A kind of recognition methods of merchandise news keyword | |
CN103136223B (en) | A kind of excavation has the method and device of the inquiry of similar demands | |
CN103246697B (en) | A kind of method and apparatus for determining nearly justice sequence cluster | |
CN104462613B (en) | Hot spot polymerization and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |