CN115378905B

CN115378905B - Domain name collection method, device, equipment and computer readable storage medium

Info

Publication number: CN115378905B
Application number: CN202210873197.4A
Authority: CN
Inventors: 史振宇; 赵武
Original assignee: Beijing Huashun Xin'an Information Technology Co ltd; Beijing Huashunxinan Technology Co ltd
Current assignee: Beijing Huashun Xin'an Information Technology Co ltd; Beijing Huashunxinan Technology Co ltd
Priority date: 2022-07-22
Filing date: 2022-07-22
Publication date: 2023-11-14
Anticipated expiration: 2042-07-22
Also published as: CN115378905A

Abstract

The application relates to a domain name collection method, a device, equipment and a computer readable storage medium, belonging to the technical field of communication, which comprises the steps of receiving initial domain name information in real time; expanding the initial domain name information to obtain one or more related domain names; judging whether the web pages corresponding to the related domain names are open web pages one by one, if so, acquiring page response information of the open web pages, and storing the page response information into a data storage area; if the web page is not opened, resolving the related domain name to obtain an IP address, binding the related domain name and the IP address, and storing the binding relation between the related domain name and the IP address into a data storage area; for open web page storage page response information, and for unopened web page storage related domain name and IP address corresponding relation, so that the discarding of domain name and missing of domain name caused by unopened web page corresponding to domain name are not easy to happen.

Description

Domain name collection method, device, equipment and computer readable storage medium

Technical Field

The present application relates to the field of communications technologies, and in particular, to a domain name collection method, device, apparatus, and computer readable storage medium.

Background

The domain name system (Domain Name System) is used to name computers and web services in a hierarchy from organization to domain. The DNS server can provide domain name resolution service for the client, and the domain name input by the client is resolved into an IP address corresponding to the domain name, so that the client can access a website corresponding to the domain name by utilizing the IP address.

In the related art, a domain name is generally collected by using a domain name collection crawler, a web crawler is an important component of a search index engine grabbing system, the main purpose of the crawler is to download a web page on the internet to the local to form a mirror image backup of internet content, and when the domain name is collected, the domain name collection crawler can grab and store the related domain name or subdomain name in the local according to the web page content.

With respect to the related art in the above, the inventors found that: when a web crawler is used for capturing a domain name, omission easily occurs, and when a web page is not opened, the domain name acquisition crawler cannot acquire data, and therefore the domain name cannot be collected, and limitation exists in the domain name collection.

Disclosure of Invention

In order to facilitate more comprehensive collection of domain names, the application provides a domain name collection method, a device, equipment and a computer readable storage medium.

In a first aspect, the present application provides a domain name collection method, which adopts the following technical scheme:

a domain name collection method comprises the steps of receiving initial domain name information in real time;

expanding the initial domain name information to obtain one or more related domain names;

judging whether the web pages corresponding to the related domain names are open web pages one by one, if so, acquiring page response information of the open web pages, and storing the page response information into a data storage area; if the web page is not opened, the related domain name is resolved to obtain an IP address, the related domain name and the IP address are bound, and the binding relation between the related domain name and the IP address is stored in the data storage area.

By adopting the technical scheme, after the initial domain name information is expanded, whether the web pages corresponding to the related domain names are open web pages or not is judged one by one, if the web pages are open web pages, the page response information can be directly obtained from the open web pages, the page response information is stored, if the web pages are not open web pages, the IP corresponding to the related domain names is analyzed, and the binding relation between the related domain names and the IP addresses is stored; therefore, the relevant domain name really exists, but the part of the relevant domain name which is not opened is not directly discarded, but the binding IP address and the relevant domain name are stored, and further the comprehensive effect of collecting the relevant domain name of the initial domain name information is realized.

Optionally, after storing the page response information in the data storage area, the method further includes:

judging whether the page response information contains page domain name information, if so, setting the page domain name information as initial domain name information.

By adopting the technical scheme, if the page response information contains the page domain name information, the page domain name information is set as the initial domain name information, and the page domain name information is expanded and stored again, so that all domain names related to the initial domain name information are collected again.

Optionally, the storing the binding relationship between the related domain name and the IP address in the data storage area specifically includes:

judging whether an open port exists in the IP address; if yes, binding the related domain name with the port of the IP address, and storing the binding relation between the related domain name and the port of the IP address into a data storage area; if not, binding the related domain name with the IP address, and storing the binding relation between the related domain name and the IP address into a data storage area.

By adopting the technical scheme, if an open port exists in the IP address corresponding to the related domain name, the port of the related domain name and the port of the IP address are bound, the binding relation between the related domain name and the IP address is stored in the data storage area, and if the port does not exist in the IP address corresponding to the related domain name, the binding relation between the related domain name and the IP address is directly stored in the data storage area, so that the storage of the related domain name of the unopened webpage is realized.

Optionally, after the binding the related domain name and the port of the IP address and storing the binding relationship between the related domain name and the port of the IP address in the data storage area, the method further includes:

acquiring protocol response information of an IP address port;

judging whether the protocol domain name information exists in the protocol response information, and if so, setting the protocol domain name information as initial domain name information.

By adopting the technical scheme, for the IP address with the open port, the protocol response information is acquired, whether the protocol domain name information exists in the protocol response information is judged, if so, the protocol domain name information is set as the initial domain name information, and the protocol domain name information is expanded and stored, so that the related domain name of the initial domain name information is further collected, and the collected domain name is more comprehensive.

Optionally, the expanding the initial domain name information to obtain one or more related domain names specifically includes:

and inputting the initial domain name information into a domain name server, a domain name blasting tool or a domain name search engine, and inquiring to obtain the related domain name.

By adopting the technical scheme, the related domain name is inquired through a domain name server, a domain name blasting tool or a domain name search engine.

Optionally, after receiving the initial domain name information in real time, the method further includes:

and judging whether the initial domain name information is valid, and if not, discarding the initial domain name data.

By adopting the technical scheme, the initial domain name information is screened, invalid initial domain name information is timely discarded, and the running time of a program is saved.

Optionally, the expanding the initial domain name information to obtain one or more related domain names further includes:

judging whether the related domain names have the universal domain names, if so, judging whether the number of the universal domain names is larger than a preset number, if so, randomly selecting the preset number of the universal domain names for reservation, and if not, reserving all the universal domain names;

judging whether an invalid domain name exists in the related domain names, and discarding the invalid domain name if the invalid domain name exists.

By adopting the technical scheme, the universal domain name in the related domain name is selectively reserved, and the invalid domain name in the related domain name is discarded, so that the data volume stored in the data storage area is reduced on one hand, and valuable information can be screened out for a user on the other hand.

In a second aspect, the present application provides a domain name collection device, which adopts the following technical scheme:

a domain name collection device comprises a data receiving unit, a domain name expanding unit, a domain name processing unit and a storage unit;

the data receiving unit is used for receiving the initial domain name information in real time;

the domain name expansion unit is used for expanding the initial domain name information to obtain one or more related domain names;

the domain name processing unit is used for judging whether the web pages corresponding to the related domain names are open web pages one by one, and if the web pages are open web pages, acquiring page response information of the open web pages; if the web page is not opened, resolving the related domain name to obtain an IP address, and binding the related domain name and the IP address;

the storage unit is used for storing page response information, and the bound related domain name and IP address.

By adopting the technical scheme, after the initial domain name information is expanded, judging whether the obtained webpage corresponding to the related domain name is an open webpage, if the webpage is the open webpage, storing page response information, if the webpage is not the open webpage, storing the corresponding relation between the related domain name and the IP address, and if the IP address has an open port, acquiring protocol response information and storing the protocol response information, thereby realizing relatively comprehensive related domain names which collect the initial domain name information, and being difficult to discard and miss the domain names because the webpage corresponding to the related domain name is not open.

In a third aspect, the present application provides a computer device, which adopts the following technical scheme:

a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor performing a domain name collection method according to any of the first aspects.

In a fourth aspect, the present application provides a computer readable storage medium, which adopts the following technical scheme:

a computer readable storage medium comprising a computer program stored with a memory capable of being loaded by a processor and performing any one of the methods of the first aspect.

In summary, the application has the following beneficial technical effects:

after the initial domain name information is expanded, judging whether the obtained web page corresponding to the related domain name is an open web page, storing page response information for the open web page, and storing the corresponding relation between the related domain name and the IP address for the unopened web page, so that the domain name is not easy to discard and miss because the web page corresponding to the domain name is unopened, the related domain name of the open web page can be collected, the related domain name of the unopened web page can be also collected, and the effect of more comprehensively collecting the domain name related to the initial domain name information is realized.

Drawings

Fig. 1 is a flow chart of a domain name collection method according to an embodiment of the application.

FIG. 2 is a flow chart of a method for collecting domain names of open web pages according to one embodiment of the present application.

FIG. 3 is a flow chart illustrating a method for collecting domain names of unopened web pages according to an embodiment of the present application.

Fig. 4 is a flowchart of a method for screening related domain names according to an embodiment of the present application.

Fig. 5 is a block diagram of a collecting device according to an embodiment of the present application.

Reference numerals illustrate: 1. a data receiving unit; 2. a domain name expansion unit; 3. a domain name processing unit; 4. and a storage unit.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings 1 to 5 and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The embodiment of the application discloses a domain name collection method. Referring to fig. 1, a domain name collection method includes:

step S101: receiving initial domain name information in real time;

wherein the initial domain name information may be imported by a user or automatically imported by the system.

Step S102: and expanding the initial domain name information to obtain one or more related domain names.

Wherein the related domain name includes a sub-domain name of the initial domain name information and other related domain names.

Step S103: judging whether the web pages corresponding to the related domain names are open web pages one by one, if so, executing step S104; if the web page is not opened, step S105 is performed.

The open web page refers to a web page in which a web address is input and a page can be normally displayed.

Step S104: and acquiring page response information of the open webpage, and storing the page response information into the data storage area.

The page response information at least comprises an IP address, an IP address open port, web page response header information, web page response body information and web page certificates.

Specifically, the web page response header information includes Date (time of generation of the identification response), last-Modified (Last modification time of the specified resource), content-Encoding (Encoding of the specified response Content), server (information such as name, version number of the Server), content-Type (Type of returned data), expire time of the specified response), and the like; the web page response body information includes text data of a response, for example, when a web page is requested, the response body is HTML code of the web page, and when a picture is requested, the response body is binary data of the picture.

Step S105: and resolving the related domain name to obtain an IP address, binding the related domain name and the IP address, and storing the binding relation between the related domain name and the IP address into a data storage area.

Wherein the data stored in the data storage area is used for display so that the user can review the required information.

In the above embodiment, after the initial domain name information is expanded, whether the web pages corresponding to the related domain names are open web pages or not is determined one by one, if the web pages are open web pages, the page response information can be directly obtained from the open web pages, the page response information is stored, if the web pages are not open web pages, the IP corresponding to the related domain names is resolved, and the binding relation between the related domain names and the IP addresses is stored; for the related domain name actually exists, but the part of data of the web page which is not opened is not directly discarded, but the binding IP address and the related domain name are stored, so that the comprehensive effect of collecting the related domain name of the initial domain name information is realized.

As an embodiment of step S102, step S102 specifically includes: and inputting the initial domain name information into a domain name server, a domain name blasting tool or a domain name search engine, and inquiring to obtain the related domain name.

Referring to fig. 2, as a further embodiment of the domain name collection method, after step S104, further includes:

step S1041: and judging whether the page response information contains page domain name information, if so, executing the step S1042, and if not, not executing the operation.

Step S1042: the page domain name information is set as the initial domain name information.

The page domain name information is contained in the web page response header information and the web page response body information in the page response information, and the web page response header information and the web page response body information are searched.

In the above embodiment, if the page response information includes the page domain name information, the page domain name information is set as the initial domain name information, and at this time, the page domain name information is used as the initial domain name information automatically imported by the system to repeatedly execute steps S101-S105, so as to collect the related domain names in the page domain name information, thereby facilitating to collect all domain names related to the initial domain name information again.

Referring to fig. 3, binding the related domain name and the IP address, storing the binding relationship of the related domain name and the IP address to the data storage area specifically includes,

step S1051: judging whether an open port exists in the IP address; if yes, go to step S1052, if no, go to step S1053.

Step S1052: binding the related domain name with the port of the IP address, and storing the binding relation between the related domain name and the port of the IP address into a data storage area;

if there are a plurality of open ports in the IP address corresponding to the related domain name, the plurality of ports are all bound to the related domain name.

Step S1053: binding the related domain name with the IP address, and storing the binding relation between the related domain name and the IP address into a data storage area.

As a further embodiment of the domain name collection method, after step S1052, further comprising:

step S1054: acquiring protocol response information of an IP address port;

the protocol response information comprises certificates and/or port canner information, and the canner information comprises information such as a software developer, a software name, a service type, a version number and the like.

In addition, the protocol response information is stored to the data storage area.

Step S1055: whether the protocol domain name information exists in the protocol response information is determined, and if so, step S1056 is performed.

Step S1056: the protocol domain name information is set as the initial domain name information.

In the above embodiment, if an open port exists in the IP address corresponding to the related domain name, the protocol response information of the IP address port is obtained, the related domain name is bound with the port of the IP address, and the binding relationship between the related domain name and the IP address and the protocol response information are stored in the data storage area, so that the related domain name of the unopened web page of the initial domain name information is collected conveniently; judging whether the protocol domain name information exists in the protocol response information, if so, setting the protocol domain name information as initial domain name information, repeatedly executing the steps S101-S105, and further collecting the related domain name of the initial domain name information, so that the initial domain name information is collected more comprehensively.

As a further embodiment of the domain name collection method, step S101 further includes: and judging whether the initial domain name information is valid, and if not, discarding the initial domain name data.

Detecting whether the initial domain name information survives or not through a domain name survival test, if so, considering the initial domain name information as valid, and continuing to execute the next step; otherwise, the initial domain name data is discarded.

In the embodiment, the initial domain name information is filtered, and invalid initial domain name information is timely discarded, so that the running time of a program is saved.

Referring to fig. 4, as a further embodiment of the domain name collection method, step S102 further includes:

step S1021: whether the related domain name exists or not is judged, and if so, step S1022 is performed.

The generic domain name refers to a domain name with the same IP address and can be resolved by adding any prefix under the same root domain name, so that when judging whether the generic domain name exists, a test can be performed by adopting a mode of arbitrarily adding a secondary domain name or a wild card symbol (asterisk) is used for resolving the secondary domain name.

Step S1022: judging whether the number of the universal domain names is larger than a preset number, if so, executing step S1023; if not, step S1024 is performed.

After the broad domain name is resolved, the IP addresses corresponding to the broad domain name under the same root domain name are the same, and the reserved preset number of broad domain names is taken as an example, and the preset number can be two, three, four or more than four.

It should be noted that, when judging the number of the domain names, it should be judged whether the number of the domain names under different root domains is greater than a preset number, that is, each root domain name retains at most a preset number of the domain names.

Step S1023: randomly selecting a preset number of domain names to reserve.

Step S1024: all the generic domain names are reserved.

Step S1025: whether an invalid domain name exists in the related domain names is judged, and if so, step S1026 is performed.

Step S1026: the invalid domain name is discarded.

Where invalid domain names refer to non-surviving domain names.

In the above embodiment, the broad domain name in the related domain name is selectively reserved, and the invalid domain name in the related domain name is discarded, so that on one hand, the data volume stored in the data storage area is reduced, the operand of the subsequent step is reduced, and on the other hand, valuable information can be screened out for the user.

The implementation principle of the domain name collection method in the embodiment of the application is as follows: after the initial domain name information is expanded, judging whether the obtained web page corresponding to the related domain name is an open web page, if the web page is an open web page, storing page response information, if the web page is an unopened web page, storing the corresponding relation between the related domain name and the IP address, and if the IP address has an open port, acquiring protocol response information and storing the protocol response information, thereby realizing that domain names related to the initial domain name information are comprehensively collected, and discarding and missing the domain names are not easy to occur due to the fact that the web page corresponding to the domain name is unopened.

The embodiment of the application also discloses a domain name collection device.

Referring to fig. 5, a domain name collecting apparatus further includes a data receiving unit, a domain name expanding unit, a domain name processing unit, and a storage unit;

a data receiving unit 1, configured to receive initial domain name information in real time;

a domain name expansion unit 2, configured to expand the initial domain name information to obtain one or more related domain names;

the domain name processing unit 3 is used for judging whether the web pages corresponding to the related domain names are open web pages one by one, and if the web pages are open web pages, acquiring page response information of the open web pages; if the web page is not opened, resolving the related domain name to obtain an IP address, and binding the related domain name and the IP address;

and the storage unit 4 is used for storing the page response information, the bound related domain name and the bound IP address.

In the above embodiment, the data receiving unit is used to receive the initial domain name information, the domain name expansion unit is used to expand the domain name, the domain name processing unit is used to judge whether the related domain name obtained by expansion is an open web page, if the related domain name is an open web page, the page response information is stored, if the related domain name is an unopened web page, the binding relation between the related domain name and the IP address is stored, so that the related domain names corresponding to the open web page and the unopened web page can be stored, the omission and other conditions are not easy to occur, and the inquiry of the related domain name of the initial domain name information is more comprehensive.

The domain name collecting device provided by the embodiment of the application can realize any one of the collecting methods, and the specific working process of the domain name collecting device can refer to the corresponding process in the collecting method embodiment.

The embodiment of the application also discloses computer equipment.

A computer device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor performing a domain name collection method as described above.

The embodiment of the application also discloses a computer readable storage medium.

A computer readable storage medium includes a computer program stored with instructions capable of being loaded by a processor and performing any of the domain name collection methods described above.

In several embodiments provided by the present application, it should be understood that the provided methods and apparatus may be implemented in other ways. For example, the device embodiments described above are merely illustrative; for example, the division of a module is merely a logical function division, and there may be another division manner when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted or not performed.

The foregoing description of the preferred embodiments of the application is not intended to limit the scope of the application in any way, including the abstract and drawings, in which case any feature disclosed in this specification (including abstract and drawings) may be replaced by alternative features serving the same, equivalent purpose, unless expressly stated otherwise. That is, each feature is one example only of a generic series of equivalent or similar features, unless expressly stated otherwise.

Claims

1. A method for collecting domain names, comprising:

receiving initial domain name information in real time;

2. The method for collecting domain names according to claim 1, wherein after storing the page response information in the data storage area, the method further comprises:

judging whether the page response information contains page domain name information, if so, setting the page domain name information as the initial domain name information.

3. The method for collecting domain names according to claim 1, wherein the step of binding the related domain name with the IP address, and storing the binding relationship between the related domain name and the IP address in the data storage area specifically comprises:

judging whether an open port exists in the IP address; if yes, binding the related domain name with the port of the IP address, and storing the binding relation between the related domain name and the port of the IP address into the data storage area; if not, binding the related domain name with the IP address, and storing the binding relation between the related domain name and the IP address into the data storage area.

4. A method for collecting domain names according to claim 3, wherein after the port binding of the related domain name and the IP address and storing the binding relationship between the related domain name and the IP address port in the data storage area, the method further comprises:

acquiring protocol response information of the IP address port;

judging whether the protocol domain name information exists in the protocol response information, and if so, setting the protocol domain name information as the initial domain name information.

5. The method for collecting domain names according to claim 1, wherein the expanding the initial domain name information to obtain one or more related domain names specifically includes:

6. The method for collecting domain names according to claim 1, wherein after receiving the initial domain name information in real time, the method further comprises:

judging whether the initial domain name information is valid or not, and discarding the initial domain name data if not.

7. The method for collecting domain names according to claim 1, wherein the expanding the initial domain name information to obtain one or more related domain names further comprises:

8. A domain name collection device, characterized in that: the system comprises a data receiving unit, a domain name expanding unit, a domain name processing unit and a storage unit;

the data receiving unit (1) is used for receiving initial domain name information in real time;

the domain name expansion unit (2) is used for expanding the initial domain name information to obtain one or more related domain names;

the domain name processing unit (3) is used for judging whether the web pages corresponding to the related domain names are open web pages one by one, and if the web pages are open web pages, acquiring page response information of the open web pages; if the web page is not opened, resolving the related domain name to obtain an IP address, and binding the related domain name and the IP address;

the storage unit (4) is used for storing page response information, and the bound related domain name and IP address.

9. A computer device, characterized by: comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor performing a domain name collection method according to any of claims 1-7.

10. A computer-readable storage medium, characterized by: comprising a computer program stored with instructions capable of being loaded by a processor and executing the method according to any of claims 1 to 7.