CN118138280A - Website cloning method, device, electronic equipment and storage medium - Google Patents

Website cloning method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN118138280A
CN118138280A CN202410103154.7A CN202410103154A CN118138280A CN 118138280 A CN118138280 A CN 118138280A CN 202410103154 A CN202410103154 A CN 202410103154A CN 118138280 A CN118138280 A CN 118138280A
Authority
CN
China
Prior art keywords
cloning
resource
website
analysis information
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410103154.7A
Other languages
Chinese (zh)
Inventor
孙兆兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qax Technology Group Inc
Secworld Information Technology Beijing Co Ltd
Original Assignee
Qax Technology Group Inc
Secworld Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qax Technology Group Inc, Secworld Information Technology Beijing Co Ltd filed Critical Qax Technology Group Inc
Priority to CN202410103154.7A priority Critical patent/CN118138280A/en
Publication of CN118138280A publication Critical patent/CN118138280A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a website cloning method, a device, electronic equipment and a storage medium, and relates to the technical field of Internet, wherein the method comprises the following steps: acquiring network traffic between terminal equipment and a server; the network flow comprises at least one request message for a target website, which is sent to the server by the terminal equipment through the browser, and response messages corresponding to the request messages sent to the browser by the server; analyzing the request information aiming at each request information to obtain first analysis information corresponding to the request information; analyzing response information corresponding to the request information to obtain second analysis information; and cloning the target website based on the first analysis information and the second analysis information to obtain a cloned website. The invention obtains the network flow of the target website, and clones the target website by analyzing the network flow, and can obtain all the network flow of the target website, thereby ensuring that the cloned website is more complete.

Description

Website cloning method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of internet technologies, and in particular, to a method and apparatus for cloning a website, an electronic device, and a storage medium.
Background
Website cloning refers to cloning a website similar to the original website by copying and reconstructing the structure and content of the existing website, the cloned website generally having a similar appearance, layout and function as the original website.
In the related art, websites are usually cloned by an active scanning mode, and resources and paths of target websites need to be traversed and scanned one by one.
However, in the related art, for the website requiring authorization of the user, all the resources of the website requiring authorization cannot be obtained by the active scanning method, so that the cloned website is incomplete.
Disclosure of Invention
Aiming at the problems existing in the prior art, the embodiment of the invention provides a website cloning method, a website cloning device, electronic equipment and a storage medium.
The invention provides a website cloning method, which comprises the following steps:
Acquiring network traffic between terminal equipment and a server; the network traffic comprises at least one request message for a target website, which is sent to the server by the terminal equipment through a browser, and response messages corresponding to the request messages, which are sent to the browser by the server;
Analyzing the request information aiming at each piece of request information to obtain first analysis information corresponding to the request information;
Analyzing response information corresponding to the request information to obtain second analysis information;
And cloning the target website based on the first analysis information and the second analysis information to obtain a cloned website.
According to the website cloning method provided by the invention, the target website is cloned based on the first analysis information and the second analysis information to obtain a cloned website, and the method comprises the following steps:
determining at least one resource type of the resources included in each piece of second analysis information;
Cloning the resources corresponding to the resource types included in the resources based on the first analysis information corresponding to the second analysis information to obtain cloning information corresponding to the resource types;
and determining the cloning website based on the cloning information corresponding to each piece of second analysis information.
According to the website cloning method provided by the invention, the cloning of the resources corresponding to the resource types included in the resources based on the first analysis information corresponding to the second analysis information to obtain the cloning information corresponding to the resource types includes:
Determining the link type of a first Uniform Resource Locator (URL) of a resource corresponding to the hypertext resource type under the condition that the resource type is the hypertext resource type;
Under the condition that the link type is an in-station link type, replacing a domain name in a first URL with a domain name of a clone server, and replacing a path in the first URL with a storage path of a resource corresponding to the hypertext resource type in the clone server to obtain a target URL;
associating the target URL with a resource corresponding to the hypertext resource type;
Constructing a catalog based on the second URL included in the first analysis information corresponding to the second analysis information, and storing resources corresponding to the hypertext resource type associated with the target URL based on the catalog to obtain clone information corresponding to the hypertext resource type.
According to the website cloning method provided by the invention, the method further comprises the following steps:
Under the condition that the link type is an off-site link type, associating a first URL of a resource corresponding to the hypertext resource type with the resource corresponding to the hypertext resource type;
constructing a catalog based on a second URL included in the first analysis information corresponding to the second analysis information, and storing resources corresponding to the hypertext resource type based on the catalog to obtain clone information corresponding to the hypertext resource type.
According to the website cloning method provided by the invention, the cloning of the resources corresponding to the resource types included in the resources based on the first analysis information corresponding to the second analysis information to obtain the cloning information corresponding to the resource types includes:
And under the condition that the resource type is a static file resource type, constructing a catalog based on a second URL (uniform resource locator) included in the first analysis information corresponding to the second analysis information, and storing the resource corresponding to the static file resource type based on the catalog to obtain clone information corresponding to the static file resource type.
According to the website cloning method provided by the invention, the cloning of the resources corresponding to the resource types included in the resources based on the first analysis information corresponding to the second analysis information to obtain the cloning information corresponding to the resource types includes:
Storing the second analysis information and the first analysis information corresponding to the second analysis information into a database under the condition that the resource type is a dynamic file resource type, so as to obtain clone information corresponding to the dynamic file resource type; and the second analysis information comprises resources corresponding to the dynamic file resource types.
According to the method for cloning the website provided by the invention, the method for obtaining the network traffic between the terminal equipment and the server comprises the following steps:
Receiving the network traffic sent by a proxy server; the proxy server is arranged between the terminal equipment and the server and is used for intercepting network traffic matched with the preset website domain name of the target website between the terminal equipment and the server.
The invention also provides a website cloning device, which comprises:
An acquisition unit for acquiring network traffic between the terminal device and the server; the network traffic comprises at least one request message for a target website, which is sent to the server by the terminal equipment through a browser, and response messages corresponding to the request messages, which are sent to the browser by the server;
the first analysis unit is used for analyzing the request information aiming at each piece of request information to obtain first analysis information corresponding to the request information;
the second analysis unit is used for analyzing the response information corresponding to the request information to obtain second analysis information;
And the cloning unit is used for cloning the target website based on the first analysis information and the second analysis information to obtain a cloned website.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the website cloning method as described in any one of the above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a website cloning method as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements a method of cloning a website as described in any of the above.
According to the website cloning method, the device, the electronic equipment and the storage medium, network flow between the terminal equipment and the server, which comprises the request information and the response information corresponding to the request information, is obtained, analysis is carried out on the request information and the response information, and the target website is cloned based on the first analysis information and the second analysis information obtained through analysis, so that the target website is obtained. It can be seen that the invention obtains the network traffic of the target website, and clones the target website by analyzing the network traffic, and can obtain all the network traffic of the target website, thereby ensuring that the cloned website is more complete.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a website cloning method according to an embodiment of the present invention;
FIG. 2 is a second flowchart of a website cloning method according to an embodiment of the present invention;
FIG. 3 is a third flowchart of a website cloning method according to an embodiment of the present invention;
FIG. 4 is a flowchart of a website cloning method according to an embodiment of the present invention;
FIG. 5 is a flowchart of a website cloning method according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a website cloning apparatus according to an embodiment of the present invention;
Fig. 7 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The web site cloning method of the present invention is described below with reference to fig. 1 to 5. The main execution body of the website cloning method may be an electronic device such as a cloning server, or may be a website cloning device provided in the electronic device, and the website cloning device may be implemented by software, hardware, or a combination of both.
Fig. 1 is a schematic flow chart of a website cloning method according to an embodiment of the present invention, as shown in fig. 1, the website cloning method includes the following steps:
Step 101, obtaining network traffic between terminal equipment and a server; the network traffic comprises at least one request message for a target website, which is sent to the server by the terminal equipment through a browser, and response messages corresponding to the request messages, which are sent to the browser by the server.
For example, a clone server may be provided as a proxy server between the terminal device and the server, and network traffic between the terminal device and the server may be intercepted by the clone server, so that the clone server obtains the network traffic between the terminal device and the server. The network flow comprises at least one request message for a target website, which is sent to the server by the terminal equipment through the browser, and response messages corresponding to the request messages sent to the browser by the server. For example, the target website is a personal website of a CSDN website, and blog.csdn.net is a website of a public blog, wherein an article (i.e. only visible) in the personal blog can only be accessed after logging in an account, and the resources of the personal website can not be acquired by using an active scanning mode. Setting a website domain name of a target website to be cloned on a cloning server, waiting for access of the target website of a browser, intercepting network traffic matched with the website domain name of the target website through the cloning server, for example, a user accesses the target website 'blog.csdn.net' and logs in to a personal account, in the process, the cloning server intercepts all communication between the browser and the server so as to obtain all network traffic matched with the website domain name of the target website, and the method of intercepting the network traffic through the cloning server can be called a passive scanning method, and can intercept all network traffic through the proxy server for websites which do not need to be authorized by the user or websites which need to be accessed under other specific conditions, for example, websites which need to be loaded by using dynamic scripts, so that cloning of website resources can be realized.
Step 102, analyzing the request information aiming at each piece of request information to obtain first analysis information corresponding to the request information.
The request information may be hypertext transfer protocol (Hypertext Transfer Protocol, HTTP) request information, or the like.
For example, when the network traffic between the terminal device and the server is obtained, each piece of request information in the network traffic is parsed to obtain first parsed information included in each piece of request information, where the first parsed information may include information such as a uniform resource locator (Uniform Resource Locator, URL), a request parameter, a Cookie, and a request header.
And 103, analyzing the response information corresponding to the request information to obtain second analysis information.
For example, when the network traffic between the terminal device and the server is obtained, each piece of response information in the network traffic is parsed to obtain second parsed information included in each piece of response information, where the second parsed information may include a status code and a response body, and the response body includes a resource, for example, the resource may be a hypertext markup language (Hyper Text Markup Language, HTML) resource, an image resource, a cascading style sheet (CASCADING STYLE SHEETS, CSS) file resource, or a JavaScript script resource, where the JavaScript script resource is a dynamic script resource, and the dynamic script resource loading is a website technology, and by using the JavaScript resource, the content is dynamically loaded and displayed in the web page loading process.
And 104, cloning the target website based on the first analysis information and the second analysis information to obtain a cloned website.
For example, when each piece of first analysis information and each piece of second analysis information corresponding to the target website are obtained, each piece of second analysis information corresponds to one webpage, so that the first analysis information and the second analysis information corresponding to the webpage are processed for each webpage corresponding to the target website, cloning of the webpage is achieved, cloning of the webpage corresponding to the next piece of second analysis information is continued, circulation is conducted, cloning of the target website is achieved finally, and a cloned website is obtained, wherein the cloned website has the same appearance, layout and functions as the target website.
According to the website cloning method provided by the invention, network flow between the terminal equipment and the server, which comprises each request information and response information corresponding to each request information, is obtained, each request information and each response information are analyzed, and the target website is cloned based on each first analysis information and each second analysis information obtained through analysis, so that the target website is obtained. It can be seen that the invention obtains the network traffic of the target website, and clones the target website by analyzing the network traffic, and can obtain all the network traffic of the target website, thereby ensuring that the cloned website is more complete.
Fig. 2 is a second flow chart of a website cloning method according to an embodiment of the present invention, as shown in fig. 2, in step 104, the target website is cloned based on each piece of the first analysis information and each piece of the second analysis information, so as to obtain a cloned website, which may be implemented specifically by the following steps:
Step 1041, determining at least one resource type of the resources included in each piece of the second parsing information.
For each piece of second analysis information, the second analysis information includes resources and resource types corresponding to the resources, and the resources included in the second analysis information include at least one of the following: hypertext resources, static file resources, and dynamic file resources, static file resources refer to file resources pre-written on a server, typically without involving complex logic processing or dynamic generation of data. The URL of a static file resource is typically fixed and the same for all users. Dynamic file resources refer to file resources that are dynamically generated based on user requests or server data. The URLs for dynamic file resources are typically dynamically generated, with each URL that a user may obtain being different.
Wherein the resource type corresponding to the hypertext resource is a hypertext resource type, for example, HTML text; the resource type corresponding to the static file resource is a static file resource type, for example: CSS files for defining web page styles, javaScript files for realizing the interactive effect of web pages, and picture files, such as logo. Png or background. Jpg; the resource type corresponding to the dynamic file resource is a dynamic file resource type, for example, after a user logs in, the server generates a page containing user personalized information according to the user information; for another example, the server dynamically generates a search results page based on the query conditions entered by the user; for another example, a news page or a merchandise list page, etc., is dynamically generated based on the latest data in the database. When the second resolution information is obtained, at least one resource type of the resources included in the second resolution information may be obtained.
Step 1042, cloning the resources corresponding to the resource types included in the resources based on the first analysis information corresponding to the second analysis information, so as to obtain cloning information corresponding to the resource types.
For example, the resource corresponding to each resource type may be cloned based on the URL in the first analysis information corresponding to the second analysis information, so as to obtain the clone information corresponding to each resource type.
Step 1043, determining the cloning website based on the cloning information corresponding to each piece of the second analysis information.
When the clone information corresponding to each piece of second analysis information is obtained, the clone information corresponding to each piece of second analysis information is stored in the clone server to obtain the clone website, so that when a user accesses the specific content of the clone website, the user can directly access the clone server, the clone server sends the corresponding information as a response to the terminal equipment, and the terminal equipment renders and displays the received response.
In this embodiment, the cloning information corresponding to each resource type is obtained by cloning the resources corresponding to each resource type based on the first analysis information corresponding to the second analysis information, so as to obtain the cloning website, and realize rapid cloning of the cloning website.
In an embodiment, fig. 3 is a third flowchart of the website cloning method according to the embodiment of the present invention, as shown in fig. 3, where step 1042 clones the resources corresponding to the resource types included in the resources based on the first analysis information corresponding to the second analysis information to obtain clone information corresponding to the resource types, and specifically may be implemented by the following steps:
Step 10421, determining a link type of a first uniform resource locator URL of a resource corresponding to the hypertext resource type if the resource type is the hypertext resource type.
For example, in the case that the resource type is a hypertext resource type, determining the link types of all the first URLs of the resources corresponding to the hypertext resource type, where the specific link types may be determined based on the domain name in the first URLs, and if the domain name is the website domain name of the target website, determining that the link type of the first URL is an in-station link type; if the domain name is not the website domain name of the target website, determining that the link type of the first URL is an off-site link type; in-site links refer to links between different pages within the same web site, typically using relative paths to represent link targets; off-site links refer to links that jump from one website to another website or external resource, typically using absolute paths to represent link targets.
Step 10422, under the condition that the link type is an in-station link type, replacing the domain name in the first URL with the domain name of the clone server, and replacing the path in the first URL with the storage path of the resource corresponding to the hypertext resource type in the clone server, thereby obtaining the target URL.
For example, in the case that the link type is the in-station link type, it is necessary to replace the domain name in the first URL with the domain name of the clone server, and replace the path in the first URL with the storage path of the resource corresponding to the hypertext resource type in the clone server, to obtain the modified target URL. For example, when the user accesses the resource of the server 1 through the browser and needs to clone the resource of the server 1 in the clone server, the domain name in the first URL is the domain name of the server 1, the domain name of the server 1 in the first URL needs to be replaced by the domain name of the clone server, and the storage path of the resource corresponding to the hypertext resource type in the first URL in the server 1 is replaced by the storage path of the resource corresponding to the hypertext resource type in the clone server.
And 10423, associating the target URL with the resource corresponding to the hypertext resource type.
Illustratively, when the modified target URL is obtained, the target URL is associated with the resource corresponding to the hypertext resource type, so that the resource corresponding to the hypertext resource type can be conveniently obtained based on the target URL.
Step 10424, constructing a directory based on the second URL included in the first resolution information corresponding to the second resolution information, and storing a resource corresponding to the hypertext resource type associated with the target URL based on the directory, to obtain clone information corresponding to the hypertext resource type.
For example, the first resolution information corresponding to the second resolution information includes a second URL, and the cloning server constructs a directory based on a path in the second URL, and stores a resource corresponding to a hypertext resource type associated with the target URL in the directory, to obtain cloning information corresponding to the hypertext resource type. Thus, when the clone server receives the resource request for the path in the second URL sent by the user through the browser, the clone server can acquire the resource corresponding to the hypertext resource type based on the catalog, and send the acquired resource as a response to the terminal equipment, and the terminal equipment renders the received response and displays the response.
In this embodiment, when the resource type is a hypertext resource type and the link type of the first URL of the resource corresponding to the hypertext resource type is an in-station link type, the domain name and the path in the first URL are modified to obtain a target URL, the target URL is associated with the resource corresponding to the hypertext resource type, and the resource corresponding to the hypertext resource type associated with the target URL is stored in a directory constructed based on the path in the second URL, so that cloning of the resource corresponding to the hypertext resource type is achieved.
In one embodiment, the step 1042 further comprises the steps of:
Under the condition that the link type is an off-site link type, associating a first URL of a resource corresponding to the hypertext resource type with the resource corresponding to the hypertext resource type; constructing a catalog based on a second URL included in the first analysis information corresponding to the second analysis information, and storing resources corresponding to the hypertext resource type based on the catalog to obtain clone information corresponding to the hypertext resource type.
For example, in the case that the link type is the off-site link type, since the resource corresponding to the off-site link is usually a shared resource and is not the content of the website itself, when the user clicks the off-site link, the browser jumps to a new website, and this process has no direct relation with the target website, so the off-site link does not affect the use of the target website, so in order to reduce the complexity of cloning the website, it is unnecessary to modify the first URL of the off-site link type, at this time, the first URL of the resource corresponding to the hypertext resource type is directly associated with the resource corresponding to the hypertext resource type, and the cloning server constructs a directory based on the path in the second URL included in the first resolution information corresponding to the second resolution information, and stores the resource corresponding to the hypertext resource type associated with the first URL under the directory, so as to obtain the cloning information corresponding to the hypertext resource type.
In this embodiment, when the resource type is a hypertext resource type and the link type of the first URL of the resource corresponding to the hypertext resource type is an off-site link type, the first URL is directly associated with the resource corresponding to the hypertext resource type, and the resource corresponding to the hypertext resource type associated with the first URL is stored in a directory configured based on the path in the second URL, so that cloning of the resource corresponding to the hypertext resource type is achieved.
In an embodiment, fig. 4 is a flowchart of a website cloning method according to an embodiment of the present invention, as shown in fig. 4, where step 1042 clones resources corresponding to each resource type included in the resources based on the first analysis information corresponding to the second analysis information to obtain clone information corresponding to each resource type, and specifically may be implemented by the following steps:
In step 10425, when the resource type is a static file resource type, a directory is constructed based on a second URL included in the first resolution information corresponding to the second resolution information, and a resource corresponding to the static file resource type is stored based on the directory, so as to obtain clone information corresponding to the static file resource type.
In an example, when the resource type is a static file resource type, the cloning server constructs a directory based on a path in the second URL included in the first parsing information corresponding to the second parsing information, and because the resource corresponding to the static file resource type is not changed according to the change of the request parameter, only the resource corresponding to the static file resource type needs to be stored under the directory, and the content in the first parsing information corresponding to the second parsing information does not need to be stored under the directory, thereby realizing cloning of the resource corresponding to the static file resource type and obtaining cloning information corresponding to the static file resource type.
It should be noted that, when the resource type of the resource included in the second analysis information includes both the hypertext resource type and the static file resource type, cloning the resource corresponding to the hypertext resource type is implemented based on the steps 10421 to 10424, and cloning the resource corresponding to the static file resource type is implemented based on the step 10425, so that cloning the resource included in the second analysis information includes cloning information corresponding to the hypertext resource type and cloning information corresponding to the static file resource type; in this way, when the clone server receives the resource request for the path in the second URL sent by the user through the browser, the clone server can obtain the resource corresponding to the hypertext resource type and the resource corresponding to the static file resource type based on the directory at the same time, and send all the obtained resources as the response to the terminal device, and the terminal device renders the received response and displays the response.
In this embodiment, in the case that the resource type is the static file resource type, the resource corresponding to the static file resource type is stored under the directory constructed based on the path in the second URL, so that cloning of the resource corresponding to the static file resource type is achieved, and when the cloning server receives the resource request sent by the user through the browser and aiming at the path in the second URL, the cloning server can acquire the resource corresponding to the static file resource type based on the directory, and send the acquired resource as a response to the terminal device, and the terminal device renders the received response and displays the response.
In an embodiment, fig. 5 is a fifth flowchart of a website cloning method according to the embodiment of the present invention, as shown in fig. 5, where step 1042 clones resources corresponding to each resource type included in the resources based on the first analysis information corresponding to the second analysis information to obtain clone information corresponding to each resource type, and specifically may be implemented by the following steps:
Step 10426, storing the second analysis information and the first analysis information corresponding to the second analysis information into a database to obtain clone information corresponding to the dynamic file resource type when the resource type is the dynamic file resource type; and the second analysis information comprises resources corresponding to the dynamic file resource types.
For example, in the case that the resource type is a dynamic file resource type, since the resource corresponding to the dynamic file resource type changes according to the request parameter, the second analysis information and the first analysis information corresponding to the second analysis information need to be directly stored in the database of the clone server, so as to obtain the clone information corresponding to the dynamic file resource type. Because the first resolution information corresponding to the second resolution information includes the second URL, when the clone server receives a resource request generated based on information other than the domain name in the second URL, which is sent by the user through the browser, the clone server can acquire a resource corresponding to the dynamic file resource type based on the directory.
When the resource type is a dynamic file resource type, when the first analysis information is stored in the database, the request parameter in the first analysis information, other information except the domain name in the second URL, a cookie and the like can be stored in the database, so that when the clone server receives a resource request generated based on the other information except the domain name in the second URL and sent by the user through the browser, the clone server can match the information in the resource request with the request parameter stored in the database, the other information except the domain name in the second URL, the cookie and the like, and when the matching is successful, the resource corresponding to the dynamic file resource type is returned to the browser.
It should be noted that, in the case that the resources included in the second parsing information are other types of resources except the resources corresponding to the hypertext resource type, the resources corresponding to the static file resource type, and the resources corresponding to the dynamic file resource type, the present invention does not perform any processing.
It should be noted that, when the resource types of the resources included in the second analysis information include the hypertext resource type, the static file resource type and the dynamic file resource type, cloning the resources corresponding to the hypertext resource type is implemented based on the steps 10421 to 10424, cloning the resources corresponding to the static file resource type is implemented based on the step 10425, and cloning the resources corresponding to the dynamic file resource type is implemented based on the step 10426, so that cloning the resources included in the second analysis information includes cloning information corresponding to the hypertext resource type, cloning information corresponding to the static file resource type and cloning information corresponding to the dynamic file resource type; when receiving the resource request for the path in the second URL sent by the user through the browser, the clone server can obtain the resource corresponding to the hypertext resource type, the resource corresponding to the static file resource type and the resource corresponding to the dynamic file resource type based on the directory at the same time, and send all the obtained resources as the response to the terminal device, and the terminal device renders the received response and displays the response.
It should be noted that, when the resource types are the hypertext resource type and the static file resource type, the resources corresponding to the hypertext resource type and the static file resource type need to be stored based on the directory, and when the dynamic file resource type needs to be stored based on the database, because the resources corresponding to the hypertext resource type and the static file resource type are relatively stable, the attribute is fixed, the resources corresponding to the URL one to one, the resources corresponding to the dynamic file resource type change according to different parameters and cannot correspond to the URL one to one, and therefore, a flexible storage mode of the database needs to be adopted.
In this embodiment, when the resource type is a dynamic file resource type, the second analysis information and the first analysis information corresponding to the second analysis information are directly stored in the database of the cloning server, so that cloning of the resource corresponding to the dynamic file resource type is realized, and when the cloning server receives a resource request sent by a user through a browser and aiming at a path in the second URL, the cloning server can acquire the resource corresponding to the dynamic file resource type based on the catalog, and send the acquired resource as a response to the terminal device, and the terminal device renders the received response and then displays the response.
In an embodiment, the step 101 of obtaining the network traffic between the terminal device and the server may be specifically implemented by the following manner:
Receiving the network traffic sent by a proxy server; the proxy server is arranged between the terminal equipment and the server and is used for intercepting network traffic matched with the preset website domain name of the target website between the terminal equipment and the server.
For example, a proxy server may be set between the terminal device and the server, that is, a proxy of a browser of the terminal device is set as the proxy server of the present invention, and the proxy server is in communication connection with the clone server, and intercepts, by the proxy server, network traffic between the terminal device and the server, which matches with a website domain name of a target website configured in advance, and the proxy server sends the network traffic intercepted between the terminal device and the server to the clone server, so that the clone server obtains the network traffic between the terminal device and the server.
In this embodiment, network traffic between the terminal device and the server is intercepted by the proxy server, so that a passive scanning mode of cloning a website is implemented, and the passive scanning mode captures resources of a target website by intercepting and analyzing the network traffic. The method does not need to directly access the target website, but relies on intercepting the network traffic between the terminal equipment and the server to acquire the content of the target website, and the proxy server can acquire all the network traffic of the target website, so that the cloned website is more complete.
The website cloning device provided by the invention is described below, and the website cloning device described below and the website cloning method described above can be referred to correspondingly.
Fig. 6 is a schematic structural diagram of a website cloning apparatus according to an embodiment of the present invention, as shown in fig. 6, the website cloning apparatus 600 includes an obtaining unit 601, a first analyzing unit 602, a second analyzing unit 603, and a cloning unit 604; wherein:
An obtaining unit 601, configured to obtain network traffic between a terminal device and a server; the network traffic comprises at least one request message for a target website, which is sent to the server by the terminal equipment through a browser, and response messages corresponding to the request messages, which are sent to the browser by the server;
a first parsing unit 602, configured to parse the request information for each request information, to obtain first parsed information corresponding to the request information;
a second parsing unit 603, configured to parse the response information corresponding to the request information to obtain second parsed information;
and a cloning unit 604, configured to clone the target website based on each piece of the first analysis information and each piece of the second analysis information, so as to obtain a cloned website.
The website cloning device provided by the invention obtains the network flow between the terminal equipment and the server, wherein the network flow comprises the request information and the response information corresponding to the request information, analyzes the request information and the response information, and clones the target website based on the first analysis information and the second analysis information obtained by analysis to obtain the target website. It can be seen that the invention obtains the network traffic of the target website, and clones the target website by analyzing the network traffic, and can obtain all the network traffic of the target website, thereby ensuring that the cloned website is more complete.
Based on any of the above embodiments, the cloning unit 604 is specifically configured to:
determining at least one resource type of the resources included in each piece of second analysis information;
Cloning the resources corresponding to the resource types included in the resources based on the first analysis information corresponding to the second analysis information to obtain cloning information corresponding to the resource types;
and determining the cloning website based on the cloning information corresponding to each piece of second analysis information.
Based on any of the above embodiments, the cloning unit 604 is further specifically configured to:
Determining the link type of a first Uniform Resource Locator (URL) of a resource corresponding to the hypertext resource type under the condition that the resource type is the hypertext resource type;
Under the condition that the link type is an in-station link type, replacing a domain name in a first URL with a domain name of a clone server, and replacing a path in the first URL with a storage path of a resource corresponding to the hypertext resource type in the clone server to obtain a target URL;
associating the target URL with a resource corresponding to the hypertext resource type;
Constructing a catalog based on the second URL included in the first analysis information corresponding to the second analysis information, and storing resources corresponding to the hypertext resource type associated with the target URL based on the catalog to obtain clone information corresponding to the hypertext resource type.
Based on any of the above embodiments, the cloning unit 604 is further specifically configured to:
Under the condition that the link type is an off-site link type, associating a first URL of a resource corresponding to the hypertext resource type with the resource corresponding to the hypertext resource type;
constructing a catalog based on a second URL included in the first analysis information corresponding to the second analysis information, and storing resources corresponding to the hypertext resource type based on the catalog to obtain clone information corresponding to the hypertext resource type.
Based on any of the above embodiments, the cloning unit 604 is further specifically configured to:
And under the condition that the resource type is a static file resource type, constructing a catalog based on a second URL (uniform resource locator) included in the first analysis information corresponding to the second analysis information, and storing the resource corresponding to the static file resource type based on the catalog to obtain clone information corresponding to the static file resource type.
Based on any of the above embodiments, the cloning unit 604 is further specifically configured to:
Storing the second analysis information and the first analysis information corresponding to the second analysis information into a database under the condition that the resource type is a dynamic file resource type, so as to obtain clone information corresponding to the dynamic file resource type; and the second analysis information comprises resources corresponding to the dynamic file resource types.
Based on any of the above embodiments, the obtaining unit 601 is specifically configured to:
Receiving the network traffic sent by a proxy server; the proxy server is arranged between the terminal equipment and the server and is used for intercepting network traffic matched with the preset website domain name of the target website between the terminal equipment and the server.
Fig. 7 is a schematic physical structure of an electronic device according to an embodiment of the present invention, as shown in fig. 7, where the electronic device may include: processor 710, communication interface (Communications Interface) 720, memory 730, and communication bus 740, wherein processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform a web site cloning method comprising: acquiring network traffic between terminal equipment and a server; the network traffic comprises at least one request message for a target website, which is sent to the server by the terminal equipment through a browser, and response messages corresponding to the request messages, which are sent to the browser by the server;
Analyzing the request information aiming at each piece of request information to obtain first analysis information corresponding to the request information;
Analyzing response information corresponding to the request information to obtain second analysis information;
And cloning the target website based on the first analysis information and the second analysis information to obtain a cloned website.
Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing the method of cloning a website provided by the methods described above, the method comprising: acquiring network traffic between terminal equipment and a server; the network traffic comprises at least one request message for a target website, which is sent to the server by the terminal equipment through a browser, and response messages corresponding to the request messages, which are sent to the browser by the server;
Analyzing the request information aiming at each piece of request information to obtain first analysis information corresponding to the request information;
Analyzing response information corresponding to the request information to obtain second analysis information;
And cloning the target website based on the first analysis information and the second analysis information to obtain a cloned website.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the website cloning method provided by the above methods, the method comprising: acquiring network traffic between terminal equipment and a server; the network traffic comprises at least one request message for a target website, which is sent to the server by the terminal equipment through a browser, and response messages corresponding to the request messages, which are sent to the browser by the server;
Analyzing the request information aiming at each piece of request information to obtain first analysis information corresponding to the request information;
Analyzing response information corresponding to the request information to obtain second analysis information;
And cloning the target website based on the first analysis information and the second analysis information to obtain a cloned website.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (11)

1. A method for cloning a web site, comprising:
Acquiring network traffic between terminal equipment and a server; the network traffic comprises at least one request message for a target website, which is sent to the server by the terminal equipment through a browser, and response messages corresponding to the request messages, which are sent to the browser by the server;
Analyzing the request information aiming at each piece of request information to obtain first analysis information corresponding to the request information;
Analyzing response information corresponding to the request information to obtain second analysis information;
And cloning the target website based on the first analysis information and the second analysis information to obtain a cloned website.
2. The method of cloning a website according to claim 1, wherein cloning the target website based on each of the first analysis information and each of the second analysis information to obtain a cloned website includes:
determining at least one resource type of the resources included in each piece of second analysis information;
Cloning the resources corresponding to the resource types included in the resources based on the first analysis information corresponding to the second analysis information to obtain cloning information corresponding to the resource types;
and determining the cloning website based on the cloning information corresponding to each piece of second analysis information.
3. The website cloning method according to claim 2, wherein cloning the resources corresponding to the resource types included in the resources based on the first analysis information corresponding to the second analysis information to obtain the cloning information corresponding to the resource types includes:
Determining the link type of a first Uniform Resource Locator (URL) of a resource corresponding to the hypertext resource type under the condition that the resource type is the hypertext resource type;
Under the condition that the link type is an in-station link type, replacing a domain name in a first URL with a domain name of a clone server, and replacing a path in the first URL with a storage path of a resource corresponding to the hypertext resource type in the clone server to obtain a target URL;
associating the target URL with a resource corresponding to the hypertext resource type;
Constructing a catalog based on the second URL included in the first analysis information corresponding to the second analysis information, and storing resources corresponding to the hypertext resource type associated with the target URL based on the catalog to obtain clone information corresponding to the hypertext resource type.
4. A method of cloning a web site according to claim 3, wherein the method further comprises:
Under the condition that the link type is an off-site link type, associating a first URL of a resource corresponding to the hypertext resource type with the resource corresponding to the hypertext resource type;
constructing a catalog based on a second URL included in the first analysis information corresponding to the second analysis information, and storing resources corresponding to the hypertext resource type based on the catalog to obtain clone information corresponding to the hypertext resource type.
5. The website cloning method according to claim 2, wherein cloning the resources corresponding to the resource types included in the resources based on the first analysis information corresponding to the second analysis information to obtain the cloning information corresponding to the resource types includes:
And under the condition that the resource type is a static file resource type, constructing a catalog based on a second URL (uniform resource locator) included in the first analysis information corresponding to the second analysis information, and storing the resource corresponding to the static file resource type based on the catalog to obtain clone information corresponding to the static file resource type.
6. The website cloning method according to claim 2, wherein cloning the resources corresponding to the resource types included in the resources based on the first analysis information corresponding to the second analysis information to obtain the cloning information corresponding to the resource types includes:
Storing the second analysis information and the first analysis information corresponding to the second analysis information into a database under the condition that the resource type is a dynamic file resource type, so as to obtain clone information corresponding to the dynamic file resource type; and the second analysis information comprises resources corresponding to the dynamic file resource types.
7. The method for cloning a website according to any one of claims 1 to 6, wherein the obtaining network traffic between the terminal device and the server comprises:
Receiving the network traffic sent by a proxy server; the proxy server is arranged between the terminal equipment and the server and is used for intercepting network traffic matched with the preset website domain name of the target website between the terminal equipment and the server.
8. A web site cloning apparatus, comprising:
An acquisition unit for acquiring network traffic between the terminal device and the server; the network traffic comprises at least one request message for a target website, which is sent to the server by the terminal equipment through a browser, and response messages corresponding to the request messages, which are sent to the browser by the server;
the first analysis unit is used for analyzing the request information aiming at each piece of request information to obtain first analysis information corresponding to the request information;
the second analysis unit is used for analyzing the response information corresponding to the request information to obtain second analysis information;
And the cloning unit is used for cloning the target website based on the first analysis information and the second analysis information to obtain a cloned website.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the website cloning method of any one of claims 1 to 7 when the program is executed by the processor.
10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the website cloning method of any one of claims 1 to 7.
11. A computer program product comprising a computer program which, when executed by a processor, implements the website cloning method of any one of claims 1 to 7.
CN202410103154.7A 2024-01-24 2024-01-24 Website cloning method, device, electronic equipment and storage medium Pending CN118138280A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410103154.7A CN118138280A (en) 2024-01-24 2024-01-24 Website cloning method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410103154.7A CN118138280A (en) 2024-01-24 2024-01-24 Website cloning method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN118138280A true CN118138280A (en) 2024-06-04

Family

ID=91244794

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410103154.7A Pending CN118138280A (en) 2024-01-24 2024-01-24 Website cloning method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN118138280A (en)

Similar Documents

Publication Publication Date Title
US10911554B2 (en) Method and system for tracking web link usage
US8527504B1 (en) Data network content filtering using categorized filtering parameters
US10015226B2 (en) Methods for making AJAX web applications bookmarkable and crawlable and devices thereof
US20140325337A1 (en) Content request with http request-header rendering template that is independent of content storage location
US8234627B2 (en) System and method for expediting information display
US20090037517A1 (en) Method and system to share content between web clients
US20100082747A1 (en) Real-time collaborative browsing
US20090327421A1 (en) Cross domain interaction of a Web application
US9081867B2 (en) System and method to transform results of client requests using client uploaded presentation formats
US20170078343A1 (en) Information Sharing Method and Device
US20080147875A1 (en) System, method and program for minimizing amount of data transfer across a network
CN112637361B (en) Page proxy method, device, electronic equipment and storage medium
CN111339456B (en) Preloading method and device
CN108737252B (en) Information pushing method and device based on block chain
CN110674435A (en) Page access method, server, terminal, electronic equipment and readable storage medium
CN110704777A (en) Method and system for implementing gray scale publishing
JP2008123234A (en) Translation server device, translation system, translation method, and translation program
CN118138280A (en) Website cloning method, device, electronic equipment and storage medium
CN115758016A (en) Webpage content staticizing processing method and system
US11716405B1 (en) System and method for identifying cache miss in backend application
CN114186148A (en) Page loading method and device, electronic equipment and storage medium
JP5154716B1 (en) Information processing apparatus, method, and program
CN111783006A (en) Page generation method and device, electronic equipment and computer readable medium
CN114996621B (en) Method, system and storage medium for user to self-select portal home page
US20210357465A1 (en) Method and System for High Speed Serving of Webpages

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination