CN106874300B - Webpage identification method and device and setting rate determination method and device - Google Patents

Webpage identification method and device and setting rate determination method and device Download PDF

Info

Publication number
CN106874300B
CN106874300B CN201510924044.8A CN201510924044A CN106874300B CN 106874300 B CN106874300 B CN 106874300B CN 201510924044 A CN201510924044 A CN 201510924044A CN 106874300 B CN106874300 B CN 106874300B
Authority
CN
China
Prior art keywords
target
webpage
home page
target website
access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510924044.8A
Other languages
Chinese (zh)
Other versions
CN106874300A (en
Inventor
李新国
冯鸳鹤
吴茜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201510924044.8A priority Critical patent/CN106874300B/en
Publication of CN106874300A publication Critical patent/CN106874300A/en
Application granted granted Critical
Publication of CN106874300B publication Critical patent/CN106874300B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Abstract

The application discloses a webpage identification method and device and a setting rate determination method and device. Wherein, the method comprises the following steps: inquiring a target access log from the access log of the target website, wherein the target access log is a log of a home page of the access target website; analyzing the target access log, and extracting an access source of a home page of an access target website; judging whether the webpage of the access source is other than a home page in the target website or not; and when the webpage of the access source is other webpages except the home page in the target website, determining that the home page returning link is arranged on the webpage of the access source. The method and the device for determining the webpage link return efficiency solve the technical problems that whether the webpage link return efficiency is low and the workload is large in the prior art by manually determining.

Description

Webpage identification method and device and setting rate determination method and device
Technical Field
The application relates to the field of internet, in particular to a webpage identification method and device and a setting rate determination method and device.
Background
In the internet field, in order to improve the navigation effect of a website, a home page return link is usually set on a webpage in the website, and a user can directly return to the home page of the website through the link in the process of browsing the website. The website return home page link setting rate refers to the ratio of the webpage of the return home page link to all the webpages in the website. The index of the returned home page link setting rate reflects the user experience condition of the website to a great extent and is also an important index for evaluating the performance of the website.
Currently, it is usually determined whether a link for returning to the home page is set on the web page by means of manual search. Specifically, the website is accessed through manual login, whether a return home page link is set on a webpage of the website is checked, and then the number of the webpages in which the return home page link is set is counted to calculate the setting rate of the return home page link. However, it is inefficient and labor intensive to determine whether a return home link is set on a web page manually.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the application provides a webpage identification method and device and a setting rate determination method and device, and aims to at least solve the technical problems that in the prior art, whether a homepage link is set on a webpage or not is determined in a manual mode, the efficiency is low, and the workload is large.
According to an aspect of an embodiment of the present application, there is provided a web page identification method, including: inquiring a target access log from an access log of a target website, wherein the target access log is a log of a home page of the target website; analyzing the target access log, and extracting an access source of a home page for accessing the target website; judging whether the webpage of the access source is other webpages in the target website except the home page or not; and when the webpage of the access source is other webpages in the target website except the home page, determining that a return home page link is arranged on the webpage of the access source, wherein the return home page link is a link which is arranged on the webpage of the target website and is used for jumping back to the home page of the target website.
Further, determining whether the webpage of the access source is another webpage of the target website except for the home page includes: judging whether the domain name contained in the uniform resource locator of the webpage of the access source is the same as the domain name of the target website or not; and when the domain name contained in the uniform resource locator of the webpage of the access source is the same as the domain name of the target website, determining that the webpage of the access source is other webpages except the home page in the target website.
Further, analyzing the target access log, and extracting an access source of a home page accessing the target website includes: and analyzing a target field from the target access log, wherein the target field is a field recorded with a uniform resource locator of a previous-hop webpage.
Further, the querying the target access log from the access log of the target website includes: matching the uniform resource locator corresponding to the home page of the target website with the access log of the target website; and taking the access log which is matched from the access log of the target website and contains the uniform resource locator corresponding to the home page of the target website as the target access log.
According to another aspect of the embodiments of the present application, there is also provided a setting rate determining method, including: analyzing an access log of a target website, and counting the total number of accessed webpages in the target website; identifying a target webpage, and counting the number of the target webpage, wherein the target webpage is a webpage provided with a return home page link; and calculating the setting rate of the returned home page links on the target website according to the total number of the webpages and the number of the target webpages.
According to another aspect of the embodiments of the present application, there is also provided a web page identification apparatus, including: the query unit is used for querying a target access log from the access logs of a target website, wherein the target access log is a log of a home page for accessing the target website; the extraction unit is used for analyzing the target access log and extracting an access source of a home page accessing the target website; the judging unit is used for judging whether the webpage of the access source is other than the home page in the target website; and a determining unit, configured to determine that a return home page link is set on the webpage of the access source when the webpage of the access source is another webpage of the target website except for the home page, where the return home page link is a link that is set on the webpage of the target website and is used for jumping back to the home page of the target website.
Further, the judging unit includes: the judging module is used for judging whether the domain name contained in the uniform resource locator of the webpage of the access source is the same as the domain name of the target website or not; a first determining module, configured to determine that the webpage of the access source is a webpage other than the home page in the target website when a domain name included in the uniform resource locator of the webpage of the access source is the same as the domain name of the target website.
Further, the extracting unit is specifically configured to parse a target field from the target access log, where the target field is a field recorded with a uniform resource locator of a previous-hop web page.
Further, the query unit includes: the matching module is used for matching the uniform resource locator corresponding to the home page of the target website with the access log of the target website; and the second determining module is used for taking the access log which is matched from the access log of the target website and contains the uniform resource locator corresponding to the home page of the target website as the target access log.
According to another aspect of the embodiments of the present application, there is also provided a setting rate determining apparatus, including: the first counting unit is used for analyzing the access log of the target website and counting the total number of accessed webpages in the target website; the second statistical unit is used for identifying target webpages and counting the number of the target webpages, wherein the target webpages are webpages provided with links returning to home pages; and the calculating unit is used for calculating the setting rate of the returned home page link on the target website according to the total number of the webpages and the number of the target webpages.
According to the embodiment of the application, a target access log is inquired from the access log of the target website, wherein the target access log is the log of the first page of the access target website, analyzing the target access log, extracting the access source of the home page of the access target website, judging whether the webpage of the access source is other webpages except the home page in the target website or not, when the webpage of the access source is other than the home page of the target website, determining that a return home page link is arranged on the webpage of the access source, by analyzing the access log, the webpage provided with the return home page link is identified, so that compared with a manual mode in the prior art, the efficiency is improved, the workload is reduced, and the technical problems that whether the return home page link is arranged on the webpage or not is low in efficiency and large in workload in the prior art are solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of a method of web page identification according to an embodiment of the application;
FIG. 2 is a flow chart of a setting rate determination method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a web page recognition apparatus according to an embodiment of the present application;
fig. 4 is a schematic diagram of a setting rate determination apparatus according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In accordance with an embodiment of the present application, there is provided a method embodiment of a web page identification method, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than that herein.
Fig. 1 is a flowchart of a web page identification method according to an embodiment of the present application, as shown in fig. 1, the method includes the following steps:
step S102, a target access log is inquired from the access log of the target website, wherein the target access log is the log of a home page of the access target website.
The target website access log can be obtained by adding Tracker to the target website, wherein the Tracker is essentially a section of JS script and is embedded in the source code of the target website, and the target website access log of the user can be sent to a specified server, and the target website access log records different user access behavior data on the target website, including behavior data searched in the website.
And step S104, analyzing the target access log, and extracting an access source of a home page of the access target website.
In this embodiment, the access logs of the top page of the accessed target website, that is, the target access logs, are queried from all the access logs of the target website, so that the access source of the top page is analyzed from the logs. The access source here refers to a web page source that jumps to the home page through a link, for example, if a user accesses web page a and then jumps to the home page of the target website through web page a, the URL of web page a is recorded in the access log of the home page of the target website to indicate the access source of the home page of the access target website.
Step S106, judging whether the webpage of the access source is other webpages except the home page in the target website.
And S108, when the webpage of the access source is other webpages except the home page in the target website, determining that a return home page link is arranged on the webpage of the access source, wherein the return home page link is a link which is arranged on the webpage of the target website and is used for jumping back to the home page of the target website.
In this embodiment, after determining an access source of a home page, it is determined whether a page of the access source is a webpage of a target website other than the home page, that is, it is determined whether an access source page of a home page that jumps to the target website is a webpage of the target website, and if so, it is determined that a return home page link is set on the webpage of the access source, that is, after accessing a non-home page, a user jumps to the home page through the return home page link on the webpage. Otherwise, the access source is the web page of other website, so the return home page link is not set.
According to the embodiment of the application, a target access log is inquired from the access log of the target website, wherein the target access log is the log of the first page of the access target website, analyzing the target access log, extracting the access source of the home page of the access target website, judging whether the webpage of the access source is other webpages except the home page in the target website or not, when the webpage of the access source is other than the home page of the target website, determining that a return home page link is arranged on the webpage of the access source, by analyzing the access log, the webpage provided with the return home page link is identified, so that compared with a manual mode in the prior art, the efficiency is improved, the workload is reduced, and the technical problems that whether the return home page link is arranged on the webpage or not is low in efficiency and large in workload in the prior art are solved.
Preferably, the determining whether the webpage of the access source is other than the home page in the target website includes: judging whether the domain name contained in the uniform resource locator of the webpage of the access source is the same as the domain name of the target website or not; when the domain name contained in the uniform resource locator of the webpage of the access source is the same as the domain name of the target website, determining that the webpage of the access source is other than the first page of the target website.
Because the Uniform Resource Locator (URL) of each web page includes the domain name of the website where the URL is located, when determining whether the web page of the access source is the web page of the target website, it may be determined whether the domain name included in the URL corresponding to the web page of the access source is the same as the domain name of the target website, and if the domain names are the same, the web page of the access source is considered as the web page of the target website, and in this case, the web page of the access source may be considered as a non-top page web page in the target website; otherwise, the access source web page is considered as the web page of other web sites.
According to the embodiment of the application, whether the webpage is a webpage which is not a first page of the target website is determined by the URL of the webpage of the access source, so that the webpage which jumps from the webpage of other websites to the first page of the target website can be eliminated from the webpage of the access source.
Preferably, analyzing the target access log, and extracting an access source of a home page of the access target website includes: and analyzing a target field from the target access log, wherein the target field is a field recorded with a uniform resource locator of the previous-hop webpage.
In this embodiment, a target field may be set in the access log of the target website, where the target field is used to record a referrurl field that is a URL field of a previous-hop webpage of the webpage. If the user returns to the home page through the return button of the browser, the refereurl recorded in the access log is null, and when the user returns to the home page by clicking the return home page link on the webpage, the access log records the URL of the current webpage in the refereurl field of the home page.
After the target field is extracted, whether the webpage of the access source is a webpage of a non-top page of the target website may be determined by using the URL in the target field, which may be specifically referred to in the foregoing embodiments, and details are not described here.
Preferably, the querying the target access log from the access log of the target website includes: matching a uniform resource locator corresponding to a home page of a target website with an access log of the target website; and taking the access log which is matched from the access log of the target website and contains the uniform resource locator corresponding to the home page of the target website as the target access log.
In this embodiment, after the access log of the target website is obtained, the access log may be analyzed one by one, and the URL corresponding to the home page of the target website is matched with the URL of the access webpage recorded in the access log of the target website, so as to match the access log of the page corresponding to the URL that is the same as the URL corresponding to the home page of the target website, that is, the access log of the home page of the access target website is queried from the access log of the target website with the URL corresponding to the home page of the target website as a query condition.
An alternative implementation of the embodiment of the present application is described below, which specifically includes:
step 1: and deploying Tracker at the target website. After deployment is completed, all access logs of the user at the target website are sent to the server, wherein the access logs can also record data searched in the website.
Step 2: and configuring the home page URL and the domain name D of the target website.
And step 3: and analyzing the access logs collected by the server one by one.
And 4, step 4: and finding an access log for accessing the first page, namely a target access log, from the result of the step 3. Specifically, the target access log may be determined by determining whether the URL of the access page is the same as the URL of the top page in step 2.
And 5: and finding out a log with the domain name of the ReferURL being the same as the domain name D of the target website from the result of the step 4, wherein the ReferURL is the target field in the log and is recorded with the URL of the previous-hop webpage. The log that the domain name of the referrurl is the same as the domain name D of the target website is found, that is, the record that the referrurl is the other website or is empty is deleted, and the referrurl corresponding to the record is the page in the website where the link returning to the home page is set.
Therefore, the total number of visited pages M in the website and the total number N of refererl analyzed in step 5 (as the refererl is removed for other websites or for empty records, the obtained number N of refererl is the number of web pages with links of the return top page) can be counted, and then the setting rate of the links of the return top page of the target website can be calculated by the formula N/M. In addition, if the setting rate of the returned home page link in a certain period of time needs to be counted, the website access log in the period of time only needs to be analyzed according to the steps.
There is also provided, in accordance with an embodiment of the present application, a method embodiment of a method for setting a rate, it being noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
As shown in fig. 2, the method includes:
step S202, analyzing the access log of the target website, and counting the total number of accessed webpages in the target website.
And S204, identifying target webpages, and counting the number of the target webpages, wherein the target webpages are webpages provided with links for returning home pages.
The target webpage is specifically a webpage with a home page return link in the accessed webpage, wherein the home page return link is a link on the webpage for jumping to a home page of the target website.
In the embodiment of the present application, the web page identification method described in the above embodiment of the present application may be used to identify a web page with a link returning to a home page in an accessed web page, that is, a target web page, and then count the total number of the target web pages.
And step S206, calculating the setting rate of the links returning to the home page on the target website according to the total number of the web pages and the number of the target web pages.
In this embodiment, a value obtained by dividing the number of target webpages by the total number of the webpages is used as the setting rate of the link to the home page.
According to the method and the device for identifying the webpage provided with the return home page link, the webpage provided with the return home page link is identified by the webpage identification method in the embodiment of the application, so that the total number of the webpages is counted, the setting rate of the return home page link is calculated according to the counted total number of the visited webpages and the number of the webpages provided with the return home page link, and compared with the prior art that the webpage identification is carried out in a manual mode, the efficiency is greatly improved.
An embodiment of the present application further provides a web page identification apparatus, which may be used to execute the web page identification method according to the embodiment of the present application, and as shown in fig. 3, the apparatus includes: an inquiring unit 301, an extracting unit 303, a judging unit 305 and a determining unit 307.
The query unit 301 is configured to query a target access log from the access logs of the target website, where the target access log is a log of a first page of the access target website.
The target website access log can be obtained by adding Tracker to the target website, wherein the Tracker is essentially a section of JS script and is embedded in the source code of the target website, and the target website access log of the user can be sent to a specified server, and the target website access log records different user access behavior data on the target website, including behavior data searched in the website.
The extracting unit 303 is configured to analyze the target access log and extract an access source of a home page of the access target website.
In this embodiment, the access logs of the top page of the accessed target website, that is, the target access logs, are queried from all the access logs of the target website, so that the access source of the top page is analyzed from the logs. The access source here refers to a web page source that jumps to the home page through a link, for example, if a user accesses web page a and then jumps to the home page of the target website through web page a, the URL of web page a is recorded in the access log of the home page of the target website to indicate the access source of the home page of the access target website.
The determining unit 305 is configured to determine whether the webpage of the access source is a webpage other than the home page of the target website.
The determining unit 307 is configured to determine that a return home page link is set on the webpage of the access source when the webpage of the access source is another webpage in the target website except for the home page.
In this embodiment, after determining an access source of a home page, it is determined whether a page of the access source is a webpage of a target website other than the home page, that is, it is determined whether an access source page of a home page that jumps to the target website is a webpage of the target website, and if so, it is determined that a return home page link is set on the webpage of the access source, that is, after accessing a non-home page, a user jumps to the home page through the return home page link on the webpage. Otherwise, the access source is the web page of other website, so the return home page link is not set.
According to the embodiment of the application, a target access log is inquired from the access log of the target website, wherein the target access log is the log of the first page of the access target website, analyzing the target access log, extracting the access source of the home page of the access target website, judging whether the webpage of the access source is other webpages except the home page in the target website or not, when the webpage of the access source is other than the home page of the target website, determining that a return home page link is arranged on the webpage of the access source, by analyzing the access log, the webpage provided with the return home page link is identified, so that compared with a manual mode in the prior art, the efficiency is improved, the workload is reduced, and the technical problems that whether the return home page link is arranged on the webpage or not is low in efficiency and large in workload in the prior art are solved.
Preferably, the judging unit includes: the judging module is used for judging whether the domain name contained in the uniform resource locator of the webpage of the access source is the same as the domain name of the target website or not; the first determining module is used for determining that the webpage of the access source is other webpages except the home page in the target website when the domain name contained in the uniform resource locator of the webpage of the access source is the same as the domain name of the target website.
Because the Uniform Resource Locator (URL) of each web page includes the domain name of the website where the URL is located, when determining whether the web page of the access source is the web page of the target website, it may be determined whether the domain name included in the URL corresponding to the web page of the access source is the same as the domain name of the target website, and if the domain names are the same, the web page of the access source is considered as the web page of the target website, and in this case, the web page of the access source may be considered as a non-top page web page in the target website; otherwise, the access source web page is considered as the web page of other web sites.
According to the embodiment of the application, whether the webpage is a webpage which is not a first page of the target website is determined by the URL of the webpage of the access source, so that the webpage which jumps from the webpage of other websites to the first page of the target website can be eliminated from the webpage of the access source.
Preferably, the extracting unit is specifically configured to parse a target field from the target access log, where the target field is a field recorded with a uniform resource locator of a previous-hop web page.
In this embodiment, a target field may be set in the access log of the target website, where the target field is used to record a referrurl field that is a URL field of a previous-hop webpage of the webpage. If the user returns to the home page through the return button of the browser, the refereurl recorded in the access log is null, and when the user returns to the home page by clicking the return home page link on the webpage, the access log records the URL of the current webpage in the refereurl field of the home page.
After the target field is extracted, whether the webpage of the access source is a webpage of a non-top page of the target website may be determined by using the URL in the target field, which may be specifically referred to in the foregoing embodiments, and details are not described here.
Preferably, the query unit comprises: the matching module is used for matching the uniform resource locator corresponding to the home page of the target website with the access log of the target website; and the second determining module is used for taking the access log which is matched from the access log of the target website and contains the uniform resource locator corresponding to the home page of the target website as the target access log.
In this embodiment, after the access log of the target website is obtained, the access log may be analyzed one by one, and the URL corresponding to the home page of the target website is matched with the URL of the access webpage recorded in the access log of the target website, so as to match the access log of the page corresponding to the URL that is the same as the URL corresponding to the home page of the target website, that is, the access log of the home page of the access target website is queried from the access log of the target website with the URL corresponding to the home page of the target website as a query condition.
The setting rate determining apparatus includes a processor and a memory, and the aforementioned querying unit 301, the extracting unit 303, the judging unit 305, the determining unit 307, and the like are stored in the memory as program units, and the aforementioned program units stored in the memory are executed by the processor.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to one or more, and whether a return home page link is set in the webpage or not is identified by adjusting the kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
The present application further provides an embodiment of a computer program product, which, when being executed on a data processing device, is adapted to carry out program code for initializing the following method steps: inquiring a target access log from the access log of the target website, wherein the target access log is a log of a home page of the access target website; analyzing the target access log, and extracting an access source of a home page of an access target website; judging whether the webpage of the access source is other than a home page in the target website or not; and when the webpage of the access source is other webpages except the home page in the target website, determining that the home page returning link is arranged on the webpage of the access source.
An embodiment of the present application further provides a setting rate determining apparatus, which may be used to execute the setting rate determining method in the embodiment of the present application, and as shown in fig. 4, the apparatus includes: a first statistical unit 401, a second statistical unit 403 and a calculation unit 405.
The first statistical unit 401 is configured to analyze an access log of a target website, and count the total number of accessed webpages in the target website;
the second counting unit 403 is configured to identify a target webpage by using the webpage identification method of any one of claims 1 to 4, and count the number of the target webpage, where the target webpage is a webpage provided with a link for returning to a home page; and
the target webpage is specifically a webpage with a home page return link in the accessed webpage, wherein the home page return link is a link on the webpage for jumping to a home page of the target website.
In the embodiment of the present application, the web page identification method described in the above embodiment of the present application may be used to identify a web page with a link returning to a home page in an accessed web page, that is, a target web page, and then count the total number of the target web pages.
The calculating unit 405 is configured to calculate a setting rate of the link returning to the home page on the target website according to the total number of the web pages and the number of the target web pages.
In this embodiment, a value obtained by dividing the number of target webpages by the total number of the webpages is used as the setting rate of the link to the home page.
According to the method and the device for identifying the webpage provided with the return home page link, the webpage provided with the return home page link is identified by the webpage identification method in the embodiment of the application, so that the total number of the webpages is counted, the setting rate of the return home page link is calculated according to the counted total number of the visited webpages and the number of the webpages provided with the return home page link, and compared with the prior art that the webpage identification is carried out in a manual mode, the efficiency is greatly improved.
The setting rate determining apparatus includes a processor and a memory, the first statistical unit 401, the second statistical unit 403, the calculating unit 405, and the like are stored in the memory as program units, and the processor executes the program units stored in the memory.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can set one or more, and the setting rate of the link returning to the home page is calculated by adjusting the parameters of the kernel.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
The present application further provides an embodiment of a computer program product, which, when being executed on a data processing device, is adapted to carry out program code for initializing the following method steps: analyzing the access log of the target website, and counting the total number of accessed webpages in the target website; identifying a target webpage, and counting the number of the target webpage, wherein the target webpage is a webpage provided with a link for returning to a home page; and calculating the setting rate of the returned home page links on the target website according to the total number of the webpages and the number of the target webpages.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (8)

1. A method for identifying a web page, comprising:
inquiring a target access log from an access log of a target website, wherein the target access log is a log of a home page of the target website;
analyzing the target access log, and extracting an access source of a home page for accessing the target website;
judging whether the webpage of the access source is other webpages in the target website except the home page or not; and
when the webpage of the access source is other webpages in the target website except the home page, determining that a home page returning link is arranged on the webpage of the access source, wherein the home page returning link is a link which is arranged on the webpage of the target website and is used for jumping back to the home page of the target website;
wherein, the determining whether the webpage of the access source is another webpage except the home page in the target website includes:
judging whether the domain name contained in the uniform resource locator of the webpage of the access source is the same as the domain name of the target website or not;
when the domain name contained in the uniform resource locator of the webpage of the access source is the same as the domain name of the target website, determining that the webpage of the access source is other webpages except the home page in the target website;
the access source refers to a webpage source for jumping to a home page through a link.
2. The method of claim 1, wherein parsing the target access log to extract an access source for accessing a top page of the target website comprises:
and analyzing a target field from the target access log, wherein the target field is a field recorded with a uniform resource locator of a previous-hop webpage.
3. The method of claim 1, wherein querying the target access log from the access log of the target website comprises:
matching the uniform resource locator corresponding to the home page of the target website with the access log of the target website;
and taking the access log which is matched from the access log of the target website and contains the uniform resource locator corresponding to the home page of the target website as the target access log.
4. A method for setting rate determination, comprising:
analyzing an access log of a target website, and counting the total number of accessed webpages in the target website;
identifying a target webpage by using the webpage identification method of any one of claims 1 to 3, and counting the number of the target webpage, wherein the target webpage is a webpage provided with a link for returning to a home page; and
and calculating the setting rate of the returned home page links on the target website according to the total number of the webpages and the number of the target webpages.
5. A web page recognition apparatus, comprising:
the query unit is used for querying a target access log from the access logs of a target website, wherein the target access log is a log of a home page for accessing the target website;
the extraction unit is used for analyzing the target access log and extracting an access source of a home page accessing the target website;
the judging unit is used for judging whether the webpage of the access source is other than the home page in the target website; and
a determining unit, configured to determine that a home page return link is set on the webpage of the access source when the webpage of the access source is another webpage of the target website except for the home page, where the home page return link is a link that is set on the webpage of the target website and is used to jump back to the home page of the target website;
wherein the judging unit includes:
the judging module is used for judging whether the domain name contained in the uniform resource locator of the webpage of the access source is the same as the domain name of the target website or not;
a first determining module, configured to determine that the webpage of the access source is a webpage other than the home page in the target website when a domain name included in a uniform resource locator of the webpage of the access source is the same as a domain name of the target website;
the access source refers to a webpage source for jumping to a home page through a link.
6. The apparatus of claim 5, wherein the extracting unit is specifically configured to parse a target field from the target access log, where the target field is a field in which a uniform resource locator of a previous-hop webpage is recorded.
7. The apparatus of claim 5, wherein the query unit comprises:
the matching module is used for matching the uniform resource locator corresponding to the home page of the target website with the access log of the target website;
and the second determining module is used for taking the access log which is matched from the access log of the target website and contains the uniform resource locator corresponding to the home page of the target website as the target access log.
8. A setting rate determining apparatus, characterized by comprising:
the first counting unit is used for analyzing the access log of the target website and counting the total number of accessed webpages in the target website;
a second statistical unit, configured to identify a target webpage by using the webpage identification method according to any one of claims 1 to 3, and count the number of the target webpage, where the target webpage is a webpage provided with a link for returning to a home page; and
and the calculating unit is used for calculating the setting rate of the returned home page link on the target website according to the total number of the webpages and the number of the target webpages.
CN201510924044.8A 2015-12-14 2015-12-14 Webpage identification method and device and setting rate determination method and device Active CN106874300B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510924044.8A CN106874300B (en) 2015-12-14 2015-12-14 Webpage identification method and device and setting rate determination method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510924044.8A CN106874300B (en) 2015-12-14 2015-12-14 Webpage identification method and device and setting rate determination method and device

Publications (2)

Publication Number Publication Date
CN106874300A CN106874300A (en) 2017-06-20
CN106874300B true CN106874300B (en) 2020-05-22

Family

ID=59178354

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510924044.8A Active CN106874300B (en) 2015-12-14 2015-12-14 Webpage identification method and device and setting rate determination method and device

Country Status (1)

Country Link
CN (1) CN106874300B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101193008B (en) * 2007-03-29 2011-01-12 腾讯科技(深圳)有限公司 A method and system for replaying user webpage access track
CN102377583A (en) * 2010-08-09 2012-03-14 百度在线网络技术(北京)有限公司 Method and system for counting website traffic
CN102957712A (en) * 2011-08-17 2013-03-06 阿里巴巴集团控股有限公司 Method and system for loading website resources
CN103714057A (en) * 2012-09-28 2014-04-09 北京亿赞普网络技术有限公司 Real-time monitoring method and device for online web information
CN104391953A (en) * 2014-11-27 2015-03-04 北京国双科技有限公司 Method and device for detecting web page updating

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7644134B2 (en) * 2001-07-06 2010-01-05 Clickfox, Llc System and method for analyzing system visitor activities
KR101010285B1 (en) * 2008-11-21 2011-01-24 삼성전자주식회사 History Operation Method For Web Page And Apparatus using the same

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101193008B (en) * 2007-03-29 2011-01-12 腾讯科技(深圳)有限公司 A method and system for replaying user webpage access track
CN102377583A (en) * 2010-08-09 2012-03-14 百度在线网络技术(北京)有限公司 Method and system for counting website traffic
CN102957712A (en) * 2011-08-17 2013-03-06 阿里巴巴集团控股有限公司 Method and system for loading website resources
CN103714057A (en) * 2012-09-28 2014-04-09 北京亿赞普网络技术有限公司 Real-time monitoring method and device for online web information
CN104391953A (en) * 2014-11-27 2015-03-04 北京国双科技有限公司 Method and device for detecting web page updating

Also Published As

Publication number Publication date
CN106874300A (en) 2017-06-20

Similar Documents

Publication Publication Date Title
CN107800591B (en) Unified log data analysis method
CN106919611B (en) Product information pushing method and device
CN108363815B (en) Webpage pre-reading method and device and intelligent terminal equipment
CN106874165B (en) Webpage detection method and device
US10216848B2 (en) Method and system for recommending cloud websites based on terminal access statistics
CN104391999B (en) Information recommendation method and device
CN101329687A (en) Method for positioning news web page
US10614500B2 (en) Identifying search friendly web pages
CN106033579A (en) Data processing method and apparatus thereof
CN106933897B (en) Data query method and device
CN110535974B (en) Pushing method, pushing device, equipment and storage medium for resources to be released
CN106933916B (en) JSON character string processing method and device
CN106611029B (en) Method and device for improving search efficiency in website
CN103248513A (en) Network information data collection method and system based on Office suite
CN108268523B (en) Database aggregation processing method and device
CN107357795B (en) Method and device for monitoring association degree between websites
CN106933903B (en) Storage method and device applied to distributed storage
CN106919609B (en) Product information pushing method and device
CN106874300B (en) Webpage identification method and device and setting rate determination method and device
CN106874302B (en) Setting rate determination method and device
CN106055572B (en) Page conversion parameter processing method and device
CN108228609B (en) Information filtering method and device
CN106611022B (en) Method and device for improving search efficiency in website
CN110275998B (en) Method and device for determining webpage attribute data
CN109948034B (en) Method and device for extracting page information based on filtering session

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant