CN110971713A - Method and device for tracing webpage access source - Google Patents

Method and device for tracing webpage access source Download PDF

Info

Publication number
CN110971713A
CN110971713A CN201811140542.3A CN201811140542A CN110971713A CN 110971713 A CN110971713 A CN 110971713A CN 201811140542 A CN201811140542 A CN 201811140542A CN 110971713 A CN110971713 A CN 110971713A
Authority
CN
China
Prior art keywords
source
webpage
information
source information
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811140542.3A
Other languages
Chinese (zh)
Inventor
王安迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201811140542.3A priority Critical patent/CN110971713A/en
Publication of CN110971713A publication Critical patent/CN110971713A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/09Mapping addresses
    • H04L61/10Mapping addresses of different types

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method and a device for tracing a webpage access source, relates to the technical field of data networks, and mainly aims to solve the problem that the access of a target page is influenced in the process of tracing the access source in the prior art. The method of the invention comprises the following steps: acquiring source information from address parameters corresponding to a target webpage, wherein the address parameters are obtained by adding the source information into a webpage address of the target webpage, and the length of the source information is smaller than that of a source website corresponding to the source information; acquiring a source website corresponding to the source information from a preset cache region, wherein the source information and the source website corresponding to the source information are stored in the preset cache region; and determining the access source of the target webpage according to the source webpage corresponding to the source information. The method and the device are used for tracing the access source of the webpage.

Description

Method and device for tracing webpage access source
Technical Field
The invention relates to the technical field of networks, in particular to a method and a device for tracing a webpage access source.
Background
In the network, an advertiser can place advertisements in a plurality of websites in order to improve the exposure effect of a product of the advertiser, and in order to count the revenue of advertisement placement in different websites, the advertiser usually counts and traces back the source of a visitor visiting a page of the product of the advertiser. Generally, the detection of the visitor source is performed according to a refer header in an HTTP (HyperText Transfer Protocol, HTTP for short) request. Where a referrer (HTTP refereer, referred to as referrer) is a part of a header file in request information, when a browser sends a request to a server, the browser generally carries the referrer so as to make the server know from which page link the request is transmitted. However, with the convenience of network access operations and the popularization of intelligent devices, when a user selects some specific ways to jump to a product page of an advertiser, the source detection way through referrer in the HTTP request also has a situation that the source page cannot be confirmed, for example, a new window is opened by dragging a mouse or a Flash internal link is clicked. Therefore, in order to solve the situation that the source of the web page may be lost, when the user jumps to the product page of the advertiser, the monitoring code mounts the page address of the previous page in the link of the product page, so that when the source web page is detected subsequently, the source is traced according to the page address of the previous page mounted in the link of the product page.
Currently, in the tracing process of the access source of the target page, the access source is generally confirmed according to the page address mounted in the link of the target page. However, in practical applications, when the number of characters of the page address of the access source is large, the existing source tracing manner may cause the page address of the access source with a large number of characters to have an excessive number of characters in the link after the link to the target web page is mounted, so that the target page cannot be loaded because the link characters exceed the threshold value. Therefore, how to trace back the access source of the target webpage under the condition of ensuring that the loading of the target webpage is not influenced becomes a problem to be solved urgently in the field.
Disclosure of Invention
In view of the above problems, the present invention provides a method and an apparatus for tracing a webpage access source, and a main object of the present invention is to avoid an influence on loading a target page while tracing an access source of the target page.
In order to solve the above technical problem, in a first aspect, the present invention provides a method for tracing a webpage access source, where the method includes:
acquiring source information from address parameters corresponding to a target webpage, wherein the address parameters are obtained by adding the source information into a webpage address of the target webpage, and the length of the source information is smaller than that of a source website corresponding to the source information;
acquiring a source website corresponding to the source information from a preset cache region, wherein the source information and the source website corresponding to the source information are stored in the preset cache region;
and determining the access source of the target webpage according to the source webpage corresponding to the source website.
Optionally, before obtaining the source information from the address parameter corresponding to the target webpage, the method further includes:
when an access request of a target webpage is received, a source website corresponding to the access request is obtained;
determining source information corresponding to the source website;
adding the source information into the address information of the target webpage to obtain the address parameter;
and/or the presence of a gas in the gas,
and storing the source information and the source website corresponding to the source information in the preset cache region.
Optionally, the source information includes a source identifier, and the storing the source information and the source address corresponding to the source information in the preset cache area includes:
storing the source identification and the source website corresponding to the source identification in a preset cache region;
the obtaining of the source website corresponding to the source information in the preset cache area includes:
and extracting the source website corresponding to the source identifier in the preset cache region.
Optionally, before the source website corresponding to the source information is acquired in the preset cache region, the method further includes:
and determining whether the source website exists in the target webpage or not according to whether the source information comprises a preset source identifier and/or a preset source parameter or not.
Optionally, determining whether the source website exists in the target webpage according to whether the source information includes a preset source identifier includes:
determining whether the source identifier of the source information comprises the preset source identifier;
if the source identification comprises the preset source identification, determining that the target webpage has a source webpage;
and if the source identification does not comprise the preset source identification, determining that the target webpage does not have the source webpage.
Optionally, the preset source parameters include a first parameter and a second parameter, where the first parameter is used to represent that the target webpage has a source webpage, and the second parameter is used to represent that the target webpage does not have a source webpage;
determining whether the target webpage has a source website according to whether the source information includes preset source parameters, including:
determining whether preset source parameters of the source information comprise the first parameter or the second parameter;
if the preset source parameters comprise the first parameters, determining that the target webpage has a source webpage;
if the preset source parameters comprise the second parameters, determining that the source webpage does not exist in the target webpage; and/or the presence of a gas in the gas,
after determining that the target web page does not have a source web page, the method further comprises:
and clearing the source webpage corresponding to the target webpage in the preset cache area.
In a second aspect, the present invention further provides an apparatus for tracing a source of a webpage access, including:
the first obtaining unit is used for obtaining source information from an address parameter corresponding to a target webpage when the target webpage is loaded, wherein the address parameter is obtained by adding source information into a webpage address of the target webpage, and the length of the source information is smaller than that of a source website corresponding to the source information;
a second obtaining unit, configured to obtain a source website corresponding to the source information from a preset cache region, where the source information and the source website corresponding to the source information are stored in the preset cache region;
and the first determining unit is used for determining the access source of the target webpage according to the source webpage corresponding to the source website.
Optionally, the apparatus further comprises:
the third acquisition unit is used for acquiring a source website corresponding to an access request when the access request of a target webpage is received;
a second determining unit, configured to determine source information corresponding to the source website;
the adding unit is used for adding the source information in the address information of the target webpage to obtain the address parameter;
and/or the presence of a gas in the gas,
and the storage unit is used for storing the source information and the source website corresponding to the source information in the preset cache region.
Optionally, the source information includes a source identifier,
the storage unit is further configured to store the source identifier and the source website corresponding to the source identifier in a preset cache region;
the second obtaining unit is further configured to extract a source website corresponding to the source identifier from the preset cache region.
Optionally, the apparatus further comprises:
and a third determining unit, configured to determine whether a source website exists in the target webpage according to whether the source information includes a preset source identifier and/or a preset source parameter.
Optionally, the third determining unit includes:
the first determining module is used for determining whether the source identifier of the source information comprises the preset source identifier;
a second determining module, configured to determine that a source webpage exists in the target webpage if the source identifier includes the preset source identifier;
and the third determining module is used for determining that the target webpage does not have the source webpage if the source identifier does not comprise the preset source identifier.
Optionally, the preset source parameters include a first parameter and a second parameter, where the first parameter is used to represent that the target webpage has a source webpage, and the second parameter is used to represent that the target webpage does not have a source webpage;
the third determination unit includes:
a fourth determining module, configured to determine whether preset source parameters of the source information include the first parameter or the second parameter;
a fifth determining module, configured to determine that a source webpage exists in the target webpage if the preset source parameter includes the first parameter;
a sixth determining module, configured to determine that the source web page does not exist in the target web page if the preset source parameter includes the second parameter;
the device further comprises:
and the clearing unit is used for clearing the source webpage corresponding to the target webpage in the preset cache area.
In order to achieve the above object, according to a third aspect of the present invention, a storage medium is provided, where the storage medium includes a stored program, where the program, when running, controls a device in which the storage medium is located to perform the above method for tracing back a source of access to a web page.
In order to achieve the above object, according to a fourth aspect of the present invention, there is provided a processor for executing a program, wherein the program executes the method for tracing back a webpage access source.
By means of the technical scheme, the method and the device for tracing the webpage access source provided by the invention have the advantage that the source information is obtained from the address parameter corresponding to the target webpage, so that the problem that the access of the target webpage is influenced in the process of tracing the access source in the prior art is solved. And then acquiring a source website corresponding to the source information from a preset cache region. And finally, determining the access source of the target webpage according to the source webpage corresponding to the source website, so as to realize the function of tracing the access source.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flowchart illustrating a method for tracing a source of a web page visit according to an embodiment of the present invention;
FIG. 2 is a flow chart of another method for tracing the source of a web page visit according to an embodiment of the present invention;
FIG. 3 is a block diagram illustrating an apparatus for tracing a source of a web page visit according to an embodiment of the present invention;
fig. 4 is a block diagram illustrating another apparatus for tracing a source of a web page visit according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In order to solve the problem that access of a target page is affected in the process of tracing an access source in the prior art, an embodiment of the present invention provides a method for tracing an access source of a web page, as shown in fig. 1, the method includes:
101. and acquiring source information from the address parameters corresponding to the target webpage.
The address parameter is obtained by adding source information to the webpage address of the target webpage, and the length of the source information is smaller than that of a source website corresponding to the source information.
In an embodiment of the present invention, the source information may include a source identifier and a corresponding source parameter. The source identifier may be randomly selected from an identifier or an identifier combination when the target web page is loaded according to needs, and the source identifier is not specifically limited herein and may be selected according to actual situations. The source parameter may be understood as parameter information for characterizing whether the source actually exists, for example, a certain number may be randomly selected as the source parameter to characterize that the current target web page exists in the source web page. It should be noted that, in the embodiment of the present invention, the number and the type of the characters of the selected source information are not limited, but it is to be ensured that the length of the selected source information is smaller than the length of the source address corresponding to the source information, so as to avoid the problem that the subsequent address parameters affect the loading of the web page due to the overlong characters.
In addition, in the embodiment of the present invention, before the method described in this step is performed, a monitoring code or a monitoring program may be deployed in the target webpage in advance, so that when the target webpage is triggered, the source information in the address parameter of the target webpage is acquired. Specifically, the deployment method is not limited herein, and any existing method may be selected for the deployment.
102. And acquiring a source website corresponding to the source information from a preset cache region.
The source information and the source website corresponding to the source information are stored in the preset cache area. After the step 101, after the source information is obtained, the website corresponding to the source information may be obtained from a preset cache region in which the source website is stored, where the obtaining manner is not limited, and may be selected according to actual needs, for example, by adding identification information to the source information, and querying the source website corresponding to the identification information in the preset cache region according to the identification information.
Furthermore, in this step, a source parameter may be extracted from the source information, where the source parameter may be used to characterize whether a source web page exists in the current target page, and therefore, before the preset cache area, it may be determined whether the current loaded target page is triggered by the source web page and then loaded according to the source parameter. Specifically, the source parameter may be compared with a preset parameter table, and whether the current target page number has the source web page may be determined in the parameter table according to the set meaning.
When it is determined that the target web page has the source web page before being loaded, the web page address corresponding to the source information, that is, the source web address, may be obtained from the preset cache region. Here, the preset cache region may be set in any storage medium of the browser, and may be specifically selected according to the actual needs of the user, for example, the location may be in COOKIE of the browser. In addition, in the embodiment of the present invention, in order to avoid the problem of source address error caused by locking the acquired source address when a plurality of source addresses exist in the preset cache region, the source address corresponding to the source identifier may be queried in the preset cache region according to the source identifier in the source information during the acquisition, so as to ensure the accuracy of the acquired source address.
103. And determining the access source of the target webpage according to the source webpage corresponding to the source website.
After the source web address of the target web page is obtained from the source information in the foregoing step 102, each source web address corresponds to one website entity, so that the access source of the target web page when being loaded can be determined based on the source web address.
For example, when the source address of the source web page is "https:// www.jd.com/", since the web page address corresponds to the jingdong mall, it can be determined that the source of the current target web page is the jingdong mall when the current target web page is loaded.
According to the method for tracing the webpage access source, provided by the embodiment of the invention, for the problem that the access of the target page is influenced in the process of tracing the access source in the prior art, the source information is obtained from the address parameter corresponding to the target webpage. And then acquiring a source website corresponding to the source information from a preset cache region. And finally, determining the access source of the target webpage according to the source webpage corresponding to the source website, so as to realize the function of tracing the access source.
Further, as a refinement and an extension of the embodiment shown in fig. 1, an embodiment of the present invention further provides another method for tracing a webpage access source, as shown in fig. 2, the method includes the following specific steps:
201. when an access request of a target webpage is received, a source website corresponding to the access request is obtained, source information corresponding to the source website is determined, and the source information is added to address information of the target webpage to obtain the address parameters.
The source information may include a source identifier and a source parameter, and the source parameter may be a first parameter or a second parameter, where the first parameter is used to represent that the target webpage has a source webpage, and the second parameter is used to represent that the target webpage does not have a source webpage.
Before this step, a monitoring program or code may be deployed in the target web page to ensure that when the monitored target web page is loaded or an access request is received, the current web page of the browser currently accessing the source is identified and the website is obtained in time, that is, the source website is obtained. Wherein the deployed monitoring code may select JavaScript code. In addition, two modes exist when the target webpage is accessed, one mode is that the target webpage is accessed through different pages in the same website, the other mode is that the target webpage is accessed based on an off-site searching website, the former mode can acquire the webpage address when the current access request is sent out based on the on-site data in the same website, and the latter mode can acquire the address of the website where the search engine is located.
In addition, in the embodiment of the present invention, the source address may specifically be a Uniform Resource Locator (URL) of a web page, and the URL may be understood as a concise representation of a location and an access method of a resource obtained from the internet and is an address of a standard resource on the internet. It contains information indicating the location of the file and how the browser should handle it, each file on the internet having a unique URL. Thus, when the monitoring code deployed in the target webpage determines that the access request of the target webpage is received, the URL of the webpage at the time of sending the access request can be obtained.
After the source website is acquired, in order to facilitate subsequent query of the website, the source information used for replacing the source website can be determined by selecting preset characters or character strings. And adding the source information into the address information in the target webpage to obtain the address parameter. In order to avoid the loading of the web page with the added address parameters affected by the overlong characters, it is required to ensure that the length of the added source information is smaller than the length of the source website corresponding to the source information, so as to avoid the problem of overlong address parameter characters.
Therefore, when the access request of the target webpage is received, the source website corresponding to the access request is obtained, and the current page address before the jump of the initiating end of the access request can be timely obtained when the target webpage is accessed, so that the accuracy of data in the follow-up tracing of the access source of the target webpage is ensured, and a data basis is provided for the follow-up tracing. And determining source information corresponding to the source website, and adding the source information to the address information of the target webpage to obtain the address parameter, so that the problem that the loading of a subsequent page is influenced due to overlong characters caused by directly adding the source website to the address information of the target webpage in the prior art can be solved.
202. And storing the source information and the source website corresponding to the source information in the preset cache region.
In order to avoid the problem that the URL of the source web page mounted in the target web page causes the overlong whole URL of the link, in the embodiment of the present invention, after the source website when the target web page is accessed is obtained in step 201, the source address may be stored in a preset location according to the method in this step, that is, stored in a preset cache region. The preset cache area may be in any storage medium of a browser, for example: cookie, LocalStorage, or SessionsStorage.
In addition, when the URLs of the source webpages of the multiple target webpages are stored in the preset cache area, in order to avoid confusion in the subsequent tracing process, a source identifier may be set in the source information, and thus the step may specifically be: firstly, adding a source identifier for the source website. And then, storing the source identifier and the source website corresponding to the source identifier in a preset cache area, where the source identifier may select any character or character string combination, and may be selected as needed, but it should be noted that the selection of the source identifier is to ensure that the source identifier is not confused with characters in the URL of the target web page.
For example: com, when the URL of the target web page is www.a.com, the URL of the source address where the access request can be acquired is www.test.com after the access request is received. Www.test.com of the URL is then saved to a Cookie in a predetermined cache area, and a source identifier gsref is added to the URL.
Therefore, the source information and the source websites corresponding to the source information are stored in the preset cache region, so that when a plurality of source websites exist in the subsequent preset cache region, the corresponding source websites can be acquired according to the source information, and the accuracy of the subsequent webpage source tracing is ensured. In addition, by setting the source identifier in the source information, the source address corresponding to the source information can be acquired according to the source identifier, the character length in the address parameter is further reduced, and the stability of the webpage during loading is ensured.
203. And acquiring source information from the address parameters corresponding to the target webpage.
After the foregoing steps, when it is determined that the address parameter exists in the target webpage, it indicates that the source information exists, so that according to the method described in this step, the source information is first obtained from the address parameter when the target webpage is loaded.
Specifically, in the embodiment of the present invention, the source information, the address parameters, and the manner of obtaining the source information from the address parameters are all the same as those described in step 101 in the foregoing embodiment, and are not described herein again.
For example, when the content of the address displayed on the target web page is test.a.com? When gsref is 1, because test.a.com is the actual URL, and? gsref 1 is not the standard URL information, so it can be determined that the address content of the target web page is actually the address parameter containing the source information. And then get from? gsref 1 as source information.
204. And determining whether the source website exists in the target webpage or not according to whether the source information comprises a preset source identifier and/or a preset source parameter or not.
The address parameter may include a URL of the target web page and source information, the source information includes a source identifier and source parameters, the source parameters include a first parameter and a second parameter, the first parameter is used to represent that the target web page has the source web page before being loaded, and the second parameter is used to represent that the target web page does not have the source web page before being recorded. Based on the embodiment of the present invention, in order to implement tracing of a web page source, in the embodiment of the present invention, before obtaining source information from an address parameter, it is further required to first determine whether a source web page exists before a target web page is accessed, specifically, it may be determined according to the method in this step whether the source information includes a preset source identifier or includes a preset source parameter, and then determine whether the preset source parameter exists after determining that the preset source identifier exists, specifically, it may be selected according to actual needs of a user, and a specific execution manner is not limited herein. The judgment basis can be determined according to preset characters, for example, whether content identical to a preset source identifier exists can be determined from content in an address bar when a target webpage is loaded, and if the content exists, it is indicated that an address parameter containing source information exists in the target webpage.
For example, based on the example in the aforementioned step 202, the source information includes a source identifier and a source parameter, wherein the source identifier is gsref in accordance with the aforementioned step 202. In addition, a source parameter is allocated to the source identifier, when the identifier parameter 1 represents that a source web page exists and 0 represents that no source web page exists, therefore, when it is determined that the address parameter may be "test.a.com? When gsref is 1 ", the method according to this step may determine that the URL of the target web page is test.a.com, and the source information includes the predetermined source identifier gsref, so that it may be continuously determined whether the source parameter is 1, and since the source parameter is specifically 1, the source web page exists in the access.
In addition, when it is determined that the target webpage does not have the source webpage, the source webpage corresponding to the target webpage in the preset cache area can be removed, so that resource consumption in the preset cache area can be reduced, and cache resources are saved.
Therefore, whether the source webpage exists in the target page is judged by judging whether the source information comprises the preset source identification and/or whether the preset source parameter is included, so that the source website does not need to be acquired and the webpage source does not need to be analyzed continuously when the source webpage does not exist in the target source page, and unnecessary time consumption is reduced.
205. And acquiring a source website corresponding to the source information from a preset cache region.
Specifically, the steps may be: and extracting the source website corresponding to the source identifier in the preset cache region. Wherein, the step can also be specifically as follows: firstly, when the source parameter is a first parameter, determining that the target webpage exists in the source webpage before being loaded. And then, acquiring the webpage address corresponding to the source information in a preset cache region. Specifically, obtaining the web page address corresponding to the source information in the preset cache region may specifically include: and inquiring identification parameters of the source identification corresponding to the source information in the preset cache region, and then extracting the webpage address from the identification parameters.
For example, based on the foregoing example in the steps 201-204 in the embodiment of the present invention, when the source parameter is determined to be 1, it is determined that the source web page exists when the target web page is accessed this time, and therefore, according to the source identifier "gsref" in the source information, the website corresponding to gsref is queried in the Cookie, and since the corresponding website is www.test.com, the source website of the source web page of the current target web page can be determined to be www.test.com.
206. And determining the access source of the target webpage according to the source webpage corresponding to the source website.
When the source web address of the source web page of the target web page when being loaded is determined, the source web address may specifically be a URL in the embodiment of the present invention, and each file in the network contains a unique URL, so that the access source of the target web page when being recorded can be determined based on the URL.
Further, as an implementation of the method shown in fig. 1, an embodiment of the present invention further provides a device for tracing a source of a webpage access, which is used to implement the method shown in fig. 1. The embodiment of the apparatus corresponds to the embodiment of the method, and for convenience of reading, details in the embodiment of the apparatus are not repeated one by one, but it should be clear that the apparatus in the embodiment can correspondingly implement all the contents in the embodiment of the method. As shown in fig. 3, the apparatus includes: a first acquisition unit 31, a second acquisition unit 32 and a first determination unit 33, wherein
The first obtaining unit 31 may be configured to, when a target webpage is loaded, obtain source information from an address parameter corresponding to the target webpage, where the address parameter is obtained by adding source information to a webpage address of the target webpage, and a length of the source information is smaller than a length of a source website corresponding to the source information.
A second obtaining unit 32, configured to obtain a source address corresponding to the source information obtained by the first obtaining unit 31 from a preset cache region, where the source information and the source address corresponding to the source information are stored in the preset cache region;
the first determining unit 33 may be configured to determine an access source of the target web page according to the source web page corresponding to the source website acquired by the second acquiring unit 32.
Further, as an implementation of the method shown in fig. 2, an embodiment of the present invention further provides a device for tracing a source of a webpage access, which is used to implement the method shown in fig. 2. The embodiment of the apparatus corresponds to the embodiment of the method, and for convenience of reading, details in the embodiment of the apparatus are not repeated one by one, but it should be clear that the apparatus in the embodiment can correspondingly implement all the contents in the embodiment of the method. As shown in fig. 4, the apparatus includes: a first acquisition unit 41, a second acquisition unit 42 and a first determination unit 43, wherein
The first obtaining unit 41 may be configured to, when a target webpage is loaded, obtain source information from an address parameter corresponding to the target webpage, where the address parameter is obtained by adding source information to a webpage address of the target webpage, and a length of the source information is smaller than a length of a source website corresponding to the source information.
A second obtaining unit 42, configured to obtain a source address corresponding to the source information obtained by the first obtaining unit 41 from a preset cache region, where the source information and the source address corresponding to the source information are stored in the preset cache region;
the first determining unit 43 may be configured to determine an access source of the target web page according to the source web page corresponding to the source website acquired by the second acquiring unit 42.
Further, the apparatus further comprises:
a third obtaining unit 44, configured to obtain, when an access request of a target web page is received, a source website corresponding to the access request;
a second determining unit 45, configured to determine source information corresponding to the source website acquired by the third acquiring unit 44;
an adding unit 46, configured to add the source information determined by the second determining unit 45 to the address information of the target webpage to obtain the address parameter;
the saving unit 47 may be configured to save the source information determined by the second determining unit 45 and the source address acquired by the third acquiring unit 44 corresponding to the source information in the preset cache region.
Further, the source information comprises a source identifier,
the saving unit 47 may be further configured to save the source identifier and the source address corresponding to the source identifier in a preset cache region;
the second obtaining unit 42 may be further configured to extract a source address corresponding to the source identifier from the preset cache region.
Further, the apparatus further comprises:
the third determining unit 48 may be configured to determine whether the source address exists in the target webpage according to whether the source information includes a preset source identifier and/or a preset source parameter.
Further, the third determining unit 48 includes:
a first determining module 481, configured to determine whether the source identifier of the source information includes the preset source identifier;
a second determining module 482, configured to determine that a source webpage exists in the target webpage if the first determining module 481 determines that the source identifier includes the preset source identifier;
the third determining module 483 may be configured to determine that the target web page does not have a source web page if the first determining module 481 determines that the source identifier does not include the preset source identifier.
Furthermore, the preset source parameters include a first parameter and a second parameter, the first parameter is used for representing that the target webpage has a source webpage, and the second parameter is used for representing that the target webpage does not have a source webpage;
the third determining unit 48 includes:
a fourth determining module 484, which may be configured to determine whether the first parameter or the second parameter is included in preset source parameters of the source information;
a fifth determining module 485, configured to determine that the target webpage has a source webpage if the fourth determining module 484 determines that the preset source parameter includes the first parameter;
a sixth determining module 486, configured to determine that the target webpage does not have a source webpage if the fourth determining module 484 determines that the preset source parameter includes the second parameter;
the device further comprises:
the clearing unit 49 may be configured to clear the source web page corresponding to the target web page in the preset cache area if the third determining unit 48 determines that the target web page does not have the source web page.
By means of the technical scheme, the embodiment of the invention provides a method and a device for tracing a webpage access source, aiming at the problem that in the prior art, in the process of tracing the access source, the access of a target page is influenced, and the invention acquires the source information from the address parameter corresponding to the target webpage. And then acquiring a source website corresponding to the source information from a preset cache region. And finally, determining the access source of the target webpage according to the source webpage corresponding to the source website, so as to realize the function of tracing the access source.
Meanwhile, when the access request of the target webpage is received, the source website corresponding to the access request is obtained, so that the current page address before the jump of the initiating end of the access request can be timely obtained when the target webpage is accessed, the accuracy of data in the follow-up tracing of the access source of the target webpage is ensured, and a data basis is provided for the follow-up tracing. And determining source information corresponding to the source website, and adding the source information to the address information of the target webpage to obtain the address parameter, so that the problem that the loading of a subsequent page is influenced due to overlong characters caused by directly adding the source website to the address information of the target webpage in the prior art can be solved. In addition, the source information and the source websites corresponding to the source information are stored in the preset cache region, so that when a plurality of source websites exist in the subsequent preset cache region, the corresponding source websites can be acquired according to the source information, and the accuracy of the subsequent webpage source tracing is ensured. In addition, by setting the source identifier in the source information, the source address corresponding to the source information can be acquired according to the source identifier, the character length in the address parameter is further reduced, and the stability of the webpage during loading is ensured. Furthermore, whether the source webpage exists in the target page is judged by judging whether the source information comprises the preset source identification and/or whether the preset source parameter is included, so that the situation that when the source webpage does not exist in the target source page, the source website does not need to be acquired continuously and the webpage source does not need to be analyzed is avoided, and unnecessary time consumption is reduced. In addition, when it is determined that the target webpage does not have the source webpage, the source webpage corresponding to the target webpage in the preset cache area can be removed, so that resource consumption in the preset cache area can be reduced, and cache resources are saved.
The device for tracing the webpage access source comprises a processor and a memory, wherein the first acquiring unit, the second acquiring unit, the first determining unit and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, and the access source of the target page is traced by adjusting the kernel parameters, and meanwhile, the influence on the loading of the target page is avoided.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
The embodiment of the invention provides a storage medium, wherein a program is stored on the storage medium, and the program realizes the method for tracing the access source of the webpage when being executed by a processor.
The embodiment of the invention provides a processor, which is used for running a program, wherein the method for tracing the webpage access source is executed when the program runs.
The embodiment of the invention provides equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein the processor executes the program and realizes the following steps: acquiring source information from address parameters corresponding to a target webpage, wherein the address parameters are obtained by adding the source information into a webpage address of the target webpage, and the length of the source information is smaller than that of a source website corresponding to the source information; acquiring a source website corresponding to the source information from a preset cache region, wherein the source information and the source website corresponding to the source information are stored in the preset cache region; and determining the access source of the target webpage according to the source webpage corresponding to the source website.
Further, before the obtaining the source information from the address parameter corresponding to the target webpage, the method further includes:
when an access request of a target webpage is received, a source website corresponding to the access request is obtained;
determining source information corresponding to the source website;
adding the source information into the address information of the target webpage to obtain the address parameter;
and/or the presence of a gas in the gas,
and storing the source information and the source website corresponding to the source information in the preset cache region.
Further, the source information includes a source identifier, and the storing the source information and the source address corresponding to the source information in the preset cache area includes:
storing the source identification and the source website corresponding to the source identification in a preset cache region;
the obtaining of the source website corresponding to the source information in the preset cache area includes:
and extracting the source website corresponding to the source identifier in the preset cache region.
Further, before the source website corresponding to the source information is acquired in the preset cache region, the method further includes:
and determining whether the source website exists in the target webpage or not according to whether the source information comprises a preset source identifier and/or a preset source parameter or not.
Further, determining whether the source website exists in the target webpage according to whether the source information includes a preset source identifier includes:
determining whether the source identifier of the source information comprises the preset source identifier;
if the source identification comprises the preset source identification, determining that the target webpage has a source webpage;
and if the source identification does not comprise the preset source identification, determining that the target webpage does not have the source webpage.
Furthermore, the preset source parameters include a first parameter and a second parameter, the first parameter is used for representing that the target webpage has a source webpage, and the second parameter is used for representing that the target webpage does not have a source webpage;
determining whether the target webpage has a source website according to whether the source information includes preset source parameters, including:
determining whether preset source parameters of the source information comprise the first parameter or the second parameter;
if the preset source parameters comprise the first parameters, determining that the target webpage has a source webpage;
if the preset source parameters comprise the second parameters, determining that the source webpage does not exist in the target webpage; and/or the presence of a gas in the gas,
after determining that the target web page does not have a source web page, the method further comprises:
and clearing the source webpage corresponding to the target webpage in the preset cache area.
The device in the embodiment of the invention can be a server, a PC, a PAD, a mobile phone and the like.
An embodiment of the present invention further provides a computer program product, which, when executed on a data processing apparatus, is adapted to execute a program that initializes the following method steps: acquiring source information from address parameters corresponding to a target webpage, wherein the address parameters are obtained by adding the source information into a webpage address of the target webpage, and the length of the source information is smaller than that of a source website corresponding to the source information; acquiring a source website corresponding to the source information from a preset cache region, wherein the source information and the source website corresponding to the source information are stored in the preset cache region; and determining the access source of the target webpage according to the source webpage corresponding to the source website.
Further, before the obtaining the source information from the address parameter corresponding to the target webpage, the method further includes:
when an access request of a target webpage is received, a source website corresponding to the access request is obtained;
determining source information corresponding to the source website;
adding the source information into the address information of the target webpage to obtain the address parameter;
and/or the presence of a gas in the gas,
and storing the source information and the source website corresponding to the source information in the preset cache region.
Further, the source information includes a source identifier, and the storing the source information and the source address corresponding to the source information in the preset cache area includes:
storing the source identification and the source website corresponding to the source identification in a preset cache region;
the obtaining of the source website corresponding to the source information in the preset cache area includes:
and extracting the source website corresponding to the source identifier in the preset cache region.
Further, before the source website corresponding to the source information is acquired in the preset cache region, the method further includes:
and determining whether the source website exists in the target webpage or not according to whether the source information comprises a preset source identifier and/or a preset source parameter or not.
Further, determining whether the source website exists in the target webpage according to whether the source information includes a preset source identifier includes:
determining whether the source identifier of the source information comprises the preset source identifier;
if the source identification comprises the preset source identification, determining that the target webpage has a source webpage;
and if the source identification does not comprise the preset source identification, determining that the target webpage does not have the source webpage.
Furthermore, the preset source parameters include a first parameter and a second parameter, the first parameter is used for representing that the target webpage has a source webpage, and the second parameter is used for representing that the target webpage does not have a source webpage;
determining whether the target webpage has a source website according to whether the source information includes preset source parameters, including:
determining whether preset source parameters of the source information comprise the first parameter or the second parameter;
if the preset source parameters comprise the first parameters, determining that the target webpage has a source webpage;
if the preset source parameters comprise the second parameters, determining that the source webpage does not exist in the target webpage; and/or the presence of a gas in the gas,
after determining that the target web page does not have a source web page, the method further comprises:
and clearing the source webpage corresponding to the target webpage in the preset cache area.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A method for tracing a source of access to a web page, comprising:
acquiring source information from address parameters corresponding to a target webpage, wherein the address parameters are obtained by adding the source information into a webpage address of the target webpage, and the length of the source information is smaller than that of a source website corresponding to the source information;
acquiring a source website corresponding to the source information from a preset cache region, wherein the source information and the source website corresponding to the source information are stored in the preset cache region;
and determining the access source of the target webpage according to the source webpage corresponding to the source information.
2. The method of claim 1, wherein before obtaining the source information from the address parameter corresponding to the target webpage, the method further comprises:
when an access request of a target webpage is received, a source website corresponding to the access request is obtained;
determining source information corresponding to the source website;
adding the source information into the address information of the target webpage to obtain the address parameter;
and/or the presence of a gas in the gas,
and storing the source information and the source website corresponding to the source information in the preset cache region.
3. The method of claim 2, wherein the source information includes a source identifier, and the storing the source information and the source address corresponding to the source information in the predetermined cache area comprises:
storing the source identification and the source website corresponding to the source identification in a preset cache region;
the obtaining of the source website corresponding to the source information in the preset cache area includes:
and extracting the source website corresponding to the source identifier in the preset cache region.
4. The method of claim 3, wherein before the obtaining the source address corresponding to the source information in the preset cache region, the method further comprises:
and determining whether the source website exists in the target webpage or not according to whether the source information comprises a preset source identifier and/or a preset source parameter or not.
5. The method of claim 4, wherein determining whether the source address exists in the target webpage according to whether the source information includes a preset source identifier comprises:
determining whether the source identifier of the source information comprises the preset source identifier;
if the source identification comprises the preset source identification, determining that the target webpage has a source webpage;
and if the source identification does not comprise the preset source identification, determining that the target webpage does not have the source webpage.
6. The method according to claim 4, wherein the preset source parameters include a first parameter and a second parameter, the first parameter is used for characterizing that the target web page has a source web page, and the second parameter is used for characterizing that the target web page has no source web page;
determining whether the target webpage has a source website according to whether the source information includes preset source parameters, including:
determining whether preset source parameters of the source information comprise the first parameter or the second parameter;
if the preset source parameters comprise the first parameters, determining that the target webpage has a source webpage;
if the preset source parameters comprise the second parameters, determining that the source webpage does not exist in the target webpage; and/or the presence of a gas in the gas,
after determining that the target web page does not have a source web page, the method further comprises:
and clearing the source webpage corresponding to the target webpage in the preset cache area.
7. An apparatus for tracing a source of access to a web page, comprising:
the first obtaining unit is used for obtaining source information from an address parameter corresponding to a target webpage when the target webpage is loaded, wherein the address parameter is obtained by adding source information into a webpage address of the target webpage, and the length of the source information is smaller than that of a source website corresponding to the source information;
a second obtaining unit, configured to obtain a source website corresponding to the source information from a preset cache region, where the source information and the source website corresponding to the source information are stored in the preset cache region;
and the first determining unit is used for determining the access source of the target webpage according to the source webpage corresponding to the source website.
8. The apparatus of claim 7, further comprising:
the third acquisition unit is used for acquiring a source website corresponding to an access request when the access request of a target webpage is received;
a second determining unit, configured to determine source information corresponding to the source website;
the adding unit is used for adding the source information in the address information of the target webpage to obtain the address parameter;
and the storage unit is used for storing the source information and the source website corresponding to the source information in the preset cache region.
9. A storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus in which the storage medium is located to perform the method for tracing back a source of webpage access according to any one of claims 1 to 6.
10. A processor configured to run a program, wherein the program runs the method for tracing a source of webpage access according to any one of claims 1 to 6.
CN201811140542.3A 2018-09-28 2018-09-28 Method and device for tracing webpage access source Pending CN110971713A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811140542.3A CN110971713A (en) 2018-09-28 2018-09-28 Method and device for tracing webpage access source

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811140542.3A CN110971713A (en) 2018-09-28 2018-09-28 Method and device for tracing webpage access source

Publications (1)

Publication Number Publication Date
CN110971713A true CN110971713A (en) 2020-04-07

Family

ID=70026884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811140542.3A Pending CN110971713A (en) 2018-09-28 2018-09-28 Method and device for tracing webpage access source

Country Status (1)

Country Link
CN (1) CN110971713A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317884A (en) * 2014-10-21 2015-01-28 北京国双科技有限公司 Method and device for acquiring types of source pages of website
CN104462182A (en) * 2014-10-10 2015-03-25 北京国双科技有限公司 Webpage skipping processing method and device
CN106547799A (en) * 2015-09-23 2017-03-29 北京国双科技有限公司 The introduction method and device of data
US20170169443A1 (en) * 2015-12-10 2017-06-15 Taglynx, LLC Tag link creation and campaign tracking

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462182A (en) * 2014-10-10 2015-03-25 北京国双科技有限公司 Webpage skipping processing method and device
CN104317884A (en) * 2014-10-21 2015-01-28 北京国双科技有限公司 Method and device for acquiring types of source pages of website
CN106547799A (en) * 2015-09-23 2017-03-29 北京国双科技有限公司 The introduction method and device of data
US20170169443A1 (en) * 2015-12-10 2017-06-15 Taglynx, LLC Tag link creation and campaign tracking

Similar Documents

Publication Publication Date Title
CN108304410B (en) Method and device for detecting abnormal access page and data analysis method
CN110020339B (en) Webpage data acquisition method and device based on non-buried point
CN108256888B (en) Landing page acquisition method, website server and network advertisement monitoring system
CN109428776B (en) Website traffic monitoring method and device
CN109598526B (en) Method and device for analyzing media contribution
CN109600272B (en) Crawler detection method and device
CN107015986B (en) Method and device for crawling webpage by crawler
CN106657422B (en) Method, device and system for crawling website page and storage medium
CN106937173B (en) Video playing method and device
CN111368163A (en) Crawler data identification method, system and equipment
CN111061977A (en) Website updating method, device and system
CN106682044B (en) Data processing method and device
CN106911636B (en) Method and device for detecting whether backdoor program exists in website
CN110889065B (en) Page stay time determination method, device and equipment
CN106649374B (en) Navigation tag sequencing method and device
CN108255878B (en) User information processing method and related device
CN110969469B (en) Data acquisition method and device
CN115297042B (en) Method for detecting consistency of webpages under different networks and related equipment
CN108984572B (en) Website information pushing method and device
CN112749352A (en) Webpage skipping method and device, electronic equipment and readable storage medium
WO2018114055A1 (en) Method and system for providing additional information relating to primary information
CN110929188A (en) Method and device for rendering server page
CN110971713A (en) Method and device for tracing webpage access source
CN102694802B (en) Network access information recording method and device
CN110971578B (en) User identity confirmation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200407