WO2017124692A1 - 查找表单页面和目标页面转化关系的方法和装置 - Google Patents

查找表单页面和目标页面转化关系的方法和装置 Download PDF

Info

Publication number
WO2017124692A1
WO2017124692A1 PCT/CN2016/086408 CN2016086408W WO2017124692A1 WO 2017124692 A1 WO2017124692 A1 WO 2017124692A1 CN 2016086408 W CN2016086408 W CN 2016086408W WO 2017124692 A1 WO2017124692 A1 WO 2017124692A1
Authority
WO
WIPO (PCT)
Prior art keywords
url
page
jump
form page
target
Prior art date
Application number
PCT/CN2016/086408
Other languages
English (en)
French (fr)
Inventor
王晓元
马宇峰
邓鸣捷
叶峻
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Publication of WO2017124692A1 publication Critical patent/WO2017124692A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Definitions

  • the present application relates to the field of computer technologies, and in particular, to the field of Internet technologies, and in particular, to a method and apparatus for finding a mapping relationship between a form page and a target page.
  • conversion is often used to describe the jump behavior from one page to another.
  • the user enters the page that the merchant wants the visitor to open on the website from the promotion page of the merchant (also called For the target page, such as registration, order placement, payment, etc., which is required to visit, it is a "conversion”.
  • Form conversions are conversions made through a form on the webpage that is primarily responsible for data collection. Conversions typically occur after a user’s input or click.
  • This kind of form conversion judgment method often only reflects the page view amount in isolation, can not express the real conversion behavior (such as easy to be cheated), and the target page of the successful conversion is various in form (such as prompt registration success, registration The next step, even the error page displayed when the server is unable to provide normal information, etc.), the single page access mode is difficult to accurately determine the form conversion behavior.
  • the shortcoming of the prior art is that the form page conversion is judged by a single page, and the mutual connection between the pages is not considered, so that the accuracy of determining the form conversion behavior is not high.
  • the purpose of this application is to propose an improved lookup form page and target page conversion.
  • the method and apparatus of the relationship solves the technical problems mentioned in the background section above.
  • the present application provides a method for finding a conversion relationship between a form page and a target page, the method comprising: parsing a page access log, and decomposing a uniform resource locator URL and a jump URL group of all accessed pages.
  • Each of the jump URL groups includes two URLs, and the page corresponding to the second URL is directly jumped from the page corresponding to the first URL; filtering the URLs of all accessed pages, including the form a URL of the page is added to the set of form page URLs; a form page jump pair set is generated according to the jump URL group of the first URL belonging to the form page URL set; and the form page jumps from the set to be filtered out a form page jump pair of the preset condition; the form page jump is generalized to the URL of the remaining form page jump pair in the set, and the target page URL is determined, wherein the target page URL is generalized The second URL; for each target page URL, the first URL that is generalized in at least one form page jump alignment is determined as the form page URL converted to the target page URL.
  • the jump URL group is obtained by: obtaining access request information for each accessed page URL; obtaining a link page URL linked to the current page URL from the access request information; The link page URL and the current page URL generate a jump URL group, wherein the link page URL is a first URL, and the current page URL is a second URL.
  • a page URL that satisfies one of the following conditions is filtered out and added to a form page URL set: the page URL matches a preset URL pattern; the page content corresponding to the page URL includes a preset keyword .
  • the step of jumping from the form page to the set to remove the form page jump pair that meets the preset condition comprises: acquiring the form page jump to the first form page jump pair in the set a standard object model of a URL; parsing the standard object model, if the attribute field of the standard object model includes a hyperlink attribute field, the form page jumps to a form page determined to meet a preset condition Jump to and filter out.
  • the form page jumps to generalize URLs in the remaining form page jump pairs in the set, and determining the target page URL includes: jumping the form page to each of the remaining form pages in the set De-parameter processing of the URL in the jump pair; each form page jump pair after performing the de-parameter processing, merging the same second URL as the Target page URL.
  • the present application provides an apparatus for finding a conversion relationship between a form page and a target page, the device comprising: a decomposition module configured to parse the page access log, and decompose the uniform resource locator of all accessed pages.
  • the first screening module is configured For filtering the URLs of all accessed pages, adding the URL of the page containing the form to the form page URL set; and generating a module configured to jump the URL group according to the first URL belonging to the form page URL set Generating a form page jump pair set; a second filtering module configured to: jump from the form page to the set to remove the form page jump pair that meets the preset condition; the first determining module is configured to use the The form page jumps to generalize the URLs of the remaining form page jump pairs in the set, and determines the target page URL, wherein the target page URL is the generalized second URL; the second determining module Configured for various target page URL, it will be at least one form page to jump in after the first URL generalization is determined to form page URL to the target page URL transformation.
  • the decomposition module includes the following unit for obtaining the jump URL group: an access request information acquisition unit configured to acquire access request information for each accessed page URL; a link page URL acquisition unit a link page URL configured to obtain a link to the current page URL from the access request information, and a generating unit configured to generate a jump URL group by using the link page URL and the current page URL, where The link page URL is the first URL, and the current page URL is the second URL.
  • a page URL that satisfies one of the following conditions is filtered out and added to a form page URL set: the page URL matches a preset URL pattern; the page content corresponding to the page URL includes a preset keyword .
  • the second screening module includes: a standard object model obtaining unit configured to acquire a standard object model of the first URL of the form page jump pair of each form page jump pair in the set; And a screening unit configured to parse the standard object model, if the attribute field of the standard object model includes a hyperlink attribute field, the form page jumps to a form page determined to meet a preset condition Jump to and sieve except.
  • the first determining module includes: a processing unit configured to perform a parameter de-parameter processing on a form page jump to a URL in each of the remaining form page jump pairs in the set; a merging unit configured to be used Each form page jump pair after the parameter processing is performed, and the same second URL is merged as the target page URL.
  • the method and device for searching a form page and a target page conversion relationship provided by the present application, by parsing the page access log, decomposing the uniform resource locator URL and the jump URL group of all accessed pages, wherein each jump URL
  • the group includes two URLs, and the page corresponding to the second URL is directly jumped from the page corresponding to the first URL, and then the URLs of all the accessed pages are filtered, and the URL of the page containing the form is added to the form.
  • the generalized URL of the remaining form page jump pair is determined, and the target page URL is determined.
  • the target page URL is the generalized second URL, and then the target page URL is at least one form page.
  • the first URL that is generalized in the jump alignment is determined to jump to the form page URL of the target page URL, thereby determining at least one form page U that jumps to the target page URL. RL.
  • FIG. 1 illustrates an exemplary system architecture to which embodiments of the present application may be applied
  • FIG. 2 is a flow diagram of one embodiment of a method of finding a form page and a target page conversion relationship in accordance with the present application
  • FIG. 3 is a schematic diagram of a DOM tree structure of a page document
  • FIG. 4a and FIG. 4b are schematic diagrams showing an application scenario of a method for searching a form page and a target page conversion relationship according to the present application;
  • FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for finding a form page and a target page conversion relationship according to the present application
  • FIG. 6 is a schematic structural diagram of a computer system suitable for implementing an electronic device of an embodiment of the present application.
  • FIG. 1 illustrates an exemplary system architecture 100 in which embodiments of the present application may be applied.
  • system architecture 100 can include terminal devices 101, 102, network 103, and server 104.
  • the network 103 is used to provide a medium for communication links between the terminal devices 101, 102 and the server 104.
  • Network 103 may include various types of connections, such as wired, wireless communication links, fiber optic cables, and the like.
  • the terminal devices 101, 102 can interact with the server 104 over the network 103 to receive or transmit messages and the like.
  • Various communication client applications such as a browser application, a search application, a wealth management application, a shopping application, a map application, a social platform application, a mailbox client, an instant communication tool, etc., may be installed on the terminal devices 101 and 102. .
  • the terminal devices 101, 102 may be various electronic devices supported by a browser application or the like, including but not limited to smart phones, smart watches, tablets, personal digital assistants, e-book readers, MP3 players (Moving Picture) Experts Group Audio Layer III, dynamic video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV), portable laptops and desktop computers.
  • a browser application or the like, including but not limited to smart phones, smart watches, tablets, personal digital assistants, e-book readers, MP3 players (Moving Picture) Experts Group Audio Layer III, dynamic video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV), portable laptops and desktop computers.
  • Server 104 may be a server that provides various services.
  • the server 104 may be a background server or the like that provides support for a browser application of the terminal devices 101, 102, and the like.
  • Service The server can store, generate, and the like the received data, and feed back the processing result to the terminal device.
  • the method for searching for a form page and a target page conversion relationship provided by the embodiment of the present application is generally performed by the server 104, but it is not excluded that it can be executed by the terminal devices 101 and 102.
  • the device for searching the form page and the target page conversion relationship provided by the embodiment of the present application is generally disposed in the server 104, but is not excluded from being provided in the terminal devices 101 and 102.
  • terminal devices, networks, and servers in Figure 1 is merely illustrative. Depending on the implementation needs, there can be any number of terminal devices, networks, and servers.
  • a flow 200 of one embodiment of a method of finding a form page and a target page conversion relationship is shown.
  • This embodiment is mainly illustrated by the method being applied to an electronic device having a certain computing capability, and the electronic device may be, for example, the server 104 shown in FIG.
  • the method for finding a relationship between a form page and a target page includes the following steps:
  • Step 201 Parse the page access log, and decompose the uniform resource locator URL and the jump URL group of all accessed pages.
  • the electronic device can parse the page access log, decompose the URLs of all accessed pages (Uniform Resource Locator), and decompose multiple jumps according to the jump relationship between the pages.
  • URL group The Uniform Resource Locator URL is a compact representation of the location and access method of resources that can be obtained from the Internet. It is the address of a standard resource on the Internet. Each file on the Internet has a unique URL, such as Each page resource corresponds to a URL.
  • the terminal device running by the terminal device or the back-end server supporting the same can generate a page access log according to the page access record generated by the terminal device.
  • the page access log generated by the terminal device running by the application may include information such as a page accessed by the application device, an access time, and the like; and the page access log generated by the background server that supports the application may include the application being accessed by each terminal device.
  • the information such as the page and the access time may also include the page request information sent by the application received by the background server by each terminal device, the background server providing the response information of the related page display to each terminal device, and the page generated by one application of the application.
  • the above application may be, for example, a browser application or other application (for example, "Alipay") that can perform information push.
  • the electronic device can obtain the above page access log locally or remotely.
  • the page access log can be directly obtained from the local device; otherwise, the page access log can be obtained from the background server by using a wired connection or a wireless connection.
  • the above wireless connection methods include, but are not limited to, 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connection methods now known or developed in the future.
  • each jump URL group may include two URLs, for example, the jump URL group i is recorded as ⁇ URL i1 , URL i2 >, where URL i1 and URL i2 are respectively the first URL of the jump URL group i, The second URL.
  • the page corresponding to the URL i2 can be directly jumped from the page corresponding to the URL i1 .
  • the electronic device may sequentially decompose the URL corresponding to each page according to the page access order in the page access log, and determine the URL corresponding to the two adjacent accessed pages as a URL group.
  • a page access sequence to visit page access log application created "page A to page B to the page C” the electronic device may be decomposed A page corresponding to the URL “URL A”, page B corresponding to the URL “URL B “, the URL “URL C “ corresponding to page C, and the jump URL group " ⁇ URL A , URL B >" and " ⁇ URL B , URL C >”.
  • the electronic device may obtain the jump URL group by first acquiring the access request information of each accessed page URL; and then obtaining the link from the access request information to the current page.
  • the link page URL of the URL; then, the link page URL and the current page URL are used as a jump URL group, wherein the link page URL is the first URL, and the current page URL is the second URL.
  • the terminal device used by the user may first send the page request information to the background server, and the background server provides the related page resource to the terminal device according to the parsing of the page request information.
  • the URL of each visited page can correspond to a page request information.
  • the page request information may include information such as a page to be accessed, a processing method of the page, an access path, and the like, and the information may be included in a header file of the page access request.
  • the header file of the page access request often includes the Referer information, that is, from which page the page requested by the page access request is linked.
  • the page requested by the page access request is the current page, and the connected page is the link page.
  • the electronic device can obtain the source information to obtain the current page URL and the link page URL.
  • Step 202 Filter the URLs of all accessed pages, and add the URL of the page containing the form to the form page URL set.
  • the electronic device may analyze the URL pattern or page content of the single accessed page decomposed in step 201, thereby filtering out the URL of the page containing the form, and adding the form page URL set.
  • a form can include form tags, form fields, and form buttons, where the form tags are used to declare the form, can include the URL of the generic gateway interface used to process the form data, and the method by which the data is submitted to the server; the form field can include text One or more of a box, password box, hidden field, multi-line text box, check box, radio button, drop-down selection box, and file upload box; form buttons can include submit button, reset button, and custom button Etc., used to transfer data to the server's common gateway interface script or cancel input, you can also use the form button to control other processing tasks that define the processing script.
  • Forms in a page can be defined by form tags, such as the " ⁇ form>" tag used to create HTML (HyperText Markup Language) forms, and so on.
  • the page containing the form can also be reflected in the URL pattern of the page. For example, a page containing a form will generate a form submission URL when the form is submitted.
  • the form submission URL may include the form submission method (such as "POST").
  • the electronic device can adopt a method such as matching the content of the page with a preset keyword (such as a " ⁇ form>" tag keyword, etc.), and the URL and the preset mode (such as a form submission URL including a form submission method "POST")
  • the analysis method such as matching determines whether the page is a page containing a form, and will not be described here.
  • the electronic device can add the filtered URL of the page containing the form to the form page URL set.
  • Step 203 Generate a form page jump pair set according to the jump URL group whose first URL belongs to the form page URL set.
  • the electronic device may match the first URL in each jump URL group with the URL in the form page URL set, and if it matches the consistent URL, determine the first in the jump URL group.
  • the page corresponding to the URL is a page containing the form, thereby further determining that the page corresponding to the two URLs in the corresponding jump URL group may be the form page and the target page in which the conversion occurs, and further, the electronic device may be based on the jump URL. All jump URL groups matched in the group, generating a form page jump pair set.
  • the form page jump pair represents a pair of URLs, and the page corresponding to the second URL is jumped from the form page corresponding to the first URL.
  • Step 204 Jump from the form page to the set to remove the form page jump pair that meets the preset condition.
  • the electronic device may further filter the form page jump pair in the set by the form page jump according to the preset condition, so as to screen out the page jump to the target page that may not be completed by the form submission of the form page. turn.
  • pages containing forms may also contain other page content, such as hyperlinks.
  • the form page jumps to the two URLs in the pair, corresponding to the page containing the form, and the page that contains the form directly jumps to the page, but the jump here is not necessarily the form in the page containing the form.
  • the commit operation may also be caused by other operations, such as clicking on a hyperlink, which is not related to the conversion of the form page and the target page. Therefore, in the present embodiment, the jump caused by the form submitting operation in the page containing the form is excluded, thereby accurately determining the form page jump pair that causes the page jump to the form submitting operation in the page containing the form.
  • the electronic device may filter the form page jump pair in the set by the form page jump:
  • the electronic device can obtain a Document Object Model (DOM) that the form page jumps to the first URL corresponding page of each form page jump pair in the collection.
  • the Document Object Model DOM can define standard methods for accessing and manipulating documents corresponding to the above pages (such as Hypertext Markup Language HTML documents, Extensible Markup Language XML documents, etc.) and presenting the document with elements, attributes, and text.
  • a tree structure (such as a node tree) to represent the logical structure of the document, and a method for the application to access and process the document, as shown in Figure 3, is an example of a DOM tree structure, and the DOM tree structure of Figure 3.
  • the element 1 at the node 301 has the attributes as indicated by the node 302, and is a hyperlink;
  • the electronic device can then parse the above document object model, if the attribute field of the standard object model contains a hyperlink attribute field (such as HTML hyperlink "HTML href” and scripting language hyperlink “Javascript href”, etc.), the form page The jump jumps to and removes the form page determined to be in accordance with the preset condition.
  • a hyperlink attribute field such as HTML hyperlink "HTML href” and scripting language hyperlink “Javascript href”, etc.
  • HTML hyperlink
  • Step 205 The form page jumps to generalize the URLs of the remaining form page jump pairs in the set to determine the target page URL.
  • the electronic device may then generalize the form page jump to the URLs of the remaining form page jump pairs in the set, and determine the target page URL according to the generalized URL.
  • the target page URL is the second URL after the generalization of the form page jump alignment.
  • the protocol indicates the specified transport protocol.
  • the hostname indicates the Domain Name System (DNS) host name or IP (Internet Protocol) address of the server that stores the page resources.
  • DNS Domain Name System
  • IP Internet Protocol
  • the port indicates the service that stores resources in the host.
  • the specified transport protocol is "http”
  • the domain name system of the server storing the page resources is "www” .yydd.com”
  • the file address on the host is "landingpage/3gsem/message.html”
  • the generalization of the URL can be a process of removing the options in the URL and converting them into a standard format URL, for example: the following URL
  • the same page may have different options (such as web page delivery parameters, etc.) in the URL due to factors such as the page to which it jumps.
  • the generalization processing of the URL by the electronic device can remove the interference item in the URL, thereby facilitating accurate determination of whether the page containing the form is converted to the target page.
  • the target page may be pre-set by the electronic device, or the electronic device may determine the comparison of the second URL after the generalization by the generalized form page jump.
  • the electronic device may determine by selecting a target page URL set and matching each of the form page jumps to the second generalized URL and the preset target page URL set.
  • the second URL may be determined as the target page URL, and in step 203, the corresponding form page jumps to the two URLs in the pair.
  • the corresponding pages are a form page and a target page, and the form page jumps a conversion that can represent a form page to the target page.
  • the electronic device may further de-parameterize the form page jump to the URLs of the remaining form page jump pairs in the set (eg, remove all parameter items), and then perform de-parameterized processing of each form.
  • the page jumps to merge the same second URL as the target page URL, and the corresponding page is the target page.
  • Step 206 Determine, for each target page URL, a first URL that is generalized in at least one form page jump alignment in which it is located, and a form page URL converted to the target page URL.
  • the electronic device may determine each target page determined in step 205.
  • the URL jumps from the form page after the filtering in step 204 to find at least one form page jump pair in the set, and jumps at least one form page in the jump to the first URL that is generalized in the pair. Determine the form page URL to be converted to the target page URL.
  • the electronic device can establish a mapping relationship between the form page URL and the target page URL.
  • Each target page URL may correspond to one or more form page URLs.
  • the form page URL may be merged, counted, etc., for calculating the form page and the target page conversion rate.
  • FIG. 4a and FIG. 4b show the background server for searching for the conversion relationship between the form page and the target page in the embodiment, which is applied to the background server of the webpage advertisement information promotion website, or the analysis system of the advertisement distributor. Etc., it can determine the page conversion relationship in the webpage advertisement, and can be further used for calculating the conversion rate of different form pages to the target page.
  • FIG. 4a an illustration of the conversion of the form page 401 to the target page 402 is given
  • FIG. 4b an illustration of the conversion of the form page 403 to the target page 404 is given.
  • Two different target pages are shown in FIG. 4a and FIG. 4b.
  • the target page 402 is a page that the jump operation to the form page 401 after the form submit operation is successful.
  • the target page 404 is the form page 403 after the form submission operation. Go to another form page.
  • the user can open various pages, such as a form page 401, a form page 403, etc., through a browser application run by the terminal device.
  • the electronic device to which the method of the embodiment is applied may obtain a page access log of the browser from a background server that provides support for the browser application, parse the page access log, and decompose the URL and the jump URL group of all accessed pages.
  • Each of the jump URL groups may include two URLs, and the page corresponding to the second URL is directly jumped from the page corresponding to the first URL.
  • the electronic device may filter the URLs of all accessed pages, and add the URLs of the pages including the forms (such as the form page 401 and the URL of the form page 403) to the form page URL set. Then, the electronic device can match the first URL in the jump URL group with the form page URL set, and if the matching in the form page URL set matches the first URL in the at least one jump URL group, the jumps are The URL group generates a form page to jump to the collection. Next, the electronic device can be from the form The page jump filters the pair of form pages that match the preset conditions (such as the hyperlink corresponding to the first URL).
  • the electronic device can generalize the form page jump to the URLs of the remaining form page jump pairs in the set, and determine the target page URL from the generalized second URL (eg, the target page 402, the target page) 404, etc.). Then, the electronic device may determine, for each target page URL, the first URL that is generalized in at least one form page jump alignment in which it is located as the form page URL converted to the target page URL. Optionally, the electronic device may establish a mapping relationship with the target page URL according to the determined form page URL, and further calculate a conversion rate of each form page to the target page.
  • the conversion rate may be a probability that the form page jumps to the target page, and may be calculated by a known calculation method such as the number of conversions of the form page a to the target page b divided by the number of times the form page a is opened, etc., Let me repeat. Further, in the application scenario, the electronic device may calculate an advertisement including the advertisement of the same advertisement information provider through the form page A and the form page B, and the conversion rate respectively converted to the target page C, to the advertisement information distributor Provide a reference for ad serving.
  • the present application provides an embodiment of an apparatus for finding a mapping relationship between a form page and a target page, and the apparatus embodiment and the method embodiment shown in FIG.
  • the device can be specifically applied to an electronic device.
  • the device 500 for searching a form page and a target page conversion relationship in this embodiment includes: a decomposition module 501, a first screening module 502, a generation module 503, a second screening module 504, and a first determination module 505. And a second determining module 506.
  • the decomposition module 501 can be configured to parse the page access log, and decompose the uniform resource locator URL and the jump URL group of all accessed pages, where each jump URL group includes two URLs, and the second The page corresponding to the URL is directly jumped from the page corresponding to the first URL; the first screening module 502 can be configured to filter the URLs of all accessed pages, and add the URL of the page containing the form to the form page URL set.
  • the generating module 503 can be configured to generate a form page jump pair set according to the jump URL group of the first URL belonging to the form page URL set; the second screening module 504 can be configured The form page jump pair that meets the preset condition is filtered out from the form page by the form page jump; the first determining module 505 can be configured to jump the form page to the URL of the remaining form page jump pairs in the set.
  • Generalization determining a target page URL, wherein the target page URL is a generalized second URL; the second determining module 506 can be configured to jump to the at least one form page of the respective target page URLs
  • the first URL that is generalized is determined as the form page URL that is translated to the target page URL.
  • modules or units recited in the apparatus 500 for finding a form page and target page conversion relationship correspond to the various steps in the method described with reference to FIG.
  • the operations and features described above for the method are equally applicable to the apparatus 500 for finding a form page and a target page conversion relationship and the modules or units included therein, and are not described herein again.
  • the apparatus 500 for finding a form page and target page conversion relationship described above also includes other well-known structures, such as processors, memories, etc., in order to unnecessarily obscure the embodiments of the present disclosure, such well-known structures are illustrated. Not shown in 5.
  • FIG. 6 a block diagram of a computer system 600 suitable for use in implementing the electronic device of the embodiments of the present application is shown.
  • computer system 600 includes a central processing unit (CPU) 601 that can be loaded into a program in random access memory (RAM) 603 according to a program stored in read only memory (ROM) 602 or from storage portion 608. And perform various appropriate actions and processes.
  • RAM random access memory
  • ROM read only memory
  • RAM random access memory
  • various programs and data required for the operation of the system 600 are also stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also coupled to bus 604.
  • the following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, etc.; an output portion 607 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 608 including a hard disk or the like. And a communication portion 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the Internet.
  • Driver 610 is also coupled to I/O interface 605 as needed.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 610 as needed so that a computer program read therefrom is installed into the storage portion 608 as needed.
  • an embodiment of the present application includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart.
  • the computer program can be downloaded and installed from the network via communication portion 609, and/or installed from removable media 611.
  • the units involved in the embodiments of the present application may be implemented by software or by hardware.
  • the described module may also be disposed in the processor.
  • the processor may be configured as: a processor, a first screening module, a generating module, a second screening module, a first determining module, and a second determining module.
  • the name of these modules does not constitute a limitation on the module itself in some cases.
  • the decomposition module can also be described as "configured to parse the page access log and decompose the uniform resource location of all accessed pages. The module for the URL and the jump URL group.”
  • the present application further provides a computer readable storage medium, which may be a computer readable storage medium included in the apparatus described in the foregoing embodiment, or may exist separately, not A computer readable storage medium that is assembled into a terminal.
  • the computer readable storage medium stores one or more programs that, when executed by one or more processors, cause the device to: parse the page access log to resolve the unification of all accessed pages a resource locator URL and a jump URL group, wherein each of the jump URL groups includes two URLs, and the page corresponding to the second URL is directly jumped from the page corresponding to the first URL; Accessing the URL of the page for filtering, adding a URL of the page containing the form to the set of form page URLs; generating a form page jump pair set according to the jump URL group of the first URL belonging to the form page URL set; The form page jumps the form page jump pair that matches the preset condition in the set; the form page jumps to generalize the URLs of the remaining form page jump pairs in the set, and determines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本申请公开了一种查找表单页面和目标页面转化关系的方法和装置。所述方法的一具体实施方式包括:对页面访问日志进行解析,分解出所有被访问页面的URL和跳转URL组;对所有URL进行筛选,将包含表单的页面的URL加入表单页面URL集合;根据第一个URL属于表单页面URL集合的跳转URL组生成表单页跳转对集合;从表单页跳转对集合中筛除符合预设条件的表单页跳转对;将表单页跳转对集合中剩余各个表单页跳转对中的URL进行泛化,确定目标页面URL;对各个目标页面URL,将其对应的经过泛化的第一个URL确定为向该目标页面URL转化的表单页面URL。该实施方式可以提高查找表单页面和目标页面转化关系的准确性。

Description

查找表单页面和目标页面转化关系的方法和装置
相关申请的交叉引用
本申请要求于2016年1月20日提交的中国专利申请号为“201610037371.6”的优先权,其全部内容作为整体并入本申请中。
技术领域
本申请涉及计算机技术领域,具体涉及互联网技术领域,尤其涉及查找表单页面和目标页面转化关系的方法和装置。
背景技术
在互联网领域,往往通过“转化”来描述从一个页面到另一个页面的跳转行为,例如,在电商领域,如果用户从商户的推广页面进入商户希望访客在网站上打开的页面(也称为目标页面,如进行注册、下订单、付款等所需访问的页面),就是一次“转化”。表单转化,是通过网页中主要负责数据采集功能的表单来实现的转化,转化通常发生在用户的输入操作或点击操作之后。现有的表单转化页面捕捉方法中,往往通过对用户打开的表单页或转化成功的目标页面进行单页面统计,来确定表单转化行为是否发生。这种表单转化的判断方法,往往只是孤立的反映出页面浏览量,无法表现出真实转化行为(比如容易被作弊),而且由于转化成功的目标页面的形式多种多样(比如提示注册成功,注册到达下一步骤,甚至是服务器无法提供正常信息时显示的错误页面等),单页面访问模式难以准确判断表单转化行为。现有技术存在的缺陷是:通过单一页面判断表单页面转化,没有考虑页面之间的相互联系,从而导致确定表单转化行为的准确度不高。
发明内容
本申请的目的在于提出一种改进的查找表单页面和目标页面转化 关系的方法和装置,来解决以上背景技术部分提到的技术问题。
一方面,本申请提供了一种查找表单页面和目标页面转化关系的方法,所述方法包括:对页面访问日志进行解析,分解出所有被访问页面的统一资源定位符URL和跳转URL组,其中,每个所述跳转URL组包括两个URL,且第二个URL对应的页面由第一个URL对应的页面直接跳转而来;对所有被访问页面的URL进行筛选,将包含表单的页面的URL加入表单页面URL集合;根据所述第一个URL属于所述表单页面URL集合的跳转URL组生成表单页跳转对集合;从所述表单页跳转对集合中筛除符合预设条件的表单页跳转对;将所述表单页跳转对集合中剩余各个表单页跳转对中的URL进行泛化,确定目标页面URL,其中,所述目标页面URL为泛化后的第二个URL;对各个目标页面URL,将其所在的至少一个表单页跳转对中经过泛化的第一个URL确定为向该目标页面URL转化的表单页面URL。
在一些实施例中,所述跳转URL组通过以下方法获得:获取每个被访问页面URL的访问请求信息;从所述访问请求信息中获取链接到当前页面URL的链接页面URL;将所述链接页面URL和所述当前页面URL生成一个跳转URL组,其中,所述链接页面URL为第一个URL,所述当前页面URL为第二个URL。
在一些实施例中,将满足下列条件之一的页面URL筛选出来加入表单页面URL集合:所述页面URL与预设的URL模式相匹配;所述页面URL对应的页面内容包含预设的关键字。
在一些实施例中,所述从所述表单页跳转对集合中筛除符合预设条件的表单页跳转对包括:获取所述表单页跳转对集合中各个表单页跳转对的第一个URL的标准对象模型;对所述标准对象模型进行解析,如果所述标准对象模型的属性字段中包括超链接属性字段,将所述表单页跳转对确定为符合预设条件的表单页跳转对并筛除。
在一些实施例中,所述将所述表单页跳转对集合中剩余各个表单页跳转对中的URL进行泛化,确定目标页面URL包括:将表单页跳转对集合中剩余各个表单页跳转对中的URL进行去参数处理;对进行去参数处理后的各个表单页跳转对,合并相同的第二个URL作为所述 目标页面URL。
第二方面,本申请提供了一种查找表单页面和目标页面转化关系的装置,所述装置包括:分解模块,配置用于对页面访问日志进行解析,分解出所有被访问页面的统一资源定位符URL和跳转URL组,其中,每个所述跳转URL组包括两个URL,且第二个URL对应的页面由第一个URL对应的页面直接跳转而来;第一筛选模块,配置用于对所有被访问页面的URL进行筛选,将包含表单的页面的URL加入表单页面URL集合;生成模块,配置用于根据所述第一个URL属于所述表单页面URL集合的跳转URL组生成表单页跳转对集合;第二筛选模块,配置用于从所述表单页跳转对集合中筛除符合预设条件的表单页跳转对;第一确定模块,配置用于将所述表单页跳转对集合中剩余各个表单页跳转对中的URL进行泛化,确定目标页面URL,其中,所述目标页面URL为泛化后的第二个URL;第二确定模块,配置用于对各个目标页面URL,将其所在的至少一个表单页跳转对中经过泛化的第一个URL确定为向该目标页面URL转化的表单页面URL。
在一些实施例中,所述分解模块包括用于获得所述跳转URL组的下列单元:访问请求信息获取单元,配置用于获取每个被访问页面URL的访问请求信息;链接页面URL获取单元,配置用于从所述访问请求信息中获取链接到当前页面URL的链接页面URL;生成单元,配置用于将所述链接页面URL和所述当前页面URL生成一个跳转URL组,其中,所述链接页面URL为第一个URL,所述当前页面URL为第二个URL。
在一些实施例中,将满足下列条件之一的页面URL筛选出来加入表单页面URL集合:所述页面URL与预设的URL模式相匹配;所述页面URL对应的页面内容包含预设的关键字。
在一些实施例中,所述第二筛选模块包括:标准对象模型获取单元,配置用于获取所述表单页跳转对集合中各个表单页跳转对的第一个URL的标准对象模型;确定及筛除单元,配置用于对所述标准对象模型进行解析,如果所述标准对象模型的属性字段中包括超链接属性字段,将所述表单页跳转对确定为符合预设条件的表单页跳转对并筛 除。
在一些实施例中,所述第一确定模块包括:处理单元,配置用于将表单页跳转对集合中剩余各个表单页跳转对中的URL进行去参数处理;合并单元,配置用于对进行去参数处理后的各个表单页跳转对,合并相同的第二个URL作为所述目标页面URL。
本申请提供的查找表单页面和目标页面转化关系的方法和装置,通过对页面访问日志进行解析,分解出所有被访问页面的统一资源定位符URL和跳转URL组,其中,每个跳转URL组包括两个URL,且第二个URL对应的页面是由第一个URL对应的页面直接跳转而来的,接着对所有被访问页面的URL进行筛选,将包含表单的页面的URL加入表单页面URL集合,然后根据第一个URL属于表单页面URL集合的跳转URL组生成表单页跳转对集合,接着从表单页跳转对集合中筛除符合预设条件的表单页跳转对,将剩余各个表单页跳转对中的URL进行泛化,确定目标页面URL,其中,目标页面URL为泛化后的第二个URL,接着对各个目标页面URL,将其所在的至少一个表单页跳转对中经过泛化的第一个URL确定为跳转到该目标页面URL的表单页面URL,由此,确定出了至少一个跳转到目标页面URL的表单页面URL。由于充分考虑了页面之间的跳转关系,这种查找表单页面和目标页面转化关系的方法和装置提高了确定表单转化行为的准确度。
附图说明
通过阅读参照以下附图所作的对非限制性实施例的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1示出了可以应用本申请实施例的示例性系统架构;
图2是根据本申请的查找表单页面和目标页面转化关系的方法的一个实施例的流程图;
图3是页面文档的DOM树结构的示意图;
图4a、图4b是根据本申请的查找表单页面和目标页面转化关系的方法的一个应用场景的示意图;
图5是根据本申请的查找表单页面和目标页面转化关系的装置的一个实施例的结构示意图;
图6是适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
图1示出了可以应用本申请实施例的示例性系统架构100。
如图1所示,系统架构100可以包括终端设备101、102、网络103和服务器104。网络103用以在终端设备101、102和服务器104之间提供通信链路的介质。网络103可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
终端设备101、102可以通过网络103与服务器104交互,以接收或发送消息等。终端设备101、102上可以安装有各种通讯客户端应用,例如浏览器应用、搜索类应用、理财类应用、购物类应用、地图类应用、社交平台应用、邮箱客户端、即时通信工具等等。
终端设备101、102可以是支持浏览器应用等安装于其上的各种电子设备,包括但不限于智能手机、智能手表、平板电脑、个人数字助理、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。
服务器104可以是提供各种服务的服务器。例如服务器104可以是对终端设备101、102的浏览器应用等提供支持的后台服务器等。服 务器可以对接收到的数据进行存储、生成等处理,并将处理结果反馈给终端设备。
需要说明的是,本申请实施例所提供的查找表单页面和目标页面转化关系的方法一般通过服务器104执行,但不排除可以通过终端设备101、102执行。相应地,本申请实施例所提供的查找表单页面和目标页面转化关系的装置一般设于服务器104中,但不排除可以设于终端设备101、102中。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
请参考图2,其示出了查找表单页面和目标页面转化关系的方法的一个实施例的流程200。本实施例主要以该方法应用于有一定运算能力的电子设备中来举例说明,该电子设备例如可以是图1示出的服务器104。该查找表单页面和目标页面转化关系的方法,包括以下步骤:
步骤201,对页面访问日志进行解析,分解出所有被访问页面的统一资源定位符URL和跳转URL组。
在本实施例中,电子设备可以对页面访问日志进行解析,分解出所有被访问页面的URL(Uniform Resource Locator,统一资源定位符),并根据页面之间的跳转关系分解出多个跳转URL组。其中,统一资源定位符URL是对可以从互联网上得到的资源的位置和访问方法的一种简洁的表示,是互联网上标准资源的地址,互联网中的每个文件都有一个唯一的URL,例如,每个页面资源都对应一个URL。
对于终端设备上运行的各种应用而言,往往包括多个页面,其所运行的终端设备或为其提供支持的后台服务器可以根据其产生的页面访问记录生成页面访问日志。应用所运行的终端设备生成的页面访问日志可以包括应用通过终端设备所访问过的页面、访问时间等信息;为应用提供支持的后台服务器生成的页面访问日志可以包括应用通过各终端设备所访问过的页面、访问时间等信息,也可以包括后台服务器接收到的应用通过各终端设备发送的页面请求信息、后台服务器向各终端设备提供相关页面显示的响应信息、应用的一次访问产生的页 面访问序列(如页面A到页面B到页面C)等等。其中,上述应用例如可以是浏览器应用或可进行信息推送的其他应用(例如“支付宝”)等。电子设备可以从本地或远程地获取上述页面访问日志。具体地,当上述电子设备是为上述应用提供支持的后台服务器时,其可以直接从本地获取上述页面访问日志;否则,其可以通过有线连接方式或者无线连接方式从上述后台服务器获取上述页面访问日志。上述无线连接方式包括但不限于3G/4G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、UWB(ultra wideband)连接、以及其他现在已知或将来开发的无线连接方式。
这里,每个跳转URL组可以包括两个URL,例如跳转URL组i记为<URLi1,URLi2>,其中,URLi1、URLi2分别为跳转URL组i的第一个URL、第二个URL。URLi2对应的页面可以由URLi1对应的页面直接跳转而来。电子设备可以根据页面访问日志中的页面访问次序,依次分解出各个页面对应的URL,并将相邻两个被访问页面对应的URL确定为一个URL组。例如,对于页面访问日志中应用的一次访问产生的页面访问序列“页面A到页面B到页面C”,电子设备可以分解出页面A对应的URL“URLA”、页面B对应的URL“URLB”、页面C对应的URL“URLC”,以及跳转URL组“<URLA,URLB>”和“<URLB,URLC>”。
在本实施例的一些可选实现方式中,电子设备可以通过以下方式获得跳转URL组:首先,获取每个被访问页面URL的访问请求信息;接着,从访问请求信息中获取链接到当前页面URL的链接页面URL;然后,将链接页面URL和当前页面URL作为一个跳转URL组,其中,链接页面URL为第一个URL,当前页面URL为第二个URL。可以理解,用户通过互联网访问每个页面时,用户所使用的终端设备可以先向后台服务器发送页面请求信息,后台服务器根据对页面请求信息的解析,向终端设备提供相关页面资源。相应地,每个被访问页面的URL都可以对应一个页面请求信息。该页面请求信息可以包括所要访问的页面、页面的处理方式、访问路径等信息,这些信息可以包括在页面访问请求的头文件中。例如,当浏览器向网页服务器发送页面访问请 求的时候,页面访问请求的头文件里往往会包括Referer(访问来源)信息,即,是页面访问请求所请求的页面从哪个页面链接过来的。这里,页面访问请求所请求的页面为当前页面,所连接过来的页面为链接页面。电子设备可以获取该来源信息,从而获得当前页面URL和链接页面URL。
步骤202,对所有被访问页面的URL进行筛选,将包含表单的页面的URL加入表单页面URL集合。
在本实施例中,电子设备可以对步骤201中分解出的单个被访问页面的URL模式或页面内容进行分析,从而筛选出包含表单的页面的URL,加入表单页面URL集合。
表单在页面中可以负责数据采集功能。一般而言,一个表单可以包括表单标签、表单域和表单按钮,其中:表单标签用于申明表单,可以包含处理表单数据所用通用网关接口的URL以及数据提交到服务器的方法;表单域可以包括文本框、密码框、隐藏域、多行文本框、复选框、单选框、下拉选择框和文件上传框等中的一项或多项;表单按钮可以包括提交按钮、复位按钮和自定义按钮等,用于将数据传送到服务器通用网关接口脚本或者取消输入,还可以用表单按钮来控制其他定义了处理脚本的处理工作。页面中的表单可以通过表单标签来定义,例如用于创建HTML(HyperText Markup Language,超文本标记语言)表单的“<form>”标签等等。包含表单的页面也可以在页面的URL模式中体现,例如一个包含表单的页面在表单提交时会生成一个表单提交URL,该表单提交URL可能会包含表单提交的方式(如“POST”)。
由此,电子设备可以通过采用诸如将页面的内容与预设关键字(如“<form>”标签关键字等)匹配、将URL与预设模式(如表单提交URL包含表单提交方式“POST”)匹配之类的分析方法确定页面是否为包含表单的页面,在此不再赘述。电子设备可以将筛选出的包含表单的页面的URL加入表单页面URL集合。
步骤203,根据第一个URL属于表单页面URL集合的跳转URL组生成表单页跳转对集合。
在本实施例中,电子设备可以将各个跳转URL组中的第一个URL与表单页面URL集合中的URL进行匹配,如果匹配到相一致的URL,则确定跳转URL组中的第一个URL对应的页面为包含表单的页面,从而进一步确定相应的跳转URL组中的两个URL对应的页面可能为发生转化中的表单页面和目标页面,进而,电子设备可以根据在跳转URL组中匹配到的所有跳转URL组,生成表单页跳转对集合。这里,表单页跳转对表示一对URL中,第二个URL对应的页面由第一个URL对应的表单页面跳转而来。
步骤204,从表单页跳转对集合中筛除符合预设条件的表单页跳转对。
在本实施例中,电子设备可以进一步根据预设条件对表单页跳转对集合中的表单页跳转对进行筛选,以筛除可能不是由表单页面的表单提交完成的向目标页面的页面跳转。
本领域技术人员可以理解,包含表单的页面同时也可能包含其他页面内容,如超链接等。此时,虽然表单页跳转对中的两个URL,分别对应包含表单的页面,和包含表单的页面直接跳转到的页面,但这里的跳转不一定是对包含表单的页面中的表单提交操作引起的,还可能是其他操作,如对超链接的点击等操作引起的,这种跳转与表单页面和目标页面的转化无关。因此,在本实施例中排除不是对包含表单的页面中的表单提交操作引起的跳转,从而准确确定出对包含表单的页面中的表单提交操作引起页面跳转的表单页跳转对。
在本实施例的一些可选实现方式中,电子设备可以通过以下方法对表单页跳转对集合中的表单页跳转对进行筛选:
首先,电子设备可以获取表单页跳转对集合中各个表单页跳转对的第一个URL对应页面的文档对象模型(Document Object Model,DOM)。其中,文档对象模型DOM可以定义访问和操作上述页面对应的文档(如超文本标记语言HTML文档、可扩展标记语言XML文档等)的标准方法,并将文档呈现为带有元素、属性和文本的树结构(如节点树),来表示文档的逻辑结构,以及应用访问和处理文档的方法,如图3所示,是一个DOM树结构的示例,图3的DOM树结构 中,节点301处的元素1,其属性如节点302所示,为超链接;
然后,电子设备可以对上述文档对象模型进行解析,如果标准对象模型的属性字段中包含超链接属性字段(如HTML超链接“HTML href”和脚本语言超链接“Javascript href”等),将表单页跳转对确定为符合预设条件的表单页跳转对并筛除。其中,页面的超链接属性字段可以包括但不限于以下至少一项:HTML超链接“HTML href”属性字段“<a href=‘xxx’></a>”(可以用于指定超链接目标的URL)、“Javascript href”属性字段“window.location.href”(用于表示超链接所指的URL页面)、“Javascript href”属性字段“window.history.back”(用于表示页面是由其他页面后退返回的前一页面)、“Javascript href”属性字段“window.navigate”(用于表示从其他页面跳转到的参数指定页面)、“Javascript href”属性字段“self.location”(用于表示由当前页面打开超链接所指的超链接对应的页面)、“Javascript href”属性字段“top.location”(用于表示从顶层页面打开超链接所指的页面)等等。
步骤205,将表单页跳转对集合中剩余各个表单页跳转对中的URL进行泛化,确定目标页面URL。
在本实施例中,电子设备可以接着将表单页跳转对集合中剩余各个表单页跳转对中的URL进行泛化处理,并根据泛化后的URL确定目标页面URL。其中,目标页面URL为表单页跳转对中泛化后的第二个URL。
URL的一般语法格式为(带方括号[]的为可选项):
protocol://hostname[:port]/path/[;parameters][?query]#fragment;
其中,protocol表示指定的传输协议,hostname表示存放页面资源的服务器的域名系统(Domain Name System,DNS)主机名或IP(Internet Protocol,网络间的互联协议)地址,port表示主机中存放资源的服务端口号,path表示主机上的一个目录或文件地址,parameters表示指定的特殊参数,query表示网页的传递参数,如果有多个参数,用“&”符号隔开,每个参数的名和值用“=”符号隔开,fragment代表字符串,用于指定页面资源中的片断。例如,URL “http://www.yydd.com/landingpage/3gsem/message.html?u=137****5423”中,指定的传输协议为“http”,存放页面资源的服务器的域名系统为“www.yydd.com”,主机上的文件地址为“landingpage/3gsem/message.html”,网页的传递参数为“u=137****5423”。
对URL的泛化处理,可以是将URL中的可选项去除从而变换成标准格式的URL的过程,例如:将如下URL
“http://www.yydd.com/landingpage/3gsem/message.html?u=137****5423”去除可选项网页的传递参数项后变换为
“http://www.yydd.com/landingpage/3gsem/message.html”。
在实践中,同一个页面由于向其跳转的页面等因素的不同,URL中的可选项(例如网页的传递参数项等)可能会有所区别。本实施例中电子设备对URL的泛化处理,可以去除URL中的干扰项,从而有利于准确确定包含表单的页面是否向目标页面发生转化。
在这里,目标页面可以是电子设备预先设定的,也可以是电子设备通过泛化后的各表单页跳转对中泛化后的第二个URL对比确定。例如,在一些实现中,电子设备可以通过预设目标页面URL集合,并将各表单页跳转对中泛化后的第二个URL与预设的目标页面URL集合进行匹配来确定,将如果在预设的目标页面URL集合中匹配到泛化后的第二个URL,则可以据此确定该第二个URL为目标页面URL,结合步骤203,相应表单页跳转对中的两个URL对应的页面分别是表单页面和目标页面,该表单页跳转对可以代表一次表单页面向目标页面发生的转化。在另一些实现中,电子设备还可以将表单页跳转对集合中剩余各个表单页跳转对中的URL进行去参数处理(例如去除所有参数项),然后对进行去参数处理后的各个表单页跳转对,合并相同的第二个URL作为目标页面URL,对应的页面为目标页面。
步骤206,对各个目标页面URL,将其所在的至少一个表单页跳转对中经过泛化的第一个URL确定为向该目标页面URL转化的表单页面URL。
在本实施例中,电子设备可以对步骤205中确定的各个目标页面 URL,从经过步骤204的筛选之后的表单页跳转对集合中查找其所在的至少一个表单页跳转对,并将其所在的至少一个表单页跳转对中经过泛化的第一个URL确定为向该目标页面URL转化的表单页面URL。
经过该步骤,电子设备可以建立表单页面URL与目标页面URL的映射关系。每个目标页面URL可能对应一个或多个表单页面URL。同时,由于经过泛化后的第一个URL可能存在重复,从而可以对表单页面URL进行合并、统计等处理,以用于计算表单页面和目标页面转化率等。
参考图4a、4b,作为一个应用场景,图4a、4b给出了本实施例的查找表单页面和目标页面转化关系的方法应用于网页广告信息推广网站的后台服务器,或者广告投放者的分析系统等,其可以进行网页广告中的页面转化关系确定,并可进一步用于进行不同表单页面到目标页面的转化率计算等。其中,在图4a中,给出了表单页面401向目标页面402转化的示意;在图4b中,给出了表单页面403向目标页面404转化的示意。图4a和图4b中给出了两种不同的目标页面,目标页面402是表单页面401进行表单提交操作后跳转到的操作成功的页面,目标页面404是表单页面403进行表单提交操作后跳转到的另一个表单页面。
在如4a、4b示出的应用场景中,用户可以通过终端设备所运行的浏览器应用打开各种页面,例如表单页面401、表单页面403等。本实施例的方法所适用的电子设备可以从为浏览器应用提供支持的后台服务器获取浏览器的页面访问日志,并对页面访问日志进行解析,分解出所有被访问页面的URL和跳转URL组,其中,每个跳转URL组可以包括两个URL,且第二个URL对应的页面由第一个URL对应的页面直接跳转而来。接着,上述电子设备可以对所有被访问页面的URL进行筛选,将包含表单的页面的URL(如表单页面401、表单页面403的URL)加入表单页面URL集合。然后,电子设备可以将跳转URL组中的第一个URL与表单页面URL集合进行匹配,如果在表单页面URL集合中匹配到至少一个跳转URL组中的第一个URL,将这些跳转URL组生成表单页跳转对集合。接着,电子设备可以从表单 页跳转对集合中筛除符合预设条件(如第一个URL对应的页面中包含超链接等)的表单页跳转对。然后,电子设备可以将表单页跳转对集合中剩余各个表单页跳转对中的URL进行泛化,从泛化后的第二个URL中确定出目标页面URL(如目标页面402、目标页面404等)。接着,电子设备可以针对各个目标页面URL,将其所在的至少一个表单页跳转对中经过泛化的第一个URL确定为向该目标页面URL转化的表单页面URL。可选地,电子设备可以根据所确定的表单页面URL与目标页面URL建立映射关系,并可进一步计算各表单页面到目标页面的转化率。这里,转化率可以是表单页面向目标页面跳转的概率,其可以通过诸如表单页面a向目标页面b转化的次数除以表单页面a被打开的次数等公知的计算方法进行计算,在此不再赘述。进一步地,在该应用场景中,电子设备可以计算包括同一广告信息投放者分别通过表单页面A和表单页面B的投放的广告,其分别向目标页面C转化的转化率,给广告信息投放者的广告投放提供参考。
本申请的上述实施例充分考虑页面之间的联系,提高了查找表单页面和目标页面转化关系的准确性。
进一步参考图5,作为对上述各图所示方法的实现,本申请提供了一种查找表单页面和目标页面转化关系的装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于电子设备中。
如图5所示,本实施例所述的查找表单页面和目标页面转化关系的装置500包括:分解模块501、第一筛选模块502、生成模块503、第二筛选模块504、第一确定模块505及第二确定模块506。其中,分解模块501可以配置用于对页面访问日志进行解析,分解出所有被访问页面的统一资源定位符URL和跳转URL组,其中,每个跳转URL组包括两个URL,且第二个URL对应的页面由第一个URL对应的页面直接跳转而来;第一筛选模块502可以配置用于对所有被访问页面的URL进行筛选,将包含表单的页面的URL加入表单页面URL集合;生成模块503可以配置用于根据第一个URL属于表单页面URL集合的跳转URL组生成表单页跳转对集合;第二筛选模块504可以配置用 于从表单页跳转对集合中筛除符合预设条件的表单页跳转对;第一确定模块505可以配置用于将表单页跳转对集合中剩余各个表单页跳转对中的URL进行泛化,确定目标页面URL,其中,目标页面URL为泛化后的第二个URL;第二确定模块506可以配置用于对各个目标页面URL,将其所在的至少一个表单页跳转对中经过泛化的第一个URL确定为向该目标页面URL转化的表单页面URL。
值得说明的是,查找表单页面和目标页面转化关系的装置500中记载的诸模块或单元与参考图2描述的方法中的各个步骤相对应。由此,上文针对方法描述的操作和特征同样适用于查找表单页面和目标页面转化关系的装置500及其中包含的模块或单元,在此不再赘述。
本领域技术人员可以理解,上述查找表单页面和目标页面转化关系的装置500还包括一些其他公知结构,例如处理器、存储器等,为了不必要地模糊本公开的实施例,这些公知的结构在图5中未示出。
下面参考图6,其示出了适于用来实现本申请实施例的电子设备的计算机系统600的结构示意图。
如图6所示,计算机系统600包括中央处理单元(CPU)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有系统600操作所需的各种程序和数据。CPU 601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。
特别地,根据本申请的实施例,上文参考流程图描述的过程可以 被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,所述计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。
本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的模块也可以设置在处理器中,例如,可以描述为:一种处理器包括分解模块、第一筛选模块、生成模块、第二筛选模块、第一确定模块及第二确定模块。其中这些模块的名称在某种情况下并不构成对该模块本身的限定,例如,分解模块还可以被描述为“配置用于对页面访问日志进行解析,分解出所有被访问页面的统一资源定位符URL和跳转URL组的模块”。
作为另一方面,本申请还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施例中所述装置中所包含的计算机可读存储介质;也可以是单独存在,未装配入终端中的计算机可读存储介质。所述计算机可读存储介质存储有一个或者一个以上程序,当所述程序被一个或者一个以上的处理器执行时,使得所述设备:对页面访问日志进行解析,分解出所有被访问页面的统一资源定位符URL和跳转URL组,其中,每个所述跳转URL组包括两个URL,且第二个URL对应的页面由第一个URL对应的页面直接跳转而来;对所有被访问页面的URL进行筛选,将包含表单的页面的URL加入表单页面URL集合;根据所述第一个URL属于所述表单页面URL集合的跳转URL组生成表单页跳转对集合;从所述表单页跳转对集合中筛除符合预设条件的表单页跳转对;将所述表单页跳转对集合中剩余各个表单页跳转对中的URL进行泛化,确定目标页面URL,其中,所述目标页面URL为泛化后的第二个URL;对各个目标页面URL,将其所在的至少一个表单页跳转对中经过泛化的第一个URL确定为向该目标页面URL转化的表单页面URL。
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限 于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离所述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (12)

  1. 一种查找表单页面和目标页面转化关系的方法,所述方法包括:
    对页面访问日志进行解析,分解出所有被访问页面的统一资源定位符URL和跳转URL组,其中,每个所述跳转URL组包括两个URL,且第二个URL对应的页面由第一个URL对应的页面直接跳转而来;
    对所有被访问页面的URL进行筛选,将包含表单的页面的URL加入表单页面URL集合;
    根据所述第一个URL属于所述表单页面URL集合的跳转URL组生成表单页跳转对集合;
    从所述表单页跳转对集合中筛除符合预设条件的表单页跳转对;
    将所述表单页跳转对集合中剩余各个表单页跳转对中的URL进行泛化,确定目标页面URL,其中,所述目标页面URL为泛化后的第二个URL;
    对各个目标页面URL,将其所在的至少一个表单页跳转对中经过泛化的第一个URL确定为向该目标页面URL转化的表单页面URL。
  2. 根据权利要求1所述的方法,其特征在于,所述跳转URL组通过以下方法获得:
    获取每个被访问页面URL的访问请求信息;
    从所述访问请求信息中获取链接到当前页面URL的链接页面URL;
    将所述链接页面URL和所述当前页面URL生成一个跳转URL组,其中,所述链接页面URL为第一个URL,所述当前页面URL为第二个URL。
  3. 根据权利要求1所述的方法,其特征在于,将满足下列条件之一的页面URL筛选出来加入表单页面URL集合:
    所述页面URL与预设的URL模式相匹配;
    所述页面URL对应的页面内容包含预设的关键字。
  4. 根据权利要求1所述的方法,其特征在于,所述从所述表单页跳转对集合中筛除符合预设条件的表单页跳转对包括:
    获取所述表单页跳转对集合中各个表单页跳转对的第一个URL的标准对象模型;
    对所述标准对象模型进行解析,如果所述标准对象模型的属性字段中包括超链接属性字段,将所述表单页跳转对确定为符合预设条件的表单页跳转对并筛除。
  5. 根据权利要求1所述的方法,其特征在于,所述将所述表单页跳转对集合中剩余各个表单页跳转对中的URL进行泛化,确定目标页面URL包括:
    将表单页跳转对集合中剩余各个表单页跳转对中的URL进行去参数处理;
    对进行去参数处理后的各个表单页跳转对,合并相同的第二个URL作为所述目标页面URL。
  6. 一种查找表单页面和目标页面转化关系的装置,所述装置包括:
    分解模块,配置用于对页面访问日志进行解析,分解出所有被访问页面的统一资源定位符URL和跳转URL组,其中,每个所述跳转URL组包括两个URL,且第二个URL对应的页面由第一个URL对应的页面直接跳转而来;
    第一筛选模块,配置用于对所有被访问页面的URL进行筛选,将包含表单的页面的URL加入表单页面URL集合;
    生成模块,配置用于根据所述第一个URL属于所述表单页面URL集合的跳转URL组生成表单页跳转对集合;
    第二筛选模块,配置用于从所述表单页跳转对集合中筛除符合预设条件的表单页跳转对;
    第一确定模块,配置用于将所述表单页跳转对集合中剩余各个表单页跳转对中的URL进行泛化,确定目标页面URL,其中,所述目标页面URL为泛化后的第二个URL;
    第二确定模块,配置用于对各个目标页面URL,将其所在的至少一个表单页跳转对中经过泛化的第一个URL确定为向该目标页面URL转化的表单页面URL。
  7. 根据权利要求6所述的装置,其特征在于,所述分解模块包括用于获得所述跳转URL组的下列单元:
    访问请求信息获取单元,配置用于获取每个被访问页面URL的访问请求信息;
    链接页面URL获取单元,配置用于从所述访问请求信息中获取链接到当前页面URL的链接页面URL;
    生成单元,配置用于将所述链接页面URL和所述当前页面URL生成一个跳转URL组,其中,所述链接页面URL为第一个URL,所述当前页面URL为第二个URL。
  8. 根据权利要求6所述的装置,其特征在于,将满足下列条件之一的页面URL筛选出来加入表单页面URL集合:
    所述页面URL与预设的URL模式相匹配;
    所述页面URL对应的页面内容包含预设的关键字。
  9. 根据权利要求6所述的装置,其特征在于,所述第二筛选模块包括:
    标准对象模型获取单元,配置用于获取所述表单页跳转对集合中各个表单页跳转对的第一个URL的标准对象模型;
    确定及筛除单元,配置用于对所述标准对象模型进行解析,如果所述标准对象模型的属性字段中包括超链接属性字段,将所述表单页跳转对确定为符合预设条件的表单页跳转对并筛除。
  10. 根据权利要求6所述的装置,其特征在于,所述第一确定模块包括:
    处理单元,配置用于将表单页跳转对集合中剩余各个表单页跳转对中的URL进行去参数处理;
    合并单元,配置用于对进行去参数处理后的各个表单页跳转对,合并相同的第二个URL作为所述目标页面URL。
  11. 一种设备,包括:
    处理器;和
    存储器,
    所述存储器中存储有能够被所述处理器执行的计算机可读指令,在所述计算机可读指令被执行时,所述处理器执行查找表单页面和目标页面转化关系的方法,所述方法包括:
    对页面访问日志进行解析,分解出所有被访问页面的统一资源定位符URL和跳转URL组,其中,每个所述跳转URL组包括两个URL,且第二个URL对应的页面由第一个URL对应的页面直接跳转而来;
    对所有被访问页面的URL进行筛选,将包含表单的页面的URL加入表单页面URL集合;
    根据所述第一个URL属于所述表单页面URL集合的跳转URL组生成表单页跳转对集合;
    从所述表单页跳转对集合中筛除符合预设条件的表单页跳转对;
    将所述表单页跳转对集合中剩余各个表单页跳转对中的URL进行泛化,确定目标页面URL,其中,所述目标页面URL为泛化后的第二个URL;
    对各个目标页面URL,将其所在的至少一个表单页跳转对中经过泛化的第一个URL确定为向该目标页面URL转化的表单页面URL。
  12. 一种非易失性计算机存储介质,所述计算机存储介质存储有能够被处理器执行的计算机可读指令,当所述计算机可读指令被处理器执行时,所述处理器执行查找表单页面和目标页面转化关系的方法,所述方法包括:
    对页面访问日志进行解析,分解出所有被访问页面的统一资源定位符URL和跳转URL组,其中,每个所述跳转URL组包括两个URL,且第二个URL对应的页面由第一个URL对应的页面直接跳转而来;
    对所有被访问页面的URL进行筛选,将包含表单的页面的URL加入表单页面URL集合;
    根据所述第一个URL属于所述表单页面URL集合的跳转URL组生成表单页跳转对集合;
    从所述表单页跳转对集合中筛除符合预设条件的表单页跳转对;
    将所述表单页跳转对集合中剩余各个表单页跳转对中的URL进行泛化,确定目标页面URL,其中,所述目标页面URL为泛化后的第二个URL;
    对各个目标页面URL,将其所在的至少一个表单页跳转对中经过泛化的第一个URL确定为向该目标页面URL转化的表单页面URL。
PCT/CN2016/086408 2016-01-20 2016-06-20 查找表单页面和目标页面转化关系的方法和装置 WO2017124692A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610037371.6A CN105718559B (zh) 2016-01-20 2016-01-20 查找表单页面和目标页面转化关系的方法和装置
CN201610037371.6 2016-01-20

Publications (1)

Publication Number Publication Date
WO2017124692A1 true WO2017124692A1 (zh) 2017-07-27

Family

ID=56147960

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/086408 WO2017124692A1 (zh) 2016-01-20 2016-06-20 查找表单页面和目标页面转化关系的方法和装置

Country Status (2)

Country Link
CN (1) CN105718559B (zh)
WO (1) WO2017124692A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933736A (zh) * 2019-03-08 2019-06-25 浪潮通用软件有限公司 安全访问第三方jsp页面的方法、装置及存储介质
CN110968824A (zh) * 2018-09-30 2020-04-07 北京国双科技有限公司 页面数据处理方法和装置
CN111708965A (zh) * 2020-05-28 2020-09-25 北京嗨学网教育科技股份有限公司 一种同域跨单页应用无感知跳转方法及装置
CN113792232A (zh) * 2021-09-13 2021-12-14 北京百度网讯科技有限公司 页面特征计算方法、装置、电子设备、介质及程序产品

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326396B (zh) * 2016-08-19 2019-08-23 武汉斗鱼网络科技有限公司 移动客户端中利用自定义url实现页面跳转的方法及系统
CN107506478A (zh) * 2017-09-08 2017-12-22 北京京东尚科信息技术有限公司 一种区分网站页面的方法和装置
CN109949117B (zh) * 2017-12-21 2021-06-29 北京京东尚科信息技术有限公司 用于推送信息的方法和装置
CN113590985B (zh) * 2021-09-29 2022-01-04 北京每日优鲜电子商务有限公司 页面跳转配置方法、装置、电子设备和计算机可读介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002268667A (ja) * 2001-03-06 2002-09-20 Canon Inc プレゼンテーションシステムおよびその制御方法
CN101984429A (zh) * 2010-11-04 2011-03-09 百度在线网络技术(北京)有限公司 获取目标页面的方法、装置、搜索引擎和浏览器

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102054004B (zh) * 2009-11-04 2015-05-06 清华大学 一种网页推荐方法和装置
CN102663291B (zh) * 2012-03-23 2015-02-25 北京奇虎科技有限公司 邮件的信息提示方法及装置
CN103810184B (zh) * 2012-11-07 2017-09-26 阿里巴巴集团控股有限公司 确定网站页面地址流转率的方法、优化方法及其装置
CN103077250B (zh) * 2013-01-28 2016-06-29 人民搜索网络股份公司 一种网页内容抓取方法及装置
CN104158828B (zh) * 2014-09-05 2018-05-18 北京奇虎科技有限公司 基于云端内容规则库识别可疑钓鱼网页的方法及系统

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002268667A (ja) * 2001-03-06 2002-09-20 Canon Inc プレゼンテーションシステムおよびその制御方法
CN101984429A (zh) * 2010-11-04 2011-03-09 百度在线网络技术(北京)有限公司 获取目标页面的方法、装置、搜索引擎和浏览器

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968824A (zh) * 2018-09-30 2020-04-07 北京国双科技有限公司 页面数据处理方法和装置
CN110968824B (zh) * 2018-09-30 2023-08-25 北京国双科技有限公司 页面数据处理方法和装置
CN109933736A (zh) * 2019-03-08 2019-06-25 浪潮通用软件有限公司 安全访问第三方jsp页面的方法、装置及存储介质
CN111708965A (zh) * 2020-05-28 2020-09-25 北京嗨学网教育科技股份有限公司 一种同域跨单页应用无感知跳转方法及装置
CN111708965B (zh) * 2020-05-28 2024-05-03 北京嗨学网教育科技股份有限公司 一种同域跨单页应用无感知跳转方法及装置
CN113792232A (zh) * 2021-09-13 2021-12-14 北京百度网讯科技有限公司 页面特征计算方法、装置、电子设备、介质及程序产品
CN113792232B (zh) * 2021-09-13 2024-02-27 北京百度网讯科技有限公司 页面特征计算方法、装置、电子设备、介质及程序产品

Also Published As

Publication number Publication date
CN105718559B (zh) 2018-02-13
CN105718559A (zh) 2016-06-29

Similar Documents

Publication Publication Date Title
WO2017124692A1 (zh) 查找表单页面和目标页面转化关系的方法和装置
US20220101343A1 (en) Systems and Methods for Managing Web Content
US20210314354A1 (en) Techniques for determining threat intelligence for network infrastructure analysis
KR101168705B1 (ko) 이동통신 단말기와 아이피 기반 정보 단말기를 이용한 맞춤, 지능형 심볼, 아이콘 인터넷 정보 검색시스템
CN101131747B (zh) 捕获和/或分析客户端的Web页面事件的方法、装置及系统
US20190190977A1 (en) System and method of automatic generation and insertion of analytic tracking codes
US8078986B1 (en) Method and system for a browser module
US8935798B1 (en) Automatically enabling private browsing of a web page, and applications thereof
US9836438B2 (en) Methods and systems of outputting content of interest
US9311303B2 (en) Interpreted language translation system and method
US10943063B1 (en) Apparatus and method to automate website user interface navigation
AU2014400621B2 (en) System and method for providing contextual analytics data
CN106897336A (zh) 网页文件发送方法、网页渲染方法及装置、网页渲染系统
CN106126693A (zh) 一种网页的相关数据的发送方法及装置
US9684918B2 (en) System and method for candidate domain name generation
CN101146040B (zh) 分析网站流量的方法和装置
CN110808868B (zh) 测试数据获取方法、装置、计算机设备及存储介质
US20160350817A1 (en) System for tracking donor influence in charitable transactions
CN110929183A (zh) 一种数据处理方法、装置和机器可读介质
US20110197133A1 (en) Methods and apparatuses for identifying and monitoring information in electronic documents over a network
KR101282975B1 (ko) 문서 요소를 분리 구조화하여 표준화한 후 웹페이지를 재구성하는 웹화면 크롭 서버 장치
KR20180047467A (ko) 사용자 프로필 제공 시스템 및 방법
CN114328947A (zh) 一种基于知识图谱的问答方法和装置
CN109344344A (zh) 网页客户端的标识方法、服务器及计算机可读存储介质
US11669588B2 (en) Advanced data collection block identification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16885954

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16885954

Country of ref document: EP

Kind code of ref document: A1