CN112632446A - Page access path construction method and system - Google Patents

Page access path construction method and system Download PDF

Info

Publication number
CN112632446A
CN112632446A CN202011610978.1A CN202011610978A CN112632446A CN 112632446 A CN112632446 A CN 112632446A CN 202011610978 A CN202011610978 A CN 202011610978A CN 112632446 A CN112632446 A CN 112632446A
Authority
CN
China
Prior art keywords
access
path
page
pages
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011610978.1A
Other languages
Chinese (zh)
Other versions
CN112632446B (en
Inventor
刘洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Suning Cloud Computing Co ltd
Original Assignee
Jiangsu Suning Cloud Computing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Suning Cloud Computing Co ltd filed Critical Jiangsu Suning Cloud Computing Co ltd
Priority to CN202011610978.1A priority Critical patent/CN112632446B/en
Publication of CN112632446A publication Critical patent/CN112632446A/en
Priority to CA3144126A priority patent/CA3144126A1/en
Application granted granted Critical
Publication of CN112632446B publication Critical patent/CN112632446B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Transfer Between Computers (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method and a system for constructing a page access path, which can improve the construction efficiency and reduce the consumption of computing resources by optimizing a page access path tree. The method comprises the following steps: acquiring an access session of a user, wherein the access session comprises a plurality of different access pages; performing page cleaning on an access page in an access session and numbering according to an acquisition time sequence; sequentially identifying an entry access page of each path from the multiple access pages, and taking each entry access page as a first node of the corresponding path; dividing the access pages in the interval into corresponding path partitions according to the number intervals of the access pages of the adjacent entries; matching the access pages belonging to each path partition into pages one by one according to the sequence of the numbers from small to large, acquiring the matching relation of each access page, constructing path nodes, and simultaneously recording the path information of each path node; and constructing a page access path tree based on the path information of the head node and the path node.

Description

Page access path construction method and system
Technical Field
The invention relates to the technical field of internet, in particular to a method and a system for constructing a page access path.
Background
In the website operation and website analysis business, it is necessary to know what the behavior of a user from entering a website to leaving the website, such as whether the user is browsing and visiting according to a navigation path designed by the website, the loss condition of the user in each browsing step, what the user actually visits after leaving a corresponding page, and the like, and it is necessary to summarize link analysis of the user visiting a website key path, count source/skip and exit indexes of each page, and use the indexes to identify and optimize the website structure, improve website visiting and order conversion rate, and improve user experience.
In the prior art, when analyzing a path behavior trajectory of a user, all behavior trajectory data of the user needs to be traversed, and all behaviors are subjected to multiple associated traversal operations, so that path information of the user can be acquired. When the website is visited in a large amount, the traversal is very time consuming and consumes a lot of computing resources.
Disclosure of Invention
The invention aims to provide a method and a system for constructing a page access path, which can improve the construction efficiency and reduce the consumption of computing resources by optimizing the construction of a page access path tree.
In order to achieve the above object, a first aspect of the present invention provides a method for constructing a page access path, including:
acquiring an access session of a user, wherein the access session comprises a plurality of different access pages;
performing page cleaning on the access page in the access session and numbering according to the collection time sequence;
sequentially identifying an entry access page of each path from the plurality of access pages, taking each entry access page as a first node of the corresponding path, and recording path information of each first node;
if the number of the paths is multiple, dividing the access pages in the interval into corresponding path partitions according to the number intervals of the adjacent entry access pages; or if the number of the paths is one, dividing all the access pages into a path partition;
matching the access pages belonging to each path partition into pages one by one according to the sequence of the numbers from small to large, acquiring the matching relation of each access page, constructing path nodes, and simultaneously recording the path information of each path node;
and constructing a page access path tree based on the path information of the head node and the path node.
Preferably, the method for acquiring an access session of a user, wherein the access session comprises a plurality of different access pages, comprises:
and acquiring a plurality of access pages browsed by a user when the user accesses the website through the terminal within preset time, and summarizing according to the time sequence to form an access session.
Preferably, the method for performing page cleaning on the access pages in the access session and numbering according to the collection time sequence includes:
identifying noise access pages generated by crawlers and/or cheating from the access pages, and primarily cleaning and removing the noise access pages;
and numbering the reserved access pages in the same access session according to the acquisition time sequence.
Further, after the preliminary cleaning and removing of the noise access page, the method further includes:
in the same access session, if two continuous access pages are compared to be the same access page, secondary cleaning and removing are carried out on the later access pages.
Preferably, before the step of sequentially identifying the entry access page of each path from the multiple access pages, taking each entry access page as a head node of the corresponding path, and recording path information of each head node, the method further includes:
and constructing a page breakpoint dimension table, wherein the page breakpoint dimension table comprises at least one access page breakpoint page.
Preferably, the method for sequentially identifying the entry access page of each path from the multiple access pages, taking each entry access page as a head node of the corresponding path, and recording the path information of each head node includes:
comparing the access pages in the access session with the page breakpoint dimension table in a one-to-one correspondence mode according to the serial number sequence, and defining the access pages in the comparison mode as the entrance access pages of the new path;
taking each entry access page as a first node of a corresponding path, and recording the path information of each first node as null;
the number of the paths is the number of the portal access pages.
Further, the method for matching the access pages belonging to each path partition into the pages one by one according to the sequence of numbers from small to large, acquiring the matching relationship of each access page, constructing path nodes, and simultaneously recording the path information of each path node comprises the following steps:
sequentially searching a transfer page of each access page in the path partition according to the numbering sequence from small to large;
based on the matching relation between each access page and the corresponding shifted-in page, drawing a path matching relation between each access page in a path partition, representing the access pages in the path relation in the form of path nodes, recording the path matching relation of each path node through path information, wherein the path information also comprises pit position clicking information of the shifted-in page;
and connecting the first node and the path node in series to form a path according to the path matching relationship.
Further, the method for constructing the page access path tree comprises the following steps:
and summarizing each path in the access session to construct a page access path tree.
Compared with the prior art, the method for constructing the page access path has the following beneficial effects:
the method for constructing the page access path comprises the steps of obtaining a plurality of access pages browsed by a user when the user accesses a website through a terminal within preset time, summarizing according to the sequence of time to form an access session, then carrying out page cleaning on each access page in the access session, numbering each access page in sequence according to the sequence of acquisition time, sequentially identifying an entrance access page of each path from the plurality of access pages, taking the entrance access page as a first node of the corresponding path, and simultaneously recording path information of each first node, wherein if the number of the identified paths is multiple, the access pages in the interval can be divided into corresponding path partitions according to the number intervals of adjacent entrance access pages; or if the number of the paths is one, dividing all the access pages into one path partition, then matching the access pages belonging to each path partition into the pages one by one according to the numbering sequence from small to large, acquiring the matching relation of each access page, constructing path nodes, simultaneously recording the path information of each path node, and finally constructing a page access path tree based on the path information of the first node and the path nodes.
Therefore, the invention can perform noise reduction treatment on continuous repeated access pages by performing page cleaning on the access session, and eliminates the interference of the repeated access pages on path analysis. In addition, compared with the scheme that the table is associated with the information of the next and next access pages for multiple times to construct the path in the prior art in a mode of accessing page detail, the method and the device for constructing the path tree structure of the multi-page access system reduce the consumption of system computing resources while improving the construction efficiency of the path tree through the path optimization of the access pages.
A second aspect of the present invention provides a system for constructing a page access path, which is applied to the method for constructing a page access path according to the above technical solution, and the system includes:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an access session of a user, and the access session comprises a plurality of different access pages;
the cleaning unit is used for cleaning the pages of the access pages in the access session and numbering the pages according to the acquisition time sequence;
the identification unit is used for sequentially identifying the entrance access page of each path from the access pages, taking each entrance access page as a first node of the corresponding path, and recording the path information of each first node;
the judging unit is used for dividing the access pages in the interval into corresponding path partitions according to the number intervals of the adjacent entry access pages if the number of the paths is multiple; or if the number of the paths is one, dividing all the access pages into a path partition;
the path matching unit is used for matching the access pages belonging to each path partition into pages one by one according to the sequence of numbers from small to large, acquiring the matching relation of each access page, constructing path nodes and simultaneously recording the path information of each path node;
and the path tree construction unit is used for constructing a page access path tree based on the path information of the first node and the path node.
Compared with the prior art, the beneficial effect of the system for constructing the page access path provided by the invention is the same as that of the method for constructing the page access path provided by the technical scheme, and the description is omitted here.
A third aspect of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the above method for constructing a page access path.
Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by the invention are the same as those of the construction method of the page access path provided by the technical scheme, and are not repeated herein.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic flow chart of a method for constructing a page access path according to an embodiment of the present invention;
fig. 2 is an exemplary diagram of a page access path tree in the embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1, the present embodiment provides a method for constructing a page access path, including:
acquiring an access session of a user, wherein the access session comprises a plurality of different access pages; performing page cleaning on an access page in an access session and numbering according to an acquisition time sequence; sequentially identifying an entry access page of each path from the multiple access pages, taking each entry access page as a first node of the corresponding path, and recording path information of each first node; if the number of the paths is multiple, dividing the access pages in the interval into corresponding path partitions according to the number intervals of the adjacent entry access pages; or if the number of the paths is one, dividing all the access pages into a path partition; matching the access pages belonging to each path partition into pages one by one according to the sequence of the numbers from small to large, acquiring the matching relation of each access page, constructing path nodes, and simultaneously recording the path information of each path node; and constructing a page access path tree based on the path information of the head node and the path node.
In the method for constructing the page access path provided by this embodiment, a plurality of access pages browsed by a user when accessing a website through a terminal within a preset time are acquired, an access session is formed after the access pages are summarized according to the sequence of time, then page cleaning is performed on each access page in the access session, each access page is sequentially numbered according to the sequence of acquisition time, then an entry access page of each path is sequentially identified from the plurality of access pages, the entry access page is used as a first node of a corresponding path, and path information of each first node is recorded at the same time, if the number of the identified paths is multiple, the access pages in an interval can be divided into corresponding path partitions according to the number interval of adjacent entry access pages; or if the number of the paths is one, dividing all the access pages into one path partition, then matching the access pages belonging to each path partition into the pages one by one according to the numbering sequence from small to large, acquiring the matching relation of each access page, constructing path nodes, simultaneously recording the path information of each path node, and finally constructing a page access path tree based on the path information of the first node and the path nodes.
Therefore, the method and the device can perform noise reduction processing on continuous and repeated access pages by performing page cleaning on the access session, and eliminate the interference of the repeated access pages on path analysis. In addition, compared with the scheme that the table is associated with the information of the next and next access pages for multiple times to construct the path in the prior art by adopting the method of accessing the page details, the embodiment optimizes the path of the access pages, thereby improving the efficiency of constructing the path tree and reducing the consumption of the system computing resources.
In the above embodiment, the method for acquiring an access session of a user, where the access session includes a plurality of different access pages, includes:
and acquiring a plurality of access pages browsed by a user when the user accesses the website through the terminal within preset time, and summarizing according to the time sequence to form an access session.
In specific implementation, the scheme of the embodiment is suitable for multiple terminals, such as an APP terminal, a PC terminal, an applet terminal and the like, one access session is generated by browsing multiple access pages when a user accesses a website through any terminal within a certain time, and the access sessions need to be organized according to the sequence of the access time of the user during summary.
In the above embodiment, the method for performing page cleaning on the access pages in the access session and numbering according to the collection time sequence includes:
identifying noise access pages generated by crawlers and/or cheating from the access pages, and primarily cleaning and removing the noise access pages; and numbering the reserved access pages in the same access session according to the acquisition time sequence.
In specific implementation, the conventional algorithm program is utilized to identify a crawler and/or cheating visitor list, identify noise access pages in the access pages, wash and remove the noise access pages, and only the access pages reserved in an access session are numbered according to the acquisition time sequence.
In the above embodiment, after performing the preliminary cleaning and removing on the noise-accessed page, the method further includes:
and in the same access session, if two continuous access pages are compared to be the same access page, performing secondary cleaning and removing on the later access page.
In specific implementation, for two continuous access pages in the same access session, if it is determined that the two pages belong to the same access page, it indicates that the user may continuously refresh the access page, or the access page of the latter is generated in some page turning scenes, since such repeated access pages are meaningless to path analysis, the duplicate removal processing is required, that is, the repeated access pages of the latter are filtered and removed. It can be understood that, in the actual operation process, the access pages in the access session may be numbered sequentially, then the access pages are cleaned primarily and secondarily, and finally the numbers are updated again.
In addition, the method for judging whether two continuous access pages are repeated comprises the following steps: judging through one or more of page information comparison, url information comparison and page name information comparison, if the comparison results of any one of the page information comparison, url information comparison or page name information comparison of two continuous access pages are completely equal, indicating that the two continuous access pages are repeated, and at the moment, eliminating the repeated access pages of the latter.
In the foregoing embodiment, before the step of sequentially identifying entry access pages of each path from the multiple access pages, taking each entry access page as a head node of the corresponding path, and recording path information of each head node, the method further includes:
and constructing a page breakpoint dimension table, wherein the page breakpoint dimension table comprises at least one access page breakpoint page.
In the above embodiment, the method for sequentially identifying the entry access page of each path from the multiple access pages, taking each entry access page as the head node of the corresponding path, and recording the path information of each head node includes:
comparing the access pages in the access session with the page breakpoint dimension table in one-to-one correspondence according to the serial number sequence, and defining the access pages in comparison as the entry access pages of the new path; taking each entry access page as a first node of a corresponding path, and recording the path information of each first node as null; the number of paths is the number of portal access pages.
In specific implementation, breakpoint pages are recorded in the page breakpoint dimension table, such as a first page of a website page, a tab cut page, and the like, and generally speaking, the breakpoint pages are entry access pages (first access pages) of a new path, that is, the breakpoint pages are used as first nodes of the new path, by comparing access pages retained in an access session in a one-to-one correspondence manner with the breakpoint pages in the breakpoint dimension table, when the comparison is successful, it is stated that the corresponding access pages are entry access pages, and at this time, the corresponding access pages are used as the first nodes of the new path, and finally, the number of the corresponding paths can be known by counting the number of the entry access pages. It can be understood that the path information record of the first node has a null path relationship since no path node appears before the first node.
In the above embodiment, the method for matching the access pages belonging to each path partition into the pages one by one according to the numbering sequence from small to large, obtaining the matching relationship of each access page and constructing the path nodes, and recording the path information of each path node includes:
sequentially searching a transfer page of each access page in the path partition according to the numbering sequence from small to large; based on the matching relation between each access page and the corresponding shifted-in page, drawing a path matching relation between each access page in the path partition, expressing the access pages in the path relation in the form of path nodes, recording the path matching relation of each path node through path information, and the path information also comprises pit position clicking information of the shifted-in page; and connecting the first node and the path node in series to form a path according to the path matching relationship.
During specific implementation, the access pages in the path partition are sequentially subjected to relationship matching with the shifted-in page according to the serial number sequence, the path matching relationship between the first node and the path node in the path partition is obtained, and the path information of each path node is recorded, wherein the path information not only comprises the relationship matching between the path node of the access page and the path node of the shifted-in page, but also records pit position clicking information when the access page enters the access page through the shifted-in page. The access path field of the user adopts the design of a dynamic array, the user path starts from a first access page as a path, and simultaneously pit position information corresponding to click of a converted page is printed for value conversion analysis of pit positions, so that the website browsing habit of the user can be analyzed more accurately and in more detail, and the value analysis effect is improved. It should be emphasized that the path information in the latter path node includes the path information of the path node matched with the former path node, that is, the path information of the latter path node can restore the path having the direct or indirect relationship with the former path node.
For ease of understanding, the above embodiments are now described by way of example:
step 1, acquiring an access session, identifying a noise access page by associating a cheating algorithm and a crawler algorithm, performing primary cleaning and removing, and performing secondary cleaning and removing on repeated access pages;
step 2, identifying all paths in the access session, taking one path as an example for illustration, taking a first page access page A with the number of 1 as a first node, marking the path information of the first node as null, and taking other access pages as path nodes;
the method comprises the steps of taking a search access page B with the number of 2, comparing the search access page B with a page breakpoint dimension table, indicating that the search access page B is a breakpoint page when the comparison is successful, recording that path information of the search access page B is null at the moment, indicating that the search access page B is a path node when the comparison is failed, and determining whether a turned-in page matched with the search access page B is a home page access page A at the moment, recording a path relation between the search access page B and the home page access page A in the path information of the search access page B if the turned-in page is not the home page access page A, and simultaneously recording pit position clicking information of the home page access page A, namely, clicking which pit position of the home page access page A can be clicked to.
Taking an apple 5S four-level access page C with the number of 3, comparing with a page breakpoint dimension table, when the comparison is successful, indicating that the search access page C is a breakpoint page, recording the path information of the search access page C as null, when the comparison is failed, indicating that the search access page C is a path node, and at this time, whether a transfer page needing to be matched with the search access page C is a search access page B, if so, the path information of the search access page C is: searching the path information of the access page B and searching the name and pit position clicking information of the access page B; if not, continuing to judge whether a transfer page of the search access page C is a home page access page A, if so, the path information of the search access page C is as follows: the path information of the home page access page A, the name of the home page access page A and pit bit click information are added, and if not, the path information of the search access page C is empty;
the method comprises the steps of taking an apple 11-level four-level access page D with the number of 4, comparing the apple 11-level four-level access page D with a page breakpoint dimension table, indicating that the four-level access page D is a breakpoint page when the comparison is successful, recording path information of the breakpoint page to be null at the moment, indicating that the four-level access page D is a path node when the comparison is failed, and matching the path matching relations of the four-level access page D with a search access page C, a search access page B and a home page A in sequence at the moment, wherein the specific matching process can refer to the logic.
For example, as seen from the record of access behavior of a user, the following operations are included: the user opens the website home page-search page (cell phone) -apple 5S level four page-apple 11 level four page-search page (cell phone) -hua zhi mate30 level four page-submit order-pay page. The access session data collected are as follows:
Figure BDA0002872910380000101
as shown in fig. 2, the website browsing behaviors of the user are opening a website home page, a search page (mobile phone), an apple 5S level four page, an apple 11 level four page, a search page (mobile phone), a hua P20 level four page, a website home page, a shopping cart page, an order submission page, and a payment success page in sequence.
After the above access session is processed according to the method of the present embodiment, the following path information data can be obtained, and the page access path tree of the access session va can be drawn through the path information data.
Figure BDA0002872910380000102
In summary, the present embodiment has the following beneficial effects:
1. the scheme of the embodiment has the advantages of fast calculation and calculation force saving, and is suitable for multiple ends;
2. according to the scheme, the conversion analysis of the user-defined path can be realized only by configuring the page type to be analyzed by the user, and the flow loss and the conversion of the key node can be monitored;
3. according to the scheme, the analysis of the website page access path, the fishbone map of the page pit position, the flow funnel map and other paths can be realized, whether a user browses the website according to the navigation path designed by the product can be found through path analysis, and therefore the product has an adjustment direction and an improvement point;
4. according to the scheme, the value calculation of the commodities or activities of the website page and the pit positions on the page can be more accurately realized, and the putting and selling of the in-site advertisements can be guided.
Example two
The embodiment provides a system for constructing a page access path, which includes:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an access session of a user, and the access session comprises a plurality of different access pages;
the cleaning unit is used for cleaning the pages of the access pages in the access session and numbering the pages according to the acquisition time sequence;
the identification unit is used for sequentially identifying the entrance access page of each path from the access pages, taking each entrance access page as a first node of the corresponding path, and recording the path information of each first node;
the judging unit is used for dividing the access pages in the interval into corresponding path partitions according to the number intervals of the adjacent entry access pages if the number of the paths is multiple; or if the number of the paths is one, dividing all the access pages into a path partition;
the path matching unit is used for matching the access pages belonging to each path partition into pages one by one according to the sequence of numbers from small to large, acquiring the matching relation of each access page, constructing path nodes and simultaneously recording the path information of each path node;
and the path tree construction unit is used for constructing a page access path tree based on the path information of the first node and the path node.
Compared with the prior art, the beneficial effects of the system for constructing the page access path provided by the embodiment of the invention are the same as those of the method for constructing the page access path provided by the first embodiment, and are not described herein again.
EXAMPLE III
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method for constructing a page access path are executed.
Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by the embodiment are the same as the beneficial effects of the method for constructing the page access path provided by the above technical scheme, and are not described herein again.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the invention may be implemented by hardware instructions related to a program, the program may be stored in a computer-readable storage medium, and when executed, the program includes the steps of the method of the embodiment, and the storage medium may be: ROM/RAM, magnetic disks, optical disks, memory cards, and the like.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A method for constructing a page access path is characterized by comprising the following steps:
acquiring an access session of a user, wherein the access session comprises a plurality of different access pages;
performing page cleaning on the access page in the access session and numbering according to the collection time sequence;
sequentially identifying an entry access page of each path from the plurality of access pages, taking each entry access page as a first node of the corresponding path, and recording path information of each first node;
if the number of the paths is multiple, dividing the access pages in the interval into corresponding path partitions according to the number intervals of the adjacent entry access pages; or if the number of the paths is one, dividing all the access pages into a path partition;
matching the access pages belonging to each path partition into pages one by one according to the sequence of the numbers from small to large, acquiring the matching relation of each access page, constructing path nodes, and simultaneously recording the path information of each path node;
and constructing a page access path tree based on the path information of the head node and the path node.
2. The method of claim 1, wherein obtaining an access session for a user, the access session comprising a plurality of different access pages comprises:
and acquiring a plurality of access pages browsed by a user when the user accesses the website through the terminal within preset time, and summarizing according to the time sequence to form an access session.
3. The method of claim 2, wherein the method of page cleansing and chronological numbering of acquisition for the visited pages in a visited session comprises:
identifying noise access pages generated by crawlers and/or cheating from the access pages, and primarily cleaning and removing the noise access pages;
and numbering the reserved access pages in the same access session according to the acquisition time sequence.
4. The method of claim 3, wherein the preliminary cleaning and culling of the noisy access page further comprises:
in the same access session, if two continuous access pages are compared to be the same access page, secondary cleaning and removing are carried out on the later access pages.
5. The method according to any one of claims 1 to 4, wherein before the step of sequentially identifying an entry access page of each path from a plurality of the access pages, taking each entry access page as a head node of the corresponding path, and recording path information of each head node, the method further comprises:
and constructing a page breakpoint dimension table, wherein the page breakpoint dimension table comprises at least one access page breakpoint page.
6. The method according to claim 5, wherein the method for sequentially identifying the entry access page of each path from the plurality of access pages, regarding each entry access page as a head node of the corresponding path, and recording path information of each head node comprises:
comparing the access pages in the access session with the page breakpoint dimension table in a one-to-one correspondence mode according to the serial number sequence, and defining the access pages in the comparison mode as the entrance access pages of the new path;
taking each entry access page as a first node of a corresponding path, and recording the path information of each first node as null;
the number of the paths is the number of the portal access pages.
7. The method according to claim 6, wherein the method for matching the access pages belonging to each path partition into pages one by one according to the numbering sequence from small to large, obtaining the matching relationship of each access page and constructing path nodes, and simultaneously recording the path information of each path node comprises:
sequentially searching a transfer page of each access page in the path partition according to the numbering sequence from small to large;
based on the matching relation between each access page and the corresponding shifted-in page, drawing a path matching relation between each access page in a path partition, representing the access pages in the path relation in the form of path nodes, recording the path matching relation of each path node through path information, wherein the path information also comprises pit position clicking information of the shifted-in page;
and connecting the first node and the path node in series to form a path according to the path matching relationship.
8. The method of claim 7, wherein the method of constructing the page access path tree comprises:
and summarizing each path in the access session to construct a page access path tree.
9. A system for constructing a page access path, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an access session of a user, and the access session comprises a plurality of different access pages;
the cleaning unit is used for cleaning the pages of the access pages in the access session and numbering the pages according to the acquisition time sequence;
the identification unit is used for sequentially identifying the entrance access page of each path from the access pages, taking each entrance access page as a first node of the corresponding path, and recording the path information of each first node;
the judging unit is used for dividing the access pages in the interval into corresponding path partitions according to the number intervals of the adjacent entry access pages if the number of the paths is multiple; or if the number of the paths is one, dividing all the access pages into a path partition;
the path matching unit is used for matching the access pages belonging to each path partition into pages one by one according to the sequence of numbers from small to large, acquiring the matching relation of each access page, constructing path nodes and simultaneously recording the path information of each path node;
and the path tree construction unit is used for constructing a page access path tree based on the path information of the first node and the path node.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of the claims 1 to 8.
CN202011610978.1A 2020-12-30 2020-12-30 Page access path construction method and system Active CN112632446B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011610978.1A CN112632446B (en) 2020-12-30 2020-12-30 Page access path construction method and system
CA3144126A CA3144126A1 (en) 2020-12-30 2021-12-29 Method of and system for constructing page access path

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011610978.1A CN112632446B (en) 2020-12-30 2020-12-30 Page access path construction method and system

Publications (2)

Publication Number Publication Date
CN112632446A true CN112632446A (en) 2021-04-09
CN112632446B CN112632446B (en) 2024-08-27

Family

ID=75286696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011610978.1A Active CN112632446B (en) 2020-12-30 2020-12-30 Page access path construction method and system

Country Status (2)

Country Link
CN (1) CN112632446B (en)
CA (1) CA3144126A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127742A (en) * 2021-04-30 2021-07-16 康键信息技术(深圳)有限公司 User behavior path extraction method, device, equipment and storage medium
CN113242159A (en) * 2021-05-24 2021-08-10 中国工商银行股份有限公司 Application access relation determining method and device
CN113791837A (en) * 2021-08-12 2021-12-14 百度在线网络技术(北京)有限公司 Page processing method, device, equipment and storage medium
CN113934421A (en) * 2021-09-28 2022-01-14 青岛海尔科技有限公司 Page path writing method, device and equipment of application program and storage medium
CN113934616A (en) * 2021-12-16 2022-01-14 深圳市活力天汇科技股份有限公司 Method for judging abnormal user based on user operation time sequence
CN114374595A (en) * 2022-01-13 2022-04-19 平安普惠企业管理有限公司 Event node attribution analysis method and device, electronic equipment and storage medium
CN115766495A (en) * 2022-09-26 2023-03-07 车智互联(北京)科技有限公司 Entrance statistical method, system, mobile terminal and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017101652A1 (en) * 2015-12-17 2017-06-22 北京国双科技有限公司 Method and apparatus for determining an access path between website pages
CN107644100A (en) * 2017-10-09 2018-01-30 北京京东尚科信息技术有限公司 Information processing method, device and system and computer-readable recording medium
CN107943679A (en) * 2017-11-24 2018-04-20 广州优视网络科技有限公司 Generation method, device and the server of path funnel
CN108874909A (en) * 2018-05-28 2018-11-23 深圳壹账通智能科技有限公司 User access path acquisition methods, server and computer storage medium
CN109284450A (en) * 2018-08-22 2019-01-29 中国平安人寿保险股份有限公司 Order is at the determination method and device of single path, storage medium, electronic equipment
CN111552905A (en) * 2020-04-22 2020-08-18 苏宁云计算有限公司 Method and system for acquiring user access critical path
CN111737630A (en) * 2020-08-25 2020-10-02 智者四海(北京)技术有限公司 Method for recording user access path

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017101652A1 (en) * 2015-12-17 2017-06-22 北京国双科技有限公司 Method and apparatus for determining an access path between website pages
CN107644100A (en) * 2017-10-09 2018-01-30 北京京东尚科信息技术有限公司 Information processing method, device and system and computer-readable recording medium
CN107943679A (en) * 2017-11-24 2018-04-20 广州优视网络科技有限公司 Generation method, device and the server of path funnel
CN108874909A (en) * 2018-05-28 2018-11-23 深圳壹账通智能科技有限公司 User access path acquisition methods, server and computer storage medium
CN109284450A (en) * 2018-08-22 2019-01-29 中国平安人寿保险股份有限公司 Order is at the determination method and device of single path, storage medium, electronic equipment
CN111552905A (en) * 2020-04-22 2020-08-18 苏宁云计算有限公司 Method and system for acquiring user access critical path
CN111737630A (en) * 2020-08-25 2020-10-02 智者四海(北京)技术有限公司 Method for recording user access path

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127742A (en) * 2021-04-30 2021-07-16 康键信息技术(深圳)有限公司 User behavior path extraction method, device, equipment and storage medium
CN113127742B (en) * 2021-04-30 2023-10-20 康键信息技术(深圳)有限公司 User behavior path extraction method, device, equipment and storage medium
CN113242159A (en) * 2021-05-24 2021-08-10 中国工商银行股份有限公司 Application access relation determining method and device
CN113242159B (en) * 2021-05-24 2022-12-09 中国工商银行股份有限公司 Application access relation determining method and device
CN113791837A (en) * 2021-08-12 2021-12-14 百度在线网络技术(北京)有限公司 Page processing method, device, equipment and storage medium
CN113791837B (en) * 2021-08-12 2023-08-11 百度在线网络技术(北京)有限公司 Page processing method, device, equipment and storage medium
CN113934421A (en) * 2021-09-28 2022-01-14 青岛海尔科技有限公司 Page path writing method, device and equipment of application program and storage medium
CN113934616A (en) * 2021-12-16 2022-01-14 深圳市活力天汇科技股份有限公司 Method for judging abnormal user based on user operation time sequence
CN113934616B (en) * 2021-12-16 2022-03-18 深圳市活力天汇科技股份有限公司 Method for judging abnormal user based on user operation time sequence
CN114374595A (en) * 2022-01-13 2022-04-19 平安普惠企业管理有限公司 Event node attribution analysis method and device, electronic equipment and storage medium
CN114374595B (en) * 2022-01-13 2024-03-15 平安普惠企业管理有限公司 Event node attribution analysis method, device, electronic equipment and storage medium
CN115766495A (en) * 2022-09-26 2023-03-07 车智互联(北京)科技有限公司 Entrance statistical method, system, mobile terminal and storage medium

Also Published As

Publication number Publication date
CA3144126A1 (en) 2022-06-30
CN112632446B (en) 2024-08-27

Similar Documents

Publication Publication Date Title
CN112632446A (en) Page access path construction method and system
CN109189991B (en) Duplicate video identification method, device, terminal and computer readable storage medium
CN103886068B (en) Data processing method and device for Internet user's behavioural analysis
CN107800591A (en) A kind of analysis method of unified daily record data
CN107229668B (en) Text extraction method based on keyword matching
CN104426713B (en) The monitoring method and device of web site access effect data
Liao et al. Evaluating the effectiveness of search task trails
CN108763274B (en) Access request identification method and device, electronic equipment and storage medium
CN105260414A (en) User behavior similarity computing method and device
CN106844588A (en) A kind of analysis method and system of the user behavior data based on web crawlers
CN111461815B (en) Order recognition model generation method, recognition method, system, equipment and medium
CN110717092A (en) Method, system, device and storage medium for matching objects for articles
CN113934851A (en) Data enhancement method and device for text classification and electronic equipment
CN110795613A (en) Commodity searching method, device and system and electronic equipment
CN107086925B (en) Deep learning-based internet traffic big data analysis method
Li et al. ε-matching: Event processing over noisy sequences in real time
US20240177077A1 (en) Attribution analysis method, electronic device, and storage medium
CN110851708B (en) Negative sample extraction method, device, computer equipment and storage medium
CN106250456A (en) Bid winning announcement extraction method and device
CN110442616B (en) Page access path analysis method and system for large data volume
CN110138720A (en) Anomaly classification detection method, device, storage medium and the processor of network flow
CN106055572B (en) Page conversion parameter processing method and device
CN104657388A (en) Data processing method and device
CN108153817B (en) Intelligent web page data acquisition method
JP5634859B2 (en) Site cluster system and site cluster method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant