CN108304410B - Method and device for detecting abnormal access page and data analysis method - Google Patents

Method and device for detecting abnormal access page and data analysis method Download PDF

Info

Publication number
CN108304410B
CN108304410B CN201710024279.0A CN201710024279A CN108304410B CN 108304410 B CN108304410 B CN 108304410B CN 201710024279 A CN201710024279 A CN 201710024279A CN 108304410 B CN108304410 B CN 108304410B
Authority
CN
China
Prior art keywords
page
access
record
user
historical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710024279.0A
Other languages
Chinese (zh)
Other versions
CN108304410A (en
Inventor
贾亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202111582729.0A priority Critical patent/CN114417197A/en
Priority to CN201710024279.0A priority patent/CN108304410B/en
Publication of CN108304410A publication Critical patent/CN108304410A/en
Application granted granted Critical
Publication of CN108304410B publication Critical patent/CN108304410B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Information Transfer Between Computers (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the application discloses a method and a device for detecting an abnormal access page and a data analysis method, wherein the method for detecting the abnormal access page comprises the following steps: acquiring a historical access record of a user, wherein the historical access record comprises a page identifier of an access page and reference information of the access page; according to the page identification and the reference information, counting the number of times that each page is accessed in the historical access record and recording reference data corresponding to each page; and detecting abnormal access pages from the historical access records according to the number of times of the access of each page and the reference data corresponding to each page. The detection method, the detection device and the data analysis method for the abnormal access page can improve the detection efficiency of the abnormal access page, so that the data analysis efficiency is improved.

Description

Method and device for detecting abnormal access page and data analysis method
Technical Field
The present application relates to the field of network data communication technologies, and in particular, to a method and an apparatus for detecting an abnormal access page, and a data analysis method.
Background
With the continuous development of network data communication technology, when a user accesses a website, a server of the website can generally collect access records of the user. By analyzing the access records of the user, the behavior of the user during website access can be obtained. For example, the shopping website analyzes the access records of the commodities browsed by the user, so that the commodity which the user is interested in can be known, and the information of the commodity which the user is interested in can be pushed to the user.
Currently, when a user accesses a website through a client, data interaction between a server of the website and the client of the user can be automatically performed. For example, a server of a website may send an information acquisition request to a client of a user, and after receiving the information acquisition request, the client of the user may simulate an access behavior of the user and feed back user information to the server of the website. In the process, the access record of the user information fed back to the server of the website by the client of the user is not the real access record of the user, and the pages contained in the access record cannot be normally displayed. In order to accurately analyze the behavior of the user, it is generally necessary to detect an abnormal access page and remove the abnormal access page from the access record of the user.
In the prior art, when an abnormal access page is detected, the website included in each access record can be input into the browser in a manual screening mode, and if the browser cannot normally display the page, the input website is indicated as an abnormal website, so that the page corresponding to the abnormal website is the abnormal access page.
In the prior art, by manually detecting the abnormal access page, when a large amount of access records are faced, a long time is required to detect the abnormal access page, which results in low efficiency of detecting the abnormal access page and further reduces efficiency of analyzing user behaviors.
Disclosure of Invention
The embodiment of the application aims to provide a method and a device for detecting an abnormal access page and a data analysis method, which can improve the detection efficiency of the abnormal access page, thereby improving the efficiency of data analysis.
In order to achieve the above object, an aspect of the embodiments of the present application provides a method for detecting an abnormal access page, where the method includes: acquiring a historical access record of a user, wherein the historical access record comprises a page identifier of an access page and reference information of the access page; according to the page identification and the reference information, counting the number of times that each page is accessed in the historical access record and recording reference data corresponding to each page; and detecting abnormal access pages from the historical access records according to the number of times of the access of each page and the reference data corresponding to each page.
In order to achieve the above object, another aspect of the embodiments of the present application further provides an apparatus for detecting an abnormal access page, where the apparatus includes a network communication port and a processor, where: the network communication port is used for carrying out network data communication; the processor is used for acquiring a historical access record of a user through the network communication port, wherein the historical access record comprises a page identifier of a current access page and reference information of the current access page; according to the page identification and the reference information, counting the number of times that each page is accessed in the historical access record and recording reference data corresponding to each page; and detecting abnormal access pages from the historical access records according to the number of times of the access of each page and the reference data corresponding to each page.
In order to achieve the above object, another aspect of the embodiments of the present application further provides a data analysis method, including: acquiring a historical access record of a user, wherein the historical access record comprises a page identifier of an access page and reference information of the access page; according to the page identification and the reference information, counting the number of times that each page is accessed in the historical access record and recording reference data corresponding to each page; detecting abnormal access pages from the historical access records according to the number of times that each page is accessed and the reference data corresponding to each page; removing data related to the abnormal access page from the historical access record to obtain target service data; and performing data analysis based on the target service data.
As can be seen from the above, according to the detection method, the detection device and the data analysis method for the abnormal access page provided by the embodiment of the application, the accessed page and the referred page in the historical access record of the user are analyzed, so that the abnormal access page can be detected from the historical access record according to the number of times that each page is accessed and the number of times that each page is referred. The detection method, the detection device and the data analysis method for the abnormal access page can automatically analyze the historical access record, and avoid a manual detection mode, so that the detection efficiency of the abnormal access page is improved, and the data analysis efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for detecting an abnormal access page according to an embodiment of the present disclosure;
FIG. 2 is a diagram illustrating a page access path tree according to an embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of a device for detecting an abnormal access page according to an embodiment of the present disclosure;
fig. 4 is a flowchart of a data analysis method according to an embodiment of the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application shall fall within the scope of protection of the present application.
An embodiment of the present application provides a method for detecting an abnormal access page, please refer to fig. 1, where the method includes the following steps.
S1: acquiring a historical access record of a user, wherein the historical access record comprises a page identifier of an access page and reference information of the access page.
In this embodiment, when a user accesses a preset website through a client, an access request may be sent to a server of the preset website. The access request may include an identifier pointing to the preset website. For example, the identifier may be a domain name of the preset website, or an IP address of a server of the preset website. When the identifier is the Domain Name of the preset website, the client of the user may analyze the Domain Name through a Domain Name System (DNS) server to obtain an IP address corresponding to the Domain Name. In this way, the user's client can send the access request to the server pointed to by the resolved IP address.
In this embodiment, the access request may further include an IP address of the client of the user, so that after the server of the preset website receives the access request sent by the client of the user, page information for the access request may be fed back to the IP address of the client of the user. Therefore, the client of the user can receive the page information sent by the server of the preset website and display the page information in the current page.
In this embodiment, in the process of accessing the preset website by the user, the server of the preset website may collect the access record of the user. Specifically, the manner of collecting the access record of the user may include storing each access request of the user or adding a script for acquiring the user information to the fed-back page information when feeding back the page information to the client of the user. In this way, when the client of the user receives the page information fed back by the server, the script added in the page information can be executed, so that the user information in the client of the user can be sent to the server of the preset website. In this embodiment, the user information may include information of a browser in the client, access time, cookie information, and the like.
In this embodiment, after the server of the preset website collects the access record of the user, the access record of the user may be stored in the preset memory. The preset memory may be located in a server of the preset website, or may be an independent storage server, and the storage server may be accessed by the server of the preset website.
In this embodiment, the access record of the user stored in the preset memory may be used as the historical access record of the user. In the preset memory, the historical access records belonging to the same user may have the same user identifier. The user identifier may be an account registered by the user in the preset website, or may be an IP address of the client of the user. Therefore, different historical access records can be inquired from the preset memory according to different user identifications.
In this embodiment, the manner of obtaining the historical access record of the user may include reading the historical access record corresponding to the user identifier from the preset memory. Specifically, the detection apparatus for an abnormal access page may provide a data acquisition request including a user identifier to a server of the preset website. In this way, the server of the preset website may extract the user identifier from the data acquisition request, so as to feed back the historical access data corresponding to the user identifier to the detection device. In this embodiment, the detection device may be an electronic device having a data processing function, or may be a program running on the electronic device or a server of the predetermined website.
In this embodiment, the historical access record of the user may generally include a page identifier of an access page and reference information of the access page. Wherein the page identifier of the access page may be a character string for pointing to the access page. For example, the page identifier may be a Uniform Resource Locator (URL). The reference information may indicate from which page the access page is linked. The reference information may include a page identifier of a page higher than the access page. For example, the page identifier of the access page may be www.jd.com, and the reference information of the access page includes a page identifier of www.google.com. This indicates that the www.jd.com visited page is linked from www.google.com.
In this embodiment, each access record in the history access records may be written according to a fixed format. The fixed format may define the components in the access record and the order in which the components are arranged. For example, the access record may include an access page field and a reference page field, and the beginning of each field may be a preset header identifier. For example, for an access page field, its header identification may be "Request: "for the reference page field, its header identification may be" refer: ". After the header identification, the page identification of the access page or the page identification of the reference page, respectively, may be populated.
S3: and counting the accessed times of each page in the historical access record and recording the reference data corresponding to each page according to the page identification and the reference information.
In this embodiment, the page identifier included in the history access record may point to the accessed page, and the reference information included in the history access record may point to the referred page.
In this embodiment, the reference data corresponding to each page may include the number of times each page is referred to. Specifically, each record in the historical access record may be traversed, and the number of times that each page is accessed and the number of times that each page is referenced in the historical access record may be counted. Specifically, in the history access record, when a page identifier of a preset page appears in an access page field in one record, the number of times of access corresponding to the preset page may be increased by 1; when the page identifier of the preset page appears in the reference page field in one record, 1 may be added to the number of times that the preset page corresponds to the reference page. Therefore, after counting each access record one by one, the access times of each page and the quoted times of each page in the historical access record of the user can be obtained.
It should be noted that in some access records, there may be no corresponding reference page in the access page, and the reference page field in these access records may be empty. For example, in an access record, the access page is www.jd.com, and when the user accesses the page, he directly enters www.jd.com a website in the browser, so that there is no reference page field in the access record.
In this embodiment, the reference data corresponding to each page may further include a reference page corresponding to each page. Specifically, each record in the history access record may be traversed, the number of times each page in the history access record is accessed is counted, and the reference page corresponding to each page is recorded. Specifically, in the history access record, when the page identifier of the preset page appears in the access page field in one record, 1 may be added to the number of times of access corresponding to the preset page. When the page identifier of the preset page appears in the reference page field in one record, it indicates that the preset page can be used as the reference page of the current access page in the record. Therefore, after counting each access record one by one, the access times of each page in the historical access record of the user and the reference page corresponding to each page can be obtained.
It should also be noted that in some access records, the corresponding reference page may not exist in the access page, and the reference page field in these access records may be empty. For example, in an access record, the access page is www.jd.com, and when the user accesses the page, he directly enters www.jd.com a website in the browser, so that there is no reference page field in the access record. That is, not all pages have a reference page.
S5: and detecting abnormal access pages from the historical access records according to the number of times of the access of each page and the reference data corresponding to each page.
In this embodiment, since the abnormal access page cannot be normally displayed, the abnormal access page cannot be used as a reference page of other pages. That is, when the preset page is a page that cannot be normally displayed, the corresponding number of times that the preset page is referred to should be zero. Based on the above, which pages are abnormal access pages can be judged according to the times of referencing of the pages.
In the present embodiment, if an abnormal access page is detected only according to the number of times a page is referenced, a normal page is erroneously determined as an abnormal access page. The reason is that: when a user accesses some normal pages, other pages may not be further accessed through the normal pages, and the normal pages are not referred by the other pages. If the anomalous access pages are detected based only on the number of times the pages are referenced, these normal pages may be erroneously determined to be anomalous access pages.
In the present embodiment, in order to reduce the page misjudgment, the number of times that a page is accessed may be referred to simultaneously when detecting an abnormally accessed page according to the number of times that the page is referred to. Specifically, when the number of times that a preset page is accessed is greater than or equal to a preset threshold and the preset page is not referred to by other pages, it may be determined that the preset page is an abnormal access page.
In this embodiment, the preset threshold may be determined according to the number of historical access records of the user. Specifically, the determined scaling factor may be preset, and the preset threshold may be obtained by multiplying the number of the historical access records of the user by the scaling factor. Therefore, when the number of times of the preset page access is greater than or equal to the preset threshold value, the preset page has a large number of access records. At this time, if the preset page is not referred by other pages, it may be determined that the preset page is an abnormal access page.
In this embodiment, the abnormal access page may also be generated by automatically performing data interaction between the server based on the website and the client of the user, and the abnormal access page generally does not have a corresponding reference page. That is, when a default page is an abnormal access page, it does not have a corresponding reference page. Based on the above, which pages are abnormal access pages can be judged according to the reference pages corresponding to the pages.
In the present embodiment, if an abnormal access page is detected only from the reference page corresponding to each page, a normal page is erroneously determined as an abnormal access page. The reason is that: when a user accesses certain normal pages, the user may directly input the websites of the normal pages in the browser for access. If the abnormal access pages are detected only according to the reference pages corresponding to the pages, the normal pages are wrongly determined as the abnormal access pages.
In this embodiment, in order to reduce the page misjudgment, the number of times of accessing the page may be referred to simultaneously when detecting an abnormal access page according to a reference page corresponding to each page. Specifically, when the number of times that a preset page is accessed is greater than or equal to a preset threshold and the preset page does not have a reference page, it may be determined that the preset page is an abnormal access page.
In this embodiment, the preset threshold may be determined according to the number of historical access records of the user. Specifically, the determined scaling factor may be preset, and the preset threshold may be obtained by multiplying the number of the historical access records of the user by the scaling factor. Therefore, when the number of times of the preset page access is greater than or equal to the preset threshold value, the preset page has a large number of access records. At this time, if the preset page does not have a reference page, it may be determined that the preset page is an abnormal access page.
In this embodiment, after detecting an abnormal access page, the access record including the page identifier of the abnormal access page may be deleted from the historical access record of the user, so that the behavior of the user may be analyzed correctly.
In a specific application scenario, a historical access record generated when a user accesses a website of the kyoto mall may be obtained. Each access record may include a URL of an access page and a URL of a reference page corresponding to the access page. In particular, the URL of the reference page may be in a header file (header) of the access record. A referrer field may be present in the header file, in which the URL of the referring page corresponding to the visited page may be filled in. In the application scenario, historical access records of a user can be identified one by one, and when a page identifier of a preset page appears in an access page field in one record, the number of times of access corresponding to the preset page can be increased by 1; when the page identifier of the preset page appears in the referrer field in one record, 1 may be added to the number of times that the preset page is referred to. Therefore, after counting each access record one by one, the access times of each page and the quoted times of each page in the historical access record of the user can be obtained. Specific statistics may be as shown in table 1.
TABLE 1 number of accesses and number of references to individual pages in historical access records
URL Number of accesses Number of times of being referred to
www.jd.com 100000 120000
www.jd.com/a 80000 60000
www.jd.com/b 70000 0
As can be seen from Table 1, www.jd.com/b, when accessed in large numbers, is not referenced by any other page, and thus it can be seen that the page may be an exception page.
In one embodiment of the application, the reference information of the access page in the history access record can be obtained from the client of the user through a script. Specifically, when a user accesses a current page through a client, a page access request may be sent to a server of the current page. Meanwhile, the client may also locally store the page identifier of the reference page of the current page, and the page identifier of the reference page may be used as reference information corresponding to the current page.
In this embodiment, in response to a page access request sent by a client of a user, a server of a website may feed back page information to the client of the user, where the page information may include a script for acquiring reference information. The script may be a code segment edited by a preset programming language and capable of being executed by a client of a user. For example, the script may be a JS script or a PHP script.
In this embodiment, when the client of the user receives the page information added with the script sent by the server, the script can be automatically executed. The script, when executed, may obtain information of the client. The acquired information may include, for example, version information of the browser, cookie information, reference information of the current page, access time information, and the like. In this way, the client can send the reference information of the current page pointed by the page access request to the server of the website. After receiving the reference information sent by the client of the user after executing the script, the server may write the received reference information and the current page information in the page access request together into the historical access record of the user.
Referring to fig. 2, in an embodiment of the present application, a history access record of a user may be processed by constructing a page access path tree. The page access path tree may include page nodes of the respective pages and connection relationships between the respective pages. Specifically, in this embodiment, a page access path tree of the user may be generated according to the historical access record, where the page access path tree includes at least one page node, where a path connection line exists between page nodes having a reference relationship, and the path connection line points from a referenced page node to an accessed page node.
In this embodiment, each page in the history access record may correspond to a page node, where the page represented by each page node may be an accessed page and, similarly, may also be a referred page. For example, as shown in fig. 2, www.jd.com can be the accessed page in an access record, and its corresponding referenced page can be www.google.com, which indicates that www.jd.com is linked from www.google.com in the access record. In another access record, www.jd.com/a can be the accessed page, and its corresponding referenced page can be www.jd.com, which indicates that www.jd.com/a is linked from www.jd.com in the access record. In another access record, www.jd.com/b can be the accessed page, and its corresponding referenced page can be www.jd.com, which indicates that www.jd.com/b is linked from www.jd.com in the access record.
As can be seen from the above, for the same page address, in different access records, it can be used as both the accessed page and the referred page. In this embodiment, the page node having the reference relationship may refer to one of the page nodes corresponding to a currently accessed page, and the other page node corresponding to a reference page of the currently accessed page. Thus, a path link between page nodes may represent the reference relationship, where the path link may be a link with a direction that points from the referenced page node to the visited page node.
In an embodiment of the present application, when constructing the page access path tree, each record in the history access records may be sorted according to access time. The access time may be a time point when a client of the user sends a page access request to a server of the website. And sequencing each record according to the access time so as to follow the browsing sequence of the user when accessing the website.
In this embodiment, after sorting each record in the history access records, each record in the history access records may be traversed, and a corresponding page node may be created for the access page/reference page in each record. To avoid creating page nodes repeatedly, it may be determined whether a page node exists in the visited/referenced page in the current record. If the page node exists in the access page/reference page, repeated creation is not needed.
In this embodiment, a corresponding page node may be created for the access page/reference page for which no page node exists in the current record, and a path connection line may be established between the page node of the reference page and the page node of the access page. For example, in the current access record, the access page is www.jd.com, and the reference page of the access page is www.baidu.com, where www.baidu.com already has a page node, it may not be created repeatedly. Www.jd.com has no page node, then a corresponding page node www.jd.com can be created. In the access record, www.jd.com is linked from www.baidu.com, then a path connection may be established between page node of www.jd.com and page node of www.baidu.com, which may be in the direction from page node of www.baidu.com to page node of www.jd.com. In this way, the process of creating the page node and the path connecting line is performed on each access record in the historical access records, so that a page access path tree corresponding to the historical access records of the user can be generated.
In an embodiment of the present application, each page node in the page access path tree may correspond to the number of times of access and the number of times of reference. If the page node already exists in the visited page/referred page in the current record, the visited times/referred times corresponding to the page node may be increased once. Thus, after the page access path tree is created, each page node can correspond to the total number of times of being accessed and referenced. Therefore, when the number of times of accessing each page in the historical access record is counted, the number of times of accessing corresponding to each page node in the page access path tree can be counted. Similarly, when counting the number of times that each page is referred to, the number of times that each page node in the page access path tree is referred to may be counted.
In an embodiment of the present application, when recording a reference page corresponding to each page, each page node in the page access path tree may be traversed, and each page node referred by a current page node is recorded, so that a page corresponding to the recorded page node may be used as a reference page corresponding to the current page. After each page node in the page access path tree is counted, a reference page corresponding to each page can be obtained.
Referring to fig. 3, the present embodiment further provides an apparatus for detecting an abnormal access page, where the apparatus includes a network communication port 110 and a processor 210.
The network communication port 110 is used for performing network data communication.
The processor 210 is configured to obtain a historical access record of a user through the network communication port, where the historical access record includes a page identifier of a current access page and reference information of the current access page; according to the page identification and the reference information, counting the number of times that each page is accessed in the historical access record and recording reference data corresponding to each page; and detecting abnormal access pages from the historical access records according to the number of times of the access of each page and the reference data corresponding to each page.
In this embodiment, the network communication port 110 may refer to a hardware port or a software port. The hardware port may be a USB port, a serial port, etc. The software ports may be communication protocol ports in the network that are directed to connection services and connectionless services. The communication protocol may comprise, for example, a TCP/IP protocol or a UDP protocol.
The processor 210 may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth. The present application is not limited.
The specific functions implemented by the network communication port 110 and the processor 210 of the detection apparatus disclosed in the foregoing embodiments may be explained by comparing with the embodiment of the detection method for an abnormal access page in the present application, so that the embodiment of the detection method for an abnormal access page in the present application may be implemented and the technical effect of the embodiment of the method may be achieved.
Referring to fig. 4, the present application also provides a data analysis method, which may include the following steps.
S61: acquiring a historical access record of a user, wherein the historical access record comprises a page identifier of an access page and reference information of the access page.
S63: and counting the accessed times of each page in the historical access record and recording the reference data corresponding to each page according to the page identification and the reference information.
S65: and detecting abnormal access pages from the historical access records according to the number of times of the access of each page and the reference data corresponding to each page.
S67: and removing data related to the abnormal access page from the historical access record to obtain target service data.
S69: and performing data analysis based on the target service data.
In this embodiment, the reference data corresponding to each page may include: the number of times each page is referred to or the reference page corresponding to each page. The specific implementation manner of steps S61 to S65 may refer to the description of steps S1 to S5, and will not be described herein again.
In this embodiment, after an abnormal access page is detected, data related to the abnormal access page may be removed from the history access log. In particular, the data related to the abnormal access page may include a page identification of the abnormal access page and/or reference information of the abnormal access page. The page identifier of the abnormal access page may be a character string for pointing to the abnormal access page. For example, the page identifier may be a Uniform Resource Locator (URL). The reference information may indicate from which page the abnormally accessed page is linked. The reference information may include a page identifier of a page above the abnormal access page. For example, the page identifier of the abnormal access page may be www.jd-404.com, and the reference information of the abnormal access page includes the page identifier of www.google.com. This indicates www.jd-404. com's access page is linked from www.google.com.
In this embodiment, after data related to the abnormal access page is removed from the historical access record, target business data can be obtained. The target service data may include information of an access page that can be normally displayed. The information of the normally displayed access page may include an identifier of the access page, a time of accessing the page, a time of staying in the page, a reference page of the access page, and other information. By analyzing the target service data, the behavior characteristics of the user for webpage browsing can be obtained.
In this embodiment, the analyzing the target service data may include analyzing a search keyword of the user from the target service data. Currently, a search engine usually stores a search keyword used by a user in a URL, so that the search keyword of the user can be intercepted from the URL included in target service data. Specifically, the URL may store a search keyword used by the user by presetting a search keyword variable. In an actual application scenario, the preset search keyword variables corresponding to different search engines may also be different. For example, the preset search keyword variables may include/word/,/keyword/,/word/, and the like. After these preset search keyword variables, the search keywords used by the user may be filled in. A combination of preset search keyword variables and search keywords used by the user may be stored in the query field of the URL. In this way, the query field of the URL included in the target service data is extracted, and the preset search keyword variable is identified from the extracted query field, so that the search keyword included in the target service data can be acquired. After the search keywords of the user are obtained through analysis, the server of the website can count the search keywords with high search frequency, so that the search keywords with high search frequency are automatically pushed to the user in a search bar of a home page of the website.
In this embodiment, the analyzing the target service data may further include counting access pages that are relatively interested by the user. Specifically, the number of times each access page is accessed and the time length of each access page being accessed may be counted in the target service data. After statistics, the access pages in the target service data may be sorted according to the access times and the access duration. In this way, the access pages contained in the target service data can be sorted by the access times or the access time, so that the pages with higher access times and longer access time can be recommended to the user. Of course, in an actual application scenario, the pages accessed may also be sorted according to other criteria. For example, the pages accessed by the user may be sorted according to the number of clicks of the user on each page accessed by the user, and the sorting manner is not limited in the present application.
As can be seen from the above, according to the detection method, the detection device and the data analysis method for the abnormal access page provided by the embodiment of the application, the accessed page and the referred page in the historical access record of the user are analyzed, so that the abnormal access page can be detected from the historical access record according to the number of times that each page is accessed and the number of times that each page is referred. The detection method, the detection device and the data analysis method for the abnormal access page can automatically analyze the historical access record, and avoid a manual detection mode, so that the detection efficiency of the abnormal access page is improved, and the data analysis efficiency is improved.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate a dedicated integrated circuit chip 2. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardbyscript Description Language (vhr Description Language), and the like, which are currently used by Hardware compiler-software (Hardware Description Language-software). It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for embodiments of the detection device, reference may be made to the introduction of embodiments of the method described above in comparison with the explanation.
The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Although the present application has been described in terms of embodiments, those of ordinary skill in the art will recognize that there are numerous variations and permutations of the present application without departing from the spirit of the application, and it is intended that the appended claims encompass such variations and permutations without departing from the spirit of the application.

Claims (13)

1. A method for detecting an abnormal access page, the method comprising:
acquiring a historical access record of a user, wherein the historical access record comprises a page identifier of an access page and reference information of the access page;
according to the page identification and the reference information, counting the number of times that each page is accessed in the historical access record and recording reference data corresponding to each page; wherein, the reference data corresponding to each page comprises: a reference page corresponding to each page;
and detecting abnormal access pages from the historical access records according to the number of times of the access of each page and the reference data corresponding to each page.
2. The method of claim 1, wherein the reference data corresponding to each page further comprises: the number of times each page is referenced.
3. The method according to claim 1 or 2, wherein when the reference data corresponding to each page includes the number of times that each page is referred to, detecting an abnormally accessed page from the history access record according to the number of times that each page is accessed and the reference data corresponding to each page includes:
when the number of times of accessing a preset page is greater than or equal to a preset threshold value and the preset page is not referred by other pages, judging that the preset page is an abnormal access page.
4. The method according to claim 1 or 2, wherein when the reference data corresponding to each page includes a reference page corresponding to each page, detecting an abnormal access page from the history access record according to the number of times each page is accessed and the reference data corresponding to each page comprises:
when the number of times of accessing a preset page is greater than or equal to a preset threshold value and the preset page does not have a reference page, determining that the preset page is an abnormal access page.
5. The method according to claim 1 or 2, wherein the reference information of the access page is obtained as follows:
responding to a page access request sent by a client of a user, and feeding back page information to the client of the user, wherein the page information comprises a script used for obtaining reference information;
and receiving reference information sent by the client of the user after the script is executed, and writing the received reference information into a historical access record of the user.
6. The method of claim 1 or 2, wherein after obtaining the user's historical access record, the method further comprises:
and generating a page access path tree of the user according to the historical access record, wherein the page access path tree comprises at least one page node, a path connecting line exists between the page nodes with the reference relationship, and the path connecting line points to the accessed page node from the referenced page node.
7. The method of claim 6, wherein generating the page access path tree for the user based on the historical access records comprises:
sequencing each record in the historical access records according to the access time;
traversing each record in the historical access record, and judging whether a page node exists in an access page/reference page in the current record or not;
and establishing a corresponding page node for the access page/reference page without the page node in the current record, and establishing a path connection line between the page node of the reference page and the page node of the access page.
8. The method of claim 7, further comprising:
and if the page node already exists in the access page/reference page in the current record, increasing the accessed times/referenced times corresponding to the page node once.
9. The method of claim 6, wherein counting the number of times each page in the historical access record is accessed comprises:
and counting the accessed times corresponding to each page node in the page access path tree.
10. The method of claim 6, wherein recording the reference data corresponding to each page comprises:
counting the number of times of being quoted corresponding to each page node in the page access path tree;
or
And traversing each page node in the page access path tree, recording each page node referred by the current page node, and taking the page corresponding to the recorded page node as a referred page corresponding to the current page.
11. An apparatus for detecting an anomalous access page, the apparatus comprising a network communication port and a processor, wherein:
the network communication port is used for carrying out network data communication;
the processor is used for acquiring a historical access record of a user through the network communication port, wherein the historical access record comprises a page identifier of a current access page and reference information of the current access page; according to the page identification and the reference information, counting the number of times that each page is accessed in the historical access record and recording reference data corresponding to each page; detecting abnormal access pages from the historical access records according to the number of times that each page is accessed and the reference data corresponding to each page; wherein, the reference data corresponding to each page comprises: and the reference page corresponds to each page.
12. A method of data analysis, the method comprising:
acquiring a historical access record of a user, wherein the historical access record comprises a page identifier of an access page and reference information of the access page;
according to the page identification and the reference information, counting the number of times that each page is accessed in the historical access record and recording reference data corresponding to each page; wherein, the reference data corresponding to each page comprises: a reference page corresponding to each page;
detecting abnormal access pages from the historical access records according to the number of times that each page is accessed and the reference data corresponding to each page;
removing data related to the abnormal access page from the historical access record to obtain target service data;
and performing data analysis based on the target service data.
13. The method of claim 12, wherein the reference data corresponding to each page further comprises: the number of times each page is referenced.
CN201710024279.0A 2017-01-13 2017-01-13 Method and device for detecting abnormal access page and data analysis method Active CN108304410B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111582729.0A CN114417197A (en) 2017-01-13 2017-01-13 Access record processing method and device and storage medium
CN201710024279.0A CN108304410B (en) 2017-01-13 2017-01-13 Method and device for detecting abnormal access page and data analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710024279.0A CN108304410B (en) 2017-01-13 2017-01-13 Method and device for detecting abnormal access page and data analysis method

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202111582729.0A Division CN114417197A (en) 2017-01-13 2017-01-13 Access record processing method and device and storage medium

Publications (2)

Publication Number Publication Date
CN108304410A CN108304410A (en) 2018-07-20
CN108304410B true CN108304410B (en) 2022-02-18

Family

ID=62872348

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202111582729.0A Pending CN114417197A (en) 2017-01-13 2017-01-13 Access record processing method and device and storage medium
CN201710024279.0A Active CN108304410B (en) 2017-01-13 2017-01-13 Method and device for detecting abnormal access page and data analysis method

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202111582729.0A Pending CN114417197A (en) 2017-01-13 2017-01-13 Access record processing method and device and storage medium

Country Status (1)

Country Link
CN (2) CN114417197A (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109167773B (en) * 2018-08-22 2021-01-26 杭州安恒信息技术股份有限公司 Access anomaly detection method and system based on Markov model
CN109948025B (en) * 2019-03-20 2023-10-20 上海古鳌电子科技股份有限公司 Data reference recording method
CN110381151B (en) * 2019-07-24 2021-12-28 秒针信息技术有限公司 Abnormal equipment detection method and device
CN110740074B (en) * 2019-08-22 2023-04-18 创新先进技术有限公司 Network address detection method and device and electronic equipment
CN110704779A (en) * 2019-09-27 2020-01-17 杭州迪普科技股份有限公司 Website page access compliance detection method, device and equipment
CN110781372B (en) * 2019-10-28 2022-04-08 珠海格力电器股份有限公司 Method and device for optimizing website, computer equipment and storage medium
CN111079138A (en) * 2019-12-19 2020-04-28 北京天融信网络安全技术有限公司 Abnormal access detection method and device, electronic equipment and readable storage medium
CN111447228A (en) * 2020-03-27 2020-07-24 四川虹美智能科技有限公司 Intelligent household appliance access request processing method and system, cloud server and intelligent air conditioner
CN113722193A (en) * 2020-05-29 2021-11-30 北京沃东天骏信息技术有限公司 Method and device for detecting page abnormity
CN112084439B (en) * 2020-09-02 2023-12-19 上海谋乐网络科技有限公司 Method, device, equipment and storage medium for identifying variable in URL
CN112506582B (en) * 2020-12-18 2024-04-09 北京百度网讯科技有限公司 Method, device, equipment and medium for processing small program data packet
CN113271322B (en) * 2021-07-20 2021-11-23 北京明略软件系统有限公司 Abnormal flow detection method and device, electronic equipment and storage medium
CN113535823B (en) * 2021-07-26 2023-11-10 北京天融信网络安全技术有限公司 Abnormal access behavior detection method and device and electronic equipment
CN116743501B (en) * 2023-08-10 2023-10-20 杭银消费金融股份有限公司 Abnormal flow control method and system
CN117407204B (en) * 2023-11-01 2024-09-17 北京优特捷信息技术有限公司 Application program fault positioning method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102281298A (en) * 2011-08-10 2011-12-14 深信服网络科技(深圳)有限公司 Method and device for detecting and defending challenge collapsar (CC) attack
CN103823883A (en) * 2014-03-06 2014-05-28 焦点科技股份有限公司 Analysis method and system for website user access path
CN104601558A (en) * 2014-12-31 2015-05-06 微梦创科网络科技(中国)有限公司 Method and device for defending cross-site request forgery attacks

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011175611A (en) * 2010-01-29 2011-09-08 Fujitsu Toshiba Mobile Communications Ltd Mobile terminal device
US20130030875A1 (en) * 2011-07-29 2013-01-31 Panasonic Corporation System and method for site abnormality recording and notification
US20140208385A1 (en) * 2013-01-24 2014-07-24 Tencent Technology (Shenzhen) Company Limited Method, apparatus and system for webpage access control
CN103401849B (en) * 2013-07-18 2017-02-15 盘石软件(上海)有限公司 Abnormal session analyzing method for website logs
CN105302811B (en) * 2014-06-13 2020-01-10 腾讯科技(深圳)有限公司 Browser page skipping method and device
CN106027577B (en) * 2016-08-04 2019-04-30 四川无声信息技术有限公司 A kind of abnormal access behavioral value method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102281298A (en) * 2011-08-10 2011-12-14 深信服网络科技(深圳)有限公司 Method and device for detecting and defending challenge collapsar (CC) attack
CN103823883A (en) * 2014-03-06 2014-05-28 焦点科技股份有限公司 Analysis method and system for website user access path
CN104601558A (en) * 2014-12-31 2015-05-06 微梦创科网络科技(中国)有限公司 Method and device for defending cross-site request forgery attacks

Also Published As

Publication number Publication date
CN114417197A (en) 2022-04-29
CN108304410A (en) 2018-07-20

Similar Documents

Publication Publication Date Title
CN108304410B (en) Method and device for detecting abnormal access page and data analysis method
CN104426713B (en) The monitoring method and device of web site access effect data
CN103237094B (en) A kind of method and device identifying user
CN104850546B (en) Display method and system of mobile media information
CN107644100B (en) Information processing method, device and system and computer readable storage medium
CN106021583B (en) Statistical method and system for page flow data
CN112486708B (en) Page operation data processing method and processing system
CN106570013B (en) Method and device for processing page access data
CN107578263A (en) A kind of detection method, device and the electronic equipment of advertisement abnormal access
CN102436564A (en) Method and device for identifying falsified webpage
WO2014180130A1 (en) Method and system for recommending contents
CN110781372B (en) Method and device for optimizing website, computer equipment and storage medium
CN102831218A (en) Method and device for determining data in thermodynamic chart
CN103617266A (en) Personalized extension search method, device and system
CN103246713A (en) Web surfing method and web surfing device
CN109743309B (en) Illegal request identification method and device and electronic equipment
CN112232075A (en) Article release time identification method based on time format and webpage element characteristics
CN116015842A (en) Network attack detection method based on user access behaviors
CN101261643B (en) Website page information statistical method and apparatus
CN107526748B (en) Method and equipment for identifying user click behavior
CN111221711A (en) User behavior data processing method, server and storage medium
CN111125704B (en) Webpage Trojan horse recognition method and system
US9824140B2 (en) Method of creating classification pattern, apparatus, and recording medium
CN104063506A (en) Method and device for identifying repeated web pages
CN106095946B (en) Page processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant