WO2013181972A1 - 网络访问行为识别方法和装置 - Google Patents

网络访问行为识别方法和装置 Download PDF

Info

Publication number
WO2013181972A1
WO2013181972A1 PCT/CN2013/074873 CN2013074873W WO2013181972A1 WO 2013181972 A1 WO2013181972 A1 WO 2013181972A1 CN 2013074873 W CN2013074873 W CN 2013074873W WO 2013181972 A1 WO2013181972 A1 WO 2013181972A1
Authority
WO
WIPO (PCT)
Prior art keywords
network access
user
identifier
access behavior
behavior
Prior art date
Application number
PCT/CN2013/074873
Other languages
English (en)
French (fr)
Inventor
唐东
张洪丁
周(韦华)
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2013181972A1 publication Critical patent/WO2013181972A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user

Definitions

  • the present invention relates to the field of network technologies, and more particularly to a network access behavior identification method and apparatus.
  • User behavior analysis refers to the behavior information when the user accesses the network. For example, it may include information such as the website URL, search keywords, and web content of the webpage, perform statistics, analysis, and obtain the rules of the user's access to the network, and the rules and networks.
  • the combination of marketing strategies, etc. to identify possible problems in current online marketing activities, and to provide a basis for further revision or re-engineering network marketing strategies.
  • existing website pages may be associated with other web pages, such as self-embedded images, advertisements, and the like.
  • the client sends a network access request, and after receiving the request response from the server, establishing a network connection with the server, the client displays
  • the sub-page address included in the webpage content is also parsed. Therefore, the client will continue to automatically send a network access request, requesting to obtain the sub-page content corresponding to the sub-page address.
  • the network access behavior triggered by each client obtains its behavior information for analysis and processing, which will affect the accuracy of the user behavior analysis. So know about network access behavior Do not determine the effective network access behavior, that is, the network access behavior triggered by the user, which becomes the key to the accuracy of user behavior analysis.
  • the existing network access behavior identification method generally identifies the network access behavior of the first network access request triggered by the client in the same network connection as an effective network access behavior, so when performing user behavior analysis, only the same The behavior information related to the first network access behavior in the network connection is analyzed and processed.
  • the inventor has found that the first network access behavior in the same network connection may also be the client because the web page requested by the user and the associated sub-page may not share the same network connection.
  • the present invention provides a network access behavior identification method and apparatus to more accurately identify network access behaviors and improve the accuracy of user behavior analysis.
  • the present invention provides the following technical solutions:
  • An aspect of the present invention provides a network access behavior identification method, including:
  • Obtaining network access information of a current network access behavior of the user where the network access information includes a user identifier of the user and a network access time;
  • the network access information record of the user Querying the network access information record of the user according to the user identifier of the user; if the network access information record of the user is queried, the network access time according to the current network access behavior of the user is The network access time of the last network access behavior of the user recorded in the network access information record determines the validity of the current network access behavior of the user.
  • Another aspect of the present invention provides a network access behavior identification method, including: acquiring network access information of a current network access behavior of a user, where the network access information includes a user identifier of the user and a webpage identifier of the access;
  • the network access information record of the user Querying the network access information record of the user according to the user identifier of the user; querying, from the pre-stored webpage identity association relationship set, the parent webpage identifier associated with the webpage identifier accessed by the current network access behavior of the user; If the network access information record of the user and the parent webpage identifier are queried, determining, according to the parent webpage identifier and the webpage identifier accessed by the user of the last network access behavior recorded in the network access information record, Describe the validity of the user's current network access behavior.
  • a network access behavior identifying apparatus including: a network information acquiring module, configured to acquire network access information of a current network access behavior of a user, where the network access information includes a user identifier of the user And network access time;
  • a record querying module configured to obtain the network access information from the network information obtaining module, and query the network access information record of the user according to the user identifier of the user included in the network access information;
  • a first determining module configured to: when the record query module queries the network access time of the last network access behavior of the user recorded in the network access information record of the user, determine the current network access behavior of the user Effectiveness.
  • a network access behavior identifying apparatus including: a network information acquiring module, configured to acquire network access information of a current network access behavior of a user, where the network access information includes a user identifier of the user And network access time;
  • a record querying module configured to obtain the network access information from the network information obtaining module, and query the network access information record of the user according to the user identifier of the user included in the network access information;
  • an identifier querying module configured to query, from the webpage identifier association set, a parent webpage identifier associated with the webpage identifier accessed by the current network access behavior of the user;
  • a second determining module configured to: when the record query module queries the network access information record of the user, and the identifier query module queries the parent webpage identifier, according to the parent webpage identifier and the network The webpage identifier accessed by the user's last network access behavior recorded in the information record is accessed, and the validity of the current network access behavior of the user is determined.
  • the present invention provides a network access behavior identification method and device, which obtains network access information of a user's current network access behavior and searches for the network access information record of the user. And determining whether the current network access behavior is an effective network access according to the current network access time and the user's last network access time in the network access information record.
  • Behavior the embodiment of the present invention effectively judges each network access behavior, and judges the current network access behavior through the user history network access behavior, thereby improving the accuracy of the network access behavior recognition, thereby making the user behavior analysis result more accurate.
  • Embodiment 1 is a flowchart of Embodiment 1 of a network access behavior identification method according to the present invention
  • Embodiment 2 is a flowchart of Embodiment 2 of a method for identifying a network end access behavior according to the present invention
  • Embodiment 3 is a flowchart of Embodiment 3 of a network access behavior identification method according to the present invention.
  • Embodiment 4 is a flowchart of Embodiment 4 of a network access behavior identification method according to the present invention.
  • FIG. 5 is a schematic structural diagram of Embodiment 1 of a network access behavior identification apparatus according to the present invention
  • FIG. 6 is a schematic structural diagram of Embodiment 2 of a network access behavior identification apparatus according to the present invention
  • FIG. 8 is a schematic structural diagram of Embodiment 4 of a network access behavior identification apparatus according to the present invention.
  • the embodiment of the invention discloses a network access behavior identification method and device, which acquires network access information of a user's current network access behavior, and searches for the network access information record of the user, and after querying the network access information record of the user, According to the current network access time and the user's last network access time in the network access information record, it is determined whether the current network access behavior is an effective network access behavior, and the embodiment of the present invention effectively judges each network access behavior and improves The accuracy of network access behavior recognition makes the user behavior analysis result more accurate.
  • FIG. 1 is a flowchart of Embodiment 1 of a network access behavior identification method according to the present invention, where the method may include:
  • Step 101 Obtain network access information of a current network access behavior of the user, where the network access information includes a user identifier of the user and a network access time.
  • the user sends a network to the access request through the client, such as an Http (hypertext transport protocol) request, and the client establishes a network connection with the server, for example, a TCP (Transmission Control Protocol) connection, the user is Access to the network.
  • Http hypertext transport protocol
  • TCP Transmission Control Protocol
  • the network access information may be the basic network information of the user network access behavior, which may be obtained from the information carried in the network access request when the network access request is sent, and the network access information may include the user identifier. , the web page ID of the visit, the network access time, and so on.
  • the webpage identifier is an identifier of a webpage accessed by the current network access behavior, and may be a webpage address URL (Uniform I Universal Resource Locator).
  • the network access information may also be information obtained from the request response after the client establishes a network connection with the server, and the network access information may further include information such as request response time, response content, and the like. As long as the user requests access to the network, the user's network access information can be obtained.
  • the user identifier is a unique identifier that can identify different access users. For example, in a mobile communication system, it may be an IMEI (International Mobile Equipment Identity) or a communication number of a mobile device, such as a mobile phone. Number; ADSL (Asymmetric Digital Subscriber Line) account number can be used in the fixed telephone network system.
  • IMEI International Mobile Equipment Identity
  • ADSL Asymmetric Digital Subscriber Line
  • the access time may be determined from timestamp information carried by the network connection request.
  • Step 102 Query the network access information record of the user according to the user identifier of the user.
  • the network access information record may refer to the saved network access information of different users, and may be historical effective network access information. According to the user identifier in the network access information, it can be queried whether there is historical network access information corresponding to the user.
  • Step 103 If the network access information record of the user is queried, determining the location according to the current network access time of the user and the network access time of the user's last network access behavior recorded in the network access information record. Describe the validity of the user's current network access behavior. If the network access information record of the user is saved, the network access information includes the network access time. Then, the last network access time of the user's last network access behavior may be determined from the network access information record, and the current network may be determined according to the current network access time of the current network access behavior and the last network access time. The effectiveness of the access behavior.
  • the last network access time of the user refers to the network access time of the last network access behavior recorded in the network access information record from the current network access behavior of the user.
  • the predetermined value is obtained according to different client performance, user habits, network delay, and the like, and is a value that is calculated according to the time interval that the user currently requests to access the network and the time when the user requested to access the network. For example, the user is After clicking a web link on the browser, clicking the next web link again, the time difference between the two access requests initiated by the browser due to the user click is usually more than 2s. If the link clicked by the user has a sub-link, the browser initiates the first network access request, and the time difference of the corresponding network access request of the sub-link associated with the next associated initiative is usually 2s or 2s, so The predetermined value can be set to 2s.
  • the time difference between the current network time of the user and the recorded last network access time is greater than a predetermined value, it indicates that the current network access behavior of the user is a network access behavior initiated by the user, and the current network access behavior may be determined to be valid. If the time difference between the current network time of the user and the recorded last network access time is less than or equal to the predetermined value, it indicates that the current network access behavior is not a network access behavior that the user actively requests, which may be an active behavior of the client. Therefore, it can be determined that the current end-of-line behavior is invalid.
  • step 102 if the network access information record is not queried, it can be known that the user has not performed the network access behavior before, so that the current network access behavior of the user can be determined as an effective network access behavior.
  • the current network access information of the user may be recorded. If it is determined that the user's current network access behavior is invalid The network access behavior may not record the current network access information of the user. Therefore, the network access information record that is queried is specifically the network access information recorded for effective network access behavior.
  • the network access information record may be saved in the form of a data table, and each network access information uniquely identifies the user's primary network access behavior. In order to avoid the overflow of the entry, the query is inaccurate.
  • the network access information record needs to be aged, and when the data table capacity reaches a set threshold, the first record item is deleted. That is, according to the set capacity, one or more network access information records with a long storage time are deleted to ensure the set capacity of the record.
  • the effective network access behavior can be analyzed and processed, and the network access information of the recorded effective network access behavior can be provided to the user behavior analysis system, and the user behavior analysis system can process For example, statistics on the visited web address and access time can also be used to crawl the webpage content according to the access URL, and to count the content of the accessed webpage.
  • the network access behavior identification described in the embodiment of the present invention may be identified by the user behavior analysis system, or may be identified by a client, a server, or a gateway that forwards the request connected to the user behavior analysis system, and then may be The network access information is then provided to the user behavior analysis system.
  • Embodiment 2 is a flowchart of Embodiment 2 of a method for identifying a network access behavior according to the present invention.
  • the method may include:
  • Step 201 Obtain network access information of a current network access behavior of the user, where the network access information includes a user identifier of the user and a webpage identifier accessed.
  • the network access information may further include a network access time. Therefore, the network access information of the current network access behavior may include the webpage identifier currently accessed by the user and the current network access time.
  • Step 202 Query the network access information record of the user according to the user identifier of the user.
  • Step 203 Query, from a pre-stored webpage identifier association set, a parent webpage identifier associated with the webpage identifier accessed by the current network access behavior of the user.
  • the webpage identity association relationship set includes a correspondence relationship between different parent webpage identifiers and subpagepage identifiers.
  • the webpage identifier association relationship set may be pre-stored by the system, and the sub-link address embedded in different webpage content may be parsed in advance, and the webpage identifier is a parent webpage identifier, and the embedded sub-link address is a parent webpage identifier association.
  • the sub-page identifier so that the correspondence between different parent web page identifiers and different sub-web page identifiers can be established to form the association relationship set.
  • a parent page identifier typically corresponds to multiple associated child page identifiers.
  • the different webpage content may be webpage content accessed by an effective web access behavior of the user history.
  • the current webpage identifier is used as a sub-webpage identifier, and the query has a parent webpage identifier associated with the webpage identifier.
  • the hash retrieval algorithm may be used to query whether there is a parent webpage identifier associated with the current webpage identifier. That is to say, the webpage identifiers in the webpage identifier association set are all corresponding to hash codes, or are all stored in a hash coded form.
  • the subpage identifier can be represented as a hash code, and its associated parent webpage identifier can be represented as a Patent Hash code. Therefore, when the query is performed, the current webpage identifier is first represented as a hash code, and then the search algorithm is used to retrieve whether the hash code corresponding to the hash code exists.
  • the hash code corresponding to the different webpage identifiers may be the same, so each webpage identifier may be represented by at least two hash codes, such as the hash code corresponding to the sub-page identifier being hash codel and hash code2, and the corresponding parent webpage identifier.
  • the hash code is Patent Hash codel and Patent Hash code2, and at least two hash codes can uniquely determine a web page identifier.
  • step 202 and the step 203 are not limited to the sequence described in this embodiment, and may be performed at the same time, or the identifier query is performed first, and then the network access information is queried.
  • Step 204 If the network access information record and the parent webpage identifier of the user are queried, the webpage accessed by the user and the last web access behavior recorded in the network access information record is An identifier that determines the validity of the current network access behavior of the user. When the presence of the user's network access information record and the parent web page identifier are found. The webpage identifier accessed by the user for the last network access behavior can be obtained from the user's network access information record. Based on the last visited webpage identifier and the parent webpage identifier, it can be determined whether the current network access behavior is valid.
  • the specificity is: comparing whether the parent webpage identifier is different from the webpage identifier accessed by the user in the last network access behavior, and if there is a parent webpage identifier corresponding to the current webpage identifier, the webpage corresponding to the current webpage identifier is a parent.
  • the sub-page associated with the web page, but the sub-page may also be requested by the user. Therefore, if there is a parent webpage identifier associated with the current webpage identifier in the webpage identifier association set, it is further required to determine whether the webpage identifier of the parent webpage identifier and the last web access behavior of the user is different to determine the current visit.
  • the webpage identifier is a sub-page identifier associated with the webpage identifier accessed by the user for the last network access behavior, and if so, the webpage corresponding to the current webpage identifier is not requested by the user, but the client actively triggers.
  • the parent webpage identifier is the same as the webpage identifier of the last visit, it may be determined that the webpage identifier i accessed by the current web end access behavior is only the sub-page identifier of the last visited webpage label, and may be regarded as the current network.
  • the access behavior is the active behavior of the client, indicating that the user's current network access behavior is invalid network access behavior.
  • the current network access behavior of the user may also be determined as an invalid network access behavior.
  • the parent webpage identifier corresponding to the currently accessed webpage identifier When the parent webpage identifier corresponding to the currently accessed webpage identifier is not queried, it indicates that the currently accessed webpage identifier is not a sub-page identifier, or may indicate that the current network access behavior is triggered by the user for the first time, and accesses according to the parsing history network access behavior. If the currently accessed webpage identifier or its associated parent webpage identifier does not exist in the webpage identifier association set obtained by the webpage, the current network access behavior of the user may be determined to be an invalid network access behavior.
  • the current network access information of the user may also be recorded.
  • the user's current network access line Obtaining, for the accessed webpage identifier, the webpage content corresponding to the currently accessed webpage identifier, and parsing the webpage content to obtain the subpage identifier associated with the currently accessed webpage identifier, and the currently accessed webpage identifier and its associated sub-page
  • the webpage identifier is saved in the webpage identity association set.
  • FIG. 3 is a flowchart of Embodiment 3 of a method for identifying a network access behavior according to the present invention.
  • the method may include:
  • Step 301 Obtain network access information of the current network access behavior of the user.
  • the network access information may include a user identifier of the user, a network access time, and a webpage identifier accessed.
  • Step 302 Query whether the network access information record of the user is saved according to the user identifier of the user. If yes, go to step 303. If no, go to step 307.
  • Step 303 Obtain a network access time of the last network access behavior of the user from the network access information record of the user.
  • Step 304 Determine whether the time difference between the current network access time of the user and the network access time of the user's last network access behavior is greater than a predetermined value. If yes, go to step 307. If no, go to step 305.
  • Step 305 Query whether there is a parent webpage identifier associated with the webpage identifier accessed by the current network access behavior of the user from the pre-stored webpage association relationship set. If yes, go to step 306. If no, go to step 307.
  • the webpage identity association relationship set includes a correspondence relationship between different parent webpage identifiers and subpagepage identifiers.
  • the webpage identifier association set may be pre-stored by the system, and may be parsed in advance.
  • the sub-link address embedded in the webpage content is the parent webpage identifier
  • the embedded sub-link address is the sub-webpage identifier associated with the parent webpage identifier, so that the correspondence between different parent webpage identifiers and different sub-page identifiers can be established. Relationships form the set of associations.
  • a parent page identifier typically corresponds to multiple associated child page identifiers.
  • the different webpage content may be webpage content accessed by an effective web access behavior of the user history.
  • the hash retrieval algorithm may be used to query whether there is a parent webpage identifier associated with the current webpage identifier.
  • the specific retrieval process can be referred to in the above-described Embodiment 2.
  • Step 306 Determine whether the parent webpage identifier is different from the webpage identifier accessed by the user for the last network access behavior. If yes, go to step 307. If no, go to step 309.
  • the webpage identifier accessed by the user for the last network access behavior is obtained from the network access information record of the user. Since the user quickly selects different webpage identifiers when performing network access, the time difference between the current network access time and the network access time of the user's last network access behavior may be less than or equal to a predetermined value. When the judgment result of 304 is NO, it is necessary to continue the identification of the network access behavior.
  • Step 307 Determine the current network access behavior of the user as an effective network access behavior, and record the network access information corresponding to the network access behavior of the current network access behavior.
  • Step 308 Obtain the webpage content corresponding to the webpage identifier currently accessed by the user, parse the webpage content, obtain the sub-webpage identifier associated with the currently accessed webpage identifier, and record the user in the webpage identifier association relationship set. The correspondence between the currently accessed web page identifier and its associated sub-page identifier.
  • the network access information record of the user When the network access information record of the user is not queried, or the parent webpage identifier associated with the currently accessed webpage identifier is not queried, or the network access time of the current network access time and the last network access behavior of the user is greater than a predetermined value, or When the current network access time and the network access time of the user's last network access behavior are less than or equal to a predetermined value, but the parent webpage identifier is different from the webpage identifier accessed by the last network access behavior, the current network access of the user may be determined.
  • a network access information record that behaves as an effective network access behavior and records the user's current network access behavior.
  • the page identifier association relationship records the correspondence between the currently accessed webpage identifier and the parsed subpage identifier for determining the next network access behavior of the user.
  • the network access information record can be saved in the form of a data table.
  • the webpage identifier association relationship set may also be saved in the form of a data table, and the query is inaccurate in order to avoid the overflow of the table item.
  • the network association relationship table needs to be aged. When the capacity of the data table reaches the set capacity, the first stored storage item is deleted according to the order in which the web page identifier relationship is stored in time to ensure the recorded capacity.
  • Step 309 Determine that the current network access behavior of the user is an invalid network access behavior.
  • the network access information of the invalid network access behavior may not be recorded, and the content of the webpage accessed by the invalid network access behavior may be parsed to obtain the sub-page identifier.
  • the parent webpage identifier is the same as the webpage identifier of the last visited webpage, it may be determined that the webpage identifier i accessed by the current network access behavior is only the sub-page identifier of the last visited webpage label, and the current network access may be considered.
  • the behavior is the active behavior of the client, indicating that the user's current network access behavior is invalid network access behavior.
  • the behavior information of the effective network access behavior may be processed by the user behavior analysis system, and the behavior information includes a network access information record.
  • the network access information records the network access information that is the effective network access behavior of the user history, so the network access information record can be provided to the user behavior analysis system for processing, for example, the visited webpage.
  • the identification and access time are counted, and the corresponding webpage content can be crawled according to the webpage identifier, and the content of the accessed webpage is statistically analyzed and analyzed, so that the user behavior information analyzed by the user behavior analysis system is effective network access behavior information.
  • the behavioral information thus improves the accuracy of user behavior analysis.
  • the query after obtaining the network access information, accessing the information record through the user's network, and querying the user's network access information record, performing the current network access time and the user's last access time. If the value is greater than the predetermined value, the current access behavior of the user is determined to be valid network access information. If the value is less than or equal to the predetermined value, the query continues to query the pre-stored webpage identifier association relationship to obtain the parent webpage identifier associated with the currently accessed webpage identifier.
  • Embodiment 4 is a flowchart of Embodiment 4 of a network behavior identification method according to the present invention. The method may include:
  • Step 401 Obtain network access information of the current network access behavior of the user.
  • the network access information may include a user identifier of the user, a webpage identifier accessed, and a network access time.
  • Step 402 Query whether there is a network access information record of the user according to the user identifier of the user. If yes, go to step 403. If no, go to step 407.
  • Step 403 Query whether there is a parent webpage identifier associated with the webpage identifier accessed by the current network access behavior of the user from the pre-stored webpage association relationship set. If yes, go to step 404. If no, go to step 407.
  • Step 404 Obtain a webpage identifier accessed by the user for the last network access behavior from the network access information record of the user.
  • Step 405 Compare whether the parent webpage identifier is different from the webpage identifier accessed by the user for the last network access behavior. If yes, go to step 407. If no, go to step 406.
  • Step 406 Determine whether the time difference between the current network access time of the user and the network access time of the user's last network access behavior is greater than a predetermined value. If yes, go to step 407. If no, go to step 409.
  • the network access time of the user's last network access behavior is obtained from the network access information record.
  • Step 407 Determine the current network access behavior of the user as an effective network access behavior, and record network access information corresponding to the network access behavior of the current network access behavior.
  • the network access information record of the user is not queried, or the parent web page identifier associated with the currently accessed web page identifier is not queried, or the parent web page identifier is different from the web page identifier accessed by the user for the last network access behavior (ie, the current The accessed webpage identifier is not the sub-page identifier of the webpage identifier accessed by the user in the last network access behavior, and may determine that the current network access behavior is actively triggered by the user, or the parent webpage identifier and the user's last network access behavior.
  • the visited webpage identifier is the same, but the current network access time and the network access time of the user's last network access behavior are greater than a predetermined value (ie, the sub-webpage identifier of the webpage identifier accessed by the last network access behavior accessed by the current network access behavior is the user. Actively triggered, it can determine the current network access behavior of the user as an effective network access behavior, and record the network access information record of the user's current network access behavior. Simultaneously parsing the accessed webpage content to obtain the sub-webpage associated with the currently accessed webpage, and recording the correspondence between the currently accessed webpage identifier and the parsed subpage identifier in the webpage identity association set for The next judgment of the user's network access behavior.
  • Step 408 Obtain the webpage content corresponding to the webpage identifier currently accessed by the user, parse the webpage content, obtain the sub-webpage identifier associated with the currently accessed webpage identifier, and record the user in the webpage identifier association relationship set. The correspondence between the currently accessed web page identifier and its associated sub-page identifier.
  • Step 409 Determine that the current network access behavior of the user is an invalid network access behavior.
  • the network access information of the invalid network access behavior may not be recorded, and the content of the webpage accessed by the invalid network access behavior may be parsed to obtain the sub-page identifier.
  • the behavior information of the effective network access behavior may be processed by the user behavior analysis system, where the behavior information includes a network access information record.
  • the network access information records the network access information that is the effective network access behavior of the user history, so the network access information record can be provided to the user behavior analysis system for processing, for example, the visited webpage.
  • the identification and access time are counted, and the corresponding webpage content can be crawled according to the webpage identifier, and the content of the accessed webpage is statistically analyzed and analyzed, so that the user behavior information analyzed by the user behavior analysis system is effective network access behavior information.
  • the behavioral information thus improves the accuracy of user behavior analysis.
  • the current network access behavior after obtaining the network access information of the current network access behavior, querying the network access information record of the user, and querying the network access information record of the user, according to the pre-stored webpage identifier association relationship set. Determining whether the parent webpage identifier associated with the currently accessed webpage identifier is different, and comparing whether the parent webpage identifier is different from the webpage identifier accessed by the user's last network access behavior, and if different, determining that the current network access behavior of the user is an effective network If the access behavior is different, when it is determined that the time difference between the current network access time and the user's last network access time is greater than a predetermined value, the current network access behavior may also be determined as an effective network access behavior, and not simply connecting a network. The network access behavior corresponding to the first access request is identified as an effective network access behavior, thereby improving the accuracy of the identification, making the user behavior analysis more accurate, and eliminating the need to analyze the behavior information of all network access behaviors, reducing the Analysis cost and Degree.
  • FIG. 5 is a schematic structural diagram of Embodiment 1 of a network access behavior identification apparatus according to the present invention, where the apparatus may include:
  • the network information obtaining module 501 is configured to obtain network access information of a current network access behavior of the user, where the network access information includes a user identifier of the user and a network access time.
  • the record query module 502 is configured to obtain the network access information from the network information obtaining module, and query the network access information record of the user according to the user identifier of the user included in the network access information.
  • the network access information record refers to the network access information of the recorded history.
  • a first determining module 503 configured to: when the record querying module 502 queries the network access information record of the user, according to the network access time of the current network access behavior of the user, and the record recorded in the network access information record The network access time of the last network access behavior of the user determines the validity of the current network access behavior of the user.
  • the first determining module 503 may specifically include:
  • a time determining module 5031 configured to obtain the user from a network access information record of the user The network access behavior time of the last network access behavior;
  • the time judging module 5032 is configured to determine whether a time difference between a network access time of the current network access behavior of the user and a network access time of the user's last network access behavior is greater than a predetermined value;
  • the first determining sub-module 5033 is configured to: when the time judging module 5032 is YES, determine that the current network access behavior of the user is an effective network access behavior; and when the time judging module results in a yes, determine the The user's current network access behavior is invalid network access behavior.
  • the predetermined value is obtained according to different client performance, user habits, network delay, etc., and is a value that is calculated according to the time interval between the user's current request to access the network and the last time the user requested to access the network. For example, the user is After clicking a web link on the browser, click the next web link again, and the time difference between the two access requests initiated by the browser is usually more than 2s. If the link clicked by the user has a sub-link, the browser initiates the first network access request, and the time difference of the corresponding network access request of the sub-link associated with the next associated initiative is usually 2s or 2s, so The predetermined value can be set to 2s.
  • the first determining module is further configured to: when the record querying module does not query the network access information record of the user, determine that the current network access behavior of the user is an effective network access behavior.
  • the device may further include a recording module, configured to record current network access information of the user after determining that the current network access behavior of the user is an effective network access behavior.
  • a recording module configured to record current network access information of the user after determining that the current network access behavior of the user is an effective network access behavior.
  • the network access information record can be saved in the form of a data table, and each network access information uniquely identifies the user's primary network access behavior. In order to avoid the overflow of the entry, the query is inaccurate.
  • the network access information record needs to be aged, and when the data table capacity reaches a set threshold, the first record item is deleted. That is, according to the set capacity, one or more network access information records with a long storage time are deleted to ensure the set capacity of the record.
  • the effective network access behavior can be analyzed and processed, and the recorded network access information record of the effective network access behavior can be provided to the user behavior analysis system, and the user behavior analysis system can perform Processing, for example, statistics on the visited web address and access time, and also crawling the webpage content according to the access URL, and performing the content of the accessed webpage. Count.
  • the query module can find out whether the network access information record of the user is generated, and triggers when the network access information record of the user exists.
  • the first determining module determines and confirms the current network access behavior. Therefore, it can be determined whether the current network access behavior is an effective network access behavior, the accuracy of the recognition is improved, and the accuracy of the user behavior analysis is also improved.
  • the device in the embodiment of the present invention may be integrated into a client, a server, a gateway for relaying messages, or a user behavior analysis system.
  • the device When the device is integrated into a client, a server, or a gateway, the device may have a corresponding The interface is connected to the user behavior analysis system.
  • the device can also be connected as a separate entity to a client, server or gateway and to a user behavior analysis system.
  • FIG. 6 is a schematic structural diagram of Embodiment 2 of a network access behavior identification apparatus according to the present invention, where the apparatus may include:
  • the network information obtaining module 601 is configured to obtain network access information of a current network access behavior of the user, where the network access information includes a user identifier of the user and a network access time.
  • the network access information may also include a network access time. Therefore, the network access information of the current network access behavior may include the current webpage identifier of the user and the current network access time.
  • the record query module 602 is configured to obtain the network access information from the network information obtaining module, and query the network access information record of the user according to the user identifier of the user included in the network access information.
  • the identifier query module 603 is configured to query, from the webpage identifier association set, a parent webpage identifier associated with the webpage identifier accessed by the current network access behavior of the user.
  • the webpage identity association relationship set includes a correspondence relationship between different parent webpage identifiers and subpagepage identifiers.
  • the webpage identifier association set may be pre-stored by the system, and the sub-link address embedded in the webpage content may be parsed in advance, and the webpage identifier is the parent webpage identifier, and the embedded sub-link address is the parent webpage identifier association.
  • the sub-page identifier so that the correspondence between different parent web page identifiers and different sub-web page identifiers can be established to form the association relationship set.
  • a parent page identifier usually corresponds to Multiple associated subpage IDs.
  • the different webpage content may be webpage content accessed by an effective web access behavior of the user history.
  • the hash search algorithm may be used to query whether there is a parent webpage identifier associated with the current webpage identifier. That is, the identifier query module may be specifically configured to use the hash search to calculate whether the parent webpage identifier corresponding to the webpage identifier currently accessed by the user exists.
  • a second determining module 604 configured to: when the record query module 602 queries the network access information record of the user, and the identifier query module 603 queries the parent webpage identifier, according to the parent webpage identifier and Determining, by the network access information record, the webpage identifier accessed by the user of the last network access behavior, determining the validity of the current network access behavior of the user.
  • the second determining module 604 may specifically include:
  • the identifier determining module 6041 is configured to obtain, from the network access information record of the user, a webpage identifier that is accessed by the user for the last network access behavior.
  • the identifier determining module 6042 is configured to determine whether the parent webpage identifier is different from the webpage identifier accessed by the user for the last network access behavior.
  • the second determining sub-module 6043 is configured to determine, when the result of the identifier determining module 6042 is YES, that the current network access behavior of the user is an effective access behavior.
  • the second determining module is further configured to: when the record querying module does not query the network access information record of the user, or the identity query module does not query the current network access behavior of the user.
  • the webpage identifies the associated parent webpage identifier, and determines the current network access behavior of the user as an effective network access behavior.
  • the device may further include a recording module, configured to record current network access information of the user after determining that the current network access behavior of the user is an effective network access behavior.
  • a recording module configured to record current network access information of the user after determining that the current network access behavior of the user is an effective network access behavior.
  • An identifier parsing module configured to: after determining that the current network access behavior of the user is an effective network access behavior, obtain webpage content corresponding to the webpage identifier currently accessed by the user, and parse the webpage content to obtain the current accession The subpage identifier associated with the webpage identifier;
  • an identifier saving module configured to record, in the webpage identifier association set, a correspondence between the webpage identifier currently accessed by the user and the associated sub-page identifier.
  • the network access information obtaining module obtains the current network access behavior of the user.
  • the record query module queries the network access information record of the user
  • the identifier query module queries the parent web page identifier associated with the currently accessed webpage identifier from the pre-stored webpage identity association set.
  • the second determining module may determine the validity of the current network access behavior of the user according to the webpage identifier of the user's last network access behavior recorded by the user's network access information record and the parent webpage identifier. The validity of the current network access behavior is determined according to the network access behavior of the user history, and the accuracy of the judgment is improved, thereby making the user behavior analysis more accurate.
  • FIG. 7 is a schematic structural diagram of Embodiment 3 of a network access behavior identification apparatus according to the present invention, where the apparatus may include:
  • the network information obtaining module 701 is configured to obtain network access information of a current network access behavior of the user.
  • the network access information may include a user identification, a current network access time, and a current web page identification.
  • the record query module 702 is configured to query whether there is a network access information record including the user identifier.
  • a time determining module 703, configured to obtain a network access time of the last network access behavior of the user from the network access information record of the user;
  • the time judging module 704 is configured to determine whether a time difference between the current network access time of the user and the network access time of the user's last network access behavior is greater than a predetermined value.
  • the identifier query module 705 is configured to: when the result of the time judging module is YES, query from the pre-stored webpage association relationship set whether there is a parent webpage identifier associated with the webpage identifier accessed by the current network access behavior of the user.
  • the webpage identity association relationship set includes a correspondence relationship between different parent webpage identifiers and subpagepage identifiers.
  • the webpage identifier association set may be pre-stored by the system, and the sub-link address embedded in the webpage content may be parsed in advance, and the webpage identifier is the parent webpage identifier, and the embedded sub-link address is the parent webpage identifier association.
  • the sub-page identifier so that the correspondence between different parent web page identifiers and different sub-web page identifiers can be established to form the association relationship set.
  • a parent page identifier usually corresponds to Multiple associated subpage IDs.
  • the different webpage content may be webpage content accessed by an effective web access behavior of the user history.
  • the identifier judging module 706 is configured to determine, when the query result of the identifier query module is YES, whether the identifier of the parent webpage is different from the webpage identifier accessed by the user of the last network access behavior.
  • the second determining module 707 is configured to: when the result of the time judging module is yes, or the result of the record query module is no, or the result of the identifier query module is no, or the result of the identifier judging module is yes, determining The current network access behavior of the user is an effective network access behavior; when the identity determination module is negative, determining that the current network access behavior of the user is an invalid network access behavior.
  • the recording module 708 is configured to record the current network access information of the user after determining that the current network access behavior of the user is an effective network access behavior.
  • the identifier parsing module 709 is configured to: after determining that the current network access behavior of the user is an effective network access behavior, acquire webpage content corresponding to the webpage identifier currently accessed by the user, and parse the webpage content to obtain the current access
  • the web page identifies the associated sub-page identifier
  • the identifier saving module 710 is configured to record, in the webpage identifier association set, a correspondence between the webpage identifier currently accessed by the user and the associated sub-page identifier.
  • the network access information and the webpage identifier association relationship are all stored in the form of a data table, and when the record item of the data table exceeds the maximum capacity of the data table, the first record is stored according to the stored chronological order. Items are deleted to ensure the capacity of the data table.
  • the behavior information of the effective network access behavior may be processed by the user behavior analysis system, and the device may further include:
  • an information providing module configured to provide the recorded network access information record to the user analysis subsystem for processing.
  • the user behavior analysis system can perform statistics on the accessed webpage identifier and access time, and can also retrieve the corresponding webpage content according to the webpage identifier, perform statistics, analysis, and the like on the accessed webpage content, thereby analyzing the user behavior analyzed by the user behavior analysis system.
  • Information is the behavioral information of effective network access behavior information, thus improving the accuracy of user behavior analysis.
  • the record query module searches whether to save the network access information record of the user, and exists in the presence
  • the triggering time judging module compares the current network access time with the last network access time of the user
  • the identifier query module triggers the identifier judging module when the query query module obtains the parent webpage identifier associated with the current webpage identifier.
  • the parent webpage identifier is compared with the webpage identifier that the user accessed last time, so that the second determining module can determine whether the current network access behavior is an effective network access behavior, and the accuracy of the identification is improved by the embodiment, that is, Improve the accuracy of user behavior analysis.
  • FIG. 8 a schematic structural diagram of Embodiment 4 of a network access behavior identification apparatus according to the present invention is shown, and the apparatus may include:
  • the network information obtaining module 801 is configured to obtain network access information of a current network access behavior, where the network access information includes a user identifier and a current webpage identifier;
  • the query query module 802 is configured to query whether there is network access information including the user identifier.
  • the identifier query module 803 is configured to query a webpage identifier association set, and determine whether there is a parent webpage identifier associated with the current webpage identifier.
  • the identifier determining module 804 is configured to: when the record querying module 802 is YES, and when the identifier querying module 803 is YES, obtain the webpage identifier of the last time the user accesses the network access behavior from the network access information record of the user. n ,.
  • the identifier determining module 805 is configured to determine whether the parent webpage identifier is different from the webpage identifier accessed by the user for the last network access behavior.
  • the time determining module 806 is configured to: when the result of the identifier determining module 804 is negative, obtain a network access time of the last network access behavior of the user from the network access information record.
  • the time judging module 807 is configured to determine whether a time difference between the current network access time of the user and the network access time of the user's last network access behavior is greater than a predetermined value.
  • the second determining module 808 is configured to: when the result of the record querying module 802 is negative, or the identifier querying module 803 is negative, or the result of the identifier determining module 805 is YES, or the time determining module 807 is Yes, determining that the current network access behavior of the user is an effective network access behavior; when the time determination module is no, determining that the current network access behavior of the user is an invalid network access Behavior.
  • the recording module 809 is configured to: after the second determining module determines that the current network access behavior of the user is an effective network access behavior, record the current network access information of the user.
  • the identifier parsing module 810 is configured to: after the second determining module 808 determines that the current network access behavior of the user is an effective network access behavior, acquire the webpage content corresponding to the webpage identifier currently accessed by the user, and parse the The webpage content obtains the sub-page identifier associated with the currently accessed webpage identifier.
  • the identifier saving module 811 is configured to record, in the webpage identifier association set, a correspondence between the webpage identifier currently accessed by the user and the associated sub-page identifier.
  • the behavior information of the effective network behavior may be processed by the user behavior analysis system, where the behavior information includes a network access information record, so the network access information record may be provided to the user behavior.
  • the analysis system performs processing, for example, statistics on the accessed webpage identifier and access time, and can also retrieve the corresponding webpage content according to the webpage identifier, perform statistics, analysis, and the like on the accessed webpage content, so that the user analyzed by the user behavior analysis system Behavioral information is the behavioral information of effective network access behavior information, thus improving the accuracy of user behavior analysis.
  • the record query module queries whether the network access information record of the user, and after the network access information record of the user exists,
  • the identifier query module determines, according to the webpage identifier association set, the parent webpage identifier associated with the currently accessed webpage identifier, and triggers the identifier judging module to compare whether the parent webpage identifier is different from the last visited webpage identifier of the user, and if different trigger times are determined
  • the module determines whether the time difference between the current network access time and the user's jacket network access time is greater than a predetermined value, thereby determining whether the current network access behavior is an effective network access behavior.
  • This embodiment is not simply a network connection.
  • the network access behavior corresponding to an access request is regarded as an effective network access behavior, thereby improving the accuracy of the identification, making the user behavior analysis more accurate, and eliminating the need to analyze the behavior information of all network access behaviors, thereby reducing the analysis cost. Difficulty.
  • the device described in the embodiment of the present invention may be integrated into a client, a server, and used for relaying messages.
  • the gateway or user behavior analysis system when the device is integrated on a client, a server or a gateway, it may have a corresponding interface connected to the user behavior analysis system.
  • the device can also be connected as a separate entity to a client, server or gateway and to a user behavior analysis system.

Abstract

本发明提供了一种网络访问行为识别方法和装置,所述方法包括:获取用户当前的网络访问行为的网络访问信息,所述网络访问信息包括所述用户的用户标识和网络访问时间;根据所述用户的用户标识,查询所述用户的网络访问信息记录;如果查询到所述用户的网络访问信息记录,则根据所述用户的当前网络访问时间与所述网络访问信息记录中记录的所述用户上一次网络访问行为的网络访问时间,确定所述用户当前的网络访问行为的有效性,通过本发明实施例提高了网络访问行为识别的准确度,从而使得用户行为分析更准确。

Description

网络访问行为识别方法和装置
本申请要求于 2012年 06月 06 日 提交中 国专利局、 申请号为 201210189934.5、 发明名称为 "网络访问行为识别方法和装置"的中国专利申 请的优先权, 其全部内容通过引用结合在本申请中。
技术领域
本发明涉及网络技术领域,更具体的说是涉及一种网络访问行为识別方法 和装置。
背景技术
随着互联网技术的应用与发展,特别是移动互联网技术的发展, 互联网应 用用户越来越多, 为了增强用户体验, 实现精细化运营, 如何对用户的网络访 问行为进行分析已经成为运营商以及服务商的研究重点。
用户行为分析是指根据用户访问网络时的行为信息,例如可以包括网站网 址、 搜索关键词、 浏览的网页内容等信息, 进行统计、 分析, 从中得到用户访 问网络的规律, 并将这些规律与网络营销策略等相结合,从而发现目前网络营 销活动中可能存在的问题, 并为进一步修正或重新制定网络营销策略提供依 据。
由于现有的网站网页可能会关联其他的网页, 例如网页中自嵌的图片、广 告等子网页。 而用户在进行网络访问, 比如通过点击一条链接网址, 请求访问 该链接网址对应的网页时,客户端发送网络访问请求,在接收到服务端的请求 响应,与服务端建立网络连接后,客户端展现用户想要访问的网页内容的同时 , 还会解析出网页内容中包含的子网页地址。因此客户端会继续自动发送网络访 问请求,请求获取该子网页地址对应的子网页内容。 由于子网页不是用户主动 点击请求访问的,如杲对每一个客户端触发的网络访问行为都获取其行为信息 进行分析处理,就会影响用户行为分析的准确度。 因此对网絡访问行为进行识 别以确定出有效网络访问行为,也即由用户主动触发的网络访问行为,成为用 户行为分析准确度的关键。
现有的网络访问行为识別方法通常是将同一网络连接中,客户端触发的第 一个网络访问请求的网络访问行为识别为有效的网络访问行为,因此在进行用 户行为分析时,只对同一网络连接中的第一个网络访问行为的相关行为信息进 行分析处理。而发明人在实现本发明的过程中发现,由于用户请求访问的网页, 以及其关联的子网页可能并不是共享同一个网络连接,因此同一网络连接中的 第一个网络访问行为也可能是客户端自动触发的;且客户端针对同一用户访问 同一网站下的不同网页时, 不同网页也可能会共享同一网络连接, 因此同一网 络连接中的其他网络访问行为也可能是用户主动触发的, 因此,现有技术中网 络访问行为的识别并不准确, 从而就会影响用户行为分析的准确度。
发明内容
有鉴于此,本发明提供了一种网络访问行为识别方法和装置, 以更准确地 识别网络访问行为, 提高用户行为分析的准确度。
为实现上述目的, 本发明提供如下技术方案:
本发明的一方面, 提供了一种网絡访问行为识别方法, 包括:
获取用户当前的网络访问行为的网络访问信息,所述网络访问信息包括所 述用户的用户标识和网络访问时间;
才艮据所述用户的用户标识, 查询所述用户的网络访问信息记录; 如杲查询到所述用户的网络访问信息记录,则冲艮据所述用户当前的网络访 问行为的网络访问时间与所述网络访问信息记录中记录的所述用户上一次网 络访问行为的网络访问时间, 确定所述用户当前的网络访问行为的有效性。
本发明的另一方面, 提供了一种网络访问行为识别方法, 包括: 获取用户当前的网络访问行为的网络访问信息,所述网絡访问信息包括所 述用户的用户标识和访问的网页标识;
才艮据所述用户的用户标识, 查询所述用户的网络访问信息记录; 从预先存储的网页标识关联关系集合中,查询与所述用户当前网络访问行 为访问的网页标识关联的父网页标识; 如果查询到所述用户的网络访问信息记录和所述父网页标识,则根据所述 父网页标识与所述网络访问信息记录中记录的所述用户上一次网络访问行为 访问的网页标识, 确定所述用户当前的网络访问行为的有效性。
本发明的又一方面, 提供了一种网络访问行为识别装置, 包括: 网络信息获取模块, 用于获取用户当前的网络访问行为的网络访问信息, 所述网络访问信息包括所述用户的用户标识和网络访问时间;
记录查询模块, 用于从所述网络信息获取模块获得所述网络访问信息, 并 才艮据所述网络访问信息中包含的所述用户的用户标识,查询所述用户的网络访 问信息记录;
第一确定模块,用于当所述记录查询模块查询到所述用户的网络访问信息 息记录中记录的所述用户上一次网络访问行为的网络访问时间,确定所述用户 当前的网络访问行为的有效性。
本发明的又一方面, 提供了一种网络访问行为识别装置, 包括: 网络信息获取模块, 用于获取用户当前的网络访问行为的网络访问信息, 所述网络访问信息包括所述用户的用户标识和网络访问时间;
记录查询模块, 用于从所述网络信息获取模块获得所述网络访问信息, 并 艮据所述网络访问信息中包含的所述用户的用户标识,查询所述用户的网络访 问信息记录;
标识查询模块, 用于从网页标识关联关系集合中, 查询与所述用户当前网 络访问行为访问的网页标识关联的父网页标识;
第二确定模块,用于当所述记录查询模块查询到所述用户的网络访问信息 记录且所述标识查询模块查询到所述父网页标识时, 4艮据所述父网页标识与所 述网络访问信息记录中记录的所述用户上一次网络访问行为访问的网页标识, 确定所述用户当前的网络访问行为的有效性。
经由上述的技术方案可知, 与现有技术相比,本发明提供了一种网络访问 行为识别方法和装置, 通过获取用户当前网絡访问行为的网络访问信息, 并查 找该用户的网络访问信息记录,并根据当前网络访问时间与网络访问信息记录 中的用户上一次网络访问时间,确定出当前网络访问行为是否为有效网絡访问 行为,本发明实施例对每一网络访问行为均进行了有效判断,通过用户历史网 络访问行为对当前网络访问行为进行判断, 提高了网络访问行为识别的准确 度, 从而使得用户行为分析结果更准确。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施 例或现有技术描述中所需要使用的附图作筒单的介绍,显而易见地, 下面描述 中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付 出创造性劳动性的前提下, 还可以根据这些附图获得其他的附图。
图 1为本发明一种网絡访问行为识别方法实施例 1的流程图;
图 2为本发明一种网终访问行为识别方法实施例 2的流程图;
图 3为本发明一种网络访问行为识別方法实施例 3的流程图;
图 4为本发明一种网络访问行为识別方法实施例 4的流程图;
图 5为本发明一种网络访问行为识别装置实施例 1的结构示意图; 图 6为本发明一种网络访问行为识别装置实施例 2的结构示意图; 图 7为本发明一种网络访问行为识别装置实施例 3的结构示意图; 图 8为本发明一种网络访问行为识別装置实施例 4的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清 楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而不是 全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造 性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。
本发明实施例公开了一种网络访问行为识別方法和装置,通过获取用户当 前网络访问行为的网络访问信息, 并查找该用户的网络访问信息记录, 当查询 到该用户的网络访问信息记录后,根据当前网络访问时间与网络访问信息记录 中的用户上一次网络访问时间,确定出当前网絡访问行为是否为有效网絡访问 行为,本发明实施例对每一网络访问行为均进行了有效判断,提高了网络访问 行为识别的准确度, 从而使得用户行为分析结果更准确。 图 1为本发明一种网络访问行为识别方法实施例 1的流程图 ,所述方法可以 包括:
步骤 101: 获取用户当前的网络访问行为的网络访问信息, 所述网络访问 信息包括所述用户的用户标识和网络访问时间。
用户通过客户端发送网给访问请求,例如 Http( hypertext transport protocol, 超文本传送协议 ) 请求, 客户端与服务端即建立网络连接, 例如 TCP ( Transmission Control Protocol ,传输控制协议)连接后,用户即可访问网络。
所述网络访问信息可以是指用户网络访问行为的基本上网信息,其可以是 在发送网络访问请求时,从所述网络访问请求中携带的信息中获取, 则所述网 络访问信息可以包括用户标识、 访问的网页标识、 网络访问时间等。 所述网页 标识是指当前网络访问行为所访问的网页的标识, 可以为网页地址 URL ( Uniform I Universal Resource Locator, 统一资源定位符)。 当然所述网络访问 信息也可以是在客户端与服务端建立网络连接后, 从请求响应中获取的信息, 则所述网络访问信息还可以包括请求响应时间, 响应内容等信息。 只要用户请 求访问网络时, 即可获得用户的网络访问信息。
所述用户标识是指可以识别不同访问用户的唯一标识,例如在移动通信系 统中, 其可以是 IMEI(Intemational Mobile Equipment Identity, 国际移动设备身 份码), 也可以是移动设备的通信号码, 如手机号码; 在固定电话网络系统中 可以的是 ADSL ( Asymmetric Digital Subscriber Line , 非对称数字用户环路) 账号等。
所述访问时间可以是从网络连接请求携带的时间戳信息中确定的。
步骤 102: 根据所述用户的用户标识, 查询所述用户的网络访问信息记录。 所述网络访问信息记录可以是指保存的不同用户的历史的网络访问信息, 具体的可以是历史的有效网络访问信息。 根据网络访问信息中的用户标识, 可 以查询是否存在该用户对应的历史网络访问信息。
步骤 103: 如果查询到所述用户的网络访问信息记录, 则根据所述用户的 当前网络访问时间与所述网络访问信息记录中记录的所述用户上一次网络访 问行为的网络访问时间, 确定所述用户当前的网络访问行为的有效性。 若保存有该用户的网络访问信息记录,由于网络访问信息包括网络访问时 间。则可以从所述网络访问信息记录中确定出用户上一次网络访问行为的上一 次网络访问时间,根据当前网络访问行为的当前网络访问时间与所述上一次网 络访问时间, 即可确定出当前网络访问行为的有效性。
所述用户上一次网络访问时间即是指网络访问信息记录中记录的距离该 用户当前网络访问行为的最近一次网络访问行为的网络访问时间。
具体是:判断所述用户的当前网络访问时间与所述用户的上一次网络访问 行为的网络访问时间的时间差是否大于预定值; 如果是, 则可确定所述用户当 前的网络访问行为为有效网络访问行为; 如果否, 则确定所述用户当前的网 络访问行为为无效网终访问行为。
所述预定值依据不同客户端性能、 用户习惯、 网络延时等条件统计得出, 其是按照用户当前请求访问网络与其上一次请求访问网络的时间间隔而统计 得出的数值, 例如, 用户在浏览器上点击一个网页链接后, 再次点击下一个网 页链接,浏览器发起的由于用户点击触发的两个访问请求之间的时间差通常在 2s以上。 而若用户点击的链接关联有子链接, 则浏览器发起第一个网络访问请 求,与其主动发起的下一个关联的子链接的对应的网络访问请求的时间差通常 为 2s或者在 2s内, 因此该预定值可以设定为 2s。
若所述用户当前网络时间与记录的其上一次网络访问时间的时间差大于 预定值,则表明用户的当前网络访问行为是用户的主动请求进行的网络访问行 为, 可确定当前的网络访问行为有效。 若所述用户当前网络时间与记录的其上 一次网络访问时间的时间差小于或等于该预定值,则表明当前网络访问行为并 不是用户主动请求进行的网络访问行为, 其可能是客户端的主动行为, 因此可 确定当前的网终行为无效。
其中, 步 «102中, 若未查询到网络访问信息记录, 则可以获知该用户之 前未进行过网络访问行为,因此可以确定该用户当前的网络访问行为为有效网 絡访问行为。
本实施例中, 在确定出用户当前的网络访问行为为有效网络访问行为后, 可以记录用户当前的网络访问信息。若确定出用户当前的网络访问行为为无效 网络访问行为, 则可以不记录用户当前的网络访问信息。 因此所查询的网络访 问信息记录具体是为有效网络访问行为记录的网络访问信息。
其中, 网络访问信息记录可以以数据表的形式保存, 每一网络访问信息唯 一标识用户的一次的网络访问行为。 为了避免表项溢出, 而使得查询不准确。 所述网络访问信息记录需要进行老化, 在数据表容量达到设定阈值时, 删除最 先记录项。 即按照设定容量, 将保存时间长的一个或多个网络访问信息记录删 除, 以保证记录的设定容量。
确定出网络访问行为为有效网络访问行为后,即可对有效网络访问行为进 行分析处理,可以将记录的有效网络访问行为的网络访问信息提供给用户行为 分析系统, 用户行为分析系统即可进行处理, 例如对访问的网址、 访问时间进 行的统计,还可依据该访问网址抓取网页内容,对访问的网页内容进行统计等。
本发明实施例所述的网络访问行为识别可以是用户行为分析系统进行的 识別, 也可以是与该用户行为分析系统连接的客户端、服务端或者转发请求的 网关进行的识别, 之后可以将网络访问信息再提供给用户行为分析系统。
在本实施例中, 在获取到用户当前的网络访问行为产生的网络访问信息 后, 通过查找是否保存该用户的网络访问信息记录, 以及在存在该用户的网络 访问信息记录后,根据该用户的当前网络访问时间与该用户的网络访问信息记 录中记录的上一次网络访问行为的网络访问时间,确定该用户的当前网络访问 行为是否有效。根据访问时间进行判断,提高了有效网络访问行为识别的准确 度, 从而可以提高用户行为分析的准确度。 图 2为本发明一种网络访问行为识別方法实施例 2的流程图,所述方法可以 包括:
步骤 201: 获取用户当前的网络访问行为的网络访问信息, 所述网络访问 信息包括所述用户的用户标识和访问的网页标识。
其中, 所述网络访问信息还可以包括网络访问时间。 因此当前的网络访问 行为的网络访问信息即可以包括该用户的当前访问的网页标识和当前网络访 问时间。 步骤 202: 根据所述用户的用户标识, 查询所述用户的网絡访问信息记录。 步骤 203: 从预先存储的网页标识关联关系集合中, 查询与所述用户当前 网络访问行为访问的网页标识关联的父网页标识。
所述网页标识关联关系集合包括不同父网页标识和子网页标识之间的对 应关系。
所述网页标识关联关系集合可以是系统预先存储的,其可以预先解析出不 同网页内容中嵌入的子链接地址,该网页标识即为父网页标识, 所嵌入的子链 接地址即为父网页标识关联的子网页标识,从而可建立不同父网页标识和不同 子网页标识的对应关系,形成所述关联关系集合。一个父网页标识通常对应有 多个相关联的子网页标识。所述不同网页内容可以是该用户历史的有效网络访 问行为所访问的网页内容.
因此本步骤中,根据所述网页标识关联关系集合,将当前网页标识作为子 网页标识, 查询是否存在与其关联的父网页标识。
其中, 所述网页标识以网页地址 URL表示时,可以利用哈希检索算法进行 查询是否存在当前网页标识关联的父网页标识。也即网页标识关联关系集合中 的网页标识均对应有哈希编码, 或者均是以哈希编码的形式保存。子网页标识 可以表示为 hash code , 其关联的父网页标识可以表示为 Patent Hash code。 因此 在进行查询时, 首先将当前网页标识也表示为哈希编码, 然后通过检索算法, 检索是否存在该哈希编码对应的 Patent Hash code即可。
其中由于不同网页标识对应的哈希编码可以相同,因此可以将每一网页标 识用至少两个哈希编码表示, 如子网页标识对应的哈希编码为 hash codel和 hash code2, 对应的父网页标识的哈希编码为 Patent Hash codel和 Patent Hash code2, 通过至少两个哈希编码则可以唯一硝定出一个网页标识。
其中, 步骤 202和步驟 203的操作并不限定与本实施例所述的顺序,可以同 时进行, 或者先进行标识查询, 再进行网络访问信息的查询。
步骤 204: 如果查询到所述用户的网絡访问信息记录和所述父网页标识, 则才 据所述父网页标识与所述网络访问信息记录中记录的所述用户上一次网 络访问行为访问的网页标识, 确定所述用户当前的网络访问行为的有效性。 当查询出存在该用户的网络访问信息记录以及所述父网页标识时。即可从 该用户的网络访问信息记录中获取该用户上一次网络访问行为访问的网页标 识。根据上一次访问的网页标识和所述父网页标识, 即可确定出当前网络访问 行为是否有效。
具体的是:比较所述父网页标识与所述用户上一次网络访问行为访问的网 页标识是否不同, 当存在当前网页标识对应的父网页标识, 则表明该当前网页 标识对应的网页为某一父网页所关联的子网页,但是所述子网页也可能是用户 主动请求访问的。 因此, 若网页标识关联关系集合中存在当前网页标识关联的 父网页标识,还需要进一步判断所述父网页标识与所述该用户上一次网络访问 行为访问的网页标识是否不同,以确定当前访问的网页标识是否为该用户上一 次网络访问行为访问的网页标识关联的子网页标识, 若是, 则表明当前网页标 识对应的网页并不是用户主动请求访问的, 而是客户端主动触发。
如果所述父网页标识与所述上一次访问的网页标识相同,则可以确定当前 网终访问行为访问的网页标 i只为上一次访问的网页标 ϊ只的子网页标识,可以认 为当前的网络访问行为为客户端的主动行为,表明用户当前的网络访问行为为 无效网络访问行为。
网页标识均表示成哈希编码形式时,判断所述父网页标识与所述最近一次 网给访问信息记录包含的网页标识是否不同,即是比较他们对应的哈希编码的 值是否不一致。
另外, 当没有查询到该用户的网络访问信息记录时, 则也可以确定该用户 当前的网络访问行为为无效网络访问行为。
当没有查询到当前访问的网页标识对应的父网页标识时,表明当前访问的 网页标识并不是子网页标识,或者可以表明当前网络访问行为为用户首次主动 触发的,在根据解析历史网络访问行为访问的网页得到的网页标识关联关系集 合中不存在当前访问的网页标识或者其关联的父网页标识,则也可以确定该用 户当前的网络访问行为为无效网络访问行为。
因此, 在确定出该用户的当前的网络访问行为为有效网终访问行为后, 还 可以记录所述用户当前的网络访问信息。同时还可以根据用户当前网络访问行 为访问的网页标识获取该当前访问的网页标识对应的网页内容,并解析所述网 页内容获得所述当前访问的网页标识关联的子网页标识,并将该当前访问的网 页标识及其关联的子网页标识保存到所述网页标识关联关系集合中。
在本实施例中, 在获取到用户当前的网络访问行为的网络访问信息后, 查 询该用户的网络访问信息记录,以及从预先存储的网页标识关联关系集合查询 当前访问的网页标识关联的父网页标识。根据该用户的网络访问信息记录所记 录的该用户上一次网络访问行为访问的网页标识以及所述父网页标识,确定出 用户当前网络访问行为的有效性。根据用户历史的网络访问行为来确定当前网 络访问行为的有效性, 提高了判断的准确度, 从而使得用户行为分析更准确。 图 3为本发明一种网络访问行为识別方法实施例 3的流程图,所述方法可以 包括:
步骤 301: 获取用户当前的网络访问行为的网络访问信息 .
所述网络访问信息可以包括所述用户的用户标识、网络访问时间以及访问 的网页标识。
步骤 302: 根据所述用户的用户标识, 查询是否保存有所述用户的网络访 问信息记录, 如果是, 执行步骤 303, 如果否, 执行步骤 307。
步骤 303: 从所述用户的网络访问信息记录中获得所述用户上一次网络访 问行为的网络访问时间。
步骤 304: 判断所述用户的当前网络访问时间与所述用户的上一次网络访 问行为的网络访问时间的时间差是否大于预定值, 如果是, 执行步驟 307 , 如 杲否, 执行步骤 305。
步骤 305: 从预先存储的网页标识关联关系集合中, 查询是否存在与所述 用户当前网络访问行为访问的网页标识关联的父网页标识, 如杲是, 执行步驟 306 , 如果否, 执行步骤 307。
所述网页标识关联关系集合包括不同父网页标识和子网页标识之间的对 应关系。
所述网页标识关联关系集合可以是系统预先存储的,其可以预先解析出不 同网页内容中嵌入的子链接地址,该网页标识即为父网页标识, 所嵌入的子链 接地址即为父网页标识关联的子网页标识,从而可建立不同父网页标识和不同 子网页标识的对应关系,形成所述关联关系集合。一个父网页标识通常对应有 多个相关联的子网页标识。本实施例中,所述不同网页内容可以是该用户历史 的有效网络访问行为所访问的网页内容。
其中, 所述网页标识以网页地址 URL表示时,可以利用哈希检索算法进行 查询是否存在当前网页标识关联的父网页标识。具体检索过程可以参加上述实 施例 2中所述。
步骤 306: 判断所述父网页标识与所述用户上一次网络访问行为访问的网 页标识是否不同, 如果是, 进入步骤 307, 如果否, 进入步骤 309。
所述用户上一次网络访问行为访问的网页标识是从所述用户的网络访问 信息记录中获得的。 由于用户在进行网络访问时,如果其快速选择不同的网页 标识,也可能导致当前网络访问时间与该用户的上一次网络访问行为的网络访 问时间的时间差小于或等于预定值时, 因此, 当步骤 304的判断结果为否时, 需要继续进行网络访问行为的识別。
步骤 307: 确定用户当前的网络访问行为为有效网络访问行为, 并记录当 前网络访问行为的网络访问行为对应的网络访问信息。
步骤 308: 获取所述用户当前访问的网页标识对应的网页内容, 解析所述 网页内容获得所述当前访问的网页标识关联的子网页标识,并在所述网页标识 关联关系集合中记录所述用户当前访问的网页标识及其关联的子网页标识的 对应关系。
当没有查询到该用户的网络访问信息记录,或者没有查询到当前访问的网 页标识关联的父网页标识,或者当前网络访问时间与该用户的上一次网络访问 行为的网络访问时间大于预定值,或者当当前网络访问时间与该用户的上一次 网络访问行为的网络访问时间小于或者等于预定值,但是所述父网页标识与上 一次网络访问行为访问的网页标识不同,则可以确定用户当前的网络访问行为 为有效网络访问行为, 并记录用户当前网络访问行为的网络访问信息记录。 同 时对访问的网页内容进行解析, 以获得当前访问的网页关联的子网页, 并在网 页标识关联关系集合中记录当前访问的网页标识与解析出的子网页标识的对 应关系, 以用于对该用户下一次网络访问行为的判断。
所述网络访问信息记录可以以数据表形式进行保存。
所述网页标识关联关系集合也可以以数据表的形式保存,同时为了避免表 项溢出, 而使得查询不准确。 所述网络关联关系表需要进行老化, 在数据表容 量达到设定容量时,按照网页标识关联关系在时间上存储的先后顺序删除最先 存储的保存项, 以保证记录的容量。
步骤 309: 确定所述用户当前的网络访问行为为无效网络访问行为。 当确定网络访问行为为无效网络访问行为时,可以不记录无效网络访问行 为的网络访问信息, 并对无效网络访问行为访问的网页内容进行解析, 以得出 子网页标识。
如果所述父网页标识与所述上一次访问的网页标识相同,则可以确定当前 网络访问行为访问的网页标 i只为上一次访问的网页标 ϊ只的子网页标识,可以认 为当前的网络访问行为为客户端的主动行为,表明用户当前的网络访问行为为 无效网洛访问行为。
另外,在确定当前网络访问行为为有效网络行为后, 即可由用户行为分析 系统对有效网络访问行为的行为信息进行处理,所述行为信息包括网络访问信 息记录。 由于本实施例中, 所述网络访问信息记录保存的即为用户历史有效网 络访问行为的网络访问信息,因此可以将所述网络访问信息记录提供给用户行 为分析系统进行处理, 例如对访问的网页标识、 访问时间进行统计, 还可依据 该网页标识抓取对应的网页内容, 对访问的网页内容进行统计、 分析等, 从而 用户行为分析系统所分析的用户行为信息均是有效的网络访问行为信息的行 为信息, 因此提高了用户行为分析的准确度。
在本实施中,在获取到用网络访问信息后,通过该用户的网络访问信息记 录, 以及查询到该用户的网絡访问信息记录后, 对当前的网络访问时间以及该 用户的上一次访问时间进行比较,若大于预定值则确定用户的当前访问行为为 有效网络访问信息, 若小于或等于预定值时, 继续查询预先存储的网页标识关 联关系集合得出当前访问的网页标识关联的父网页标识,并判断所述父网页标 识与该用户上一次网络访问行为访问的网页标识是否不同,从而确定出当前网 络访问行为是否为有效的网络访问行为 ,通过本实施例进一步提高了识别的准 确度, 从而即可提高用户行为分析的准确度。
需要说明的是,本发明实施例在实际应用中不限定于本实施例所述的执行 步骤, 也可以先对对网页标识进行判断, 在对当前的网络访问时间进行判断, 或者同时进行的判断。 图 4为本发明一种网絡行为识別方法实施例 4的流程图, 所述方法可以包 括:
步骤 401: 获取用户当前的网络访问行为的网络访问信息。
所述网络访问信息可以包括所述用户的用户标识、访问的网页标识和网络 访问时间。
步骤 402: 根据所述用户的用户标识 , 查询是否存在所述用户的网络访问 信息记录, 如果是, 执行步骤 403, 如果否, 执行步骤 407。
步骤 403: 从预先存储的网页标识关联关系集合中, 查询是否存在与所述 用户当前网络访问行为访问的网页标识关联的父网页标识,如果是,执行步驟 404, 如果否, 执行步骤 407。
步骤 404: 从所述用户的网络访问信息记录中获得所述用户上一次网络访 问行为访问的网页标识。
步骤 405: 比较所述父网页标识与所述用户上一次网络访问行为访问的网 页标识是否不同, 如果是, 执行步骤 407, 如果否, 执行步骤 406。
步骤 406: 判断所述用户的当前网络访问时间与所述用户的上一次网络访 问行为的网络访问时间的时间差是否大于预定值, 如果是, 执行步骤 407, 如 杲否, 执行步骤 409。
所述用户上一次网络访问行为的网络访问时间是从所述网络访问信息记 录中获取的。
步骤 407: 确定用户当前的网络访问行为为有效网络访问行为, 并记录当 前网络访问行为的网络访问行为对应的网络访问信息。 当没有查询到该用户的网络访问信息记录,或者没有查询到当前访问的网 页标识关联的父网页标识,或者所述父网页标识与该用户的上一次网络访问行 为访问的网页标识不同(即当前访问的网页标识并不是用户上一次网络访问行 为访问的网页标识的子网页标识,可以确定当前的网络访问行为为用户主动触 发的), 或者所述父网页标识与该用户的上一次网络访问行为访问的网页标识 相同,但是当前网络访问时间与该用户的上一次网络访问行为的网络访问时间 大于预定值(即当前网络访问行为访问的上一次网络访问行为访问的网页标识 的子网页标识是用户主动触发的), 则可以确定用户当前的网络访问行为为有 效网络访问行为, 并记录用户当前网络访问行为的网络访问信息记录。 同时对 访问的网页内容进行解析, 以获得当前访问的网页关联的子网页, 并在网页标 识关联关系集合中记录当前访问的网页标识与解析出的子网页标识的对应关 系, 以用于对该用户下一次网络访问行为的判断。
步骤 408: 获取所述用户当前访问的网页标识对应的网页内容, 解析所述 网页内容获得所述当前访问的网页标识关联的子网页标识,并在所述网页标识 关联关系集合中记录所述用户当前访问的网页标识及其关联的子网页标识的 对应关系。
步骤 409: 确定所述用户当前的网络访问行为为无效网络访问行为。 当确定网络访问行为为无效网络访问行为时,可以不记录无效网络访问行 为的网络访问信息, 并对无效网络访问行为访问的网页内容进行解析, 以得出 子网页标识。
另外,在确定当前网络访问行为为有效网络行为后, 即可由用户行为分析 系统对有效网络访问行为的行为信息进行处理,所述行为信息包括网络访问信 息记录。 由于本实施例中, 所述网络访问信息记录保存的即为用户历史有效网 络访问行为的网络访问信息,因此可以将所述网络访问信息记录提供给用户行 为分析系统进行处理, 例如对访问的网页标识、 访问时间进行统计, 还可依据 该网页标识抓取对应的网页内容, 对访问的网页内容进行统计、 分析等, 从而 用户行为分析系统所分析的用户行为信息均是有效的网络访问行为信息的行 为信息, 因此提高了用户行为分析的准确度。 在本实施例中, 获取到当前网络访问行为的网络访问信息后,通过查询该 用户的网络访问信息记录, 以及在查询到该用户的网络访问信息记录后,根据 预先存储的网页标识关联关系集合,确定当前访问的网页标识关联的父网页标 识,并比较所述父网页标识与用户上一次网络访问行为访问的网页标识是是否 不同, 若不同, 则可确定用户当前的网絡访问行为为有效网络访问行为, 若不 同,则当确定出当前网络访问时间与用户上一次网络访问时间的时间差大于预 定值后, 则也可以确定当前网络访问行为为有效网络访问行为, 并不是单纯的 将一个网络连接中的第一访问请求对应的网络访问行为识别为有效的网络访 问行为, 从而提高了识别的准确度, 使得用户行为分析更加准确, 且无需对所 有的网络访问行为的行为信息进行分析, 减少了分析成本和难度。
需要说明的是,本发明实施例在实际应用中不限定于本实施例所述的执行 步骤, 也可以先对当前的网络访问时间进行判断, 在对网页标识进行判断, 或 者同时进行的判断。 图 5为本发明一种网络访问行为识别装置实施例 1的结构示意图,所述装置 可以包括:
网络信息获取模块 501 , 用于获取用户当前的网络访问行为的网络访问信 息, 所述网络访问信息包括所述用户的用户标识和网络访问时间;
记录查询模块 502,用于从所述网络信息获取模块获得所述网络访问信息, 并根据所述网络访问信息中包含的所述用户的用户标识,查询所述用户的网络 访问信息记录。
所述网络访问信息记录即是指记录的历史的网络访问信息。
第一确定模块 503,用于当所述记录查询模块 502查询到所述用户的网络访 问信息记录后,根据所述用户当前的网络访问行为的网络访问时间与所述网絡 访问信息记录中记录的所述用户上一次网络访问行为的网络访问时间,确定所 述用户当前的网络访问行为的有效性。
其中, 所述第一确定模块 503可以具体包括:
时间确定模块 5031 ,用于从所述用户的网络访问信息记录中获得所述用户 上一次网络访问行为的网给访问时间;
时间判断模块 5032,用于判断所述用户当前的网访问行为的网络络访问时 间与所述用户的上一次网络访问行为的网络访问时间的时间差是否大于预定 值;
第一确定子模块 5033,用于当所述时间判断模块 5032结果为是时,确定所 述用户当前的网络访问行为为有效网络访问行为;当所述时间判断模块结果为 否时, 确定所述用户当前的网络访问行为为无效网络访问行为。
所述预定值依据不同客户端性能、 用户习惯、 网络延时等条件统计得出, 其是按照用户当前请求访问网络与其最近一次请求访问网络的时间间隔而统 计得出的数值, 例如, 用户在浏览器上点击一个网页链接后, 再次点击下一个 网页链接, 浏览器发起的两个访问请求之间的时间差通常在 2s以上。 而若用户 点击的链接关联有子链接, 则浏览器发起第一个网络访问请求, 与其主动发起 的下一个关联的子链接的对应的网络访问请求的时间差通常为 2s或者在 2s内, 因此该预定值可以设定为 2s。
此外,所述第一确定模块还用于当所述记录查询模块没有查询出所述用户 的网络访问信息记录时,确定所述用户当前的网络访问行为为有效网络访问行 为。
另外, 所述装置还可以包括记录模块, 用于当确定出所述用户当前的网络 访问行为为有效网络访问行为后, 记录所述用户当前的网络访问信息。
网络访问信息记录可以以数据表的形式保存,每一网络访问信息唯一标识 用户的一次的网络访问行为。 为了避免表项溢出, 而使得查询不准确。 所述网 络访问信息记录需要进行老化,在数据表容量达到设定阈值时,删除最先记录 项。 即按照设定容量, 将保存时间长的一个或多个网络访问信息记录删除, 以 保证记录的设定容量。
确定出网络访问行为为有效网络访问行为后,即可对有效网络访问行为进 行分析处理,可以将记录的有效网络访问行为的网络访问信息记录提供给用户 行为分析系统, 用户行为分析系统即可进行处理, 例如对访问的网址、访问时 间进行的统计,还可依据该访问网址抓取网页内容,对访问的网页内容进行统 计等。
在本实施例中,网络信息获取模块获取到用户当前网络访问行为的网络访 问信息后, 查询模块即可查找是否该用户的网络访问信息记录, 以及当存在该 用户的网络访问信息记录后,触发第一确定模块对当前网络访问行为进行判断 确认。从而可以确定出当前网络访问行为是否为有效的网络访问行为,提高了 识别的准确度, 也提高了用户行为分析的准确度。
本发明实施例所述的装置可以集成到客户端、服务端、 用于中转消息的网 关或者用户行为分析系统中,当所述装置集成到客户端、服务端或者网关上时, 其可以具有相应的接口与用户行为分析系统相连。当然所述装置也可以作为单 独的一个实体与客户端、 服务端或者网关相连, 并与用户行为分析系统相连。 图 6为本发明一种网络访问行为识別装置实施例 2的结构示意图,所述装置 可以包括:
网络信息获取模块 601 , 用于获取用户当前的网络访问行为的网络访问信 息, 所述网络访问信息包括所述用户的用户标识和网络访问时间。
所述网络访问信息还可以包括网络访问时间。因此当前的网络访问行为的 网络访问信息即可以包括该用户的当前访问的网页标识和当前网络访问时间。
记录查询模块 602,用于从所述网络信息获取模块获得所述网络访问信息, 并根据所述网络访问信息中包含的所述用户的用户标识,查询所述用户的网络 访问信息记录。
标识查询模块 603, 用于从网页标识关联关系集合中, 查询与所述用户当 前网络访问行为访问的网页标识关联的父网页标识。
所述网页标识关联关系集合包括不同父网页标识和子网页标识之间的对 应关系。
所述网页标识关联关系集合可以是系统预先存储的,其可以预先解析出不 同网页内容中嵌入的子链接地址, 该网页标识即为父网页标识, 所嵌入的子链 接地址即为父网页标识关联的子网页标识,从而可建立不同父网页标识和不同 子网页标识的对应关系, 形成所述关联关系集合。 一个父网页标识通常对应有 多个相关联的子网页标识。所述不同网页内容可以是该用户历史的有效网络访 问行为所访问的网页内容.
所述网页标识以网页地址 URL表示时,可以利用哈希检索算法进行查询是 否存在当前网页标识关联的父网页标识。也即标识查询模块可以具体是用于通 过哈希检索算查询是否存在所述用户当前访问的网页标识对应的父网页标识。
第二确定模块 604,用于当所述记录查询模块 602查询到所述用户的网络访 问信息记录且所述标识查询模块 603查询到所述父网页标识时, 才艮据所述父网 页标识与所述网络访问信息记录中记录的所述用户上一次网络访问行为访问 的网页标识, 确定所述用户当前的网络访问行为的有效性。
其中, 所述第二确定模块 604可以具体包括:
标识确定模块 6041 ,用于从所述用户的网络访问信息记录中获得所述用户 上一次网络访问行为访问的网页标识。
标识判断模块 6042,用于判断所述父网页标识与所述用户上一次网络访问 行为访问的网页标识是否不同。
第二确定子模块 6043 ,用于当所述标识判断模块 6042结果为是时,确定所 述用户当前的网络访问行为为有效访问行为。
另外, 所述第二确定模块,还用于当所述记录查询模块没有查询到所述用 户的网珞访问信息记录时,或者所述标识查询模块没有查询到与所述用户当前 网络访问行为访问的网页标识关联的父网页标识,确定所述用户当前的网络访 问行为为有效网络访问行为。
因此所述装置还可以包括记录模块:用于当确定出所述用户当前的网络访 问行为为有效网络访问行为后, 记录所述用户当前的网络访问信息
标识解析模块,用于当确定出所述用户当前的网络访问行为为有效网络访 问行为后, 获取所述用户当前访问的网页标识对应的网页内容, 并解析所述网 页内容获得所述当前访问的网页标识关联的子网页标识;
标识保存模块,用于在所述网页标识关联关系集合中记录所述用户当前访 问的网页标识及其关联的子网页标识的对应关系。
在本实施例中,在网络访问信息获取模块获取到用户当前的网络访问行为 的网络访问信息后, 由记录查询模块查询该用户的网络访问信息记录, 以及标 识查询模块从预先存储的网页标识关联关系集合查询当前访问的网页标识关 联的父网页标识。第二确定模块即可根据该用户的网络访问信息记录所记录的 该用户上一次网络访问行为访问的网页标识以及所述父网页标识,确定出用户 当前网络访问行为的有效性。根据用户历史的网络访问行为来确定当前网络访 问行为的有效性, 提高了判断的准确度, 从而使得用户行为分析更准确。 图 7为本发明一种网络访问行为识別装置实施例 3的结构示意图,所述装置 可以包括:
网络信息获取模块 701 , 用于获取用户当前的网络访问行为的网络访问信 息。
所述网络访问信息可以包括用户标识、当前的网络访问时间和当前网页标 识。
记录查询模块 702, 用于查询是否存在包含所述用户标识的网络访问信息 记录。
时间确定模块 703 , 用于从所述用户的网络访问信息记录中获得所述用户 上一次网络访问行为的网络访问时间;
时间判断模块 704, 用于判断所述用户的当前网络访问时间与所述用户的 上一次网络访问行为的网络访问时间的时间差是否大于预定值。
标识查询模块 705, 用于当所述时间判断模块结果为是, 从预先存储的网 页标识关联关系集合中,查询是否存在与所述用户当前网络访问行为访问的网 页标识关联的父网页标识。
所述网页标识关联关系集合包括不同父网页标识和子网页标识之间的对 应关系。
所述网页标识关联关系集合可以是系统预先存储的,其可以预先解析出不 同网页内容中嵌入的子链接地址, 该网页标识即为父网页标识, 所嵌入的子链 接地址即为父网页标识关联的子网页标识,从而可建立不同父网页标识和不同 子网页标识的对应关系, 形成所述关联关系集合。 一个父网页标识通常对应有 多个相关联的子网页标识。本实施例中,所述不同网页内容可以是该用户历史 的有效网络访问行为所访问的网页内容。
标识判断模块 706, 用于当所述标识查询模块查询结果为是时, 判断所述 父网页标识与所述用户上一次网络访问行为访问的网页标识是否不同。
第二确定模块 707: 用于当所述时间判断模块结果为是, 或者所述记录查 询模块结果为否, 或者所述标识查询模块结果为否,或者所述标识判断模块结 果为是时,确定所述用户当前的网络访问行为为有效网络访问行为; 当所述标 识判断模块为否时, 确定所述用户当前的网络访问行为为无效网络访问行为。
记录模块 708: 用于当确定出所述用户当前的网络访问行为为有效网络访 问行为后, 记录所述用户当前的网络访问信息
标识解析模块 709 , 用于当确定出所述用户当前的网络访问行为为有效网 络访问行为后,获取所述用户当前访问的网页标识对应的网页内容, 并解析所 述网页内容获得所述当前访问的网页标识关联的子网页标识;
标识保存模块 710, 用于在所述网页标识关联关系集合中记录所述用户当 前访问的网页标识及其关联的子网页标识的对应关系。
所述网络访问信息以及所述网页标识关联关系集合都以数据表的形式进 行存储, 并当数据表的记录项超出数据表的最大容量时,按照存储的时间先后 顺序, 将最先存储的记录项删除, 以保证数据表的容量。
在确定当前网络访问行为为有效网络行为后,即可由用户行为分析系统对 有效网络访问行为的行为信息进行处理, 因此所述装置还可以包括:
信息提供模块,用于将记录的所述网络访问信息记录提供给用户分析子系 统进行处理。
用户行为分析系统可以对访问的网页标识、访问时间进行统计,还可依据 该网页标识抓取对应的网页内容, 对访问的网页内容进行统计、 分析等, 从而 用户行为分析系统所分析的用户行为信息均是有效的网络访问行为信息的行 为信息, 因此提高了用户行为分析的准确度。
在本实施中,网络信息获取模块获取到用户当前网络访问行为的网络访问 信息后,记录查询模块查找是否保存该用户的网络访问信息记录, 以及在存在 该用户的网络访问信息记录后,触发时间判断模块对当前网络访问时间以及该 用户上一次网络访问时间进行比较,标识查询模块查询得出当前网页标识关联 的父网页标识时,触发标识判断模块对所述父网页标识与用户上一次访问的网 页标识进行比较,从而第二确定模块即可确定出当前网络访问行为是否为有效 的网络访问行为,通过本实施例提高了识别的准确度,从而即可提高用户行为 分析的准确度。 参见图 8 , 示出了本发明一种网络访问行为识别装置实施例 4的结构示意 图, 所述装置可以包括:
网络信息获取模块 801 , 用于获取当前网络访问行为的网络访问信息, 所 述网络访问信息包括用户标识和当前网页标识;
记录查询模块 802, 用于查询是否存在包含所述用户标识的网络访问信息
"i己 ^。
标识查询模块 803, 用于查询网页标识关联关系集合, 确定是否存在所述 当前网页标识关联的父网页标识。
标识确定模块 804,用于所述记录查询模块 802为是, 且所述标识查询模块 803为是时, 从所述用户的网络访问信息记录中获得所述用户上一次网络访问 行为访问的网页标 n、。
标识判断模块 805 , 用于判断所述父网页标识与所述用户上一次网络访问 行为访问的网页标识是否不同。
时间确定模块 806, 用于当所述标识判断模块 804结果为否时,, 从所述网 络访问信息记录中获取所述用户上一次网络访问行为的网络访问时间。
时间判断模块 807 , 用于判断所述用户的当前网络访问时间与所述用户的 上一次网絡访问行为的网络访问时间的时间差是否大于预定值。
第二确定模块 808,用于当所述记录查询模块 802结果为否, 或者所述标识 查询模块 803为否,或者所述标识判断模块 805结果为是, 或者所述时间判断模 块 807结杲为是, 确定所述用户当前的网络访问行为为有效网络访问行为; 当 所述时间判断模块为否时,确定所述用户当前的网络访问行为为无效网络访问 行为。
记录模块 809: 用于当所述第二确定模块确定出所述用户当前的网络访问 行为为有效网络访问行为后, 记录所述用户当前的网络访问信息。
标识解析模块 810,用于当所述第二确定模块 808确定出所述用户当前的网 络访问行为为有效网络访问行为后,获取所述用户当前访问的网页标识对应的 网页内容,并解析所述网页内容获得所述当前访问的网页标识关联的子网页标 识。
标识保存模块 811 , 用于在所述网页标识关联关系集合中记录所述用户当 前访问的网页标识及其关联的子网页标识的对应关系。
确定当前网络访问行为为有效网络行为后,即可由用户行为分析系统对有 效网络行为的行为信息进行处理,所述行为信息包括网络访问信息记录, 因此 可以将所述网络访问信息记录提供给用户行为分析系统进行处理,例如对访问 的网页标识、 访问时间进行统计, 还可依据该网页标识抓取对应的网页内容, 对访问的网页内容进行统计、分析等,从而用户行为分析系统所分析的用户行 为信息均是有效的网络访问行为信息的行为信息,因此提高了用户行为分析的 准确度。
在本实施例中,网络信息获取模块获取到用户当前网络访问行为的网络访 问信息后,记录查询模块即查询是否该用户的网络访问信息记录, 以及在存在 该用户的网络访问信息记录后, 由标识查询模块根据网页标识关联关系集合, 确定当前访问的网页标识关联的父网页标识,并触发标识判断模块比较所述父 网页标识与用户上一次访问的网页标识是是否不同,若不同触发时间判断模块 判断出当前网络访问时间与用户上衣网络访问时间的是否时间差大于预定值, 从而即可确定出当前网络访问行为是否为有效网络访问行为,本实施例并不是 单纯的将一个网络连接中的第一访问请求对应的网络访问行为视为有效的网 絡访问行为, 从而提高了识别的准确度, 使得用户行为分析更加准确, 且无需 对所有的网络访问行为的行为信息进行分析, 减少了分析成本和难度。 本发明实施例中所述的装置可以集成到客户端、服务端、用于中转消息的 网关或者用户行为分析系统中, 当所述装置集成到客户端、服务端或者网关上 时,其可以具有相应的接口与用户行为分析系统相连。 当然所述装置也可以作 为单独的一个实体与客户端、服务端或者网关相连, 并与用户行为分析系统相 连。 本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是 与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于 实施例公开的装置而言, 由于其与实施例公开的方法相对应, 所以描述的比较 简单, 相关之处参见方法部分说明即可。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本 发明。 对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见 的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下, 在 其它实施例中实现。 因此, 本发明将不会被限制于本文所示的这些实施例, 而 是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims

权 利 要 求
1、 一种网絡访问行为识别方法, 其特征在于, 包括:
获取用户当前的网络访问行为的网络访问信息,所述网络访问信息包括所 述用户的用户标识和网络访问时间;
^^据所述用户的用户标识, 查询所述用户的网络访问信息记录; 如果查询到所述用户的网络访问信息记录,则根据所述用户当前的网络访 问行为的网络访问时间与所述网絡访问信息记录中记录的所述用户上一次网 絡访问行为的网络访问时间, 确定所述用户当前的网络访问行为的有效性。
2、 根据权利要求 1所述的方法, 其特征在于, 所述根据所述用户当前的 网络访问行为的网络访问时间与所述网络访问信息记录中记录的所述用户的 最近一次网络访问行为的网络访问时间,确定所述用户当前的网络访问行为的 有效性, 具体包括:
从所述用户的网络访问信息记录中获得所述用户上一次网络访问行为的 网格访问时间;
判断所述用户当前的网访问行为的网络络访问时间与所述用户的上一次 网络访问行为的网络访问时间的时间差是否大于预定值;
如果是, 确定所述用户当前的网络访问行为为有效网络访问行为; 如果否, 确定所述用户当前的网络访问行为为无效网络访问行为。
3、 根据权利要求 1所述的方法, 其特征在于, 还包括: 如果没有查询出 所述用户的网络访问信息记录时,确定所述用户当前的网络访问行为为有效网 络访问行为。
4、 根据权利要求 1~3任一项所述的方法, 其特征在于, 还包括: 当确定 出所述用户当前的网络访问行为为有效网络访问行为后,记录所述用户当前的 网络访问行为对应的网络访问信息。
5、 一种网络访问行为识别方法, 其特征在于, 包括:
获取用户当前的网络访问行为的网络访问信息,所述网络访问信息包括所 述用户的用户标识和访问的网页标识;
艮据所述用户的用户标识, 查询所述用户的网络访问信息记录; 从预先存储的网页标识关联关系集合中,查询与所述用户当前网络访问行 为访问的网页标识关联的父网页标识;
如杲查询到所述用户的网络访问信息记录和所述父网页标识,则根据所述 父网页标识与所述网络访问信息记录中记录的所述用户上一次网络访问行为 访问的网页标识, 确定所述用户当前的网络访问行为的有效性。
6、 根据权利要求 5所述的方法, 其特征在于, 所述根据所述父网页标识 与所述网络访问信息记录中记录的所述用户的最近一次网络访问行为访问的 网页标识, 确定所述用户当前的网络访问行为的有效性, 具体包括:
从所述用户的网絡访问信息记录中获得所述用户上一次网络访问行为访 问的网页标识;
当所述父网页标识与所述用户上一次网络访问行为访问的网页标识不同 时, 确定所述用户当前的网络访问行为为有效网络访问行为。
7、 根据权利要求 6所述的方法, 其特征在于, 所述网络访问信息还包括 网絡访问时间, 所述方法还包括:
当所述父网页标识与所述用户上一次网络访问行为访问的网页标识相同 时,从所述网络访问信息记录中获取所述用户上一次网络访问行为的网络访问 时间;
判断所述用户当前的网络访问行为的网络访问时间与所述用户的上一次 网络访问行为的网络访问时间的时间差是否大于预定值;
如果是, 则确定所述用户当前的网络访问行为为有效网络访问行为; 如果否, 确定所述用户当前的网络访问行为为无效网络访问行为。
8、 根据权利要求 5-7任一项所述的方法, 其特征在于, 还包括: 如果没 有查询出所述用户的网络访问信息记录,或没有查询出与所述用户当前网络访 问行为访问的网页标识关联的父网页标识时,确定所述用户当前的网络访问行 为为有效网终访问行为。
9、 根据权利要求 8所述的方法, 其特征在于, 当确定出所述用户当前的 网络访问行为为有效网络访问行为后, 所述方法还包括: 记录所述用户当前的 网络访问行为对应的网络访问信息。
10、 根据权利要求 5-9任一项所述的方法, 其特征在于, 当确定出所述用 户当前的网络访问行为为有效网络访问行为后, 所述方法还包括: 获取所述用户当前访问的网页标识对应的网页内容,并解析所述网页内容 获得与所述用户当前访问的网页标识关联的子网页标识;
在所述网页标识关联关系集合中记录所述用户当前访问的网页标识及其 关联的子网页标识的对应关系。
11、 根据权利要求 5-10任一项所述的方法, 其特征在于, 所述查询与所 述用户当前网络访问行为访问的网页标识关联的父网页标识, 具体包括: 通过哈希检索算查询是否存在所述用户当前访问的网页标识对应的父网 页标识。
12、 一种网络访问行为识别装置, 其特征在于, 包括:
网络信息获取模块, 用于获取用户当前的网络访问行为的网络访问信息, 所述网络访问信息包括所述用户的用户标识和网络访问时间;
记录查询模块, 用于从所述网络信息获取模块获得所述网络访问信息, 并 艮据所述网络访问信息中包含的所述用户的用户标识,查询所述用户的网络访 问信息记录;
第一确定模块,用于当所述记录查询模块查询到所述用户的网络访问信息 记录后,根据所述用户当前的网络访问行为的网络访问时间与所述网络访问信 息记录中记录的所述用户上一次网络访问行为的网络访问时间,确定所述用户 当前的网络访问行为的有效性。
13、 根据权利要求 12所述的装置, 其特征在于, 所述第一确定模块, 具 体包括:
时间确定模块,用于从所述用户的网络访问信息记录中获得所述用户上一 次网络访问行为的网络访问时间;
时间判断模块,用于判断所述用户当前的网访问行为的网络络访问时间与 所述用户的上一次网络访问行为的网络访问时间的时间差是否大于预定值; 第一确定子模块, 用于当所述时间判断模块结果为是时,确定所述用户当 前的网络访问行为为有效网络访问行为; 当所述时间判断模块结果为否时, 确 定所述用户当前的网络访问行为为无效网络访问行为。
14、 根据权利要求 12所述的装置, 其特征在于, 所述第一确定模块还用 于, 当所述记录查询模块没有查询出所述用户的网络访问信息记录时,确定所 述用户当前的网络访问行为为有效网络访问行为。
15、 根据权利要求 12~14所述的装置, 其特征在于, 所述装置还包括: 记录模块,用于当确定出所述用户当前的网络访问行为为有效网络访问行 为后, 记录所述用户当前的网络访问行为对应的网络访问信息。
16、 一种网络访问行为识别装置, 其特征在于, 包括:
网络信息获取模块, 用于获取用户当前的网络访问行为的网络访问信息, 所述网络访问信息包括所述用户的用户标识和网络访问时间;
记录查询模块, 用于从所述网络信息获取模块获得所述网络访问信息, 并 才艮据所述网络访问信息中包含的所述用户的用户标识,查询所述用户的网络访 问信息记录;
标识查询模块, 用于从网页标识关联关系集合中, 查询与所述用户当前网 络访问行为访问的网页标识关联的父网页标识;
第二确定模块,用于当所述记录查询模块查询到所述用户的网络访问信息 记录且所述标识查询模块查询到所述父网页标识时, 居所述父网页标识与所 述网络访问信息记录中记录的所述用户上一次网络访问行为访问的网页标识, 确定所述用户当前的网络访问行为的有效性。
17、 根据权利要求 16所述的装置, 其特征在于, 所述第二确定模块具体 包括:
标识确定模块,用于从所述用户的网络访问信息记录中获得所述用户上一 次网络访问行为访问的网页标识;
标识判断模块,用于判断所述父网页标识与所述用户上一次网络访问行为 访问的网页标识是否不同;
第二确定子模块, 用于当所述标识判断模块结果为是时,确定所述用户当 前的网终访问行为为有效访问行为。
18、 根据权利要求 17所述的装置, 其特征在于, 所述网络访问信息记录 还包括网络访问时间, 所述装置还包括:
时间确定模块, 用于当所述标识判断模块结杲为否时,从所述网络访问信 息记录中获取所述用户上一次网络访问行为的网络访问时间;
时间判断模块,用于判断所述用户当前的网络访问行为的网络访问时间与 所述用户的上一次网络访问行为的网络访问时间的时间差是否大于预定值; 则所述第二确定模块,还用于当所述时间判断模块结杲为是时,确定所述 用户当前的网络访问行为为有效网络访问行为,当所述时间判断模块结果为否 时, 确定所述用户当前的网络访问行为为无效网络访问行为。
19、 根据权利要求 16-18任一项所述的装置, 其特征在于, 所述第二确定 模块还用于当所述记录查询模块没有查询到所述用户的网络访问信息记录,或 所述标识查询模块没有查询出与所述用户当前网络访问行为访问的网页标识 关联的父网页标识时, 确定所述用户当前的网络访问行为为有效网络访问行 为。
20、 根据权利要求 19所述的装置, 其特征在于, 还包括:
记录模块,用于当确定出所述用户当前的网络访问行为为有效网络访问行 为后, 记录所述用户当前的网络访问行为对应的网络访问信息。
21、 根据权利要求 16-20任一项所述的装置, 其特征在于, 还包括: 标识解析模块,用于当确定出所述用户当前的网络访问行为为有效网络访 问行为后, 获取所述用户当前访问的网页标识对应的网页内容, 并解析所述网 页内容获得与所述用户当前访问的网页标识关联的子网页标识;
标识保存模块,用于在所述网页标识关联关系集合中记录所述用户当前访 问的网页标识及其关联的子网页标识的对应关系。
22、 根据权利要求 16-21任一项所述的装置, 其特征在于, 所述标识查询 模块具体用于通过哈希检索算查询是否存在所述用户当前访问的网页标识对 应的父网页标 i只。
PCT/CN2013/074873 2012-06-06 2013-04-27 网络访问行为识别方法和装置 WO2013181972A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210189934.5A CN102752288B (zh) 2012-06-06 2012-06-06 网络访问行为识别方法和装置
CN201210189934.5 2012-06-06

Publications (1)

Publication Number Publication Date
WO2013181972A1 true WO2013181972A1 (zh) 2013-12-12

Family

ID=47032188

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/074873 WO2013181972A1 (zh) 2012-06-06 2013-04-27 网络访问行为识别方法和装置

Country Status (2)

Country Link
CN (1) CN102752288B (zh)
WO (1) WO2013181972A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2610280C2 (ru) * 2014-10-31 2017-02-08 Общество С Ограниченной Ответственностью "Яндекс" Способ авторизации пользователя в сети и сервер, используемый в нем
CN106685680A (zh) * 2015-11-09 2017-05-17 北京国双科技有限公司 还原推介流量数据的方法和装置
US9871813B2 (en) 2014-10-31 2018-01-16 Yandex Europe Ag Method of and system for processing an unauthorized user access to a resource
CN111767315A (zh) * 2020-06-29 2020-10-13 北京奇艺世纪科技有限公司 黑产识别方法、装置、电子设备及存储介质
CN111917663A (zh) * 2020-06-16 2020-11-10 深圳市风云实业有限公司 一种hsr重复报文过滤表哈希桶满覆盖方法

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102752288B (zh) * 2012-06-06 2015-07-08 华为技术有限公司 网络访问行为识别方法和装置
CN102984234B (zh) * 2012-11-19 2016-06-01 北京奇虎科技有限公司 一种通信系统和网络内容的访问控制方法
CN103020126B (zh) * 2012-11-19 2016-01-13 北京奇虎科技有限公司 网络内容的访问控制方法和装置
CN102970296B (zh) * 2012-11-22 2015-07-15 网宿科技股份有限公司 基于内容分发网络的网站内容智能防抓取方法和系统
CN103475543A (zh) * 2013-09-11 2013-12-25 北京思特奇信息技术股份有限公司 一种检测系统业务异常调用的方法及系统
CN104901930A (zh) * 2014-04-21 2015-09-09 孟俊 一种基于cpk标识认证的可追溯网络行为管理方法
CN104156232B (zh) * 2014-07-18 2018-09-07 百度在线网络技术(北京)有限公司 在线性页面结构下用于页面非线性跳转的方法和设备
CN104268217B (zh) * 2014-09-25 2017-08-01 张文铸 一种用户行为时间相关性的确定方法及装置
CN104580154A (zh) * 2014-12-09 2015-04-29 上海斐讯数据通信技术有限公司 Web服务安全访问方法、系统及相应的服务器
CN105808639B (zh) * 2016-02-24 2021-02-09 平安科技(深圳)有限公司 网络访问行为识别方法和装置
CN107438100B (zh) * 2017-07-25 2020-01-31 中国联合网络通信集团有限公司 网页访问方法及浏览器
CN110020239B (zh) * 2017-09-20 2023-05-12 腾讯科技(深圳)有限公司 恶意资源转移网页识别方法及装置
CN108280117A (zh) * 2017-11-22 2018-07-13 广州市动景计算机科技有限公司 画像元数据获得方法、装置、存储介质及电子设备
CN107888604A (zh) * 2017-11-27 2018-04-06 山东浪潮云服务信息科技有限公司 一种互联网数据获取方法及获取装置
CN109948087B (zh) * 2017-12-05 2021-11-16 Oppo广东移动通信有限公司 网页资源的获取方法、装置及终端
CN108124197B (zh) * 2017-12-18 2020-09-18 广东省电信规划设计院有限公司 终端访问行为的识别方法及装置
CN108880879B (zh) * 2018-06-11 2021-11-23 北京五八信息技术有限公司 用户身份识别方法、装置、设备及计算机可读存储介质
CN110891095B (zh) * 2018-09-07 2022-12-13 上海汽车集团股份有限公司 一种客户信息验证方法及装置
CN110417901B (zh) * 2019-07-31 2022-04-29 北京金山云网络技术有限公司 数据处理方法、装置及网关服务器
CN111818038B (zh) * 2020-07-01 2023-01-31 拉扎斯网络科技(上海)有限公司 一种网络数据获取识别方法以及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090222550A1 (en) * 2008-02-28 2009-09-03 Yahoo! Inc. Measurement of the effectiveness of advertisement displayed on web pages
CN101902438A (zh) * 2009-05-25 2010-12-01 北京启明星辰信息技术股份有限公司 一种自动识别网页爬虫的方法和装置
CN102088477A (zh) * 2010-11-25 2011-06-08 互动在线(北京)科技有限公司 网站内容防采集系统和方法
CN102098229A (zh) * 2011-03-04 2011-06-15 北京星网锐捷网络技术有限公司 统一资源定位符优化审计的方法、装置和网络侧设备
CN102752288A (zh) * 2012-06-06 2012-10-24 华为技术有限公司 网络访问行为识别方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090222550A1 (en) * 2008-02-28 2009-09-03 Yahoo! Inc. Measurement of the effectiveness of advertisement displayed on web pages
CN101902438A (zh) * 2009-05-25 2010-12-01 北京启明星辰信息技术股份有限公司 一种自动识别网页爬虫的方法和装置
CN102088477A (zh) * 2010-11-25 2011-06-08 互动在线(北京)科技有限公司 网站内容防采集系统和方法
CN102098229A (zh) * 2011-03-04 2011-06-15 北京星网锐捷网络技术有限公司 统一资源定位符优化审计的方法、装置和网络侧设备
CN102752288A (zh) * 2012-06-06 2012-10-24 华为技术有限公司 网络访问行为识别方法和装置

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2610280C2 (ru) * 2014-10-31 2017-02-08 Общество С Ограниченной Ответственностью "Яндекс" Способ авторизации пользователя в сети и сервер, используемый в нем
US9871813B2 (en) 2014-10-31 2018-01-16 Yandex Europe Ag Method of and system for processing an unauthorized user access to a resource
US9900318B2 (en) 2014-10-31 2018-02-20 Yandex Europe Ag Method of and system for processing an unauthorized user access to a resource
CN106685680A (zh) * 2015-11-09 2017-05-17 北京国双科技有限公司 还原推介流量数据的方法和装置
CN106685680B (zh) * 2015-11-09 2019-09-20 北京国双科技有限公司 还原推介流量数据的方法和装置
CN111917663A (zh) * 2020-06-16 2020-11-10 深圳市风云实业有限公司 一种hsr重复报文过滤表哈希桶满覆盖方法
CN111917663B (zh) * 2020-06-16 2022-11-04 深圳市风云实业有限公司 一种hsr重复报文过滤表哈希桶满覆盖方法
CN111767315A (zh) * 2020-06-29 2020-10-13 北京奇艺世纪科技有限公司 黑产识别方法、装置、电子设备及存储介质
CN111767315B (zh) * 2020-06-29 2023-07-04 北京奇艺世纪科技有限公司 黑产识别方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN102752288A (zh) 2012-10-24
CN102752288B (zh) 2015-07-08

Similar Documents

Publication Publication Date Title
WO2013181972A1 (zh) 网络访问行为识别方法和装置
US8892680B2 (en) System and method for caching content elements with dynamic URLs
JP5613951B2 (ja) 積極的な情報のプッシュ通知のための方法およびそのためのサーバ
US9183214B2 (en) Method and apparatus for data storage and downloading
WO2011029238A1 (zh) 广告投放方法、广告服务器和广告系统
JP2008507057A (ja) 改良型ユーザインターフェース
WO2015090246A1 (zh) 短统一资源定位符的查询响应方法和装置
WO2007071143A1 (fr) Procédé et appareil destinés à émettre des informations réseau
CN106528659B (zh) 浏览器跳转至应用程序的控制方法及装置
WO2013044569A1 (zh) 一种网络用户识别的方法及其应用服务器
US9491223B2 (en) Techniques for determining a mobile application download attribution
WO2017000439A1 (zh) 一种恶意行为的检测方法、系统、设备及计算机存储介质
CN108647240B (zh) 一种统计访问量的方法、装置、电子设备及存储介质
WO2015010664A1 (zh) 一种基于家庭网关的广告推送系统及方法
EP2958034B1 (en) Method and apparatus for managing device context using an ip address in a communication system
WO2011140784A1 (zh) 屏蔽移动终端访问无线网络信息的方法、移动终端和系统
US20130179421A1 (en) System and Method for Collecting URL Information Using Retrieval Service of Social Network Service
WO2013053278A1 (zh) 网络安全识别方法、安全检测服务器、客户端及系统
EP3382981B1 (en) A user equipment and method for protection of user privacy in communication networks
WO2012119496A1 (zh) 预读方法和装置
CN108664493B (zh) 统计url是否有效的方法、装置、电子设备和存储介质
US10120936B2 (en) Providing system configuration information to a search engine
WO2012075870A1 (zh) 超文本传输协议http流关联方法和设备
WO2012159360A1 (zh) 网页预取的方法及装置
CN103067495B (zh) 一种推送信息的方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13800465

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13800465

Country of ref document: EP

Kind code of ref document: A1