WO2017080454A1 - 网站访问路径的聚合方法和装置 - Google Patents

网站访问路径的聚合方法和装置 Download PDF

Info

Publication number
WO2017080454A1
WO2017080454A1 PCT/CN2016/105206 CN2016105206W WO2017080454A1 WO 2017080454 A1 WO2017080454 A1 WO 2017080454A1 CN 2016105206 W CN2016105206 W CN 2016105206W WO 2017080454 A1 WO2017080454 A1 WO 2017080454A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
attribute information
access
information
user
Prior art date
Application number
PCT/CN2016/105206
Other languages
English (en)
French (fr)
Inventor
詹晓强
Original Assignee
北京国双科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京国双科技有限公司 filed Critical 北京国双科技有限公司
Publication of WO2017080454A1 publication Critical patent/WO2017080454A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]

Definitions

  • the present application relates to the field of computers, and in particular to a method and apparatus for aggregating website access paths.
  • the site's access logs are typically converted and stored in a relational database, with each record in the database representing a user's access. Since the access path of the website contains a series of continuous and purposeful action behaviors of the user, the analysis of the website can be realized by analyzing the access path of the website.
  • a user's access path to a website is obtained by first finding out that the user has all access to the website for a period of time, and then analyzing each visit of the user one by one, and each visit is performed.
  • the path nodes in the column are stored in the relational database, and finally the path nodes stored in the database are processed in the above column and column to obtain the access path of the above user about a certain website.
  • After getting the access path of each user about the website it is very easy to analyze each access path.
  • the user's access path is usually massive, one by one analysis, not only inefficient, but also can not analyze the access path of the user group of the website to analyze the user behavior, which requires processing the access path, under certain conditions.
  • the same multiple access paths are aggregated into one for analysis of the website.
  • the embodiment of the present application provides a method and an apparatus for aggregating a website access path, so that at least the access path of a limited length can be aggregated in the prior art, and the access path of an arbitrary length cannot be aggregated.
  • a method for aggregating a website access path including: obtaining, by a target user, access information each time a target website is accessed within a first preset time period, wherein the target user is at least one Obtaining one or more attribute information included in each piece of access information, and storing one or more attribute information included in each piece of access information in a relational database, wherein the attribute information is used to represent the access path a path node; processing target attribute information corresponding to each target user to obtain an access path of each target user, wherein each target attribute information is included in one or more attribute information obtained from each piece of access information At least one of the components; storing each access path as a record in a relational database; The aggregated functions in the relational database are used to aggregate multiple records to obtain the aggregated results.
  • the method before acquiring the access information of the target user each time the target website is accessed in the first predetermined time period, the method further includes: obtaining, from the access log of the target website, the visited target website in the second preset time period. User, as the target user.
  • acquiring one or more attribute information included in each piece of access information, and storing one or more attribute information included in each piece of access information in a relational database according to the line includes: according to the access included in the access information The time is sorted by the access information of the target user Ai, where i takes 1 to n in turn, n is the number of target users; and from the sorted access information of the target user Ai, the identity identifiers included in each piece of access information are sequentially acquired.
  • the information and one or more attribute information; the identity information obtained from each piece of access information of the target user Ai and one or more attribute information are stored one by one in the relational database.
  • each target user corresponds to a plurality of target attribute information, wherein the target attribute information corresponding to each target user is processed, and obtaining an access path of each target user includes: multiple target attribute information corresponding to the target user Ai Any two adjacent target attribute information is connected in series by a preset symbol, where i takes 1 to n in turn, n is the number of target users; and the target attribute information of the target user Ai is connected as the target user Ai. path.
  • connecting the two adjacent target attribute information of the plurality of target attribute information corresponding to the target user Ai in series by the preset symbol includes: acquiring a plurality of target attribute information corresponding to the target user Ai; and determining the target user Ai corresponding to Whether the target attribute information A i(j-1) is the same as the target attribute information A i(j) , wherein j sequentially takes 2 to m(i)-2, and m(i) is the target attribute information corresponding to the target user Ai.
  • a i (j) and the target attribute a i (j + 1) are the same; without the same determines that the target attribute information of a i (j) and the target attribute a i (j + 1) of the case, the target properties
  • the information A i(j) and the target attribute information A i(j+1) are connected by a preset symbol.
  • the attribute information includes a source type, a source channel, a browser type, an operating system type, and a search engine.
  • an aggregation apparatus for a website access path including: a first obtaining unit, configured to acquire, when the target user visits the target website every time in the first preset time period, Accessing information, wherein the target user is at least one; the second obtaining unit is configured to acquire one or more attribute information included in each piece of access information, and associate one or more attribute information included in each piece of access information in a relationship
  • the type database is stored in rows, wherein the attribute information is used to represent the path node of the access path; the processing unit is used for each The target attribute information corresponding to the target user is processed to obtain an access path of each target user, where each target attribute information is composed of at least one of one or more attribute information acquired from each piece of access information; a unit for storing each access path as a record in a relational database; and an aggregation unit for aggregating the plurality of records by using an aggregate function in the relational database to obtain an aggregated result.
  • the device further includes: a third obtaining unit, configured to acquire the second preset from the access log of the target website before acquiring the access information of the target user each time the target website is accessed within the first predetermined time period The user who visited the target site during the time period as the target user.
  • a third obtaining unit configured to acquire the second preset from the access log of the target website before acquiring the access information of the target user each time the target website is accessed within the first predetermined time period The user who visited the target site during the time period as the target user.
  • the second obtaining unit includes: a sorting subunit, configured to sort the access information of the target user Ai according to the access time included in the access information, where i sequentially takes 1 to n, n is the number of target users; a subunit, configured to sequentially acquire, from the sorted access information of the target user Ai, identity identification information and one or more attribute information included in each piece of access information; and a storage subunit for each of the target users Ai
  • the identity information obtained in the piece of access information and one or more attribute information are stored one by one in the relational database.
  • each target user corresponds to a plurality of target attribute information
  • the processing unit includes: a connection sub-unit, configured to preset any two adjacent target attribute information of the plurality of target attribute information corresponding to the target user Ai by default The symbol is connected in series, where i takes 1 to n in turn, n is the number of target users; and the determining subunit is used to use the target attribute information of the target user Ai as the access path of the target user Ai.
  • the access information is obtained when the target user accesses the target website in the first preset time period, wherein the target user is at least one; and one or more attributes included in each piece of access information are acquired.
  • FIG. 1 is a flowchart of a method for aggregating a website access path according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of an aggregation apparatus of a website access path according to an embodiment of the present application.
  • a method embodiment of an aggregation method of a website access path is provided. It should be noted that the steps shown in the flowchart of the drawing may be executed in a computer system such as a set of computer executable instructions. And, although the logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in a different order than the ones described herein.
  • FIG. 1 is a flowchart of a method for aggregating a website access path according to an embodiment of the present application. As shown in FIG. 1 , the method includes steps S102 to S110, as follows:
  • Step S102 Obtain access information of the target user each time the target website is visited in the first preset time period, where the target user is at least one.
  • the access information of the target user when visiting the target website in the first preset time period may be obtained from the access log of the target website.
  • the first preset time period can be set according to user requirements.
  • the target website may be any consumer website or video website such as Jingdong, Taobao, Suning, Vipshop, Youku, and the like. How many times the target user visits the target website within the first preset time period, and how many pieces of access information of the target user can be obtained.
  • the target website is Jingdong
  • the first preset time period is from January 1, 2015 to September 1, 2015.
  • In the log obtain the access information of User 01, User 02, and User 03 each time they visit Jingdong from January 1, 2015 to September 1, 2015.
  • Step S104 Obtain one or more attribute information included in each piece of access information, and store one or more attribute information included in each piece of access information in a relational database, where the attribute information is used to represent the access.
  • the path node of the path Obtain one or more attribute information included in each piece of access information, and store one or more attribute information included in each piece of access information in a relational database, where the attribute information is used to represent the access.
  • each access information contains a large number of attribute information, such as source type, source channel, browser type, operating system type, and search engine. Then, an attribute information may be obtained from multiple attribute information included in each piece of access information, or several attribute information may be obtained from multiple pieces of attribute information in each piece of access information, and more information may be included in each piece of access information. Get all attribute information in the attribute information.
  • access information A1, access information B1, and access information C1 there are a total of three access information acquired in step S102, namely, access information A1, access information B1, and access information C1.
  • the attribute information obtained from each piece of access information is a search engine and a browser type, and then accesses from the access information.
  • the search engine and the browser type are obtained in the information A1, the "baidu” and the "IE" are obtained, the search engine and the browser type are obtained from the access information B1, the "google” and the "firefox” are obtained, and the search is obtained from the access information C1.
  • Engine and browser type get "baidu” and "sogou”.
  • Step S106 processing target attribute information corresponding to each target user, to obtain an access path of each target user, wherein each target attribute information is included in one or more attribute information acquired from each piece of access information. At least one of the components.
  • one target attribute information may be composed of one attribute information acquired from each piece of access information, or may be composed of partial attribute information or all attribute information of a plurality of attribute information acquired from each piece of access information. If a target user has several access information, there will be several corresponding target attribute information. For a certain target user, if five pieces of access information of the target user are acquired in step S102, the target user has five corresponding target attribute information, and then the five target attribute information are processed to obtain If the access information of the target user is obtained in step S102, the target user has one corresponding target attribute information, and then the target attribute information is processed to obtain the target attribute information. Head The access path of the target user.
  • the total number of target attribute information is equal to the total number of pieces of access information. For example, if a total of 20 pieces of access information are acquired in step S102, there are 20 pieces of target attribute information.
  • each target attribute information is composed of one attribute information acquired from the piece of access information; when two attributes are acquired from each piece of access information in step S104
  • each target attribute information may be composed of two attribute information acquired from the piece of access information, or may be composed of any one of two attribute information acquired from the piece of access information.
  • a target attribute information is composed of a plurality of attribute information, the two adjacent attribute information of the plurality of attribute information constituting one target attribute information may be separated by a special character, for example, “
  • each target attribute information has the same type, that is, each target attribute information contains the same type of attribute information.
  • the access information A1, the access information B1, and the access information C1 are all user 01. If each target attribute information is composed of all the attribute information acquired from each piece of access information, then for the access information A1 The target attribute information is baidu
  • each access path is stored as a record in a relational database.
  • Step S110 Aggregating a plurality of records by using an aggregate function in the relational database to obtain an aggregation result, that is, using the aggregate function in the relational database to access a plurality of data corresponding to the plurality of records (that is, multiple access paths)
  • the polymerization treatment was carried out to obtain a polymerization result.
  • some information including the access path included in each record is aggregated by using an aggregate function in the relational database to obtain an aggregation result.
  • the access information of each target user is obtained, and the attribute information obtained from each access information of the target user is stored in a relational database in a row, and then corresponding to each target user.
  • the attribute information is processed to obtain the access path of each target user, and the obtained multiple access paths are stored one by one in the relational database, and finally the aggregation functions in the relational database are used to aggregate the multiple access paths.
  • the result of the aggregation is achieved to avoid the length of the column due to the limitation of the number of columns in the relational database.
  • the access path is processed so that the access path of the user cannot be obtained, and the access path of each user cannot be aggregated.
  • the prior art can only aggregate the access path of a limited length, but cannot be of any length.
  • the problem of aggregation of access paths achieves the technical effect of aggregating access paths of arbitrary length.
  • the method before acquiring the access information of the target user each time the target website is accessed in the first predetermined time period, the method further includes the step S101, as follows:
  • Step S101 Obtain, from the access log of the target website, a user who has visited the target website in the second preset time period as the target user, that is, the user who has visited the target website in the second preset time period Target users.
  • the second preset time period may be set according to user requirements, for example, from September 1, 2015 to September 30, 2015.
  • the second preset time period is from September 1, 2015 to September 30, 2015.
  • the target website is Jingdong, and users who have visited Jingdong from September 1, 2015 to September 30, 2015 are assumed.
  • acquiring one or more attribute information included in each piece of access information, and storing one or more attribute information included in each piece of access information in a relational database including steps S1041 to S1045, wherein :
  • Step S1041 Sort the access information of the target user Ai according to the access time included in the access information, where i takes 1 to n in turn, and n is the number of target users. Specifically, the ascending order or the descending order may be performed according to the access time.
  • the access access information of the target user A1 is sorted as follows:
  • the access information of target user A2 is sorted as shown in Table 2 below:
  • the access information of the target user A3 is sorted as shown in Table 3 below:
  • Step S1043 sequentially acquire the identity identification information and one or more attribute information included in each piece of access information from the sorted access information of the target user Ai.
  • step S1045 the identity identification information and one or more attribute information acquired from each piece of access information of the target user Ai are stored one by one in the relational database.
  • the path node of a certain target user (that is, the obtained attribute information) is stored one by one in the relational database, and is stored in the row, so it is not subject to the relational database.
  • each target user corresponds to multiple target attribute information, where the target attribute information corresponding to each target user is processed, and obtaining an access path of each target user includes steps S1061 to S1063:
  • Step S1061 Connect two adjacent target attribute information of the plurality of target attribute information corresponding to the target user Ai in series by a preset symbol, where i takes 1 to n in turn, and n is the number of target users.
  • i when there is only one target user, i is equal to 1; when there are multiple target users, i takes 1 to n in order.
  • the preset symbol can be selected according to user requirements, for example, it can be the symbol " ⁇ ".
  • step S1063 the target attribute information after the concatenation of the target user Ai is taken as the access path of the target user Ai.
  • the custom string aggregation function can be implemented by programming, so that the relational database provides an extended function to process the target attribute information corresponding to each target user.
  • connecting the two adjacent target attribute information of the plurality of target attribute information corresponding to the target user Ai by the preset symbol in series comprises step S1 to step S9, as follows:
  • Step S1 Acquire a plurality of target attribute information corresponding to the target user Ai.
  • Step S3 determining whether the target attribute information A i(j-1) corresponding to the target user Ai is the same as the target attribute information A i(j) , wherein j sequentially takes 2 to m(i)-2, and m(i) is The number of target attribute information corresponding to the target user Ai.
  • Step S5 in the case where it is determined that the target attribute information A i(j-1) is different from the target attribute information A i(j) , the target attribute information A i(j-1) and the target attribute information A i(j) ) Connected by preset symbols.
  • Step S7 in the case where it is determined that the target attribute information A i(j-1) is the same as the target attribute information A i(j) , the target attribute information A i(j-1) is deleted, and the target attribute information A i ( j) Whether it is the same as the target attribute information A i(j+1) .
  • Step S9 in the case where it is determined that the target attribute information A i(j) is different from the target attribute information A i(j+1) , the target attribute information A i(j) and the target attribute information A i(j+1) ) Connected by preset symbols.
  • a certain target attribute information corresponding to a target user is deleted, it is not deleted in the relational database (for example, in the data contents of Table 4 and Table 5), but is obtained from the obtained target user. Deleted from multiple target attribute information.
  • step S1 to step S9 may be repeatedly performed on a plurality of target attributes corresponding to each target user to obtain an access path of each target user. Moreover, when the access path of the target user is obtained by performing steps S1 to S7, the number of consecutive times of each path node in the access path can also be calculated.
  • each target attribute information is composed of one attribute information (for example, a search engine)
  • the preset symbol is ⁇
  • the target user A1, the target user A2, and the target user A3 in the above embodiment can obtain the above content.
  • the path length of each access path is obtained by the sum of consecutive numbers of nodes.
  • the number of consecutive nodes refers to the number of consecutive occurrences of each path node in an access path.
  • the user whose identity information is the user A2 is as follows: As shown in Table 5, the user whose identity information is the user A2 (that is, the target user A2 in the above embodiment) accesses Jingdong after using the 360 browser, The baidu browser is used to access Jingdong twice in succession. Later, the bing browser is used to access Jingdong twice.
  • the access path of the target user A2 is 360 ⁇ baidu ⁇ bing, and each path node in the above access path is obtained.
  • the consecutive occurrences are 1, 2, and 2, respectively, with a "
  • the path node may also be input into the program, grouped according to the access path number, and each path node and the information corresponding to the path node are connected in series in the order of sorting numbers, such as Baidu ⁇ google ⁇ bing ⁇ ..., and
  • the aggregation function is used to serialize the consecutive occurrences of each path node in a "
  • the original information of the path access node that needs to be analyzed such as the length of stay of each path node, the length of time between two path nodes, and the time of each path of the access path.
  • Table 7 is used as an example to aggregate the access path and the number of occurrences of the node in Table 7, and the aggregation result shown in Table 8 is obtained as follows:
  • the solution provided by the present application can not only aggregate access paths of any length, but also preserve (or store) related data of path nodes.
  • an aggregation device for a website access path is further provided, where the aggregation device of the website access path is used to perform the aggregation method of the website access path provided by the foregoing content in the embodiment of the present application.
  • the aggregation device of the provided website access path is specifically introduced:
  • FIG. 2 is a schematic diagram of an aggregation device for a website access path according to an embodiment of the present application.
  • the aggregation device mainly includes a first obtaining unit 21, a second obtaining unit 23, a processing unit 25, and a storage unit 27. And an aggregation unit 29, wherein:
  • the first obtaining unit 21 is configured to acquire access information of the target user each time the target website is accessed in the first preset time period, wherein the target user is at least one.
  • the access information of the target user when visiting the target website in the first preset time period may be obtained from the access log of the target website.
  • the first preset time period can be set according to user requirements.
  • the second obtaining unit 23 is configured to acquire one or more attribute information included in each piece of access information, and store one or more attribute information included in each piece of access information in a relational database, where the attribute information is A path node used to characterize an access path.
  • each access information contains a large number of attribute information, such as source type, source channel, browser type, operating system type, and search engine. Then, an attribute information may be obtained from multiple attribute information included in each piece of access information, or partial attribute information may be obtained from multiple pieces of attribute information in each piece of access information, and multiple pieces of information may be included in each piece of access information. Get all attribute information in the attribute information.
  • the processing unit 25 is configured to process the target attribute information corresponding to each target user to obtain an access path of each target user, where each target attribute information is one or more attribute information obtained from each piece of access information. At least one of the components.
  • one target attribute information may be composed of one attribute information acquired from each piece of access information, or part of attribute information or all attribute letters of multiple attribute information obtained from each piece of access information.
  • Interest composition If a target user has several access information, there will be several corresponding target attribute information.
  • the first acquisition unit 21 acquires five pieces of access information of the target user, the target user has five corresponding target attribute information, and then the five target attribute information is processed. And obtaining the access path of the target user; if the first access unit 21 obtains one piece of access information of the target user, the target user has one corresponding target attribute information, and then, the one target attribute information is Processed to get the access path of the target user.
  • the total number of target attribute information is equal to the total number of pieces of access information. For example, if a total of 20 pieces of access information are acquired in the first obtaining unit 21, there are 20 pieces of target attribute information.
  • each target attribute information is composed of one attribute information acquired from the piece of access information; when each access is accessed from the second acquisition unit 23
  • each target attribute information may be composed of two attribute information acquired from the piece of access information, or may be any one of two attribute information acquired from the piece of access information.
  • composition when a target attribute information is composed of a plurality of attribute information, the two adjacent attribute information of the plurality of attribute information constituting one target attribute information may be separated by a special character, for example, “
  • each target attribute information has the same type, that is, each target attribute information contains the same type of attribute information.
  • the storage unit 27 is for storing each access path as a record in a relational database.
  • the aggregating unit 29 is configured to aggregate a plurality of records by using an aggregation function in the relational database to obtain an aggregation result, that is, to use a plurality of records corresponding to the plurality of records by using the aggregation function in the relational database (that is, multiple pieces)
  • the access path is subjected to polymerization processing to obtain an aggregation result.
  • some information including the access path included in each record is aggregated by using an aggregate function in the relational database to obtain an aggregation result.
  • the first obtaining unit 21, the second obtaining unit 23, the processing unit 25, the storage unit 27 and the aggregating unit 29 may be operated in a computer terminal as part of the device, and may pass through a processor in the computer terminal.
  • the computer terminal can also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile Internet device (MID), a PAD, and the like.
  • the access information of each target user is obtained, and each of the target users is obtained.
  • the attribute information obtained in the piece of access information is stored in a relational database by row, and then the attribute information corresponding to each target user is processed to obtain an access path of each target user, and the obtained multiple access paths are lined up.
  • Stored one by one in a relational database and finally aggregates multiple access paths by using the aggregate function in the relational database to obtain the aggregation result, so as to avoid the inability to process the long-length access path due to the limitation of the number of columns in the relational database. Therefore, the access path of the user cannot be obtained, and the access path of each user cannot be aggregated.
  • the prior art can only aggregate the access paths of a limited length, but cannot aggregate the access paths of any length. The problem is that the technical effect of aggregating access paths of any length is achieved.
  • the device further includes: a third obtaining unit, configured to access the access log from the target website before the access information of the target user is accessed each time the target website is accessed within the first predetermined time period.
  • the user who has visited the target website in the second preset time period is obtained as the target user, that is, the user who has visited the target website in the second preset time period is regarded as the target user.
  • the foregoing third obtaining unit may be operated in a computer terminal as a part of the device, and the function implemented by the above module may be performed by a processor in the computer terminal, and the computer terminal may also be a smart phone (such as an Android mobile phone). , iOS phones, etc.), tablets, PDAs, and mobile Internet devices (MID), PAD and other terminal devices.
  • a smart phone such as an Android mobile phone. , iOS phones, etc.
  • MID mobile Internet devices
  • the second obtaining unit 23 includes a sorting subunit, an obtaining subunit, and a storage subunit.
  • the sorting subunit is configured to sort the access information of the target user Ai according to the access time included in the access information, where i takes 1 to n, n is the number of target users, and the subunit is used to obtain the target.
  • the sorted access information of the user Ai sequentially acquires the identity identification information and one or more attribute information included in each piece of access information; and the storage subunit is configured to be obtained from each piece of access information of the target user Ai.
  • the identity information and one or more attribute information are stored one by one in the relational database.
  • the ascending order or the descending order may be performed according to the access time.
  • the path node of a certain target user (that is, the obtained attribute information) is stored one by one in the relational database, and is stored in the row, so it is not subject to the relational database.
  • the foregoing sorting subunit, the obtaining subunit, and the storage subunit may be run in a computer terminal as part of the device, and the functions implemented by the above module may be performed by a processor in the computer terminal, and the computer terminal may also It is a smart phone (such as Android phone, iOS phone, etc.), tablet computer, PDA, and mobile Internet devices (MID), PAD and other terminal devices.
  • each target user corresponds to multiple target attribute information
  • the processing unit 25 includes a connection subunit and Stator unit, where:
  • the connection subunit is configured to connect any two adjacent target attribute information of the plurality of target attribute information corresponding to the target user Ai by a preset symbol, wherein i takes 1 to n in sequence, and n is the number of target users. Specifically, when there is only one target user, i is equal to 1; when there are multiple target users, i takes 1 to n in order.
  • the preset symbol can be selected according to user requirements, for example, it can be the symbol " ⁇ ".
  • the determining subunit is configured to use the concatenated target attribute information of the target user Ai as the access path of the target user Ai.
  • connection subunit and the determining subunit may be operated in a computer terminal as a part of the device, and the function implemented by the above module may be performed by a processor in the computer terminal, and the computer terminal may also be a smart phone (such as Android phones, iOS phones, etc.), tablets, handheld computers and mobile Internet devices (Mobile Internet Devices, MID), PAD and other terminal devices.
  • connection subunit includes an acquisition module, a first determination module, a first connection module, a second determination module, and a second connection module.
  • the acquiring module is configured to obtain a plurality of target attribute information corresponding to the target user Ai.
  • the first determining module is configured to determine whether the target attribute information A i(j-1) and the target attribute information A i(j) corresponding to the target user Ai are The same, where j takes 2 to m(i)-2, m(i) is the number of target attribute information corresponding to the target user Ai; the first connection module is used to determine the target attribute information A i(j-1 In the case that the target attribute information A i(j) is not the same, the target attribute information A i(j-1) and the target attribute information A i(j) are connected by a preset symbol; the second determining module is used for judging When the target attribute information A i(j-1) is the same as the target attribute information A i(j) , the target attribute information A i(j-1) is deleted, and the target attribute information A i(j) and the target attribute are determined.
  • the second connection module is configured to: when the target attribute information A i(j) is different from the target attribute information A i(j+1) , the target attribute information A is i(j) is connected to the target attribute information A i(j+1) by a preset symbol.
  • the acquiring module, the first determining module, the first connecting module, the second determining module, and the second connecting module may be repeatedly invoked for each target attribute corresponding to each target user to obtain an access path of each target user. .
  • the access path of the target user is obtained by calling the acquisition module, the first determination module, the first connection module, the second determination module, and the second connection module, the number of consecutive times of each path node in the access path may also be calculated.
  • the foregoing obtaining module, the first determining module, the first connecting module, the second determining module, and the second connecting module may be run in the computer terminal as part of the device, and may be processed by a processor in the computer terminal.
  • the functions of the above modules may be implemented, and the computer terminal may also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile Internet device (MID), a PAD, and the like.
  • the various functional units provided by the embodiments of the present application may be operated in a mobile terminal, a computer terminal, or the like, or may be stored as part of a storage medium.
  • embodiments of the present invention may provide a computer terminal, which may be any computer terminal device in a group of computer terminals.
  • a computer terminal may also be replaced with a terminal device such as a mobile terminal.
  • the computer terminal may be located in at least one network device of the plurality of network devices of the computer network.
  • the computer terminal may execute the program code of the following steps in the aggregation method of the website access path: obtaining the access information of the target user each time the target website is accessed within the first preset time period, wherein the target user is At least one; obtaining one or more attribute information included in each piece of access information, and storing one or more attribute information included in each piece of access information in a relational database, wherein the attribute information is used to represent the access a path node of the path; processing target attribute information corresponding to each target user to obtain an access path of each target user, wherein each target attribute information is one or more attribute information obtained from each piece of access information At least one of the components; storing each access path as a record in a relational database; and aggregating a plurality of records using an aggregate function in the relational database to obtain an aggregated result.
  • the computer terminal can include: one or more processors, memory, and transmission means.
  • the memory can be used to store the software program and the module, such as the aggregation method of the website access path and the program instruction/module corresponding to the device in the embodiment of the present invention, and the processor executes the software program and the module stored in the memory, thereby executing each A functional application and data processing, that is, an aggregation method for implementing the above-mentioned website access path.
  • the memory may include a high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
  • the memory can further include memory remotely located relative to the processor, which can be connected to the terminal over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the above transmission device is for receiving or transmitting data via a network.
  • Specific examples of the above network may include a wired network and a wireless network.
  • the transmission device includes a Network Interface Controller (NIC) that can be connected to other network devices and routers via a network cable to communicate with the Internet or a local area network.
  • the transmission device is a Radio Frequency (RF) module for communicating with the Internet wirelessly.
  • NIC Network Interface Controller
  • RF Radio Frequency
  • the memory is used to store preset action conditions and information of the preset rights user, and an application.
  • the processor can call the memory stored information and the application by the transmitting device to execute the program code of the method steps of each of the alternative or preferred embodiments of the above method embodiments.
  • the computer terminal can also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile Internet device (MID), a PAD, and the like.
  • a smart phone such as an Android phone, an iOS phone, etc.
  • a tablet computer such as a Samsung Galaxy Tab, etc.
  • a palm computer such as a Samsung Galaxy Tab, etc.
  • MID mobile Internet device
  • Embodiments of the present invention also provide a storage medium.
  • the foregoing storage medium may be used to save the program code executed by the aggregation method of the website access path provided by the foregoing method embodiment and the device embodiment.
  • the foregoing storage medium may be located in any one of the computer terminal groups in the computer network, or in any one of the mobile terminal groups.
  • the storage medium is configured to store program code for performing the following steps: acquiring access information of the target user each time the target website is accessed within the first preset time period, wherein the target user At least one; obtaining one or more attribute information included in each piece of access information, and storing one or more attribute information included in each piece of access information in a relational database, wherein the attribute information is used for characterization Accessing the path node of the path; processing the target attribute information corresponding to each target user to obtain an access path of each target user, wherein each target attribute information is obtained by one or more genus obtained from each piece of access information At least one of the sexual information is composed; each access path is stored as a record in a relational database; and an aggregate function in the relational database is used to aggregate the plurality of records to obtain an aggregated result.
  • the storage medium may also be configured as program code for storing various preferred or optional method steps provided by the aggregation method of the website access path.
  • the aggregation device of the website access path includes a processor and a memory, and the first acquisition unit, the second acquisition unit, the processing unit, the storage unit, and the aggregation unit are stored in the memory as program units, and are executed by the processor and stored in the memory.
  • the above program unit includes a processor and a memory, and the first acquisition unit, the second acquisition unit, the processing unit, the storage unit, and the aggregation unit are stored in the memory as program units, and are executed by the processor and stored in the memory.
  • the processor contains a kernel, and the kernel removes the corresponding program unit from the memory.
  • the kernel can be set to one or more, and it is possible to aggregate access paths of any length by adjusting the kernel parameters.
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory (flash RAM), the memory including at least one Memory chip.
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • the present application also provides an embodiment of a computer program product, when executed on a data processing device, adapted to perform a program code that initializes a method step of: acquiring a target user each time a target is accessed within a first predetermined time period The access information of the website, wherein the target user is at least one; obtaining one or more attribute information included in each piece of access information, and pressing one or more attribute information included in each piece of access information in the relational database Row storage, wherein the attribute information is used to represent the path node of the access path; the target attribute information corresponding to each target user is processed to obtain an access path of each target user, wherein each target attribute information is accessed from each of the pieces Composing at least one of one or more attribute information obtained in the information; storing each access path as a record in a relational database; and aggregating the plurality of records by using an aggregate function in the relational database Aggregation results.
  • the disclosed technical contents may be implemented in other manners.
  • the device embodiments described above are merely illustrative, such as the division of the units, It can be divided into one logical function, and the actual implementation can have another division manner. For example, multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, unit or module, and may be electrical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • a computer readable storage medium A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种网站访问路径的聚合方法和装置。其中,该方法包括:获取目标用户在第一预设时间段内每次访问目标网站时的访问信息,其中,目标用户为至少一个(S102);获取每条访问信息中包含的一个或多个属性信息,并将每条访问信息中包含的一个或多个属性信息在关系型数据库中按行存储(S104);对每个目标用户对应的目标属性信息进行处理,得到每个目标用户的访问路径(S106);将每条访问路径作为一条记录存储到关系型数据库中(S108);利用关系型数据库中的聚合函数对多条记录进行聚合,得到聚合结果(S110)。解决了现有技术中只能对有限长度的访问路径进行聚合,而无法对任意长度的访问路径进行聚合的问题。

Description

网站访问路径的聚合方法和装置 技术领域
本申请涉及计算机领域,具体而言,涉及一种网站访问路径的聚合方法和装置。
背景技术
一般来说,网站的访问日志通常被转化后存储在关系型数据库中,数据库中的每条记录代表用户的一次访问。由于网站的访问路径中包含了用户一系列连续的有目的的动作行为,所以对网站的分析可以通过分析网站的访问路径实现。
现有技术中,通过如下过程得到某个用户对某网站的访问路径:首先找出该用户在一段时间内对该网站的所有访问,然后逐条分析该用户的每次访问,并将每次访问中的路径节点一列一列的存储在关系型数据库中,最后对上述一列一列存储在数据库中的路径节点进行处理,以得到上述用户关于某个网站的访问路径。在得到每个用户关于该网站的访问路径后,分析每一条访问路径,是非常容易的。但是由于用户的访问路径通常是海量的,逐条分析,不仅效率低下,也没法分析出网站的用户群体的访问路径分析出用户行为,这就需要对访问路径进行加工处理,把在一定条件下相同的多个访问路径聚合成一条,以用于对网站进行分析。
但是,由于受到关系型数据库中列数、数据类型等因素的影响,现有技术中只能对有限长度的访问路径进行聚合,而无法对任意长度的访问路径进行聚合。
针对上述的问题,目前尚未提出有效的解决方案。
发明内容
本申请实施例提供了一种网站访问路径的聚合方法和装置,以至少现有技术中只能对有限长度的访问路径进行聚合,而无法对任意长度的访问路径进行聚合的问题。
根据本申请实施例的一个方面,提供了一种网站访问路径的聚合方法,包括:获取目标用户在第一预设时间段内每次访问目标网站时的访问信息,其中,目标用户为至少一个;获取每条访问信息中包含的一个或多个属性信息,并将每条访问信息中包含的一个或多个属性信息在关系型数据库中按行存储,其中,属性信息用于表征访问路径的路径节点;对每个目标用户对应的目标属性信息进行处理,得到每个目标用户的访问路径,其中,每个目标属性信息由从每条访问信息中获取到的一个或多个属性信息中的至少之一组成;将每条访问路径作为一条记录存储到关系型数据库中;以及 利用关系型数据库中的聚合函数对多条记录进行聚合,得到聚合结果。
进一步地,在获取目标用户在第一预定时间段内每次访问目标网站时的访问信息之前,方法还包括:从目标网站的访问日志中,获取在第二预设时间段内访问过目标网站的用户,作为目标用户。
进一步地,获取每条访问信息中包含的一个或多个属性信息,并将每条访问信息中包含的一个或多个属性信息在关系型数据库中按行存储包括:按照访问信息中包含的访问时间对目标用户Ai的访问信息进行排序,其中,i依次取1至n,n为目标用户的数量;从目标用户Ai的排序后的访问信息中,依次获取每条访问信息中包含的身份标识信息以及一个或者多个属性信息;将从目标用户Ai的每条访问信息中获取到的身份标识信息以及一个或者多个属性信息按行逐条存储在关系型数据库中。
进一步地,每个目标用户对应多个目标属性信息,其中,对每个目标用户对应的目标属性信息进行处理,得到每个目标用户的访问路径包括:将目标用户Ai对应的多个目标属性信息中任意相邻的两个目标属性信息通过预设符号串联连接,其中,i依次取1至n,n为目标用户的数量;将目标用户Ai的串联后的目标属性信息作为目标用户Ai的访问路径。
进一步地,将目标用户Ai对应的多个目标属性信息中任意相邻的两个目标属性信息通过预设符号串联连接包括:获取目标用户Ai对应的多个目标属性信息;判断目标用户Ai对应的目标属性信息Ai(j-1)与目标属性信息Ai(j)是否相同,其中,j依次取2至m(i)-2,m(i)为目标用户Ai对应的目标属性信息的数量;在判断出目标属性信息Ai(j-1)与目标属性信息Ai(j)不相同的情况下,将目标属性信息Ai(j-1)与目标属性信息Ai(j)通过预设符号连接;在判断出目标属性信息Ai(j-1)与目标属性信息Ai(j)相同的情况下,删除目标属性信息Ai(j-1),并判断目标属性信息Ai(j)与目标属性信息Ai(j+1)是否相同;在判断出目标属性信息Ai(j)与目标属性信息Ai(j+1)不相同的情况下,将目标属性信息Ai(j)与目标属性信息Ai(j+1)通过预设符号连接。
进一步地,属性信息包括来源类型、来源渠道、浏览器类型、操作系统类型和搜索引擎。
根据本申请实施例的另一方面,提供了一种用于网站访问路径的聚合装置,包括:第一获取单元,用于获取目标用户在第一预设时间段内每次访问目标网站时的访问信息,其中,目标用户为至少一个;第二获取单元,用于获取每条访问信息中包含的一个或多个属性信息,并将每条访问信息中包含的一个或多个属性信息在关系型数据库中按行存储,其中,属性信息用于表征访问路径的路径节点;处理单元,用于对每个 目标用户对应的目标属性信息进行处理,得到每个目标用户的访问路径,其中,每个目标属性信息由从每条访问信息中获取到的一个或多个属性信息中的至少之一组成;存储单元,用于将每条访问路径作为一条记录存储到关系型数据库中;以及聚合单元,用于利用关系型数据库中的聚合函数对多条记录进行聚合,得到聚合结果。
进一步地,装置还包括:第三获取单元,用于在获取目标用户在第一预定时间段内每次访问目标网站时的访问信息之前,从目标网站的访问日志中,获取在第二预设时间段内访问过目标网站的用户,作为目标用户。
进一步地,第二获取单元包括:排序子单元,用于按照访问信息中包含的访问时间对目标用户Ai的访问信息进行排序,其中,i依次取1至n,n为目标用户的数量;获取子单元,用于从目标用户Ai的排序后的访问信息中,依次获取每条访问信息中包含的身份标识信息以及一个或者多个属性信息;存储子单元,用于将从目标用户Ai的每条访问信息中获取到的身份标识信息以及一个或者多个属性信息按行逐条存储在关系型数据库中。
进一步地,每个目标用户对应多个目标属性信息,其中,处理单元包括:连接子单元,用于将目标用户Ai对应的多个目标属性信息中任意相邻的两个目标属性信息通过预设符号串联连接,其中,i依次取1至n,n为目标用户的数量;确定子单元,用于将目标用户Ai的串联后的目标属性信息作为目标用户Ai的访问路径。
在本申请实施例中,采用获取目标用户在第一预设时间段内每次访问目标网站时的访问信息,其中,目标用户为至少一个;获取每条访问信息中包含的一个或多个属性信息,并将每条访问信息中包含的一个或多个属性信息在关系型数据库中按行存储,其中,属性信息用于表征访问路径的路径节点;对每个目标用户对应的目标属性信息进行处理,得到每个目标用户的访问路径,其中,每个目标属性信息由从每条访问信息中获取到的一个或多个属性信息中的至少之一组成;以及将每条访问路径作为一条记录存储到关系型数据库中;以及利用关系型数据库中的聚合函数对多条记录进行聚合,得到聚合结果。通过获取每个目标用户的访问信息,并将从该目标用户的每条访问信息获取到的属性信息按行存储在关系型数据库中,然后对每个目标用户对应的属性信息进行处理,得到每个目标用户的访问路径,并将得到的多条访问路径按行逐条存储在关系型数据库中,最后利用关系型数据库中的聚合函数对多条访问路径进行聚合得到聚合结果,达到了避免因关系型数据库中列数限制导致无法对长度较长的访问路径进行处理,以至于无法得到用户的访问路径,最终无法对各个用户的访问路径进行聚合的目的,解决了现有技术中只能对有限长度的访问路径进行聚合,而无法对任意长度的访问路径进行聚合的问题,达到了对任意长度的访问路径聚合的技术效果。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是根据本申请实施例的一种网站访问路径的聚合方法的流程图;以及
图2是根据本申请实施例的一种网站访问路径的聚合装置的示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
根据本申请实施例,提供了一种网站访问路径的聚合方法的方法实施例,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
图1是根据本申请实施例的一种网站访问路径的聚合方法的流程图,如图1所示,该方法包括步骤S102至步骤S110,具体如下:
步骤S102,获取目标用户在第一预设时间段内每次访问目标网站时的访问信息,其中,目标用户为至少一个。
具体地,可以从目标网站的访问日志中,获取目标用户在第一预设时间段内每次访问目标网站时的访问信息。其中,第一预设时间段可以根据用户需求设置。
具体地,目标网站可以是京东、淘宝、苏宁、唯品会、优酷等任一消费网站或者视频网站。某个目标用户在第一预设时间段内访问多少次目标网站,就可以得到该目标用户的多少条访问信息。
例如:假设目标网站为京东,第一预设时间段为2015年1月1日至2015年9月1日,目标用户有三个,分别为用户01、用户02和用户03,则从京东的访问日志中,获取用户01、用户02和用户03在2015年1月1日至2015年9月1日内每次访问京东时的访问信息。
步骤S104,获取每条访问信息中包含的一个或多个属性信息,并将每条访问信息中包含的一个或多个属性信息在关系型数据库中按行存储,其中,属性信息用于表征访问路径的路径节点。
具体地,每条访问信息中都包含很多个属性信息,例如:来源类型、来源渠道、浏览器类型、操作系统类型和搜索引擎等。则,可以从每条访问信息中包含多个属性信息中获取一个属性信息,也可以从每条访问信息中包含多个属性信息中获取几个属性信息,还可以从每条访问信息中包含多个属性信息中获取全部的属性信息。
需要说明的是,不论是获取一个属性信息还是多个属性信息,从每条访问信息中获取的属性信息的类型以及属性信息的数量都是相同的。
例如,假设步骤S102中获取到的访问信息总共有三条,分别为访问信息A1、访问信息B1和访问信息C1,从每条访问信息中获取的属性信息是搜索引擎和浏览器类型,则从访问信息A1中获取搜索引擎和浏览器类型,得到“baidu”和“IE”、从访问信息B1中获取搜索引擎和浏览器类型,得到“google”和“火狐”,以及从访问信息C1中获取搜索引擎和浏览器类型,得到“baidu”和“搜狗”。
从每条访问信息中都获取一个或者多个属性信息,具体请参照上述举例。
步骤S106,对每个目标用户对应的目标属性信息进行处理,得到每个目标用户的访问路径,其中,每个目标属性信息由从每条访问信息中获取到的一个或多个属性信息中的至少之一组成。
具体地,一个目标属性信息可以由从每条访问信息中获取到的一个属性信息组成,也可以由从每条访问信息中获取到的多个属性信息中的部分属性信息或者全部属性信息组成。一个目标用户有几条访问信息,就会有几个对应的目标属性信息。对于某个目标用户而言,如果步骤S102中获取到该目标用户的5条访问信息,则该目标用户有5个对应的目标属性信息,那么,对上述5个目标属性信息进行处理,来得到该目标用户的访问路径;如果步骤S102中获取到该目标用户的1条访问信息,则该目标用户有1个对应的目标属性信息,那么,对上述1个目标属性信息进行处理,来得到该目 标用户的访问路径。
需要说明的是,目标属性信息的总个数与访问信息的总条数是相等的。例如,步骤S102中总共获取到20条访问信息,则就有20个目标属性信息。
当步骤S104中从每条访问信息中获取一个属性信息时,则每个目标属性信息由从该条访问信息获取到的一个属性信息组成;当步骤S104中从每条访问信息中获取两个属性信息时,则每个目标属性信息可以由从该条访问信息获取到的两个属性信息组成,也可以由从该条访问信息获取到的两个属性信息中的任意一个组成。当一个目标属性信息由多个属性信息组成时,上述组成一个目标属性信息的多个属性信息中相邻两个属性信息之间可以通过特殊字符间隔,例如:“|”。还需要说明的是,每个目标属性信息的类型都相同,也即,每个目标属性信息所包含的属性信息的类型都相同。如果某个目标属性信息由属性信息“搜索引擎”组成,则不论哪个目标用户对应的目标属性信息都是由“搜索引擎”组成;如果某个目标属性信息由属性信息“搜索引擎”和“浏览器类型”组成,则不论哪个目标用户对应的目标属性信息都是由“搜索引擎”和“浏览器类型”组成。
继续采用上述举例进行说明,假设访问信息A1、访问信息B1和访问信息C1都是用户01的,如果每个目标属性信息由从每条访问信息获取到的全部属性信息组成,则对于访问信息A1,目标属性信息为baidu|IE;对于访问信息B1,目标属性为google|火狐;对于访问信息C1,目标属性信息为baidu|搜狗,那么用户01对应的目标属性信息有三个,分别为baidu|IE、google|火狐和baidu|搜狗,将上述三个目标属性信息进行处理,则可以得到用户01的访问路径。需要说明的是,对于某个网站,一个用户只有一条访问路径。
步骤S108,将每条访问路径作为一条记录存储到关系型数据库中。
步骤S110,利用关系型数据库中的聚合函数对多条记录进行聚合,得到聚合结果,也就是,利用关系型数据库中的聚合函数对多条记录对应的多个数据(也即,多条访问路径)进行聚合处理,得到聚合结果。
具体地,利用关系型数据库中的聚合函数对每条记录中包含的访问路径在内的一些信息进行聚合处理,得到聚合结果。
在本申请实施例中,通过获取每个目标用户的访问信息,并将从该目标用户的每条访问信息中获取到的属性信息按行存储在关系型数据库中,然后对每个目标用户对应的属性信息进行处理,得到每个目标用户的访问路径,并将得到的多条访问路径按行逐条存储在关系型数据库中,最后利用关系型数据库中的聚合函数对多条访问路径进行聚合得到聚合结果,达到了避免因关系型数据库中列数限制导致无法对长度较长 的访问路径进行处理,以至于无法得到用户的访问路径,最终无法对各个用户的访问路径进行聚合的目的,解决了现有技术中只能对有限长度的访问路径进行聚合,而无法对任意长度的访问路径进行聚合的问题,达到了对任意长度的访问路径进行聚合的技术效果。
可选地,在获取目标用户在第一预定时间段内每次访问目标网站时的访问信息之前,方法还包括步骤S101,具体如下:
步骤S101,从目标网站的访问日志中,获取在第二预设时间段内访问过目标网站的用户,作为目标用户,也即,将在第二预设时间段内访问过目标网站的用户作为目标用户。
具体地,第二预设时间段可以根据用户需求设置,例如:2015年9月1日至2015年9月30日。
假设,第二预设时间段为2015年9月1日至2015年9月30日,目标网站为京东,则获取2015年9月1日至2015年9月30日内访问过京东的用户,假设有三个分别是用户01、用户02和用户03,则上述三个用户均为目标用户。
可选地,获取每条访问信息中包含的一个或多个属性信息,并将每条访问信息中包含的一个或多个属性信息在关系型数据库中按行存储包括步骤S1041至步骤S1045,其中:
步骤S1041,按照访问信息中包含的访问时间对目标用户Ai的访问信息进行排序,其中,i依次取1至n,n为目标用户的数量。具体地,可以按照访问时间进行升序排序或者降序排序。
例如,按照访问时间升序排序,目标用户A1的访问访问信息进行排序如下表1:
表1
身份标识信息 访问时间 搜索引擎 浏览器类型
用户A1 2015-1-1 baidu IE
用户A1 2015-1-3 google IE
用户A1 2015-2-1 bing 搜狗
按照访问时间升序排序,目标用户A2的访问访问信息进行排序如下表2:
表2
身份标识信息 访问时间 搜索引擎 浏览器类型
用户A2 2015-1-2 360 火狐
用户A2 2015-1-9 baidu 火狐
用户A2 2015-2-11 baidu UC
用户A2 2015-3-9 bing UC
用户A2 2015-4-11 bing UC
按照访问时间升序排序,目标用户A3的访问访问信息进行排序如下表3:
表3
身份标识信息 访问时间 搜索引擎 浏览器类型
用户A3 2015-1-1 baidu IE
用户A3 2015-7-4 google 火狐
用户A3 2015-8-1 bing 搜狗
用户A3 2015-9-1 bing 搜狗
需要说明中,上述几个表格中的“…”表示访问信息中包含的其它信息。
步骤S1043,从目标用户Ai的排序后的访问信息中,依次获取每条访问信息中包含的身份标识信息以及一个或者多个属性信息。
步骤S1045,将从目标用户Ai的每条访问信息中获取到的身份标识信息以及一个或者多个属性信息按行逐条存储在关系型数据库中。
具体地,如果从每条访问信息中只获取一个属性信息(例如,搜索引擎)时,对于目标用户A1将下述表4中的内容按行逐条存储在关系型数据库中。
表4
身份标识信息 搜索引擎 排序序号
用户A1 baidu 1
用户A1 google 2
用户A1 bing 3
对于目标用户A2将下述表5中的内容按行逐条存储在关系型数据库中。
表5
身份标识信息 搜索引擎 排序序号
用户A2 360 1
用户A2 baidu 2
用户A2 baidu 3
用户A2 bing 4
用户A2 bing 5
对于目标用户A3将下述表6中的内容按行逐条存储在关系型数据库中。
表6
身份标识信息 搜索引擎 排序序号
用户A3 baidu 1
用户A3 google 2
用户A3 bing 3
用户A3 bing 4
在本申请实施例中,将某个目标用户的路径节点(也即,获取到的属性信息)按行逐条存储在关系型数据库中,由于是按行进行存储,所以不会受到关系型数据库中列数限制的影响。
可选地,每个目标用户对应多个目标属性信息,其中,对每个目标用户对应的目标属性信息进行处理,得到每个目标用户的访问路径包括步骤S1061至步骤S1063:
步骤S1061,将目标用户Ai对应的多个目标属性信息中任意相邻的两个目标属性信息通过预设符号串联连接,其中,i依次取1至n,n为目标用户的数量。
具体地,当只有一个目标用户时,i等于1;当有多个目标用户时,i依次取1至n。预设符号可以根据用户需求选定,例如,可以为符号“→”。
步骤S1063,将目标用户Ai的串联后的目标属性信息作为目标用户Ai的访问路径。
具体地,可以通过编程实现自定义字符串聚合函数,以便关系型数据库提供扩展的功能来实现对每个目标用户对应的目标属性信息进行处理。
可选地,将目标用户Ai对应的多个目标属性信息中任意相邻的两个目标属性信息通过预设符号串联连接包括步骤S1至步骤S9,具体如下:
步骤S1,获取目标用户Ai对应的多个目标属性信息。
步骤S3,判断目标用户Ai对应的目标属性信息Ai(j-1)与目标属性信息Ai(j)是否相同,其中,j依次取2至m(i)-2,m(i)为目标用户Ai对应的目标属性信息的数量。
步骤S5,在判断出目标属性信息Ai(j-1)与目标属性信息Ai(j)不相同的情况下,将目标属性信息Ai(j-1)与目标属性信息Ai(j)通过预设符号连接。
步骤S7,在判断出目标属性信息Ai(j-1)与目标属性信息Ai(j)相同的情况下,删除目标属性信息Ai(j-1),并判断目标属性信息Ai(j)与目标属性信息Ai(j+1)是否相同。
步骤S9,在判断出目标属性信息Ai(j)与目标属性信息Ai(j+1)不相同的情况下,将目标属性信息Ai(j)与目标属性信息Ai(j+1)通过预设符号连接。
需要说明的是,如果删除某个目标用户对应的某个目标属性信息并非在关系型数据库中(例如,表4、表5等数据内容中)删除,而是从获取到的该目标用户对应的 多个目标属性信息中删除。
具体地,可以对每个目标用户对应的多个目标属性都重复执行步骤S1至步骤S9,来得到每个目标用户的访问路径。并且,在通过执行步骤S1至步骤S7得到目标用户的访问路径时,还可以计算访问路径中每个路径节点的连续次数。
假设,当每个目标属性信息由一个属性信息(例如,搜索引擎)组成时,预设符号为→时,对于上述实施例中的目标用户A1、目标用户A2和目标用户A3通过上述内容可以得到下述表7中所示的访问路径。
表7
Figure PCTCN2016105206-appb-000001
需要说明的是,每条访问路径的路径长度由节点连续次数之和得到。节点连续次数是指在一条访问路径中,每个路径节点连续出现的次数。以身份标识信息为用户A2的用户为例说明如下:通过表5可知,身份标识信息为用户A2的用户(也即,上述实施例中的目标用户A2)在用360浏览器访问京东后,又连续两次用baidu浏览器访问京东,后来连续两次用bing浏览器访问京东,根据步骤S1至步骤S7得到目标用户A2的访问路径为360→baidu→bing,上述访问路径中的每个路径节点的连续出现次数分别为1、2、2,在关系型数据库中用每个节点连续次数之间用“|”间隔。
具体地,也可以把路径节点输入到程序中,按照访问路径序号分组,以排序序号的顺序把每个路径节点以及该路径节点对应的信息以串联起来,如Baidu→google→bing→…,并使用聚合函数把每个路径节点的连续出现次数以“|”分割方式串联在一起,依次类推,还可以从访问信息中获取每个路径节点的其它数据信息按照上述方式串联在一起,进而能够保留路径访问节点的需要分析的原始信息,例如每个路径节点的停留时长、两个路径节点彼此间隔时长和每条访问路径的路径耗时等。
通过关系型数据库中原本具有的聚合函数对得到的多条访问路径进行聚合,以表7为例,对表7中访问路径和节点出现次数进行聚合,得到如下表8所示的聚合结果:
表8
Figure PCTCN2016105206-appb-000002
Figure PCTCN2016105206-appb-000003
通过上述内容可知,本申请所提供的方案不仅可以对任意长度的访问路径聚合,还可以保留(或存储)路径节点的相关数据。
根据本申请实施例,还提供了一种网站访问路径的聚合装置,该网站访问路径的聚合装置用于执行本申请实施例上述内容所提供的网站访问路径的聚合方法,以下对本申请实施例所提供的网站访问路径的聚合装置做具体介绍:
图2是根据本申请实施例的一种网站访问路径的聚合装置的示意图,如图2所示,该聚合装置主要包括第一获取单元21、第二获取单元23、处理单元25、存储单元27和聚合单元29,其中:
第一获取单元21用于获取目标用户在第一预设时间段内每次访问目标网站时的访问信息,其中,目标用户为至少一个。
具体地,可以从目标网站的访问日志中,获取目标用户在第一预设时间段内每次访问目标网站时的访问信息。其中,第一预设时间段可以根据用户需求设置。
第二获取单元23用于获取每条访问信息中包含的一个或多个属性信息,并将每条访问信息中包含的一个或多个属性信息在关系型数据库中按行存储,其中,属性信息用于表征访问路径的路径节点。
具体地,每条访问信息中都包含很多个属性信息,例如:来源类型、来源渠道、浏览器类型、操作系统类型和搜索引擎等。则,可以从每条访问信息中包含多个属性信息中获取一个属性信息,也可以从每条访问信息中包含多个属性信息中获取部分属性信息,还可以从每条访问信息中包含多个属性信息中获取全部的属性信息。
需要说明的是,不论是获取一个属性信息还是多个属性信息,从每条访问信息中获取的属性信息的类型以及属性信息的数量都是相同的。
处理单元25用于对每个目标用户对应的目标属性信息进行处理,得到每个目标用户的访问路径,其中,每个目标属性信息由从每条访问信息中获取到的一个或多个属性信息中的至少之一组成。
具体地,一个目标属性信息可以由从每条访问信息中获取到的一个属性信息组成,也可以由从每条访问信息中获取到的多个属性信息中的部分属性信息或者全部属性信 息组成。一个目标用户有几条访问信息,就会有几个对应的目标属性信息。对于某个目标用户而言,如果第一获取单元21中获取到该目标用户的5条访问信息,则该目标用户有5个对应的目标属性信息,那么,对上述5个目标属性信息进行处理,来得到该目标用户的访问路径;如果第一获取单元21中获取到该目标用户的1条访问信息,则该目标用户有1个对应的目标属性信息,那么,对上述1个目标属性信息进行处理,来得到该目标用户的访问路径。
需要说明的是,目标属性信息的总个数与访问信息的总条数是相等的。例如,第一获取单元21中总共获取到20条访问信息,则就有20个目标属性信息。
当第二获取单元23中从每条访问信息中获取一个属性信息时,则每个目标属性信息由从该条访问信息获取到的一个属性信息组成;当第二获取单元23中从每条访问信息中获取两个属性信息时,则每个目标属性信息可以由从该条访问信息获取到的两个属性信息组成,也可以由从该条访问信息获取到的两个属性信息中的任意一个组成。需要说明的是,当一个目标属性信息由多个属性信息组成时,上述组成一个目标属性信息的多个属性信息中相邻两个属性信息之间可以通过特殊字符间隔,例如:“|”。还需要说明的是,每个目标属性信息的类型都相同,也即,每个目标属性信息所包含的属性信息的类型都相同。如果某个目标属性信息由属性信息“搜索引擎”组成,则不论哪个目标用户对应的目标属性信息都是由“搜索引擎”组成;如果某个目标属性信息由属性信息“搜索引擎”和“浏览器类型”组成,则不论哪个目标用户对应的目标属性信息都是由“搜索引擎”和“浏览器类型”组成。
存储单元27用于将每条访问路径作为一条记录存储到关系型数据库中。
聚合单元29用于利用关系型数据库中的聚合函数对多条记录进行聚合,得到聚合结果,也就是,利用关系型数据库中的聚合函数对多条记录对应的多个数据(也即,多条访问路径)进行聚合处理,得到聚合结果。
具体地,利用关系型数据库中的聚合函数对每条记录中包含的访问路径在内的一些信息进行聚合处理,得到聚合结果。
此处需要说明的是,上述第一获取单元21、第二获取单元23、处理单元25、存储单元27和聚合单元29可以作为装置的一部分运行在计算机终端中,可以通过计算机终端中的处理器来执行上述模块实现的功能,计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。
在本申请实施例中,通过获取每个目标用户的访问信息,并将从该目标用户的每 条访问信息中获取到的属性信息按行存储在关系型数据库中,然后对每个目标用户对应的属性信息进行处理,得到每个目标用户的访问路径,并将得到的多条访问路径按行逐条存储在关系型数据库中,最后利用关系型数据库中的聚合函数对多条访问路径进行聚合得到聚合结果,达到了避免因关系型数据库中列数限制导致无法对长度较长的访问路径进行处理,以至于无法得到用户的访问路径,最终无法对各个用户的访问路径进行聚合的目的,解决了现有技术中只能对有限长度的访问路径进行聚合,而无法对任意长度的访问路径进行聚合的问题,达到了对任意长度的访问路径进行聚合的技术效果。
可选地,在本申请实施例中,装置还包括:第三获取单元,用于在获取目标用户在第一预定时间段内每次访问目标网站时的访问信息之前,从目标网站的访问日志中,获取在第二预设时间段内访问过目标网站的用户,作为目标用户,也即,将在第二预设时间段内访问过目标网站的用户作为目标用户。
此处需要说明的是,上述第三获取单元可以作为装置的一部分运行在计算机终端中,可以通过计算机终端中的处理器来执行上述模块实现的功能,计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。
可选地,在本申请实施例中,第二获取单元23包括排序子单元、获取子单元和存储子单元。其中,排序子单元,用于按照访问信息中包含的访问时间对目标用户Ai的访问信息进行排序,其中,i依次取1至n,n为目标用户的数量;获取子单元,用于从目标用户Ai的排序后的访问信息中,依次获取每条访问信息中包含的身份标识信息以及一个或者多个属性信息;存储子单元,用于将从目标用户Ai的每条访问信息中获取到的身份标识信息以及一个或者多个属性信息按行逐条存储在关系型数据库中。
具体地,可以按照访问时间进行升序排序或者降序排序。
在本申请实施例中,将某个目标用户的路径节点(也即,获取到的属性信息)按行逐条存储在关系型数据库中,由于是按行进行存储,所以不会受到关系型数据库中列数限制的影响。
此处需要说明的是,上述排序子单元、获取子单元和存储子单元可以作为装置的一部分运行在计算机终端中,可以通过计算机终端中的处理器来执行上述模块实现的功能,计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。
可选地,每个目标用户对应多个目标属性信息,处理单元25包括连接子单元和确 定子单元,其中:
连接子单元用于将目标用户Ai对应的多个目标属性信息中任意相邻的两个目标属性信息通过预设符号串联连接,其中,i依次取1至n,n为目标用户的数量。具体地,当只有一个目标用户时,i等于1;当有多个目标用户时,i依次取1至n。预设符号可以根据用户需求选定,例如,可以为符号“→”。
确定子单元用于将目标用户Ai的串联后的目标属性信息作为目标用户Ai的访问路径。
此处需要说明的是,上述连接子单元和确定子单元可以作为装置的一部分运行在计算机终端中,可以通过计算机终端中的处理器来执行上述模块实现的功能,计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。
可选地,在本申请实施例中,连接子单元包括获取模块、第一判断模块、第一连接模块、第二判断模块和第二连接模块。其中,获取模块用于获取目标用户Ai对应的多个目标属性信息;第一判断模块用于判断目标用户Ai对应的目标属性信息Ai(j-1)与目标属性信息Ai(j)是否相同,其中,j依次取2至m(i)-2,m(i)为目标用户Ai对应的目标属性信息的数量;第一连接模块用于在判断出目标属性信息Ai(j-1)与目标属性信息Ai(j)不相同的情况下,将目标属性信息Ai(j-1)与目标属性信息Ai(j)通过预设符号连接;第二判断模块用于在判断出目标属性信息Ai(j-1)与目标属性信息Ai(j)相同的情况下,删除目标属性信息Ai(j-1),并判断目标属性信息Ai(j)与目标属性信息Ai(j+1)是否相同;第二连接模块用于在判断出目标属性信息Ai(j)与目标属性信息Ai(j+1)不相同的情况下,将目标属性信息Ai(j)与目标属性信息Ai(j+1)通过预设符号连接。
需要说明的是,如果删除某个目标用户对应的某个目标属性信息并非在关系型数据库中(例如,上述表4、表5等数据内容中)删除,而是从获取到的该目标用户对应的多个目标属性信息中删除。
具体地,可以对每个目标用户对应的多个目标属性都重复调用获取模块、第一判断模块、第一连接模块、第二判断模块和第二连接模块,来得到每个目标用户的访问路径。并且,在通过调用获取模块、第一判断模块、第一连接模块、第二判断模块和第二连接模块得到目标用户的访问路径时,还可以计算访问路径中每个路径节点的连续次数。
此处需要说明的是,上述获取模块、第一判断模块、第一连接模块、第二判断模块和第二连接模块可以作为装置的一部分运行在计算机终端中,可以通过计算机终端中的处理器来执行上述模块实现的功能,计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。
本申请实施例所提供的各个功能单元可以在移动终端、计算机终端或者类似的运算装置中运行,也可以作为存储介质的一部分进行存储。
由此,本发明的实施例可以提供一种计算机终端,该计算机终端可以是计算机终端群中的任意一个计算机终端设备。可选地,在本实施例中,上述计算机终端也可以替换为移动终端等终端设备。
可选地,在本实施例中,上述计算机终端可以位于计算机网络的多个网络设备中的至少一个网络设备。
在本实施例中,上述计算机终端可以执行网站访问路径的聚合方法中以下步骤的程序代码:获取目标用户在第一预设时间段内每次访问目标网站时的访问信息,其中,目标用户为至少一个;获取每条访问信息中包含的一个或多个属性信息,并将每条访问信息中包含的一个或多个属性信息在关系型数据库中按行存储,其中,属性信息用于表征访问路径的路径节点;对每个目标用户对应的目标属性信息进行处理,得到每个目标用户的访问路径,其中,每个目标属性信息由从每条访问信息中获取到的一个或多个属性信息中的至少之一组成;将每条访问路径作为一条记录存储到关系型数据库中;以及利用关系型数据库中的聚合函数对多条记录进行聚合,得到聚合结果。
可选地,该计算机终端可以包括:一个或多个处理器、存储器、以及传输装置。
其中,存储器可用于存储软件程序以及模块,如本发明实施例中的网站访问路径的聚合方法及装置对应的程序指令/模块,处理器通过运行存储在存储器内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的网站访问路径的聚合方法。存储器可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器可进一步包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
上述的传输装置用于经由一个网络接收或者发送数据。上述的网络具体实例可包括有线网络及无线网络。在一个实例中,传输装置包括一个网络适配器(Network Interface Controller,NIC),其可通过网线与其他网络设备与路由器相连从而可与互联网或局域网进行通讯。在一个实例中,传输装置为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。
其中,具体地,存储器用于存储预设动作条件和预设权限用户的信息、以及应用程序。
处理器可以通过传输装置调用存储器存储的信息及应用程序,以执行上述方法实施例中的各个可选或优选实施例的方法步骤的程序代码。
本领域普通技术人员可以理解,计算机终端也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以用于保存上述方法实施例和装置实施例所提供的网站访问路径的聚合方法所执行的程序代码。
可选地,在本实施例中,上述存储介质可以位于计算机网络中计算机终端群中的任意一个计算机终端中,或者位于移动终端群中的任意一个移动终端中。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:获取目标用户在第一预设时间段内每次访问目标网站时的访问信息,其中,目标用户为至少一个;获取每条访问信息中包含的一个或多个属性信息,并将每条访问信息中包含的一个或多个属性信息在关系型数据库中按行存储,其中,属性信息用于表征访问路径的路径节点;对每个目标用户对应的目标属性信息进行处理,得到每个目标用户的访问路径,其中,每个目标属性信息由从每条访问信息中获取到的一个或多个属 性信息中的至少之一组成;将每条访问路径作为一条记录存储到关系型数据库中;以及利用关系型数据库中的聚合函数对多条记录进行聚合,得到聚合结果。
可选地,在本实施例中,存储介质还可以被设置为存储网站访问路径的聚合方法提供的各种优选地或可选的方法步骤的程序代码。
如上参照附图以示例的方式描述了根据本发明的网站访问路径的聚合方法及装置。但是,本领域技术人员应当理解,对于上述本发明所提出的网站访问路径的聚合方法及装置,还可以在不脱离本发明内容的基础上做出各种改进。因此,本发明的保护范围应当由所附的权利要求书的内容确定。
网站访问路径的聚合装置包括处理器和存储器,上述第一获取单元、第二获取单元、处理单元、存储单元和聚合单元等均作为程序单元存储在存储器中,由处理器执行存储在存储器中的上述程序单元。
处理器中包含内核,由内核去存储器中调取相应的程序单元。内核可以设置一个或以上,通过调整内核参数达到了能够对任意长度的访问路径进行聚合。
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM),存储器包括至少一个存储芯片。
本申请还提供了一种计算机程序产品的实施例,当在数据处理设备上执行时,适于执行初始化有如下方法步骤的程序代码:获取目标用户在第一预设时间段内每次访问目标网站时的访问信息,其中,目标用户为至少一个;获取每条访问信息中包含的一个或多个属性信息,并将每条访问信息中包含的一个或多个属性信息在关系型数据库中按行存储,其中,属性信息用于表征访问路径的路径节点;对每个目标用户对应的目标属性信息进行处理,得到每个目标用户的访问路径,其中,每个目标属性信息由从每条访问信息中获取到的一个或多个属性信息中的至少之一组成;将每条访问路径作为一条记录存储到关系型数据库中;以及利用关系型数据库中的聚合函数对多条记录进行聚合,得到聚合结果。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
在本申请的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分, 可以为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述仅是本申请的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。

Claims (12)

  1. 一种网站访问路径的聚合方法,其特征在于,包括:
    获取目标用户在第一预设时间段内每次访问目标网站时的访问信息,其中,所述目标用户为至少一个;
    获取每条所述访问信息中包含的一个或多个属性信息,并将每条所述访问信息中包含的一个或多个属性信息在关系型数据库中按行存储,其中,所述属性信息用于表征访问路径的路径节点;
    对每个所述目标用户对应的目标属性信息进行处理,得到每个所述目标用户的访问路径,其中,每个所述目标属性信息由从每条所述访问信息中获取到的所述一个或多个属性信息中的至少之一组成;
    将每条所述访问路径作为一条记录存储到所述关系型数据库中;以及
    利用所述关系型数据库中的聚合函数对多条所述记录进行聚合,得到聚合结果。
  2. 根据权利要求1所述的方法,其特征在于,在获取目标用户在第一预定时间段内每次访问目标网站时的访问信息之前,所述方法还包括:
    从所述目标网站的访问日志中,获取在第二预设时间段内访问过所述目标网站的用户,作为所述目标用户。
  3. 根据权利要求1所述的方法,其特征在于,获取每条所述访问信息中包含的一个或多个属性信息,并将每条所述访问信息中包含的一个或多个属性信息在关系型数据库中按行存储包括:
    按照访问信息中包含的访问时间对目标用户Ai的访问信息进行排序,其中,i依次取1至n,n为所述目标用户的数量;
    从目标用户Ai的排序后的访问信息中,依次获取每条所述访问信息中包含的身份标识信息以及一个或者多个属性信息;
    将从所述目标用户Ai的每条所述访问信息中获取到的所述身份标识信息以及一个或者多个属性信息按行逐条存储在所述关系型数据库中。
  4. 根据权利要求1所述的方法,其特征在于,每个所述目标用户对应多个所述目标属性信息,其中,对每个所述目标用户对应的目标属性信息进行处理,得到每个所述目标用户的访问路径包括:
    将目标用户Ai对应的多个目标属性信息中任意相邻的两个所述目标属性信息 通过预设符号串联连接,其中,i依次取1至n,n为所述目标用户的数量;
    将所述目标用户Ai的串联后的目标属性信息作为所述目标用户Ai的访问路径。
  5. 根据权利要求4所述的方法,其特征在于,将目标用户Ai对应的多个目标属性信息中任意相邻的两个所述目标属性信息通过预设符号串联连接包括:
    获取所述目标用户Ai对应的多个目标属性信息;
    判断所述目标用户Ai对应的目标属性信息Ai(j-1)与目标属性信息Ai(j)是否相同,其中,j依次取2至m(i)-2,m(i)为所述目标用户Ai对应的目标属性信息的数量;
    在判断出所述目标属性信息Ai(j-1)与所述目标属性信息Ai(j)不相同的情况下,将所述目标属性信息Ai(j-1)与所述目标属性信息Ai(j)通过所述预设符号连接;
    在判断出所述目标属性信息Ai(j-1)与所述目标属性信息Ai(j)相同的情况下,删除所述目标属性信息Ai(j-1),并判断所述目标属性信息Ai(j)与目标属性信息Ai(j+1)是否相同;
    在判断出所述目标属性信息Ai(j)与所述目标属性信息Ai(j+1)不相同的情况下,将所述目标属性信息Ai(j)与所述目标属性信息Ai(j+1)通过所述预设符号连接。
  6. 根据权利要求1所述的方法,其特征在于,所述属性信息包括来源类型、来源渠道、浏览器类型、操作系统类型和搜索引擎。
  7. 一种网站访问路径的聚合装置,其特征在于,包括:
    第一获取单元,用于获取目标用户在第一预设时间段内每次访问目标网站时的访问信息,其中,所述目标用户为至少一个;
    第二获取单元,用于获取每条所述访问信息中包含的一个或多个属性信息,并将每条所述访问信息中包含的一个或多个属性信息在关系型数据库中按行存储,其中,所述属性信息用于表征访问路径的路径节点;
    处理单元,用于对每个所述目标用户对应的目标属性信息进行处理,得到每个所述目标用户的访问路径,其中,每个所述目标属性信息由从每条所述访问信息中获取到的所述一个或多个属性信息中的至少之一组成;
    存储单元,用于将每条所述访问路径作为一条记录存储到所述关系型数据库 中;以及
    聚合单元,用于利用所述关系型数据库中的聚合函数对多条所述记录进行聚合,得到聚合结果。
  8. 根据权利要求7所述的装置,其特征在于,所述装置还包括:
    第三获取单元,用于在获取目标用户在第一预定时间段内每次访问目标网站时的访问信息之前,从所述目标网站的访问日志中,获取在第二预设时间段内访问过所述目标网站的用户,作为所述目标用户。
  9. 根据权利要求7所述的装置,其特征在于,所述第二获取单元包括:
    排序子单元,用于按照访问信息中包含的访问时间对目标用户Ai的访问信息进行排序,其中,i依次取1至n,n为所述目标用户的数量;
    获取子单元,用于从目标用户Ai的排序后的访问信息中,依次获取每条所述访问信息中包含的身份标识信息以及一个或者多个属性信息;
    存储子单元,用于将从所述目标用户Ai的每条所述访问信息中获取到的所述身份标识信息以及一个或者多个属性信息按行逐条存储在所述关系型数据库中。
  10. 根据权利要求7所述的装置,其特征在于,每个所述目标用户对应多个所述目标属性信息,其中,所述处理单元包括:
    连接子单元,用于将目标用户Ai对应的多个目标属性信息中任意相邻的两个所述目标属性信息通过预设符号串联连接,其中,i依次取1至n,n为所述目标用户的数量;
    确定子单元,用于将所述目标用户Ai的串联后的目标属性信息作为所述目标用户Ai的访问路径。
  11. 根据权利要求10所述的装置,其特征在于,所述连接子单元包括:
    获取模块,用于获取所述目标用户Ai对应的多个目标属性信息;
    第一判断模块,用于判断所述目标用户Ai对应的目标属性信息Ai(j-1)与目标属性信息Ai(j)是否相同,其中,j依次取2至m(i)-2,m(i)为所述目标用户Ai对应的目标属性信息的数量;
    第一连接模块,用于在判断出所述目标属性信息Ai(j-1)与所述目标属性信息Ai(j)不相同的情况下,将所述目标属性信息Ai(j-1)与所述目标属性信息Ai(j)通过所 述预设符号连接;
    第二判断模块,用于在判断出所述目标属性信息Ai(j-1)与所述目标属性信息Ai(j)相同的情况下,删除所述目标属性信息Ai(j-1),并判断所述目标属性信息Ai(j)与目标属性信息Ai(j+1)是否相同;
    第二连接模块,用于在判断出所述目标属性信息Ai(j)与所述目标属性信息Ai(j+1)不相同的情况下,将所述目标属性信息Ai(j)与所述目标属性信息Ai(j+1)通过所述预设符号连接。
  12. 根据权利要求7所述的装置,其特征在于,所述属性信息包括来源类型、来源渠道、浏览器类型、操作系统类型和搜索引擎。
PCT/CN2016/105206 2015-11-12 2016-11-09 网站访问路径的聚合方法和装置 WO2017080454A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510771917.6 2015-11-12
CN201510771917.6A CN106708841B (zh) 2015-11-12 2015-11-12 网站访问路径的聚合方法和装置

Publications (1)

Publication Number Publication Date
WO2017080454A1 true WO2017080454A1 (zh) 2017-05-18

Family

ID=58694487

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/105206 WO2017080454A1 (zh) 2015-11-12 2016-11-09 网站访问路径的聚合方法和装置

Country Status (2)

Country Link
CN (1) CN106708841B (zh)
WO (1) WO2017080454A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177143A (zh) * 2021-03-31 2021-07-27 东软集团股份有限公司 时序数据访问方法、装置、存储介质及电子设备
CN113327146A (zh) * 2020-02-28 2021-08-31 北京沃东天骏信息技术有限公司 一种信息追踪方法和装置

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107943679B (zh) * 2017-11-24 2021-02-26 阿里巴巴(中国)有限公司 路径漏斗的生成方法、装置和服务器
CN108108495A (zh) * 2018-01-19 2018-06-01 厦门欣旅通科技有限公司 一种识别用户访问轨迹的方法及装置
CN110969472B (zh) * 2018-09-30 2023-07-04 北京国双科技有限公司 访问行为的处理方法和装置
CN111310061B (zh) * 2018-11-27 2023-12-15 百度在线网络技术(北京)有限公司 全链路多渠道归因方法、装置、服务器及存储介质
CN111368146A (zh) * 2018-12-26 2020-07-03 北京国双科技有限公司 一种路径信息的查询方法及装置、存储介质和处理器
CN114970762A (zh) * 2022-06-22 2022-08-30 阿维塔科技(重庆)有限公司 一种数据处理方法、装置、设备和计算机存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030231216A1 (en) * 2002-06-13 2003-12-18 International Business Machines Corp. Internet navigation tree with bookmarking and emailing capability
CN103605848A (zh) * 2013-11-19 2014-02-26 北京国双科技有限公司 路径分析方法和装置
CN103823883A (zh) * 2014-03-06 2014-05-28 焦点科技股份有限公司 一种网站用户访问路径的分析方法及系统
US20140244571A1 (en) * 2007-02-10 2014-08-28 Christopher Reid Error Bridge event analytics tools and techniques
CN104504136A (zh) * 2014-12-31 2015-04-08 北京国双科技有限公司 网站的访问路径的分析方法和装置
CN104731807A (zh) * 2013-12-20 2015-06-24 北京风行在线技术有限公司 一种统计和分析页面跳转数据的方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030231216A1 (en) * 2002-06-13 2003-12-18 International Business Machines Corp. Internet navigation tree with bookmarking and emailing capability
US20140244571A1 (en) * 2007-02-10 2014-08-28 Christopher Reid Error Bridge event analytics tools and techniques
CN103605848A (zh) * 2013-11-19 2014-02-26 北京国双科技有限公司 路径分析方法和装置
CN104731807A (zh) * 2013-12-20 2015-06-24 北京风行在线技术有限公司 一种统计和分析页面跳转数据的方法及装置
CN103823883A (zh) * 2014-03-06 2014-05-28 焦点科技股份有限公司 一种网站用户访问路径的分析方法及系统
CN104504136A (zh) * 2014-12-31 2015-04-08 北京国双科技有限公司 网站的访问路径的分析方法和装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113327146A (zh) * 2020-02-28 2021-08-31 北京沃东天骏信息技术有限公司 一种信息追踪方法和装置
CN113177143A (zh) * 2021-03-31 2021-07-27 东软集团股份有限公司 时序数据访问方法、装置、存储介质及电子设备
CN113177143B (zh) * 2021-03-31 2023-10-27 东软集团股份有限公司 时序数据访问方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
CN106708841A (zh) 2017-05-24
CN106708841B (zh) 2018-09-18

Similar Documents

Publication Publication Date Title
WO2017080454A1 (zh) 网站访问路径的聚合方法和装置
US11710054B2 (en) Information recommendation method, apparatus, and server based on user data in an online forum
WO2016107523A1 (zh) 网站的访问路径的分析方法和装置
US20170322981A1 (en) Method and device for social platform-based data mining
CN107862022B (zh) 文化资源推荐系统
CN104899220B (zh) 应用程序推荐方法和系统
WO2015085948A1 (en) Method, device, and server for friend recommendation
JP7029003B2 (ja) パスワード保護質問設定方法及び装置
CN110362727A (zh) 用于搜索系统的第三方搜索应用
CN108304410A (zh) 一种异常访问页面的检测方法、装置及数据分析方法
WO2016078533A1 (zh) 搜索方法、装置、设备及非易失性计算机存储介质
CN110046293B (zh) 一种用户身份关联方法及装置
WO2017101652A1 (zh) 网站页面间访问路径的确定方法及装置
JP2018528517A (ja) 詐欺的ソフトウェアプロモーションを検出するための方法、装置、及びシステム
CN108366012B (zh) 一种社交关系建立方法、装置及电子设备
US20170004217A1 (en) Method and apparatus for deriving and using trustful application metadata
US20120166412A1 (en) Super-clustering for efficient information extraction
CN110222790B (zh) 用户身份识别方法、装置及服务器
CN106933916B (zh) Json字符串的处理方法及装置
KR102427782B1 (ko) 인접 행렬 기반의 악성 코드 탐지 및 분류 장치와 악성 코드 탐지 및 분류 방법
US9336316B2 (en) Image URL-based junk detection
CN106933903B (zh) 应用于分布式存储的存储方法及装置
CN109981712B (zh) 用于推送信息的方法和装置
WO2017075974A1 (zh) 输入序列的处理方法、装置、设备及非易失性计算机存储介质
CN110442616B (zh) 一种针对大数据量的页面访问路径分析方法与系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16863630

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16863630

Country of ref document: EP

Kind code of ref document: A1