WO2013044559A1 - Method and system for recommending website and network server - Google Patents

Method and system for recommending website and network server Download PDF

Info

Publication number
WO2013044559A1
WO2013044559A1 PCT/CN2011/083678 CN2011083678W WO2013044559A1 WO 2013044559 A1 WO2013044559 A1 WO 2013044559A1 CN 2011083678 W CN2011083678 W CN 2011083678W WO 2013044559 A1 WO2013044559 A1 WO 2013044559A1
Authority
WO
WIPO (PCT)
Prior art keywords
website
user
feature information
address
cluster
Prior art date
Application number
PCT/CN2011/083678
Other languages
French (fr)
Chinese (zh)
Inventor
吴军
王欣
金键
Original Assignee
中国科学院计算机网络信息中心
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院计算机网络信息中心 filed Critical 中国科学院计算机网络信息中心
Publication of WO2013044559A1 publication Critical patent/WO2013044559A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user

Definitions

  • the Internet has changed people's lifestyles. For example, people can use the Internet to get books, movies, music, and even products that they are interested in. Therefore, the Internet has brought us efficient and convenient life. People have become accustomed to using computers, mobile phones and other Internet-enabled devices to learn, entertain, and shop by browsing the web pages they are interested in to meet their multi-faceted needs.
  • the web server will recommend the same type of related website to the user for reference according to the type of website visited by the user, for example, the user accesses information technology. For a type of website, the web server will recommend other websites in the information technology type for users to refer to; the web server stores the type of website that the user frequently visits and obtains related website recommendations to the user, so that the user can obtain more interested parties. News.
  • the network server in the prior art only obtains the relevant website recommendation to the user for reference according to the type of the website accessed by the user, so that the information obtained by the user is limited, and has certain limitations. Summary of the invention
  • embodiments of the present invention provide a website recommendation method and system, and a network server.
  • An embodiment of the present invention provides a website recommendation method, including:
  • the network server obtains the feature information corresponding to the website accessed by the user according to the locally stored Internet access information in a preset plurality of time periods;
  • the network server performs cluster analysis on the website according to the feature information to obtain more a website cluster, in order to determine whether the website includes a first website corresponding to the website address when receiving a network access request including a web address sent by the user terminal, and if so, according to the website group of the website group where the first website is located
  • the corresponding feature information determines a website recommended to the user, and embeds the website address of the recommended website into the network access response and returns it to the user terminal.
  • An embodiment of the present invention provides a network server, including:
  • the first obtaining module is configured to obtain feature information corresponding to the website accessed by the user according to the locally stored Internet access information in a preset plurality of time periods;
  • a second acquiring module configured to perform cluster analysis on the website according to the feature information to obtain a plurality of website clusters
  • a determining module configured to: when receiving a network access request that includes a web address sent by the user terminal, determining whether the website includes a first website corresponding to the web address;
  • a processing module configured to: if it is determined that the website includes a first website corresponding to the website address, determine a website recommended to the user according to the feature information corresponding to the website in the website cluster where the first website is located, and The URL of the recommended website is embedded in the network access response and returned to the user terminal.
  • the embodiment of the invention provides a website recommendation system, which comprises the above network server and user terminal.
  • the website recommendation method and system and the network server provided by the embodiment of the present invention obtain the feature information corresponding to the website visited by the user in the preset multiple time periods according to the locally stored Internet access information, according to the feature information.
  • the website performs cluster analysis to obtain a plurality of website clusters.
  • the website analyzed by the cluster includes the first website corresponding to the website address. If yes, the website recommended by the user is determined according to the feature information corresponding to the website in the website cluster where the first website is located, and the website address of the recommended website is embedded in the network access response and returned to the user terminal, thereby realizing more network services.
  • the website so that users can get more information of interest.
  • FIG. 2 is a flowchart of another embodiment of a website recommendation method according to the present invention
  • 3 is a schematic structural diagram of an embodiment of a network server according to the present invention
  • FIG. 4 is a schematic structural diagram of an embodiment of a website recommendation system according to the present invention. detailed description
  • FIG. 1 is a flowchart of an embodiment of a website recommendation method according to the present invention. As shown in FIG. 1, the method includes:
  • Step 100 The network server obtains feature information corresponding to the website accessed by the user according to the locally stored online information in a preset plurality of time periods;
  • the user can send a network access request to the network server for network access through a user terminal having a network function such as a mobile phone or a computer, and the network server can store the online information of the user who accesses the network for a period of time according to a preset refresh time.
  • the refresh time of the web server in this embodiment is set according to specific application requirements, for example, three days or one week.
  • the online information of the user stored by the network server specifically includes: an IP address of the user terminal, a website visited each time, and a corresponding start time and end time.
  • the network server obtains the feature information corresponding to the website accessed by the user according to the locally stored Internet access information in a preset plurality of time periods.
  • the feature information in this embodiment reflects the website visited by the user.
  • the behavior information of the corresponding user accessing the website in the preset different time period, the characteristic information may specifically include at least one of a frequency feature, a variance feature, and an entropy feature that the website is accessed by the user in each preset time period.
  • the frequency characteristics reflect the frequency of the website being accessed by the user in each preset time period;
  • the variance characteristic reflects the variance of the number of times the website is accessed by the user in each preset time period, and is used to measure the website The degree of change in the number of times the user is accessed during each preset time period;
  • the entropy feature reflects the entropy of the IP address of the user accessed by the website during each preset time period, which is used to measure the stability of the user of the website, for example It is said that during the period from 8:00 am to 10:00, the website A has been visited 5 times, IP1 has visited 1 time, and IP2 has visited 3 times. IP3 visited once, the user's IP address
  • the entropy is: -((l/5)log(l/5)+(3/5)log(3/5)+(l/5)log(l/5)).
  • the multiple time periods preset in this embodiment may be preset in the network server according to specific application situations. For example, if multiple preset time periods are 8:00 ⁇ 10:00, 10:00 every day. ⁇ 12:00 , 18:00 - 21 :00 ⁇ 21 : 00 ⁇ 24:00 , that is , the web server collects statistics according to the stored Internet information in each set time period and is accessed by the user in each time period . Characteristic information corresponding to each website. In a specific implementation process, the network server performs analog-to-digital conversion on at least one of the acquired feature information, such as: a frequency feature, a variance feature, and an entropy feature, or weights a digital quantity of several of the features to obtain a corresponding Feature information.
  • the acquired feature information such as: a frequency feature, a variance feature, and an entropy feature, or weights a digital quantity of several of the features to obtain a corresponding Feature information.
  • the feature information in this embodiment is not limited to the above-mentioned several features, and may be adjusted according to the obtained specific Internet information to obtain other feature information.
  • the specific processing process is as above, and is no longer Narration.
  • Table 1 shows the feature information corresponding to the website visited by the user in a preset plurality of time periods, and the feature information is at each preset. The value obtained by performing analog-to-digital conversion weighting on the frequency, variance, and entropy features of each website visited by the user during the time period.
  • Step 101 The network server performs cluster analysis on the website according to the feature information to obtain a plurality of website clusters.
  • the network server performs cluster analysis on all websites to obtain a plurality of website clusters according to the feature information corresponding to the website accessed by the user acquired in a plurality of preset time periods. Cluster analysis
  • Cluster Analysis Also known as group analysis, it is a process of classifying data into different classes or clusters, so objects in the same cluster have great similarities, and objects between different clusters have Great difference.
  • the calculation methods of cluster analysis mainly include partitioning methods, hierarchical methods, density-based methods, grid-based methods and model-based methods. Model-based methods).
  • the specific implementation process of each clustering method belongs to the prior art, in order to more clearly illustrate the process of clustering analysis, based on the K-means in the splitting method and the probabilistic latent semantic model in the model-based method (Probabilistic Latent Semantic Analysis, PLSA) is given as an example for specific explanation, and the rest of the clustering methods are no longer described.
  • PLSA Probabilistic Latent Semantic Analysis
  • Step (3) Recalculate the centroid for each class.
  • the calculation method is to average the weights of each website. After calculating the new centroid of each class, calculate the distance to each centroid for all websites, and so on. Until the center of mass no longer changes.
  • Step (4) For each class, calculate the mean square error within the class, that is, the distance from all the sites in the class to the centroid, and compare their mean square error, the trend should be gradually reduced, when the mean square error value drops significantly to no Then the significantly degraded K value can be used as the final K, which is the number of website clusters.
  • Step E uses the old parameters to calculate the posterior probability of the latent factor variable.
  • the formula is as follows:
  • the M step obtains a new parameter by maximizing the expected function of the likelihood function, and the formula is as follows:
  • Q £ ⁇ i ( lc.g p ( j ' )
  • Step (3 ) The formula for updating each parameter during the maximization process is as follows:
  • Step (4) Repeat the above E step and always monotonically increase during this process. When the maximum value is reached, the parameter value is determined and the update process is stopped.
  • Step 102 Determine whether the website includes a first website corresponding to the website address, and if yes, determine a website recommended to the user according to the feature information corresponding to the website in the website cluster where the first website is located, and The URL of the website is embedded into the network access response and returned to the user terminal.
  • the web server When receiving the network access request including the web address sent by the user terminal, the web server queries the clustered website according to the web address to determine whether the first website corresponding to the web address is included. If it is determined that the first website is included in the website that is clustered, the first website is also subjected to cluster analysis, and the website cluster obtained in step 101 is queried according to the website and the website cluster where the first website is located is determined.
  • the performance URL of the website visited by the user wherein the website address includes a domain name or an IP address, and the domain name and the IP address can be converted by the domain name server to determine the website visited by the user. Based on the above, it can be known that the websites in the website cluster have similarities based on the user's access behavior to the website.
  • the first website may be removed from the website cluster where the first website is located.
  • a certain number of websites are randomly selected and recommended to the user, because the user access behavior corresponding to the first website accessed by the user is similar to the user access behavior corresponding to the remaining websites in the website cluster, and the embodiment may be based on the website pair.
  • the user access behavior should recommend to the user a website that the user may be interested in.
  • the recommendation rule is specifically set according to a specific application scenario, and the specific recommendation rule is not limited in this embodiment.
  • the web server will embed the URL of the website recommended by the user into the network access response and return it to the user terminal.
  • the website address includes a domain name and/or an IP address. If the website address of the website recommended by the network server is determined to be an IP address according to the online information, the network server can directly embed the IP address into the network access response and return it to the user.
  • the terminal may also send a domain name anti-query request including an IP address to the domain name server.
  • the domain name server returns the domain name corresponding to the IP address to the network server through the PTR type domain name resolution, and the network server embeds the IP address of the website and the corresponding domain name into the terminal.
  • the network access response is returned to the user terminal for reference by the user, and the domain name is returned to the user terminal, which is convenient for the user to memorize and write, thereby making the user more convenient to retrieve and access the recommended website.
  • the network server may directly embed the domain name into the network access response and return it to the user terminal, or may send a domain name query request including the domain name to the domain name server, and the domain name server passes the A.
  • the type domain name resolution returns an IP address corresponding to the domain name to the network server, and the network server embeds the IP address of the website and the corresponding domain name into the network access response, returns the user terminal for reference, and returns an IP address to the user terminal, thereby To enable users to search and access the recommended website more directly, there is no need to initiate a domain name query request to the domain name server.
  • the website recommendation method provided by the embodiment obtains the feature information corresponding to the website visited by the user in the preset multiple time periods according to the locally stored Internet information, and performs cluster analysis on the website according to the feature information.
  • the plurality of website clusters determine whether the website analyzed by the cluster includes the first website corresponding to the website address. If yes, the website recommended by the user is determined according to the feature information corresponding to the website in the website cluster where the first website is located, and the website address of the recommended website is embedded in the network access response and returned to the user terminal, so that the web server can be based on the website.
  • the corresponding user network access behavior recommends more websites to users who make network access, so that users can obtain more information of interest.
  • FIG. 2 is a flowchart of another embodiment of a website recommendation method according to the present invention.
  • the party The law includes:
  • Step 200 The network server obtains feature information corresponding to the website accessed by the user according to the locally stored online information in a preset plurality of time periods;
  • Step 201 The network server performs cluster analysis on the website according to the feature information to obtain a plurality of website clusters.
  • Step 202 The network server, when receiving a network access request that includes a web address sent by the user terminal, determines whether the website includes a first website corresponding to the website address, and if not, broadcasts the website address to the remaining network servers. And the online information query request of the plurality of time periods, if the online information of the first website returned by the remaining network servers in the multiple time periods is received, acquiring the first information according to the online information The characteristic letter corresponding to a website
  • the network server When receiving the network access request including the web address sent by the user terminal, the network server queries the clustered website according to the web address to determine whether the first website corresponding to the web address is included. If it is determined that the first website is not included in the website that is clustered, the first website is not accessed by the user through the web server in each preset time period, that is, the user is in each preset time period. The website accessed through the web server does not include the first website.
  • the network server broadcasts the website address including the first website and the online information query request of the preset time period to the remaining network servers in the Internet system, and the remaining network servers query the request according to the received online information, and each network server is based on the first website.
  • the web address is queried from the online information stored in the preset time period of the local storage to include the online information of the first website, and if the web server can receive the first website returned by the remaining web servers in each preset time period
  • the information of the Internet access is obtained according to the information about the Internet access of the first website. For the process of obtaining the specific feature information, refer to step 100 in the first embodiment, and details are not described herein.
  • Step 203 The network server acquires corresponding aggregation profile information according to the feature information corresponding to the website in each website cluster, and determines the first part by using a similarity measure according to the feature information corresponding to the first website and the aggregated contour information. a cluster of websites to which a website belongs;
  • the network server obtains the corresponding aggregated contour information according to the feature information of the website in each website cluster obtained in the above step 201, and the aggregated contour information, that is, the average weight of the feature information corresponding to the website in each website cluster;
  • the network server performs the similarity measurement according to the feature information of the first website and the acquired aggregated contour information.
  • the method of the similarity measure is, for example, a Pearson correlation coefficient or a cosine coefficient, and the like, which is not specifically limited in this embodiment.
  • the site cluster that selects the maximum matching score is determined to be the cluster of sites to which the first site belongs.
  • step 204 the website recommended by the user is determined according to the feature information corresponding to the website in the website cluster where the first website is located, and the website address of the recommended website is embedded in the network access response and returned to the user terminal.
  • the network server embeds the URL of the recommended website into the network access response and returns it to the user terminal for the user to refer to.
  • step 201 and step 202 in this embodiment For the specific implementation process of step 201 and step 202 in this embodiment, refer to the embodiment shown in FIG. 1, and details are not described herein again.
  • the website recommendation method provided by the embodiment obtains the feature information corresponding to the website visited by the user in the preset multiple time periods according to the locally stored Internet information, and performs cluster analysis on the website according to the feature information.
  • the website cluster When receiving a network access request including a web address sent by the user terminal, if it is determined that the website that is clustered does not include the first website corresponding to the web address, the website cluster performs a broadcast query to the remaining web servers, if received
  • the network information of the first website returned by the remaining web servers determines the website cluster where the first website is located, and determines the website recommended to the user according to the characteristic information of the website in the website cluster where the first website is located, and the website URL of the recommended website Embedded in the network access response is returned to the user terminal, enabling the user to recommend more websites, so that the user can obtain more information of interest.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed to perform the steps including the foregoing method embodiments; and the foregoing storage medium includes: a ROM, A variety of media that can store program code, such as RAM, disk, or optical disk.
  • the network server includes: a first obtaining module 11, a second obtaining module 12, a determining module 13, and a processing module 14, wherein the first obtaining The module 1 is configured to acquire the feature information corresponding to the website accessed by the user according to the locally stored Internet access information in a preset plurality of time periods.
  • the second obtaining module 12 is configured to perform cluster analysis on the website according to the feature information to obtain multiple
  • the determining module 13 is configured to determine whether the website includes the first website corresponding to the website address when receiving the network access request that is sent by the user terminal, and the processing module 14 is configured to: when determining that the website includes the first website corresponding to the website address
  • the website determines the website recommended to the user according to the feature information corresponding to the website in the website cluster where the first website is located, and embeds the website address of the recommended website into the network access response and returns it to the user terminal.
  • the second obtaining module 12 may perform cluster analysis on the website according to the feature information by a splitting method, a hierarchical method, a density-based method, a grid-based method, and a model-based method.
  • the processing module 14 is further configured to: if it is determined that the website does not include the first website corresponding to the website address, broadcast the online information query request including the website address and the plurality of time periods to the remaining network servers. And if the first website returned by the remaining web servers receives the online information in the multiple time periods, the feature information corresponding to the first website is obtained according to the online information; and the corresponding aggregation is obtained according to the feature information corresponding to the website in each website cluster. And contour information, and determining, by the similarity measure, the website cluster to which the first website belongs according to the feature information and the aggregated contour information corresponding to the first website.
  • FIG. 4 is a schematic structural diagram of an embodiment of a website recommendation system according to the present invention.
  • the system includes: a network server 1 and a user terminal 2, wherein the network server 1 can be The network server provided by the embodiment of the present invention, the user terminal 2 is the user terminal involved in the embodiment of the present invention, and the functions and processing procedures of the devices in the website recommendation system provided by this embodiment may be referred to the foregoing method and apparatus embodiment.
  • the implementation principle and technical effect are similar, and will not be described here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention provides a method and a system for recommending a website and a network server. The method comprises: a network server acquiring characteristic information corresponding to websites accessed by a user separately in a plurality of preset time intervals according to locally-stored Internet access information; performing cluster analysis on the websites according to the characteristic information to acquire a plurality of website clusters; and if it is determined that the websites comprise a first website corresponding to a website address when a network access request comprising a website address and sent by a user terminal is received, determining a website to be recommended to the user according to the characteristic information corresponding to the websites in a website cluster where the first website is positioned, embedding the website address of the recommended website into a network access response, and returning the network access response to the user terminal. The network server can be used for recommending more websites to the network access user based on user network access behaviors corresponding to the websites, thereby enabling the user to acquire more information of interests.

Description

网站推荐方法和系统以及网络服务器  Website recommendation method and system and web server
技术领域 本发明涉及通信技术, 尤其涉及一种网站推荐方法和系统以及网络服 务器。 背景技术 TECHNICAL FIELD The present invention relates to communication technologies, and in particular, to a website recommendation method and system, and a network server. Background technique
随着电子信息技术的发展, 网络已经改变了人们的生活方式, 举例来 说, 人们可以利用网络获取自己感兴趣的书籍、 电影、 音乐、 甚至商品, 因此, 网络带给了我们高效便捷的生活, 人们已经习惯利用计算机、 手机 等具有上网功能的设备, 通过浏览自己感兴趣的网页进行学习、 娱乐、 购 物来满足自身多方位的需求。  With the development of electronic information technology, the Internet has changed people's lifestyles. For example, people can use the Internet to get books, movies, music, and even products that they are interested in. Therefore, the Internet has brought us efficient and convenient life. People have become accustomed to using computers, mobile phones and other Internet-enabled devices to learn, entertain, and shop by browsing the web pages they are interested in to meet their multi-faceted needs.
人们利用网络可以更加高效的获取丰富的信息进行学习和娱乐, 具体 地, 网络服务器会根据用户访问的网站的类型向其推荐同一种类型的相关 网站供用户参考, 比如用户访问的是属于信息技术类型的网站, 网络服务 器会向用户推荐信息技术类型中的其他网站供用户参考; 网络服务器会存 储用户经常访问的网站类型并获取相关的网站推荐给用户, 从而使用户可 以获取更多感兴趣的资讯。  People can use the network to obtain rich information for learning and entertainment more efficiently. Specifically, the web server will recommend the same type of related website to the user for reference according to the type of website visited by the user, for example, the user accesses information technology. For a type of website, the web server will recommend other websites in the information technology type for users to refer to; the web server stores the type of website that the user frequently visits and obtains related website recommendations to the user, so that the user can obtain more interested parties. News.
但是, 现有技术中的网络服务器只是根据用户访问的网站的类型获取 相关的网站推荐给用户供用户参考, 使用户获得的信息有限, 具有一定的 局限性。 发明内容  However, the network server in the prior art only obtains the relevant website recommendation to the user for reference according to the type of the website accessed by the user, so that the information obtained by the user is limited, and has certain limitations. Summary of the invention
针对现有技术的上述缺陷, 本发明实施例提供一种网站推荐方法和系 统以及网络服务器。  In view of the above-mentioned deficiencies of the prior art, embodiments of the present invention provide a website recommendation method and system, and a network server.
本发明实施例提供一种网站推荐方法, 包括:  An embodiment of the present invention provides a website recommendation method, including:
网络服务器在预设的多个时间段内根据本地存储的上网信息分别获 取用户访问的网站对应的特征信息;  The network server obtains the feature information corresponding to the website accessed by the user according to the locally stored Internet access information in a preset plurality of time periods;
所述网络服务器根据所述特征信息对所述网站进行聚类分析获取多 个网站簇, 以便在接收用户终端发送的包括网址的网络访问请求时, 判断 所述网站是否包括与所述网址对应的第一网站, 若是, 则根据所述第一网 站所在的网站簇中网站对应的特征信息确定向用户推荐的网站, 并将所述 推荐的网站的网址嵌入到网络访问响应中返回给所述用户终端。 The network server performs cluster analysis on the website according to the feature information to obtain more a website cluster, in order to determine whether the website includes a first website corresponding to the website address when receiving a network access request including a web address sent by the user terminal, and if so, according to the website group of the website group where the first website is located The corresponding feature information determines a website recommended to the user, and embeds the website address of the recommended website into the network access response and returns it to the user terminal.
本发明实施例提供一种网络服务器, 包括:  An embodiment of the present invention provides a network server, including:
第一获取模块, 用于在预设的多个时间段内根据本地存储的上网信息 分别获取用户访问的网站对应的特征信息;  The first obtaining module is configured to obtain feature information corresponding to the website accessed by the user according to the locally stored Internet access information in a preset plurality of time periods;
第二获取模块, 用于根据所述特征信息对所述网站进行聚类分析获取 多个网站簇;  a second acquiring module, configured to perform cluster analysis on the website according to the feature information to obtain a plurality of website clusters;
判断模块, 用于在接收用户终端发送的包括网址的网络访问请求时, 判断所述网站是否包括与所述网址对应的第一网站;  a determining module, configured to: when receiving a network access request that includes a web address sent by the user terminal, determining whether the website includes a first website corresponding to the web address;
处理模块, 用于若判断获知所述网站包括与所述网址对应的第一网 站, 则根据所述第一网站所在的网站簇中网站对应的特征信息确定向用户 推荐的网站, 并将所述推荐的网站的网址嵌入到网络访问响应中返回给所 述用户终端。  a processing module, configured to: if it is determined that the website includes a first website corresponding to the website address, determine a website recommended to the user according to the feature information corresponding to the website in the website cluster where the first website is located, and The URL of the recommended website is embedded in the network access response and returned to the user terminal.
本发明实施例提供一种网站推荐系统, 包括上述的网络服务器以及用 户终端。  The embodiment of the invention provides a website recommendation system, which comprises the above network server and user terminal.
本发明实施例提供的网站推荐方法和系统以及网络服务器, 通过网络 服务器根据本地存储的上网信息在预设的多个时间段内分别获取被用户 访问过的网站对应的特征信息, 根据特征信息对网站进行聚类分析获取多 个网站簇, 当接收到用户终端发送的包括网址的网络访问请求时, 判断经 过聚类分析的网站是否包括与该网址对应的第一网站。 若包括, 则根据第 一网站所在的网站簇中网站对应的特征信息确定向用户推荐的网站, 并将 推荐的网站的网址嵌入到网络访问响应中返回给用户终端, 实现了网络服 更多的网站, 从而使用户获取更多感兴趣的资讯。 附图说明 图 1为本发明网站推荐方法一个实施例的流程图;  The website recommendation method and system and the network server provided by the embodiment of the present invention obtain the feature information corresponding to the website visited by the user in the preset multiple time periods according to the locally stored Internet access information, according to the feature information. The website performs cluster analysis to obtain a plurality of website clusters. When receiving the network access request including the website sent by the user terminal, it is determined whether the website analyzed by the cluster includes the first website corresponding to the website address. If yes, the website recommended by the user is determined according to the feature information corresponding to the website in the website cluster where the first website is located, and the website address of the recommended website is embedded in the network access response and returned to the user terminal, thereby realizing more network services. The website, so that users can get more information of interest. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a flowchart of an embodiment of a website recommendation method according to the present invention;
图 2为本发明网站推荐方法另一实施例的流程图; 图 3为本发明网络服务器一个实施例的结构示意图; 2 is a flowchart of another embodiment of a website recommendation method according to the present invention; 3 is a schematic structural diagram of an embodiment of a network server according to the present invention;
图 4为本发明网站推荐系统一个实施例的结构示意图。 具体实施方式  FIG. 4 is a schematic structural diagram of an embodiment of a website recommendation system according to the present invention. detailed description
为使本发明实施例的目的、 技术方案和优点更加清楚, 下面将结合本 发明实施例中的附图, 对本发明实施例中的技术方案进行清楚、 完整地描 述, 显然, 所描述的实施例是本发明一部分实施例, 而不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有做出创造性劳动前提 下所获得的所有其他实施例, 都属于本发明保护的范围。  The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
图 1为本发明网站推荐方法一个实施例的流程图, 如图 1所示, 该方 法包括:  FIG. 1 is a flowchart of an embodiment of a website recommendation method according to the present invention. As shown in FIG. 1, the method includes:
步骤 100, 网络服务器在预设的多个时间段内根据本地存储的上网信 息分别获取用户访问的网站对应的特征信息;  Step 100: The network server obtains feature information corresponding to the website accessed by the user according to the locally stored online information in a preset plurality of time periods;
用户可以通过手机、 计算机等具有上网功能的用户终端向网络服务器 发送网络访问请求进行网络访问, 网络服务器能够按照预设的刷新时间存 储一段时间内进行网络访问的用户的上网信息。 可以理解的是, 本实施例 中网络服务器的刷新时间是根据具体的应用需要进行设置的比如三天或 者一个星期。 网络服务器存储的用户的上网信息具体包括: 用户终端的 IP 地址、 每次访问的网站和对应的开始时间和结束时间。  The user can send a network access request to the network server for network access through a user terminal having a network function such as a mobile phone or a computer, and the network server can store the online information of the user who accesses the network for a period of time according to a preset refresh time. It can be understood that the refresh time of the web server in this embodiment is set according to specific application requirements, for example, three days or one week. The online information of the user stored by the network server specifically includes: an IP address of the user terminal, a website visited each time, and a corresponding start time and end time.
网络服务器在预设的多个时间段内根据本地存储的上网信息分别获 取被用户访问过的网站对应的特征信息, 需要说明的是, 本实施例中的特 征信息反映了被用户访问过的网站在预设的不同的时间段内所对应的用 户访问网站的行为特征, 特征信息具体可以包括在每个预设时间段内网站 被用户访问的频率特征、 方差特征和熵特征中的至少一种特征, 其中, 频 率特征反映了网站在各预设的时间段内被用户访问的频繁度; 方差特征反 映了网站在各预设的时间段内被用户访问的次数的方差, 用来衡量网站在 各预设时间段内被用户访问的次数变化的剧烈程度; 熵特征反映了网站在 各预设的时间段内被访问的用户的 IP 地址的熵, 用来衡量网站的用户稳 定性, 举例来说, 在上午八点至十点的时间段内, 网站 A—共被访问了 5 次, IP1访问了 1次, IP2访问了 3次, IP3访问了 1次, 则用户的 IP地址 的熵为: -((l/5)log(l/5)+ (3/5)log(3/5)+ (l/5)log(l/5))。 The network server obtains the feature information corresponding to the website accessed by the user according to the locally stored Internet access information in a preset plurality of time periods. It should be noted that the feature information in this embodiment reflects the website visited by the user. The behavior information of the corresponding user accessing the website in the preset different time period, the characteristic information may specifically include at least one of a frequency feature, a variance feature, and an entropy feature that the website is accessed by the user in each preset time period. Characteristics, wherein the frequency characteristics reflect the frequency of the website being accessed by the user in each preset time period; the variance characteristic reflects the variance of the number of times the website is accessed by the user in each preset time period, and is used to measure the website The degree of change in the number of times the user is accessed during each preset time period; the entropy feature reflects the entropy of the IP address of the user accessed by the website during each preset time period, which is used to measure the stability of the user of the website, for example It is said that during the period from 8:00 am to 10:00, the website A has been visited 5 times, IP1 has visited 1 time, and IP2 has visited 3 times. IP3 visited once, the user's IP address The entropy is: -((l/5)log(l/5)+(3/5)log(3/5)+(l/5)log(l/5)).
本实施例中预设的多个时间段可以根据具体的应用情况预先在网络 服务器进行设置, 举例来说, 若预先设置的多个时间段为每天的 8:00 ~ 10:00、 10:00 ~ 12:00 , 18:00 - 21 :00 ^ 21 :00 ~ 24:00, 即网络服务器在设 置的各时间段内根据存储的上网信息进行统计获取在每个时间段内被用 户访问过的每个网站对应的特征信息。 在具体的实现过程中, 网络服务器 会将获取的特征信息比如: 频率特征、 方差特征和熵特征中的至少一种特 征进行模数转换, 或对其中几种特征的数字量进行加权获取对应的特征信 息。 值得注意的是, 本实施例中的特征信息并不局限于上述列举的几种特 征, 还可以根据获取的具体上网信息进行调整从而获取其他的特征信息, 具体的处理过程如上, 此处不再赘述。 为了更清楚的说明特征信息含义, 举例说明如表 1所示, 表 1表示了在预设的多个时间段内被用户访问过的 网站对应的特征信息, 特征信息是在每个预设的时间段内针对被用户访问 过的每个网站的频率特征、 方差特征和熵特征进行模数转换加权后获取的 数值。  The multiple time periods preset in this embodiment may be preset in the network server according to specific application situations. For example, if multiple preset time periods are 8:00 ~ 10:00, 10:00 every day. ~ 12:00 , 18:00 - 21 :00 ^ 21 : 00 ~ 24:00 , that is , the web server collects statistics according to the stored Internet information in each set time period and is accessed by the user in each time period . Characteristic information corresponding to each website. In a specific implementation process, the network server performs analog-to-digital conversion on at least one of the acquired feature information, such as: a frequency feature, a variance feature, and an entropy feature, or weights a digital quantity of several of the features to obtain a corresponding Feature information. It should be noted that the feature information in this embodiment is not limited to the above-mentioned several features, and may be adjusted according to the obtained specific Internet information to obtain other feature information. The specific processing process is as above, and is no longer Narration. In order to explain the meaning of the feature information more clearly, as shown in Table 1, Table 1 shows the feature information corresponding to the website visited by the user in a preset plurality of time periods, and the feature information is at each preset. The value obtained by performing analog-to-digital conversion weighting on the frequency, variance, and entropy features of each website visited by the user during the time period.
表 1  Table 1
Figure imgf000005_0001
步骤 101 , 所述网络服务器根据所述特征信息对所述网站进行聚类分 析获取多个网站簇;
Figure imgf000005_0001
Step 101: The network server performs cluster analysis on the website according to the feature information to obtain a plurality of website clusters.
网络服务器根据在多个预设时间段内获取的被用户访问过的网站对 应的特征信息, 对所有网站进行聚类分析获取多个网站簇。 聚类分析 The network server performs cluster analysis on all websites to obtain a plurality of website clusters according to the feature information corresponding to the website accessed by the user acquired in a plurality of preset time periods. Cluster analysis
( Cluster Analysis ) 又称群分析, 是将数据分类到不同的类或者簇这样的 一个过程, 所以同一个簇中的对象有很大的相似性, 而不同簇间的对象有 很大的相异性。 聚类分析的计算方法主要包括分裂法(partitioning methods)、 层次法 (hierarchical methods)、 基于密度的方法 (density-based methods)、 基于网格的方法 (grid-based methods)和基于模型的方法 (model-based methods)。 每一种聚类方法的具体实施过程属于现有技术, 为了更清楚的说明聚类分析的过程,以基于分裂法中的 K-均值和以基于模 型的方法中的利用概率潜在语义模型 ( Probabilistic Latent Semantic Analysis, PLSA ) 为例进行具体说明, 其余的聚类方法不再——赘述。 (Cluster Analysis) Also known as group analysis, it is a process of classifying data into different classes or clusters, so objects in the same cluster have great similarities, and objects between different clusters have Great difference. The calculation methods of cluster analysis mainly include partitioning methods, hierarchical methods, density-based methods, grid-based methods and model-based methods. Model-based methods). The specific implementation process of each clustering method belongs to the prior art, in order to more clearly illustrate the process of clustering analysis, based on the K-means in the splitting method and the probabilistic latent semantic model in the model-based method (Probabilistic Latent Semantic Analysis, PLSA) is given as an example for specific explanation, and the rest of the clustering methods are no longer described.
1 : 介绍 K-均值的算法如下:  1 : The algorithm for introducing K-means is as follows:
步骤( 1 ) : 当网站簇 k=2为例作说明, 在网站 A至网站 Z中随机选 择 2个网站作为初始质心 (类别的中心) , 假设选择网站 A和网站 B; 步骤 (2 ) : 对于剩下的每一个网站根据特征信息计算其到每个质心 的距离, 比较每个网站到网站 A和网站 B的距离, 选择距离大的划分到 一个类中; 对剩下的网站依次类推, 最终可以将所有的网站划分到以网站 A和网站 B为质心的两个类中;  Step (1): When the website cluster k=2 is taken as an example, two websites are randomly selected from the website A to the website Z as the initial center of mass (the center of the category), assuming that the website A and the website B are selected; step (2): For each of the remaining websites, calculate the distance to each centroid based on the feature information, compare the distance between each website to website A and website B, and select the distance to divide into a class; the rest of the website, and so on, In the end, all the websites can be divided into two classes with the center of website A and website B as the center of mass;
步骤 (3 ) : 对每一个类重新计算质心, 计算方法为将各网站的权重 求平均, 计算出每个类的新质心后, 对于所有的网站, 计算其到每个质 心的距离, 如此反复, 直到质心不再发生变化。  Step (3): Recalculate the centroid for each class. The calculation method is to average the weights of each website. After calculating the new centroid of each class, calculate the distance to each centroid for all websites, and so on. Until the center of mass no longer changes.
步骤 (4 ) : 对于每一个类计算类内均方误差, 即类内所有网站到质 心的距离, 比较它们的均方误差, 趋势应该为逐渐减小, 当均方误差值由 显著下降到不那么显著下降的 K值就可以作为最终的 K,即网站簇的个数。  Step (4): For each class, calculate the mean square error within the class, that is, the distance from all the sites in the class to the centroid, and compare their mean square error, the trend should be gradually reduced, when the mean square error value drops significantly to no Then the significantly degraded K value can be used as the final K, which is the number of website clusters.
2: 介绍 PLSA模型的算法如下:  2: The algorithm for introducing the PLSA model is as follows:
步骤 (1 ) : 定义似然函数 L(D, T) = fjfjw; log P(di , tJ) ; 步骤(2 ): 利用 EM算法的迭代计算确定参数值, EM算法其具体步 骤如下所示: Step (1): Define a likelihood function L(D, T) = f j f j w; log P(d i , t J ) ; Step (2): Determine the parameter value by using an iterative calculation of the EM algorithm, and the EM algorithm The specific steps are as follows:
E步利用旧参数计算潜在因素变量的后验概率, 公式如下所示:
Figure imgf000006_0001
Step E uses the old parameters to calculate the posterior probability of the latent factor variable. The formula is as follows:
Figure imgf000006_0001
M步通过最大化似然函数的期望函数求得新参数, 公式如下所示: Q : £ί i ( lc。gp(j' ) 步骤 (3 ) : 最大化过程中更新各参数的公式如下所示: The M step obtains a new parameter by maximizing the expected function of the likelihood function, and the formula is as follows: Q : £ί i ( lc.g p ( j ' ) Step (3 ) : The formula for updating each parameter during the maximization process is as follows:
Figure imgf000007_0001
步骤 (4 ) : 重复上述 E步和 在此过程中始终会 单调递增, 当达到最大值时, 参数值确定, 更新过程 ∑停止。
Figure imgf000007_0001
Step (4): Repeat the above E step and always monotonically increase during this process. When the maximum value is reached, the parameter value is determined and the update process is stopped.
步骤 102, 判断 所述网站是否包括与所述网址对应的第一网站, 若是, 则根据所述第一网 站所在的网站簇中网站对应的特征信息确定向用户推荐的网站, 并将所述 推荐的网站的网址嵌入到网络访问响应中返回给所述用户终端。  Step 102: Determine whether the website includes a first website corresponding to the website address, and if yes, determine a website recommended to the user according to the feature information corresponding to the website in the website cluster where the first website is located, and The URL of the website is embedded into the network access response and returned to the user terminal.
网络服务器接收到用户终端发送的包括网址的网络访问请求时, 根据 网址查询经过聚类的网站判断是否包括与该网址对应的第一网站。 若判断 获知经过聚类的网站中包括该第一网站, 说明该第一网站也经过了聚类分 析 ,根据网址查询步骤 101中获取的网站簇并确定第一网站所在的网站簇, 由于上网信息中用户访问过的网站的表现形式网址, 其中, 网站的网址包 括域名或 IP地址, 可以通过域名服务器对域名和 IP地址进行转换确定用 户访问的网站。 基于上述可以获知基于用户对网站的访问行为来看, 该网 站簇中的网站具有相似性。 获取第一网站所在的网站簇中的网站对应的特 征信息, 并根据特征信息按照设置的推荐规则确定向用户推荐的网站, 举 例来说, 可以从第一网站所在的网站簇中除第一网站之外随机选择一定数 量的网站推荐给用户, 因为用户访问的第一网站所对应的用户访问行为与 该网站簇中其余网站对应的用户访问行为相似, 本实施例可以基于网站对 应的用户访问行为向用户推荐用户可能感兴趣的网站。 需要说明的是, 推 荐规则根据具体的应用场景进行具体设置, 本实施例不对具体的推荐规则 作限制。 When receiving the network access request including the web address sent by the user terminal, the web server queries the clustered website according to the web address to determine whether the first website corresponding to the web address is included. If it is determined that the first website is included in the website that is clustered, the first website is also subjected to cluster analysis, and the website cluster obtained in step 101 is queried according to the website and the website cluster where the first website is located is determined. The performance URL of the website visited by the user, wherein the website address includes a domain name or an IP address, and the domain name and the IP address can be converted by the domain name server to determine the website visited by the user. Based on the above, it can be known that the websites in the website cluster have similarities based on the user's access behavior to the website. Obtaining feature information corresponding to the website in the website cluster where the first website is located, and determining the website recommended to the user according to the set recommendation rule according to the feature information. For example, the first website may be removed from the website cluster where the first website is located. A certain number of websites are randomly selected and recommended to the user, because the user access behavior corresponding to the first website accessed by the user is similar to the user access behavior corresponding to the remaining websites in the website cluster, and the embodiment may be based on the website pair. The user access behavior should recommend to the user a website that the user may be interested in. It should be noted that the recommendation rule is specifically set according to a specific application scenario, and the specific recommendation rule is not limited in this embodiment.
网络服务器将向用户推荐的网站的网址嵌入到网络访问响应中返回 给用户终端。 其中, 网站的网址包括域名和 /或 IP地址, 网络服务器上的 若根据上网信息判断获知向用户推荐的网站的网址是 IP 地址, 网络服务 器可以直接将 IP 地址嵌入到网络访问响应中返回给用户终端, 也可以向 域名服务器发送包括 IP地址的域名反查询请求, 域名服务器通过 PTR类 型的域名解析向网络服务器返回与 IP 地址对应的域名, 网络服务器将网 站的 IP 地址和对应的域名都嵌入到网络访问响应中返回给用户终端供用 户进行参考, 向用户终端返回域名, 方便用户记忆和书写, 从而使用户更 加方便的对推荐的网站进行检索和访问。 若根据上网信息判断获知向用户 推荐的网站的网址是域名, 网络服务器可以直接将域名嵌入到网络访问响 应中返回给用户终端, 也可以向域名服务器发送包括域名的域名查询请 求,域名服务器通过 A类型的域名解析向网络服务器返回与域名对应的 IP 地址, 网络服务器将网站的 IP 地址和对应的域名都嵌入到网络访问响应 中返回给用户终端供用户进行参考, 向用户终端返回 IP 地址, 从而使用 户更加直接的对推荐的网站进行检索和访问, 不需要向域名服务器发起域 名查询请求。  The web server will embed the URL of the website recommended by the user into the network access response and return it to the user terminal. The website address includes a domain name and/or an IP address. If the website address of the website recommended by the network server is determined to be an IP address according to the online information, the network server can directly embed the IP address into the network access response and return it to the user. The terminal may also send a domain name anti-query request including an IP address to the domain name server. The domain name server returns the domain name corresponding to the IP address to the network server through the PTR type domain name resolution, and the network server embeds the IP address of the website and the corresponding domain name into the terminal. The network access response is returned to the user terminal for reference by the user, and the domain name is returned to the user terminal, which is convenient for the user to memorize and write, thereby making the user more convenient to retrieve and access the recommended website. If it is determined according to the online information that the website URL recommended by the user is a domain name, the network server may directly embed the domain name into the network access response and return it to the user terminal, or may send a domain name query request including the domain name to the domain name server, and the domain name server passes the A. The type domain name resolution returns an IP address corresponding to the domain name to the network server, and the network server embeds the IP address of the website and the corresponding domain name into the network access response, returns the user terminal for reference, and returns an IP address to the user terminal, thereby To enable users to search and access the recommended website more directly, there is no need to initiate a domain name query request to the domain name server.
本实施例提供的网站推荐方法, 通过网络服务器根据本地存储的上网 信息在预设的多个时间段内分别获取被用户访问过的网站对应的特征信 息, 根据特征信息对网站进行聚类分析获取多个网站簇, 当接收到用户终 端发送的包括网址的网络访问请求时, 判断经过聚类分析的网站是否包括 与该网址对应的第一网站。 若包括, 则根据第一网站所在的网站簇中网站 对应的特征信息确定向用户推荐的网站, 并将推荐的网站的网址嵌入到网 络访问响应中返回给用户终端, 实现了网络服务器能够基于网站对应的用 户网络访问行为向进行网络访问的用户推荐更多的网站, 从而使用户获取 更多感兴趣的资讯。  The website recommendation method provided by the embodiment obtains the feature information corresponding to the website visited by the user in the preset multiple time periods according to the locally stored Internet information, and performs cluster analysis on the website according to the feature information. When receiving the network access request including the web address sent by the user terminal, the plurality of website clusters determine whether the website analyzed by the cluster includes the first website corresponding to the website address. If yes, the website recommended by the user is determined according to the feature information corresponding to the website in the website cluster where the first website is located, and the website address of the recommended website is embedded in the network access response and returned to the user terminal, so that the web server can be based on the website. The corresponding user network access behavior recommends more websites to users who make network access, so that users can obtain more information of interest.
图 2为本发明网站推荐方法另一实施例的流程图, 如图 2所示, 该方 法包括: 2 is a flowchart of another embodiment of a website recommendation method according to the present invention. As shown in FIG. 2, the party The law includes:
步骤 200, 网络服务器在预设的多个时间段内根据本地存储的上网信 息分别获取用户访问的网站对应的特征信息;  Step 200: The network server obtains feature information corresponding to the website accessed by the user according to the locally stored online information in a preset plurality of time periods;
步骤 201 , 所述网络服务器根据所述特征信息对所述网站进行聚类分 析获取多个网站簇;  Step 201: The network server performs cluster analysis on the website according to the feature information to obtain a plurality of website clusters.
步骤 202, 所述网络服务器在接收用户终端发送的包括网址的网络访 问请求时, 判断所述网站是否包括与所述网址对应的第一网站, 若不是, 则向其余网络服务器广播包括所述网址和所述多个时间段的上网信息查 询请求, 若接收到所述其余网络服务器返回的所述第一网站在所述多个时 间段内的上网信息, 则根据所述上网信息获取所述第一网站对应的特征信 自 ·  Step 202: The network server, when receiving a network access request that includes a web address sent by the user terminal, determines whether the website includes a first website corresponding to the website address, and if not, broadcasts the website address to the remaining network servers. And the online information query request of the plurality of time periods, if the online information of the first website returned by the remaining network servers in the multiple time periods is received, acquiring the first information according to the online information The characteristic letter corresponding to a website
网络服务器接收到用户终端发送的包括网址的网络访问请求时, 根据 网址查询经过聚类的网站判断是否包括与该网址对应的第一网站。 若判断 获知经过聚类的网站中不包括该第一网站, 说明该第一网站在各预设的时 间段内没有被用户通过该网络服务器访问过, 也就是在各预设的时间段内 用户通过该网络服务器访问的网站不包括该第一网站。  When receiving the network access request including the web address sent by the user terminal, the network server queries the clustered website according to the web address to determine whether the first website corresponding to the web address is included. If it is determined that the first website is not included in the website that is clustered, the first website is not accessed by the user through the web server in each preset time period, that is, the user is in each preset time period. The website accessed through the web server does not include the first website.
网络服务器向互联网系统中的其余网络服务器广播包括第一网站的 网址和各预设时间段的上网信息查询请求, 其余的网络服务器根据接收到 的上网信息查询请求, 各网络服务器均根据第一网站的网址从本地存储的 各预设时间段内的上网信息中查询是否包括该第一网站的上网信息, 若该 网络服务器能够接收到其余网络服务器返回的在各预设时间段内第一网 站的上网信息, 根据第一网站的上网信息获取该第一网站对应的特征信 息, 具体的特征信息获取过程参见上述实施例一中的步骤 100, 此处不再 赘述。  The network server broadcasts the website address including the first website and the online information query request of the preset time period to the remaining network servers in the Internet system, and the remaining network servers query the request according to the received online information, and each network server is based on the first website. The web address is queried from the online information stored in the preset time period of the local storage to include the online information of the first website, and if the web server can receive the first website returned by the remaining web servers in each preset time period The information of the Internet access is obtained according to the information about the Internet access of the first website. For the process of obtaining the specific feature information, refer to step 100 in the first embodiment, and details are not described herein.
步骤 203 , 所述网络服务器根据每个网站簇中网站对应的特征信息获 取对应的聚集轮廓信息, 并根据所述第一网站对应的特征信息和所述聚集 轮廓信息通过相似性度量确定所述第一网站所属的网站簇;  Step 203: The network server acquires corresponding aggregation profile information according to the feature information corresponding to the website in each website cluster, and determines the first part by using a similarity measure according to the feature information corresponding to the first website and the aggregated contour information. a cluster of websites to which a website belongs;
网络服务器根据上述步骤 201中获取的每个网站簇中网站的特征信息 获取对应的聚集轮廓信息, 聚集轮廓信息即每一个网站簇中的网站对应的 特征信息的平均权重; 网络服务器根据第一网站的特征信息和获取的聚集轮廓信息进行相 似性度量, 值得注意的是, 相似性度量的方法很多例如皮尔森相关系数或 者余弦系数等, 本实施例不作具体限制。 通过相似性度量获取第一网站对 应的特征信息与各个聚集轮廓信息的匹配分数以确定第一网站所属的网 站簇, 匹配分数越大, 说明第一网站与该网站簇中的网站的相似度越高, 选择最大匹配分数的网站簇确定为第一网站所属的网站簇。 The network server obtains the corresponding aggregated contour information according to the feature information of the website in each website cluster obtained in the above step 201, and the aggregated contour information, that is, the average weight of the feature information corresponding to the website in each website cluster; The network server performs the similarity measurement according to the feature information of the first website and the acquired aggregated contour information. It is noted that the method of the similarity measure is, for example, a Pearson correlation coefficient or a cosine coefficient, and the like, which is not specifically limited in this embodiment. Obtaining a matching score of the feature information corresponding to the first website and each of the aggregated contour information by using the similarity measure to determine a website cluster to which the first website belongs, and the matching score is larger, indicating that the similarity between the first website and the website in the website cluster is more High, the site cluster that selects the maximum matching score is determined to be the cluster of sites to which the first site belongs.
步骤 204, 根据所述第一网站所在的网站簇中网站对应的特征信息确 定向用户推荐的网站, 并将所述推荐的网站的网址嵌入到网络访问响应中 返回给所述用户终端。  In step 204, the website recommended by the user is determined according to the feature information corresponding to the website in the website cluster where the first website is located, and the website address of the recommended website is embedded in the network access response and returned to the user terminal.
获取第一网站所在的网站簇中的网站的特征信息, 并根据特征信息按 照设置的推荐规则确定向用户推荐的网站, 具体地, 可以对第一网站所在 的网站簇中的其余网站的特征信息进行加权平均获取其余网站的推荐分 数, 根据其余的每个网站的推荐分数按照预设的推荐准则确定向用户推荐 的网站, 比如根据每个网站的推荐分数从高往低进行排列直到预设的推荐 网站的数量为止, 将选出来的网站作为向用户推荐的网站。 网络服务器将 推荐的网站的网址嵌入到网络访问响应中返回给用户终端供用户进行参 考, 具体过程参见上述实施例, 此处不再赘述。  Obtaining feature information of the website in the cluster of the website where the first website is located, and determining the website recommended to the user according to the set recommendation rule according to the feature information, specifically, the feature information of the remaining websites in the website cluster where the first website is located Performing a weighted average to obtain the recommended scores of the remaining websites, and determining the recommended websites to the users according to the recommended recommendation criteria of each of the remaining websites, for example, according to the recommendation scores of each website, from the high to the low until the preset As far as the number of recommended websites is concerned, the selected website will be used as a website recommended to users. The network server embeds the URL of the recommended website into the network access response and returns it to the user terminal for the user to refer to. For the specific process, refer to the foregoing embodiment, and details are not described herein.
本实施例中的步骤 201和步骤 202的具体实施过程参见图 1所示的实 施例, 此处不再赘述。  For the specific implementation process of step 201 and step 202 in this embodiment, refer to the embodiment shown in FIG. 1, and details are not described herein again.
本实施例提供的网站推荐方法, 通过网络服务器根据本地存储的上网 信息在预设的多个时间段内分别获取被用户访问过的网站对应的特征信 息, 根据特征信息对网站进行聚类分析获取多个网站簇, 当接收到用户终 端发送的包括网址的网络访问请求时, 若判断获知经过聚类的网站不包括 与网址对应的第一网站, 则向其余网络服务器进行广播查询, 若接收到其 余网络服务器返回的第一网站的上网信息, 则确定第一网站所在的网站 簇, 并根据第一网站所在的网站簇中网站的特征信息确定向用户推荐的网 站, 并将推荐的网站的网址嵌入到网络访问响应中返回给用户终端, 实现 用户推荐更多的网站, 从而使用户获取更多感兴趣的资讯。  The website recommendation method provided by the embodiment obtains the feature information corresponding to the website visited by the user in the preset multiple time periods according to the locally stored Internet information, and performs cluster analysis on the website according to the feature information. When receiving a network access request including a web address sent by the user terminal, if it is determined that the website that is clustered does not include the first website corresponding to the web address, the website cluster performs a broadcast query to the remaining web servers, if received The network information of the first website returned by the remaining web servers determines the website cluster where the first website is located, and determines the website recommended to the user according to the characteristic information of the website in the website cluster where the first website is located, and the website URL of the recommended website Embedded in the network access response is returned to the user terminal, enabling the user to recommend more websites, so that the user can obtain more information of interest.
本领域普通技术人员可以理解: 实现上述方法实施例的全部或部分步 骤可以通过程序指令相关的硬件来完成, 前述的程序可以存储于一计算机 可读取存储介质中, 该程序在执行时, 执行包括上述方法实施例的步骤; 而前述的存储介质包括: ROM、 RAM, 磁碟或者光盘等各种可以存储程 序代码的介质。 One of ordinary skill in the art can understand that all or part of the steps of the above method embodiments are implemented. The foregoing program may be stored in a computer readable storage medium, and the program is executed to perform the steps including the foregoing method embodiments; and the foregoing storage medium includes: a ROM, A variety of media that can store program code, such as RAM, disk, or optical disk.
图 3为本发明网络服务器一个实施例的结构示意图, 如图 3所示, 该 网络服务器包括: 第一获取模块 11、 第二获取模块 12、 判断模块 13和处 理模块 14, 其中, 第一获取模块 1 1用于在预设的多个时间段内根据本地 存储的上网信息分别获取用户访问的网站对应的特征信息; 第二获取模块 12用于根据特征信息对网站进行聚类分析获取多个网站簇; 判断模块 13 用于在接收用户终端发送的包括网址的网络访问请求时, 判断网站是否包 括与网址对应的第一网站; 处理模块 14用于若判断获知网站包括与网址 对应的第一网站, 则根据第一网站所在的网站簇中网站对应的特征信息确 定向用户推荐的网站, 并将推荐的网站的网址嵌入到网络访问响应中返回 给用户终端。  3 is a schematic structural diagram of an embodiment of a network server according to the present invention. As shown in FIG. 3, the network server includes: a first obtaining module 11, a second obtaining module 12, a determining module 13, and a processing module 14, wherein the first obtaining The module 1 is configured to acquire the feature information corresponding to the website accessed by the user according to the locally stored Internet access information in a preset plurality of time periods. The second obtaining module 12 is configured to perform cluster analysis on the website according to the feature information to obtain multiple The determining module 13 is configured to determine whether the website includes the first website corresponding to the website address when receiving the network access request that is sent by the user terminal, and the processing module 14 is configured to: when determining that the website includes the first website corresponding to the website address The website determines the website recommended to the user according to the feature information corresponding to the website in the website cluster where the first website is located, and embeds the website address of the recommended website into the network access response and returns it to the user terminal.
针对图 3所示的实施例, 第二获取模块 12可以根据特征信息通过分 裂法、 层次法、 基于密度的方法、 基于网格的方法和基于模型的方法对网 站进行聚类分析。  For the embodiment shown in FIG. 3, the second obtaining module 12 may perform cluster analysis on the website according to the feature information by a splitting method, a hierarchical method, a density-based method, a grid-based method, and a model-based method.
本实施例提供的网络服务器中各模块的功能和处理流程, 可以参见上 述图 1所示的方法实施例, 其实现原理和技术效果类似, 此处不再赘述。  For the functions and processing procedures of the modules in the network server provided in this embodiment, refer to the method embodiment shown in FIG. 1 , and the implementation principle and technical effects are similar, and details are not described herein again.
基于图 3所示的实施例, 进一步地, 处理模块 14还用于若判断获知 网站没有包括与网址对应的第一网站, 则向其余网络服务器广播包括网址 和多个时间段的上网信息查询请求, 若接收到其余网络服务器返回的第一 网站在多个时间段内的上网信息, 则根据上网信息获取第一网站对应的特 征信息; 根据每个网站簇中网站对应的特征信息获取对应的聚集轮廓信 息, 并根据第一网站对应的特征信息和聚集轮廓信息通过相似性度量确定 第一网站所属的网站簇。  Based on the embodiment shown in FIG. 3, the processing module 14 is further configured to: if it is determined that the website does not include the first website corresponding to the website address, broadcast the online information query request including the website address and the plurality of time periods to the remaining network servers. And if the first website returned by the remaining web servers receives the online information in the multiple time periods, the feature information corresponding to the first website is obtained according to the online information; and the corresponding aggregation is obtained according to the feature information corresponding to the website in each website cluster. And contour information, and determining, by the similarity measure, the website cluster to which the first website belongs according to the feature information and the aggregated contour information corresponding to the first website.
本实施例提供的网络服务器中各模块的功能和处理流程, 可以参见上 述图 2所示的方法实施例, 其实现原理和技术效果类似, 此处不再赘述。  For the functions and processing procedures of the modules in the network server provided in this embodiment, refer to the method embodiment shown in FIG. 2, and the implementation principle and technical effects are similar, and details are not described herein again.
图 4为本发明网站推荐系统一个实施例的结构示意图, 如图 4所示, 该系统包括: 网络服务器 1以及用户终端 2, 其中, 网络服务器 1可以为 本发明实施例提供的网络服务器, 用户终端 2为本发明实施例涉及到的用 户终端, 本实施例提供的网站推荐系统中各装置的功能和处理流程, 可以 参见上述方法和装置实施例,其实现原理和技术效果类似,此处不再赘述。 4 is a schematic structural diagram of an embodiment of a website recommendation system according to the present invention. As shown in FIG. 4, the system includes: a network server 1 and a user terminal 2, wherein the network server 1 can be The network server provided by the embodiment of the present invention, the user terminal 2 is the user terminal involved in the embodiment of the present invention, and the functions and processing procedures of the devices in the website recommendation system provided by this embodiment may be referred to the foregoing method and apparatus embodiment. The implementation principle and technical effect are similar, and will not be described here.
最后应说明的是: 以上实施例仅用以说明本发明的技术方案, 而非对 其限制; 尽管参照前述实施例对本发明进行了详细的说明, 本领域的普通 技术人员应当理解: 其依然可以对前述各实施例所记载的技术方案进行修 改, 或者对其中部分技术特征进行等同替换; 而这些修改或者替换, 并不 使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。  It should be noted that the above embodiments are only for explaining the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: The technical solutions described in the foregoing embodiments are modified, or some of the technical features are equivalently replaced. The modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

权 利 要 求 书 Claim
1、 一种网站推荐方法, 其特征在于, 包括:  A website recommendation method, comprising:
网络服务器在预设的多个时间段内根据本地存储的上网信息分别获 取用户访问的网站对应的特征信息;  The network server obtains the feature information corresponding to the website accessed by the user according to the locally stored Internet access information in a preset plurality of time periods;
所述网络服务器根据所述特征信息对所述网站进行聚类分析获取多 个网站簇, 以便在接收用户终端发送的包括网址的网络访问请求时, 判断 所述网站是否包括与所述网址对应的第一网站, 若是, 则根据所述第一网 站所在的网站簇中网站对应的特征信息确定向用户推荐的网站, 并将所述 推荐的网站的网址嵌入到网络访问响应中返回给所述用户终端。  The network server performs cluster analysis on the website according to the feature information to obtain a plurality of website clusters, so as to determine, when receiving a network access request including a web address sent by the user terminal, whether the website includes a URL corresponding to the website address. a first website, if yes, determining a website recommended to the user according to the feature information corresponding to the website in the website cluster where the first website is located, and embedding the website address of the recommended website into the network access response and returning to the user terminal.
2、 根据权利要求 1 所述的网站推荐方法, 其特征在于, 若判断获知 所述网站没有包括与所述网址对应的第一网站, 所述方法还包括:  The website recommendation method according to claim 1, wherein the method further comprises: if it is determined that the website does not include the first website corresponding to the website address, the method further comprises:
所述网络服务器向其余网络服务器广播包括所述网址和所述多个时 间段的上网信息查询请求, 若接收到所述其余网络服务器返回的所述第一 网站在所述多个时间段内的上网信息, 则根据所述上网信息获取所述第一 网站对应的特征信息;  The network server broadcasts an online information query request including the website address and the plurality of time periods to the remaining network servers, if the first website returned by the remaining network servers is received within the plurality of time periods The online information is obtained, and the feature information corresponding to the first website is obtained according to the online information;
所述网络服务器根据每个网站簇中网站对应的特征信息获取对应的 聚集轮廓信息, 并根据所述第一网站对应的特征信息和所述聚集轮廓信息 通过相似性度量确定所述第一网站所属的网站簇。  And the network server obtains corresponding aggregation profile information according to the feature information corresponding to the website in each website cluster, and determines, according to the feature information corresponding to the first website and the aggregated contour information, the first website belongs by using a similarity measure Site cluster.
3、 根据权利要求 1 所述的网站推荐方法, 其特征在于, 所述根据所 述特征信息对所述网站进行聚类分析包括:  The website recommendation method according to claim 1, wherein the clustering analysis of the website according to the feature information comprises:
根据所述特征信息通过分裂法、 层次法、 基于密度的方法、 基于网格 的方法和基于模型的方法对所述网站进行聚类分析。  The website is clustered and analyzed according to the feature information by a split method, a hierarchical method, a density-based method, a grid-based method, and a model-based method.
4、 根据权利要求 1 所述的网站推荐方法, 其特征在于, 所述根据所述第 一网站所在的网站簇中网站对应的特征信息确定向用户推荐的网站包括: 根据所述第一网站所在的网站簇中的其余网站对应的特征信息获取 所述其余网站的推荐分数;  The website recommendation method according to claim 1, wherein the determining the website recommended to the user according to the feature information corresponding to the website in the website cluster where the first website is located comprises: according to the first website The feature information corresponding to the remaining websites in the website cluster obtains the recommended scores of the remaining websites;
根据所述推荐分数按照预设的推荐准则确定向用户推荐的网站。 A website recommended to the user is determined according to the recommended recommendation criteria according to the recommended score.
5、 根据权利要求 1-4任一项所述的网站推荐方法, 其特征在于, 所述 特征信息包括: 各预设时间段内所述网站被用户访问的频率特征、 方差特 征和熵特征中的至少一种特征。 The website recommendation method according to any one of claims 1 to 4, wherein the feature information comprises: a frequency feature, a variance feature, and an entropy feature of the website accessed by the user in each preset time period. At least one feature.
6、 根据权利要求 1-4任一项所述的网站推荐方法, 其特征在于, 所述 网址包括: 域名和 /或 IP地址。 The website recommendation method according to any one of claims 1 to 4, wherein the website address comprises: a domain name and/or an IP address.
7、 一种网络服务器, 其特征在于, 包括:  7. A network server, comprising:
第一获取模块, 用于在预设的多个时间段内根据本地存储的上网信息 分别获取用户访问的网站对应的特征信息;  The first obtaining module is configured to obtain feature information corresponding to the website accessed by the user according to the locally stored Internet access information in a preset plurality of time periods;
第二获取模块, 用于根据所述特征信息对所述网站进行聚类分析获取 多个网站簇;  a second acquiring module, configured to perform cluster analysis on the website according to the feature information to obtain a plurality of website clusters;
判断模块, 用于在接收用户终端发送的包括网址的网络访问请求时, 判断所述网站是否包括与所述网址对应的第一网站;  a determining module, configured to: when receiving a network access request that includes a web address sent by the user terminal, determining whether the website includes a first website corresponding to the web address;
处理模块, 用于若判断获知所述网站包括与所述网址对应的第一网 站, 则根据所述第一网站所在的网站簇中网站对应的特征信息确定向用户 推荐的网站, 并将所述推荐的网站的网址嵌入到网络访问响应中返回给所 述用户终端。  a processing module, configured to: if it is determined that the website includes a first website corresponding to the website address, determine a website recommended to the user according to the feature information corresponding to the website in the website cluster where the first website is located, and The URL of the recommended website is embedded in the network access response and returned to the user terminal.
8、 根据权利要求 7所述的网络服务器, 其特征在于, 所述处理模块, 还用于:  The network server according to claim 7, wherein the processing module is further configured to:
若判断获知所述网站没有包括与所述网址对应的第一网站, 则向其余 网络服务器广播包括所述网址和所述多个时间段的上网信息查询请求, 若 接收到所述其余网络服务器返回的所述第一网站在所述多个时间段内的 上网信息, 则根据所述上网信息获取所述第一网站对应的特征信息;  If it is determined that the website does not include the first website corresponding to the website address, the network information query request including the website address and the multiple time periods is broadcasted to the remaining network servers, and if the remaining network server is received, The online information of the first website in the multiple time periods is obtained, and the feature information corresponding to the first website is obtained according to the online information;
根据每个网站簇中网站对应的特征信息获取对应的聚集轮廓信息, 并 根据所述第一网站对应的特征信息和所述聚集轮廓信息通过相似性度量 确定所述第一网站所属的网站簇。  Corresponding aggregated profile information is obtained according to the feature information corresponding to the website in each website cluster, and the website cluster to which the first website belongs is determined by the similarity measure according to the feature information corresponding to the first website and the aggregated outline information.
9、 根据权利要求 8 所述的网络服务器, 其特征在于, 所述第二获取 模块具体用于:  The network server according to claim 8, wherein the second obtaining module is specifically configured to:
根据所述特征信息通过分裂法、 层次法、 基于密度的方法、 基于网格 的方法和基于模型的方法对所述网站进行聚类分析。  The website is clustered and analyzed according to the feature information by a split method, a hierarchical method, a density-based method, a grid-based method, and a model-based method.
10、 一种网站推荐系统, 其特征在于, 包括如权利要求 7或 8或 9任 一项所述的网络服务器, 以及用户终端。  A website recommendation system, comprising: the network server according to any one of claims 7 or 8 or 9, and a user terminal.
PCT/CN2011/083678 2011-09-26 2011-12-08 Method and system for recommending website and network server WO2013044559A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110288443.1A CN102316166B (en) 2011-09-26 2011-09-26 Website recommending method and system and network server
CN201110288443.1 2011-09-26

Publications (1)

Publication Number Publication Date
WO2013044559A1 true WO2013044559A1 (en) 2013-04-04

Family

ID=45428972

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/083678 WO2013044559A1 (en) 2011-09-26 2011-12-08 Method and system for recommending website and network server

Country Status (2)

Country Link
CN (1) CN102316166B (en)
WO (1) WO2013044559A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113139146A (en) * 2020-01-17 2021-07-20 中国移动通信集团浙江有限公司 Website quality evaluation method and device and computing equipment

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294692B (en) * 2012-02-24 2017-10-17 北京搜狗信息服务有限公司 A kind of information recommendation method and system
CN102646132B (en) * 2012-03-26 2014-03-12 中国联合网络通信集团有限公司 Method and device for recognizing attributes of broadband users
CN105868291A (en) * 2012-07-10 2016-08-17 北京奇虎科技有限公司 Website address recommendation method, apparatus and system
CN103678366B (en) * 2012-09-14 2017-11-24 腾讯科技(深圳)有限公司 The method and server of recommendation information are provided for browser
CN103812906B (en) * 2012-11-14 2015-03-18 腾讯科技(深圳)有限公司 Website recommendation method and device and communication system
CN105100165B (en) * 2014-05-20 2017-11-14 深圳市腾讯计算机系统有限公司 Network service recommends method and apparatus
CN104579773B (en) * 2014-12-31 2016-08-24 北京奇虎科技有限公司 Domain name system analyzes method and device
CN105989071A (en) * 2015-02-10 2016-10-05 阿里巴巴集团控股有限公司 Method and device for obtaining user network operation characteristics
CN106933885A (en) * 2015-12-31 2017-07-07 北京国双科技有限公司 The acquisition methods and device of website propagating influence
TWI626549B (en) * 2017-04-17 2018-06-11 Chunghwa Telecom Co Ltd Method of analyzing a URL to generate a user profile
CN107330718B (en) * 2017-06-09 2021-01-19 晶赞广告(上海)有限公司 Media anti-cheating method and device, storage medium and terminal
CN109492687A (en) * 2018-10-31 2019-03-19 北京字节跳动网络技术有限公司 Method and apparatus for handling information
CN110138599B (en) * 2019-04-24 2020-11-17 北京字节跳动网络技术有限公司 Domain Name System (DNS) query method, device, medium and electronic equipment based on domain name association degree
CN110300027A (en) * 2019-06-29 2019-10-01 西安交通大学 A kind of abnormal login detecting method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002001419A1 (en) * 2000-06-28 2002-01-03 Quark, Inc. System and method for providing personalized recommendations
CN101551806A (en) * 2008-04-03 2009-10-07 北京搜狗科技发展有限公司 Personalized website navigation method and system
CN101814083A (en) * 2010-01-08 2010-08-25 上海复歌信息科技有限公司 Automatic webpage classification method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1873657A1 (en) * 2006-06-29 2008-01-02 France Télécom User-profile based web page recommendation system and method
US7836005B2 (en) * 2007-10-16 2010-11-16 Kuo-Hui Chien System and method for automatic generation of user-oriented homepage

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002001419A1 (en) * 2000-06-28 2002-01-03 Quark, Inc. System and method for providing personalized recommendations
CN101551806A (en) * 2008-04-03 2009-10-07 北京搜狗科技发展有限公司 Personalized website navigation method and system
CN101814083A (en) * 2010-01-08 2010-08-25 上海复歌信息科技有限公司 Automatic webpage classification method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113139146A (en) * 2020-01-17 2021-07-20 中国移动通信集团浙江有限公司 Website quality evaluation method and device and computing equipment

Also Published As

Publication number Publication date
CN102316166B (en) 2015-07-08
CN102316166A (en) 2012-01-11

Similar Documents

Publication Publication Date Title
WO2013044559A1 (en) Method and system for recommending website and network server
KR101700352B1 (en) Generating improved document classification data using historical search results
WO2013044560A1 (en) Method and system for recommending website and network server
US8352396B2 (en) Systems and methods for improving web site user experience
Chen et al. Location-aware personalized news recommendation with deep semantic analysis
WO2018090793A1 (en) Multimedia recommendation method and device
CN105706083B (en) Methods, systems, and media for providing answers to user-specific queries
CN101464881B (en) Method and system for generating media recommendations in a distributed environment based on tagging play history information with location information
US20170236073A1 (en) Machine learned candidate selection on inverted indices
US9405746B2 (en) User behavior models based on source domain
US10380649B2 (en) System and method for logistic matrix factorization of implicit feedback data, and application to media environments
US20110060717A1 (en) Systems and methods for improving web site user experience
US20090259606A1 (en) Diversified, self-organizing map system and method
JP5542688B2 (en) Apparatus and method for optimizing user access to content
WO2016188347A1 (en) Network quality evaluation method and device, network content sorting method and system, computing apparatus and non-transitory machine-readable storage medium
US11301528B2 (en) Selecting content objects for recommendation based on content object collections
WO2014071782A1 (en) User interest recommendation method and apparatus
BRPI0720994A2 (en) SEARCHING ANSWERS FOR QUESTIONS
WO2009023070A1 (en) Systems and methods for keyword selection in a web-based social network
WO2008048993A2 (en) Method for estimating real life relationships and popularities among people based on personal visual data
CN112868003A (en) Entity-based search system using user interactivity
CN111552884A (en) Method and apparatus for content recommendation
Lei et al. Personalized Item Recommendation Algorithm for Outdoor Sports
CN114503099A (en) Replying to queries with voice recordings
JP2009245382A (en) Apparatus, method, program and computer readable recording medium for searching content

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11873246

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11873246

Country of ref document: EP

Kind code of ref document: A1