CN102982079A - Method and device for personalized website navigation - Google Patents

Method and device for personalized website navigation Download PDF

Info

Publication number
CN102982079A
CN102982079A CN2012104262856A CN201210426285A CN102982079A CN 102982079 A CN102982079 A CN 102982079A CN 2012104262856 A CN2012104262856 A CN 2012104262856A CN 201210426285 A CN201210426285 A CN 201210426285A CN 102982079 A CN102982079 A CN 102982079A
Authority
CN
China
Prior art keywords
classification
feature
interest model
feature words
client device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012104262856A
Other languages
Chinese (zh)
Other versions
CN102982079B (en
Inventor
周浩
邓夏玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201210426285.6A priority Critical patent/CN102982079B/en
Publication of CN102982079A publication Critical patent/CN102982079A/en
Application granted granted Critical
Publication of CN102982079B publication Critical patent/CN102982079B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a personalized website navigation method and a device of the personalized navigation of the website. The method comprises acquiring group historic action data of a plurality of devices based on customer terminals, building a group interest model of visitors of the customer terminal according to the group historic action data, wherein category information reflecting accessing interest points of the visitors of the customer terminal are saved in the interest models. According to an individual historic action data of an individual customer terminal and the group interest model, an individual interest model of the visitors of the customer terminal is built. Category information reflecting the accessing interest points of an individual visitor of the customer device is saved in the individual interest module. According to the individual interest module, website categories are confirmed on a website navigation page. The personalized website navigation method and the device of the personalized navigation of the website can meet surfing requirements of visitors of the customer terminal better, and therefore the utilizing rate of the information provided by the navigation website is improved.

Description

The personalized website navigation method and apparatus
Technical field
The present invention relates to the browser technology field, be specifically related to the personalized website navigation method and apparatus.
Background technology
Along with the development of computer technology and the continuous expansion of Internet user's scale, increasing Internet user uses personal computer to obtain various required information by the internet.Simultaneously, for the Internet user provides the website of information service also more and more, the quantity of internet web page is all increasing every day with surprising rapidity, and internet information presents the growth of explosion type.For the user, often need by certain means, could in vast as the open sea internet information, locate rapidly the website of suitable own demand or the information of needs, such as by the Web side navigation service.
Web side navigation is a more network address of set, and by a kind of network address station that certain condition is classified, is mainly the user Web side navigation service is provided.Web side navigation can make the user need not remember the network address of website, just can conveniently find the website that oneself needs to browse and information retrieval by the link that website navigation page provides.Simultaneously, existing Web side navigation has also provides some practical functions, as number inquiry, mailbox log in, hot news, search engine entrance etc., also provide convenience for user's internet surfing to a certain extent.According to statistics, there is at present the homepage of Internet user's browser of significant proportion to be set to website navigation page, illustrated also that from a side Web side navigation service has the effect that makes things convenient for the user to browse in actual applications really, thereby be subject to many users' welcome.
Yet, in existing Web side navigation service technology, the navigation content page is by the artificial static page of recommending of the operation and maintenance personnel of Web side navigation product mostly, although the navigation network address in the content of pages may relate to many classes, but with respect to the quantity of information of enormous amount on the internet and rapid growth, the Web side navigation content often can not adapt to Internet user's the demand of browsing.
Summary of the invention
In view of the above problems, the present invention has been proposed in order to provide a kind of personalized website navigation method that overcomes the problems referred to above or address the above problem at least in part and corresponding device.
According to one aspect of the present invention, a kind of personalized website navigation method is provided, comprising:
Obtain many stylobates in colony's historical behavior data of the Access Events of client device;
Set up the group interest model of client device access side according to described colony historical behavior data, preserve the classification information that embodies client device access side colony Access Interest point in the described interest model;
Individual historical behavior data and described group interest model according to single client device, set up the individual interest model of client device access side, preserve the classification information that embodies client device access side's individual access point of interest in the described individual interest model;
Determine the network address classification that in website navigation page, shows according to described individual interest model.
Alternatively, described individual historical behavior data and described group interest model according to single client device are set up the individual interest model of client device access side, comprising:
From described individual historical behavior extracting data Feature Words;
According to the classification information of preserving in the described group interest model, each Feature Words is classified, obtain several feature classifications;
Preserve each feature classification, obtain described individual interest model.
Alternatively, also preserve other weight of each feature class in the described individual interest model, described weight be used for to embody client device access side's individuality to the interest level of each classification, describedly determines that according to described individual interest model the network address classification that shows comprises in website navigation page:
According to other weight of each feature class each feature classification is sorted, determine the network address classification that in website navigation page, shows and put in order according to ranking results.
Alternatively, described individual historical behavior data and described group interest model according to single client device are set up the individual interest model of client device access side, comprising:
From described individual historical behavior extracting data Feature Words, and obtain the occurrence frequency of each Feature Words in described individual historical behavior data;
According to the classification information of preserving in the described group interest model, described Feature Words is classified, obtain several feature classifications;
According to the occurrence frequency of each Feature Words that comprises in each feature classification, obtain other weight of each feature class;
Preserve each feature classification and corresponding weight, obtain described individual interest model.
Alternatively, classification in the described group interest model comprises one-level classification and secondary classification, comprise a plurality of secondary classifications under each one-level classification, and preserve each secondary classification with the form of two-dimensional matrix, wherein, each secondary classification under corresponding each the one-level classification of the every delegation in the described two-dimensional matrix, described classification information according to preserving in the described group interest model, described Feature Words is classified, obtains several feature classifications and comprise:
According to the secondary classification information of preserving in the described group interest model, described Feature Words is classified, obtain several secondary characteristics classifications;
Each feature classification of described preservation and corresponding weight comprise:
The weight of each secondary characteristics classification is saved in element place corresponding to described two-dimensional matrix;
Described each feature classification the ordering according to other weight of each feature class comprises:
With the weight addition of each secondary characteristics classification of every delegation in the described two-dimensional matrix, obtain other weight of one-level feature class respectively;
According to each other weight of one-level feature class each one-level feature classification is sorted, and according to the weight of each secondary characteristics classification, each secondary characteristics classification is sorted.
Alternatively, the occurrence frequency of described each Feature Words according to comprising in each feature classification obtains other weight of each feature class and comprises:
Respectively the occurrence frequency of each Feature Words of comprising in each feature classification added up, obtain other weight of each feature class.
Alternatively, the occurrence frequency of described each Feature Words according to comprising in each feature classification obtains other weight of each feature class and comprises:
According to the Feature Words hot information of search engine server statistics, obtain the focus degree information of each Feature Words;
The comprehensive frequency information of calculating each Feature Words according to occurrence frequency and the described focus degree information of each Feature Words;
Respectively the comprehensive frequency information of each Feature Words of comprising in each feature classification added up, obtain other weight of each feature class.
Alternatively, the described comprehensive frequency information of calculating each Feature Words according to occurrence frequency and the described focus degree information of each Feature Words comprises:
The focus degree information of Feature Words be multiply by a weighting coefficient, and carry out addition with described occurrence frequency, obtain the comprehensive frequency information of Feature Words; Wherein, described weighting coefficient is less than 1.
Alternatively, the described group interest model of setting up client device access side according to described colony historical behavior data comprises:
From described colony historical behavior extracting data Feature Words;
Described Feature Words from described colony historical behavior extracting data is carried out cluster, obtain a plurality of class labels, preserve described a plurality of class label, obtain described group interest model.
Alternatively, described described Feature Words is carried out cluster, obtains a plurality of class labels and comprise:
Described Feature Words from described colony historical behavior extracting data is carried out normalized;
Feature Words after the normalized is carried out cluster, obtain a plurality of class labels.
According to a further aspect in the invention, provide a kind of personalized website navigation device, having comprised:
Data capture unit is used for obtaining many stylobates in colony's historical behavior data of the Access Events of client device;
The group interest model is set up the unit, is used for setting up according to described colony historical behavior data the group interest model of client device access side, preserves the classification information that embodies client device access side colony Access Interest point in the described interest model;
Individual interest model is set up the unit, be used for individual historical behavior data and described group interest model according to single client device, set up the individual interest model of client device access side, preserve the classification information that embodies client device access side's individual access point of interest in the described individual interest model;
The classification determining unit is for the network address classification of determining according to described individual interest model to show at website navigation page.
Alternatively, described individual interest model is set up the unit and is comprised:
The First Characteristic word extracts subelement, is used for from described individual historical behavior extracting data Feature Words;
The first classification subelement for the classification information of preserving according to described group interest model, is classified to each Feature Words, obtains several feature classifications;
First preserves subelement, is used for preserving each feature classification, obtains described individual interest model.
Alternatively, also preserve other weight of each feature class in the described individual interest model, described weight is used for embodying client device access side's individuality to the interest level of each classification, and described classification determining unit comprises:
The ordering subelement is used for according to other weight of each feature class each feature classification being sorted, and determines the network address classification that shows and put in order in website navigation page according to ranking results.
Alternatively, described individual interest model is set up the unit and is comprised:
Frequency obtains subelement, is used for from described individual historical behavior extracting data Feature Words, and obtains the occurrence frequency of each Feature Words in described individual historical behavior data;
The second classification subelement for the classification information of preserving according to described group interest model, is classified to described Feature Words, obtains several feature classifications;
Weight obtains subelement, and the occurrence frequency for each Feature Words that comprises according to each feature classification obtains other weight of each feature class;
Second preserves subelement, is used for preserving each feature classification and corresponding weight, obtains described individual interest model.
Alternatively, classification in the described group interest model comprises one-level classification and secondary classification, comprise a plurality of secondary classifications under each one-level classification, and preserve each secondary classification with the form of two-dimensional matrix, wherein, each secondary classification under corresponding each the one-level classification of every delegation in the described two-dimensional matrix, described the second classification subelement specifically is used for:
According to the secondary classification information of preserving in the described group interest model, described Feature Words is classified, obtain several secondary characteristics classifications;
Described second preserves subelement specifically is used for:
The weight of each secondary characteristics classification is saved in element place corresponding to described two-dimensional matrix;
Described ordering subelement specifically is used for:
With the weight addition of each secondary characteristics classification of every delegation in the described two-dimensional matrix, obtain other weight of one-level feature class respectively;
According to each other weight of one-level feature class each one-level feature classification is sorted, and according to the weight of each secondary characteristics classification, each secondary characteristics classification is sorted.
Alternatively, described weight acquisition subelement comprises:
The first cumulative subelement, the occurrence frequency that is used for respectively each Feature Words that each feature classification is comprised adds up, and obtains other weight of each feature class.
Alternatively, described weight acquisition subelement comprises:
Hot information obtains subelement, is used for the Feature Words hot information according to the search engine server statistics, obtains the focus degree information of each Feature Words;
Comprehensive frequency computation subunit is used for the comprehensive frequency information of calculating each Feature Words according to occurrence frequency and the described focus degree information of each Feature Words;
The second cumulative subelement, the comprehensive frequency information that is used for respectively each Feature Words that each feature classification is comprised adds up, and obtains other weight of each feature class.
Alternatively, described comprehensive frequency computation subunit specifically is used for:
The focus degree information of Feature Words be multiply by a weighting coefficient, and carry out addition with described occurrence frequency, obtain the comprehensive frequency information of Feature Words; Wherein, described weighting coefficient is less than 1.
Alternatively, described group interest model is set up the unit and is comprised:
The Second Characteristic word extracts subelement, is used for from described colony historical behavior extracting data Feature Words;
The cluster subelement is used for described Feature Words from described colony historical behavior extracting data is carried out cluster, obtains a plurality of class labels, preserves described a plurality of class label, obtains described group interest model.
Alternatively, described cluster subelement specifically is used for:
Described Feature Words from described colony historical behavior extracting data is carried out normalized; Feature Words after the normalized is carried out cluster, obtain a plurality of class labels.
According to personalized website navigation method and apparatus of the present invention, can be by the colony's historical behavior data from a plurality of client devices be added up, set up the group interest model of client device access side, count thus, which the interested classification of client device access side colony has; Then based on the group interest model and from the individual historical behavior data of single client device, can set up the individual interest model of client device access side, in order to preserve each client device access side respectively interested classification which is arranged, like this, when needs provide the navigation website page for certain client device access side, individual interest model that just can be corresponding according to this client device access side, be chosen in and show which categories of websites in the navigation website page, embody thus the personalized difference on the navigation website page shows based on the different client devices access side, the demand of browsing that more meets client device access side, thereby so that the utilization factor of the information that provides in the navigation website page be improved.
Above-mentioned explanation only is the general introduction of technical solution of the present invention, for can clearer understanding technological means of the present invention, and can be implemented according to the content of instructions, and for above and other objects of the present invention, feature and advantage can be become apparent, below especially exemplified by the specific embodiment of the present invention.
Description of drawings
By reading hereinafter detailed description of the preferred embodiment, various other advantage and benefits will become cheer and bright for those of ordinary skills.Accompanying drawing only is used for the purpose of preferred implementation is shown, and does not think limitation of the present invention.And in whole accompanying drawing, represent identical parts with identical reference symbol.In the accompanying drawings:
Fig. 1 shows the according to an embodiment of the invention process flow diagram of method;
Fig. 2 shows the synoptic diagram of an apparatus in accordance with one embodiment of the invention;
Fig. 3 shows the according to an embodiment of the invention synoptic diagram of system; And
Fig. 4 shows the according to an embodiment of the invention synoptic diagram of website navigation page.
Embodiment
Exemplary embodiment of the present disclosure is described below with reference to accompanying drawings in more detail.Although shown exemplary embodiment of the present disclosure in the accompanying drawing, yet should be appreciated that and to realize the disclosure and the embodiment that should do not set forth limits here with various forms.On the contrary, it is in order to understand the disclosure more thoroughly that these embodiment are provided, and can with the scope of the present disclosure complete convey to those skilled in the art.
Referring to Fig. 1, the personalized website navigation method that the embodiment of the invention provides may further comprise the steps:
S101: obtain many stylobates in colony's historical behavior data of the Access Events of client device;
Each user can corresponding client device, the user is as the access side of client device, can be registrant or the importer of client device, the access side of each client device can be assigned with a uniqueness sign corresponding with the access side of client device, so that different client device access sides is distinguished.Many stylobates can include but not limited to the historical behavior data that a plurality of client device access sides produce when using the softwares such as browsers, input method to carry out the access such as web page browsing, search, input at client device in the historical data of the Access Events of client device, and the document information on the client device of accessing etc., the uniform resource position mark URL corresponding to webpage of client device access side access for example, the keyword of input when client device access side is searched for, client device access side uses the words of input method input etc.
Many stylobates can obtain by several different methods in the historical behavior data of the Access Events of client device, for example, can be by the browser of client device access side's historical behavior data collection function be arranged, the browser plug-in that client device access side's historical behavior data collection function is arranged, other application software of client device access side's historical behavior data collection function etc. are arranged, when client device access side's accessed web page, can come client device access side's historical behavior data are collected by these programs, specifically can be when using the browser browsing page in client device access side, browser is got up by these above-mentioned programs these Data Collections with client device access side after initiating request to server.In addition, when user's accessed web page, also can be collected by the server that webpage is provided the historical behavior data of client device access side, the behavioral data of clicking at the webpage that server provides, searching for, produce during the operation such as input such as client device access side.
Getting access to many stylobates after the historical behavior data of the Access Events of client device, can also put in order or statistical operation these historical behavior data, for example the network address that repeats is being gone retry, the access times of same URL are being added up etc.
S102: set up the group interest model of client device access side according to described colony historical behavior data, preserve the classification information that embodies client device access side colony Access Interest point in the described interest model;
In order to disclose more fully the specific implementation of this step, paper is the correlation technique feature of the group interest model of client device access side once.
The group interest model of client device access side is a kind of data model that embodies the Access Interest of client device access side colony, wherein can comprise for the classification information that embodies client device access side colony Access Interest point.The form of expression of the group interest model of client device access side can be various, the embodiment of the invention to the form of expression of the group interest model of client device access side without limits, for example can represent with the form of set that the group interest model of client device access side, the below are the examples of the group interest model of a client device access side that represents with the form of set:
News, and physical culture, science and technology, amusement, automobile, video ..., house property, tourism, music, fashion, military affairs, education }
Each element in the set can represent the classification information that embodies client device access side colony Access Interest point, classifications such as wherein news, physical culture, science and technology.Use computer equipment that the group interest model of client device access side is represented and process for convenient, the group interest model of the client device access side that above-mentioned set form can also be represented is abstract in the incompatible expression of data set, can be abstract be { a such as above-mentioned group interest model take the client device access side that the set form represents 0, a 1, a 2, a 3, a 4, a 5..., a i... }.In this set, each data element can be corresponding with each element in the aforementioned set, for example a 0News category in the corresponding aforementioned set, and a 1The Sport Class in the corresponding aforementioned set then, by that analogy.When the form with data acquisition represents the group interest model of client device access side, each element in the set can a corresponding classification that embodies client device access side colony Access Interest point, simultaneously, each element in the set can also be endowed certain numerical value, the process of the element assignment in the described pair set, also can be regarded as the process that the group interest model to client device access side quantizes, and the numerical value that element is endowed, in order to embody the user of colony to the interest level of the classification of correspondence, can be according to the historical behavior data acquisition of many stylobates in the Access Events of client device.
The group interest model of client device access side can also represent with the form of two-dimensional matrix, because two-dimensional matrix, because two-dimensional matrix has two dimensions of row and column, the group interest model that more represents to refinement client device access side with two-dimensional matrix, relation between the element that can also utilize row or column and be located thereon shows the more abundant information content, and the below is the example of the group interest model of a client device access side that represents with the form of two-dimensional matrix:
Figure BDA00002333307400091
Each element in the row or column in this two-dimensional matrix can represent the classification information that embodies client device access side colony Access Interest point, such as classifications such as wherein football, stock, rock and rolls, simultaneously, every delegation can belong to again a larger classification, can classify as Sport Class such as the first row in this two-dimensional matrix, the second row can classify as finance and economics classification etc.Certainly can carry out transposition to above-mentioned two-dimensional matrix in actual use, as seen, the column element that each lists behind the transposition can belong to a larger classification.Use computer equipment that the group interest model of client device access side is represented and process for convenient, the group interest model of the client device access side that the two-dimensional matrix form can be represented equally is abstract to be the two-dimensional matrix of data mode, as:
a 11 a 12 · · · a 1 j · · · a 1 n · · · · · · · · · · · · · · · · · a i 1 a i 2 · · · a ij · · · a in · · · · · · · · · · · · · · · · · · a m 1 a m 2 · · · a mj · · · a mn
In this two-dimensional matrix, data element on each row or column can be corresponding with the classification of a client device access side colony Access Interest point, for example the example of the group interest model that represents of aforementioned two-dimensional matrix can represent with the two-dimensional matrix of the data mode of 4 row, 4 row, wherein, the football classification of a11 in can corresponding example, and a 22Stock classification in then can corresponding example, by that analogy.When the two-dimensional matrix with data mode represents the group interest model of client device access side, each element in the matrix can a corresponding classification that embodies client device access side colony Access Interest point, simultaneously, each element in the matrix can also be endowed certain numerical value, described process to element assignment in the matrix, also can be regarded as the process that the group interest model to client device access side quantizes, and the numerical value that element is endowed in the matrix, same in order to embody the user of colony to the interest level of the classification of correspondence, can be according to the historical behavior data acquisition of many stylobates in the Access Events of client device.
More than introduced the correlation technique feature of the group interest model of client device access side in the embodiment of the invention, apparent, the group interest model of the client device access side that represents with the data acquisition form, group interest model with the client device access side that represents with the two-dimensional matrix form, it all is the example of the group interest model of expression client device access side, in actual applications, can also adopt according to actual needs other form to represent the group interest model of client device access side, the embodiment of the invention is to this not restriction.How the below introduces according to the historical behavior data of many stylobates in the Access Events of client device, sets up the group interest model of client device access side.
As previously mentioned, many stylobates can include but not limited to that in colony's historical behavior data of the Access Events of client device a plurality of client device access sides use browser, the softwares such as input method carry out web page browsing at client device, search, the historical behavior data that produce during the access such as input, and the document information on the client device of accessing etc., get access to many stylobates after colony's historical behavior data of the Access Events of client device, can take diverse ways to set up the group interest model of client device access side for different colony's historical behavior data.As can then carrying out cluster to Feature Words first from colony's historical behavior extracting data Feature Words, obtain a plurality of class labels, the class label that obtains is preserved, can obtain the group interest model.Wherein, from colony's historical behavior extracting data Feature Words, can be that searching key word that client device access side is inputted in search engine is as Feature Words, also can extract content keyword as Feature Words from accessed webpage, the historical behavior data instance that the below is produced when carrying out web page browsing according to client device access side describes in detail.
The historical behavior data that when client device carries out web page browsing, produce for client device access side, at first can therefrom obtain the URL information of the webpage browsed client device access side, URL information according to webpage, web page contents corresponding to web page server request related urls to correspondence, then web page contents is analyzed, extract the Feature Words of related web page, this process also can be finished in the process of client device access side's accessed web page, namely in client device access side's browsing page, its web page contents of browsing is analyzed, thereby extracted the Feature Words of related web page.After from the web page contents of client device access side's access, extracting relevant Feature Words, can carry out cluster to relevant Feature Words, thereby obtain a plurality of class labels.Need to prove, cluster is the process that the set of object is divided into a plurality of classifications that are comprised of similar object, it is a kind of class method for distinguishing that obtains each object in the goal set, the classification of object was unknown situation during the method for cluster was applicable to classify and gathers, in embodiments of the present invention, can extract relevant Feature Words to colony's historical behavior data of the Access Events from many stylobates in client device and carry out cluster, thus the class label of acquisition correlated characteristic word.For example colony's historical behavior data of the Access Events from many stylobates in client device have been extracted following Feature Words:
Yao Ming, Liu Xiang, Sun Yang, Guo Jingjing ... Europe Cup, Barcelona, in super, Ba Luoteli ...,
Use the method for cluster, can obtain " sportsman " classification according to Feature Words clusters relevant with the sportsman such as wherein " Yao Ming ", " Liu Xiang ", " Sun Yang ", " Guo Jingjings "; And the Feature Words clusters relevant with football such as " Europe Cup ", " Barcelona ", " in super ", " Ba Luoteli " are obtained " football " classification, by that analogy, the method that just can use cluster obtains class label corresponding to above-mentioned all Feature Words, also namely realized Feature Words is carried out cluster, obtained a plurality of class labels.Then, the class label that obtains can be kept in certain expression structure, such as being kept in aforesaid set or the two-dimensional matrix, just obtain relevant group interest model.
In addition, because the Feature Words that is drawn into from colony's historical behavior data may be very many, and it is identical or close but express incomplete same Feature Words wherein may also to comprise some implications, etc., therefore, can also carry out normalized to the Feature Words from colony's historical behavior data acquisition, then the Feature Words after the normalized be carried out cluster, thereby obtain a plurality of class labels.What is called is carried out normalized to Feature Words, refer to a plurality of identical or close Feature Words of expressing the meaning in the Feature Words that obtains are carried out unified process, comprise the Feature Words such as " Barcelona ", " Barcelona ", " universe team ", " FCB " in the Feature Words such as the hypothesis acquisition, these Feature Words can normalized be " Barcelona ".Can carry out carrying out cluster after the normalized to all Feature Words, and then obtain a plurality of class labels.Wherein, the concrete grammar about normalized can referring to the realization in the prior art, no longer describe in detail here.
Wherein, when using the group interest model of the client device access side that represents with the two-dimensional matrix form, can also carry out again cluster to the class label that cluster first obtains, result according to twice cluster, obtain the group interest model of the client device access side that represents with the two-dimensional matrix form, and determine the distribution of class label in the group interest model.During concrete the application, the related category label that cluster first can be obtained, as the secondary classification in the group interest model of client device access side, and will be again the class label that obtains of cluster as the one-level classification in the group interest model of client device access side, for example obtain following class label by cluster first:
Football, basketball, tennis, swimming, fund, stock, futures, gold, R﹠amp; B plays and breathes out, allusion, rock and roll, cat, dog, cavy, snake.
At this moment, can then can carry out again cluster to above-mentioned class label as the secondary classification with above-mentioned class label as the secondary classification in the group interest model, as:
Can be " physical culture " class with " football, basketball, tennis, swimming " cluster;
Be " finance and economics " class with " fund, stock, futures, gold " cluster;
With " R﹠amp; B play to breathe out, allusion, rock and roll " cluster is " music " class;
Be " animal " class with " cat, dog, cavy, snake " cluster;
The class labels such as " physical culture; finance and economics; music; animal " that obtains by cluster again can be used as the one-level classification in the group interest model of client device access side, so just can be further the secondary classification that belongs to identical one-level classification be preserved same delegation or same row in the group interest model of the client device access side that the two-dimensional matrix form represents, obtain the group interest model of the client device access side that the two-dimensional matrix form represents, the group interest model of the client device access side that represents such as the two-dimensional matrix form that is obtained by above-mentioned class label can be:
Figure BDA00002333307400131
Can find out, belong to an one-level classification with each the secondary classification under the delegation in the two-dimensional matrix, certainly, in actual applications, the secondary classification that belongs to an one-level classification can also be kept in the same row, each the secondary classification that obtains under the same row in the two-dimensional matrix belongs to an one-level classification.Need to prove, in the part situation, secondary class number under each one-level classification is not identical, at this moment, can decide with the number of the secondary classification under the one-level classification maximum with comprising the secondary classification number of the number of one-level classification line number and the columns of the two-dimensional matrix of the group interest model that represents client device access side, and this moment, the element in the two-dimensional matrix may produce vacancy, can replenish certain data in the position of vacancy, such as numeral 0.Certainly, the step that the class label that cluster is first obtained carries out again cluster is optional, can determine according to concrete needs, when other forms of use represent the group interest model of client device access side, also can only carry out a cluster and obtain class label, the group interest model of the client device access side that represents such as the set form; Even the cluster that can also carry out more than three times is obtained three other class labels of level, when representing the group interest model of client device access side such as the matrix with three-dimensional, concrete grammar and above-mentioned use two-dimensional matrix represent that the method for group interest model of client device access side is similar, have just repeated no more at this.
As previously mentioned, the group interest model of setting up according to the historical behavior data of client device access side, can be conceptualized as the form that computer equipment represents and processes that is easier to, the group interest model of the client device access side that for example set form represents can abstractly be:
{a? 0,a 1,a? 2,a? 3,a? 4,a? 5,......,a i,......}
The group interest model of the client device access side that the two-dimensional matrix form represents can abstractly be:
a 11 a 12 · · · a 1 j · · · a 1 n · · · · · · · · · · · · · · · · · a i 1 a i 2 · · · a ij · · · a in · · · · · · · · · · · · · · · · · · a m 1 a m 2 · · · a mj · · · a mn
Wherein, each element of the group interest model of client device access side can be corresponding with a class label of client device access side colony Access Interest point.
S103: according to individual historical behavior data and the described group interest model of single client device, set up the individual interest model of client device access side, preserve the classification information that embodies client device access side's individual access point of interest in the described individual interest model;
After having set up the group interest model, can also in conjunction with the individual historical behavior data of each single client device, set up individual interest model respectively on its basis, in order to realize the personalization of navigation network address.Specifically when setting up individual interest model, equally can (method of concrete extraction Feature Words be similar with from colony's historical behavior extracting data Feature Words the time from individual historical behavior extracting data Feature Words, consult and carry out and get final product, here repeat no more), then according to the class label in the group interest model these Feature Words are classified, the classification under these Feature Words just can be regarded as the feature classification.For example, comprise the classifications such as news, physical culture, science and technology, amusement, automobile, video in the group interest model, the Feature Words that goes out from the individual historical behavior extracting data of certain client device comprises " Yao Ming ", " Barcelona ", " Gold Hawk TV Art Festival ", " Kangxu has come " etc., then " Yao Ming ", " Barcelona " can be divided into " physical culture " class, " Gold Hawk TV Art Festival ", " Kangxu has come " etc. are divided into " amusement " class, etc., accordingly, " physical culture " class and " amusement " class then become the feature classification of active client equipment.That is to say, can judge the Feature Words that has occurred in which classification in the group interest model going out from the extracting data of current single client device, and then just can be with the feature classification of these classifications as current single client device, if there are not the classifications such as " automobile ", " science and technology " in the Feature Words that the historical behavior data of current single client device extract, then these classifications just can not become the feature classification of current single client device.
As seen, when the classification information that represents with the form of gathering to preserve in the group interest model, the classification information of preserving in the individual interest model based on certain single client device foundation, just be the equal of the subset of group interest model, also, the classification of preserving in the individual interest model, it is the part of the classification of preserving in the group interest model, certainly, in some special case, also may be the whole of the classification of preserving in the group interest model.When the classification information that the form with two-dimensional matrix represents to preserve in the group interest model, the feature classification information of preserving in the individual interest model based on certain single client device foundation also can represent with the form of two-dimensional matrix, only some element in the group interest model may not can in individual interest model occurs, certainly, absent variable element can replace with 0 grade, to guarantee the structural integrity of two-dimensional matrix.For example, suppose the group interest model that represents take two-dimensional matrix as:
The Feature Words that extracts in the individual historical behavior data according to certain client device access side is judged in this access side's the individual historical behavior data and is only had football, basketball, swimming, fund, stock, R﹠amp; The Feature Words of B, cry of surprise Kazakhstan, rock and roll, cavy, these classifications of snake, does not comprise the Feature Words of the classifications such as tennis, futures, gold, allusion, cat, dog yet in this access side's the data, then this access side's individual interest model can be expressed as:
Figure BDA00002333307400152
Certainly, above-mentioned when representing access side's individual interest model with the two-dimensional matrix form, same delegation in the two-dimensional matrix can be each secondary classification under the same one-level classification equally, for example, in the above-mentioned two-dimensional matrix, one-level classification corresponding to the first row is physical culture, and the one-level classification that the second row is corresponding is financing, the one-level classification that the third line is corresponding is music, etc.
During network address classification that need in the navigation website page to select to show according to above individual interest model, just can with the feature classification information that comprises in the individual interest model, classify as the network address that needs in the navigation website page to show.For example, comprise " football ", " stock ", " rock and roll " these several feature classifications in certain access side's the individual interest model, then when being this client device show navigator Website page, just can in the page, demonstrate these several categories of websites.
Certainly, for a user, the classification of the information that it was accessed may have a variety of, but may not every classification all be that this user is interested, and even if the interested classification of this user, interested degree also may be different.Therefore, in order to embody more accurately user's personalized interest, can also give certain numerical value to each element in the individual interest model, namely the individual interest model of single client device access side quantized, and the numerical value that element is endowed in the individual interest model, can be according to individual historical behavior data acquisition, the below will introduce and how according to the historical behavior data individual interest model be quantized, and how according to the classification information of preserving in the individual interest model, determine the network address classification that in website navigation page, shows.
When specifically basis is based on individual other weight of each feature class of historical behavior data acquisition, can be from individual historical behavior extracting data Feature Words the time, obtain the occurrence frequency of each Feature Words, occurrence frequency according to each Feature Words under each feature classification, obtain other weight of each feature class, then preserve weight separately corresponding to described a plurality of feature classification, obtain the individual interest model of client device access side.For example, suppose that an interested feature classification of access side comprises: football, basketball, swimming, fund, stock, R﹠amp; B, cry of surprise Kazakhstan, rock and roll, cavy, snake if only embody the feature classification information, and are not considered weight, and then the form with two-dimensional matrix can be expressed as:
Figure BDA00002333307400161
Can be to some Feature Words should be arranged under each feature classification wherein, can be from aforesaid individual historical behavior extracting data Feature Words the time, the occurrence frequency of each Feature Words that gets access to, according to the frequency that each correlated characteristic word under each feature classification occurs, determine such other weight.Concrete when determining other weight of each feature class, the frequency addition that each correlated characteristic word under each feature classification can be occurred, the result that addition is obtained is as the weight of corresponding classification.Can obtain the individual interest model of following quantification after the frequency addition that then corresponding each Feature Words occurs under each feature classification in the above-mentioned two-dimensional matrix, example is as follows:
501 23 0 239 200 209 0 0 300 21 0 211 0 0 600 586
Can find out that from above two-dimensional matrix what the access side was most interested in may be the information of cavy and snake class, secondly be football, swimming etc., can embody in a word an access side for the interest level of different characteristic classification.
The below illustrates the frequency that how occurs by corresponding Feature Words under the feature classification and obtains other weight of each feature class, supposes the corresponding following Feature Words of " football " classification in the individual interest model of above-mentioned client device access side:
" Europe Cup ", " Barcelona ", " in super ", " Ba Luoteli ",
Wherein, can count following information from this access side's individual historical behavior data: the frequency that Feature Words " Europe Cup " occurs is 240, the frequency that Feature Words " Barcelona " occurs is 200, the frequency that Feature Words " in super " occurs is 20, the frequency that Feature Words " Ba Luoteli " occurs is 41, therefore, the frequency sum 501 that each Feature Words under " football " classification can be occurred is as the weight of " football " classification, the acquisition methods of the weight of other classifications can by that analogy, not repeat them here.In addition, in some cases, the numerical value of the Feature Words occurrence frequency data that obtain by historical data can be larger, at this moment, in order to make things convenient for computer equipment to expression and the processing of data, when determining the weight of respective classes by the Feature Words occurrence frequency under the classification, can do the weight of determining again respective classes after certain processing to the data of Feature Words occurrence frequency, for example with the data of each Feature Words occurrence frequency divided by a specific numerical value and the result who obtains after rounding determine the weight of each classification, " football " classification is supposed the Feature Words that this classification is corresponding following as example in the individual interest model of above-mentioned client device access side:
" Europe Cup ", " Barcelona ", " in super ", " Ba Luoteli ",
Wherein the frequency of Feature Words " Europe Cup " appearance is 2,400,000, the frequency that Feature Words " Barcelona " occurs is 2,000,000, the frequency that Feature Words " in super " occurs is 200,000, the frequency that Feature Words " Ba Luoteli " occurs is 410,000, the data of the frequency that each Feature Words under " football " classification can be occurred all divided by 10000 and round after again addition, with obtain with 501 weights as " football " classification.The available beneficial effect of the method for this definite classification weight is, if client device access side's individuality is less to the visit capacity of the Feature Words under some classification, for example only have 100 times, after divided by described specific numerical value, can obtain a smaller numerical value, such as 100/10000=0.01, and so little numerical value can be zero through the numerical value that is kept at after rounding in the individual interest model, the individual interest model that obtains like this can alleviate the pressure of computer equipment deal with data on the one hand, on the other hand, also guaranteed to be used for the validity of data of the Feature Words occurrence frequency of definite classification weight.
In the individual interest model of the client device access side that represents with the two-dimensional matrix form, the classification of feature can comprise one-level feature classification and secondary characteristics classification, comprise a plurality of secondary characteristics classifications under each one-level feature classification, when preserving a plurality of feature classifications and weight separately, can preserve with the form of two-dimensional matrix the weight of each secondary characteristics classification, wherein, every delegation in the two-dimensional matrix each secondary characteristics classification under can corresponding each one-level feature classification, and each other weight of one-level feature class can be determined by the weight of corresponding each secondary characteristics classification under each one-level feature classification.For example in the aforesaid individual interest model, can be respectively with the weight addition of each secondary characteristics classification of every delegation in the two-dimensional matrix, obtain each other weight of one-level feature class, weight such as the Sport Class of the first row representative in the two-dimensional matrix, can be each secondary characteristics classification on the first row weight and, that is:
(501+23+456+239)。
In addition, can also be combined with the Feature Words hot information of search engine server statistics, determine other weight of each feature class, it specifically can be the Feature Words hot information according to the search engine server statistics, obtain the focus degree information of each Feature Words, then the comprehensive frequency information of calculating each Feature Words according to occurrence frequency and the focus degree information of each Feature Words, respectively the comprehensive frequency information of each Feature Words under each feature classification is added up again, obtain other weight of each feature class.
Focus refers to current news item or the information that paid close attention to by broad masses or welcome, and also can be the relatively forward word of web search amount, such as " Beijing Auto Show ", " the London Olympic Games ", " Japanese violent earthquake " etc.These focuses can obtain by the data of crawl search engine and the search Visitor Logs of own server on the one hand, can also obtain focus by the focus vocabulary of number of site issue on the other hand.The hot information of Feature Words except comprising the focus, can also comprise other information that focus is relevant, and such as information such as the rank of each focus, visit capacities, these information can reflect " temperature " of this focus, i.e. the focus degree information.The focus degree information can be used to determine together with the occurrence frequency of Feature Words the comprehensive frequency information of each Feature Words, and the weight of classification can be determined according to the comprehensive frequency information of each Feature Words under the respective classes under each Feature Words.Be categorized as example with " football " in the individual interest model of above-mentioned client device access side, suppose the Feature Words that this classification is corresponding following:
" Europe Cup ", " Barcelona ", " in super ", " Ba Luoteli ",
The occurrence frequency of supposing above-mentioned Feature Words respectively is: 2,400,000,2,000,000,200,000,410,000; And wherein Feature Words " Europe Cup " as the visit capacity of current focus be 2,200 ten thousand, Feature Words " Ba Luoteli " is as the visit capacity 1,500,000 of current focus, " Barcelona " and " in super " two Feature Words then do not have corresponding focus visit information, at this moment, can be used to determine together with the occurrence frequency of Feature Words the comprehensive frequency information of each Feature Words according to the focus degree information, the comprehensive frequency information of above-mentioned each Feature Words that obtains can be respectively successively:
(2,400,000+2,200 ten thousand), 2,000,000,200,000, (410,000+1,500,000),
Then can determine to obtain other weight of each feature class according to the comprehensive frequency information of each Feature Words, as can be with the comprehensive frequency information addition of above-mentioned each Feature Words, as the weight of " football " classification.That is: the weight of football classification=(2,400,000+2,200 ten thousand)+2,000,000+200,000+(410,000+1,500,000).Here can with the comprehensive frequency information of each Feature Words simultaneously divided by behind the specific numerical value, be re-used as the foundation of determining other weight of feature class, to make things convenient for computer equipment to expression and the processing of data equally.
In addition, the focus degree information of Feature Words can also be multiply by a weighting coefficient, and carry out addition with described occurrence frequency, obtain the comprehensive frequency information of Feature Words; Wherein, described weighting coefficient is less than 1.Like this, the focus degree information of controlling feature word is on the impact of the weight of classification neatly, and the method that the embodiment of the invention is provided can have more flexibly adaptability.Take the weight of above-mentioned calculating football classification as example, after introducing described weighting coefficient, suppose use 0.1 as weighting coefficient, obtain at last the weight of football classification=(2,400,000+2,200 ten thousand * 0.1)+2,000,000+200,000+(410,000+1,500,000 * 0.1).And adjust according to actual needs the value of weighting coefficient, the focus degree information of controlling feature word is on the impact of the weight of classification neatly.
S104: determine the network address classification that in website navigation page, shows according to described individual interest model.
After having determined each access side's individual interest model, just can determine the network address classification that in each access side's website navigation page, shows according to each access side's individual interest model.For example, if only preserve the feature classification information in the individual interest model, and do not comprise other weight of feature class, show in the page directly that then these network address classifications get final product.If also preserve other weight of each feature class in the model, then can also determine the network address classification that in website navigation page, shows according to the weight of each classification, and put in order etc.During specific implementation, can be according to other weight of each feature class, each feature classification is sorted, result according to ordering determines the network address classification that shows and puts in order in this access side's website navigation page, be presented in the website navigation page such as front 10 classification that will sort, and decide each to be presented at putting in order of the classification of network address in the row page according to the priority of ordering.
In the access side's who represents with the two-dimensional matrix form individual interest model, weight according to each classification sorts to each classification, can sort to each one-level feature classification according to each other weight of one-level feature class, and according to the weight of each secondary characteristics classification under the one-level feature classification, each secondary characteristics classification is sorted, and then the result according to ordering determines the network address classification that shows and puts in order in website navigation page.
Need to prove, the purpose of the embodiment of the invention is to determine needs to represent for which network address classification in the network address Webpage, represent which network address as for correspondence under each network address classification, can determine according to the demand of reality, do not limit in the embodiment of the invention.For example, can adopt the mode of manual sorting, determine to represent under each network address classification which network address clauses and subclauses, perhaps, can also be according to the collected data of preamble, the visiting frequency of each network address under all kinds of network address is added up and sorted, and several network address that ordering is earlier are as the network address clauses and subclauses that need under the corresponding network address classification to represent, etc.Which under determining each network address classification, represent respectively after the network address clauses and subclauses, just can generate the packet of website navigation page, when receiving client and browse the request of website navigation page, just packet can be returned to client, have client to resolve and represent, the website navigation page that so just the mode that provides according to the embodiment of the invention can be generated represents to the user.
Need to prove in addition, in actual applications, can be when receiving the request of the access navigation website that sends certain client device access side, sets up this access side's individual interest model, and returns accordingly the data of the navigation website page to this access side.But, because the calculated amount that relates to may be larger, therefore, calculate again afterwards if receive the request of access side's access websites, then may cause operating lag, the access side need to wait for that the long time just can demonstrate the Extraordinary website navigation page.For fear of this operating lag, set up the operation of group interest model and individual interest model and can under off-line state, carry out, after setting up each access side's individual interest model, can be kept at server end.Certainly, when server end is preserved individual interest model, need to preserve the mapping relations between individual interest model and the client device access square mark, like this when receiving the request of access of certain client device access side, could be according to these mapping relations, find this access side's individual interest model, and return the related data of the navigation network address page.Wherein, sign about client device access side, the accounts information that can be registered in the systems such as navigation website by client device access side etc. represents, if perhaps not registration then also can be represented by the information such as IP address of client device access side.
By above description as can be known, the method of the Web side navigation that provides by the embodiment of the invention, can be by the classification information of each classification of preserving in the group interest model, weight such as each classification, can objectively reflect client device access side to the interest level of each classification, and also just more meet the requirements for access of client device access side according to the network address classification that in website navigation page, shows that described classification information is determined.
Corresponding with the personalized website navigation method that the embodiment of the invention provides, the embodiment of the invention also provides a kind of personalized website navigation device, and referring to Fig. 2, this device can comprise:
Data capture unit 201 is used for obtaining many stylobates in colony's historical behavior data of the Access Events of client device;
The group interest model is set up unit 202, is used for setting up according to described colony historical behavior data the group interest model of client device access side, preserves the classification information that embodies client device access side colony Access Interest point in the described interest model;
Individual interest model is set up unit 203, be used for individual historical behavior data and described group interest model according to single client device, set up the individual interest model of client device access side, preserve the classification information that embodies client device access side's individual access point of interest in the described individual interest model;
Classification determining unit 204 is for the network address classification of determining according to described individual interest model to show at website navigation page.
Under a kind of embodiment, can only preserve the interested classification in client device access side in the individual interest model has which, and at this moment, described individual interest model is set up unit 203 and specifically can be comprised:
The First Characteristic word extracts subelement, is used for from described individual historical behavior extracting data Feature Words;
The first classification subelement for the classification information of preserving according to described group interest model, is classified to each Feature Words, obtains several feature classifications;
First preserves subelement, is used for preserving each feature classification, obtains described individual interest model.
Perhaps, under another kind of embodiment, for the point of interest that embodies client device access side and to the interest level of each point of interest, can also preserve other weight of each feature class in the described individual interest model, described weight is used for embodying client device access side's individuality to the interest level of each classification, at this moment, described classification determining unit 204 specifically can comprise:
The ordering subelement is used for according to other weight of each feature class each feature classification being sorted, and determines the network address classification that shows and put in order in website navigation page according to ranking results.
In the situation of the weight that need in individual interest model, preserve each classification, can obtain by the statistics to the individual historical behavior data of single client device access side, concrete, described individual interest model is set up unit 203 and specifically can be comprised:
Frequency obtains subelement, is used for from described individual historical behavior extracting data Feature Words, and obtains the occurrence frequency of each Feature Words in described individual historical behavior data;
The second classification subelement for the classification information of preserving according to described group interest model, is classified to described Feature Words, obtains several feature classifications; The secondary characteristics classification that so obtains, the namely current interested classification in single client device access side;
Weight obtains subelement, and the occurrence frequency for each Feature Words that comprises according to each feature classification obtains other weight of each feature class;
Second preserves subelement, is used for preserving each feature classification and corresponding weight, obtains described individual interest model.
In actual applications, classification in the described group interest model can comprise one-level classification and secondary classification, comprise a plurality of secondary classifications under each one-level classification, and preserve each secondary classification with the form of two-dimensional matrix, wherein, each secondary classification under corresponding each the one-level classification of every delegation in the described two-dimensional matrix, at this moment, described the second classification subelement specifically can be used for:
According to the secondary classification information of preserving in the described group interest model, described Feature Words is classified, obtain several secondary characteristics classifications;
Described second preserves subelement specifically is used for:
The weight of each secondary characteristics classification is saved in element place corresponding to described two-dimensional matrix;
Described ordering subelement specifically is used for:
With the weight addition of each secondary characteristics classification of every delegation in the described two-dimensional matrix, obtain other weight of one-level feature class respectively;
According to each other weight of one-level feature class each one-level feature classification is sorted, and according to the weight of each secondary characteristics classification, each secondary characteristics classification is sorted.
Wherein, when reentrying other weight of feature class, can only consider the frequency information of each Feature Words under the same feature classification, at this moment, described weight obtains subelement and specifically can comprise:
The first cumulative subelement, the occurrence frequency that is used for respectively each Feature Words that each feature classification is comprised adds up, and obtains other weight of each feature class.
Perhaps, can also consider simultaneously hot information current in the search engine server, described weight obtains subelement and comprises:
Hot information obtains subelement, is used for the Feature Words hot information according to the search engine server statistics, obtains the focus degree information of each Feature Words;
Comprehensive frequency computation subunit is used for the comprehensive frequency information of calculating each Feature Words according to occurrence frequency and the described focus degree information of each Feature Words;
The second cumulative subelement, the comprehensive frequency information that is used for respectively each Feature Words that each feature classification is comprised adds up, and obtains other weight of each feature class.
Wherein, described comprehensive frequency computation subunit specifically is used for:
The focus degree information of Feature Words be multiply by a weighting coefficient, and carry out addition with described occurrence frequency, obtain the comprehensive frequency information of Feature Words; Wherein, described weighting coefficient is less than 1.
During specific implementation, the group interest model also can be by setting up from the mode of colony's historical behavior extracting data Feature Words, and described group interest model is set up the unit and comprised:
The Second Characteristic word extracts subelement, is used for from described colony historical behavior extracting data Feature Words;
The cluster subelement is used for described Feature Words from described colony historical behavior extracting data is carried out cluster, obtains a plurality of class labels, preserves described a plurality of class label, obtains described group interest model.
Wherein, described cluster subelement specifically is used for:
Described Feature Words from described colony historical behavior extracting data is carried out normalized; Feature Words after the normalized is carried out cluster, obtain a plurality of class labels.
The aforementioned personalized website navigation method that provides with the embodiment of the invention and install corresponding, the embodiment of the invention also provides a kind of personalized website navigation system, and referring to Fig. 3, this system can comprise server end 301 and client 302, wherein, server end 301 can comprise:
Data capture unit 3011 is used for obtaining many stylobates in colony's historical behavior data of the Access Events of client device;
The group interest model is set up unit 3012, is used for setting up according to described colony historical behavior data the group interest model of client device access side, preserves the classification information that embodies client device access side colony Access Interest point in the described interest model;
Individual interest model is set up unit 3013, be used for individual historical behavior data and described group interest model according to single client device, set up the individual interest model of client device access side, preserve the classification information that embodies client device access side's individual access point of interest in the described individual interest model;
Storage unit 3014 is for the mapping relations between the identification information of preserving individual interest model and client device access side; The identification information of client device access side can be client device access side at the accounts information of server end registration, the information that perhaps IP address etc. can unique identification client device access side identity;
Described client 302 comprises:
Request of access transmitting element 3021 is used for sending the request of access of accessing website navigation page to described server, carries the identification information of this client device access side in the described request of access; For example, can sign in in the situation of server end in client device access side, when the user submitted request of access to, client just can send to server end with request of access, carried simultaneously this user's accounts information;
Described server end 301 also comprises:
Individual interest model query unit 3015 is used for by inquiring about described mapping relations, obtaining individual interest model corresponding to this client device access side when receiving the request of access of described client;
Classification determining unit 3016 is for the network address classification of determining according to described individual interest model to show at website navigation page;
Network address determining unit 3017 to be represented be used for to be determined the network address clauses and subclauses that need represent respectively under each network address classification that website navigation page represents;
Data are returned unit 3018, are used for the network address clauses and subclauses that network address classification that described website navigation page need be represented and lower need of all categories represent and return to client;
Described client 302 also comprises:
Represent unit 3022, be used for that described data are returned the data of returning the unit and resolve, represent described website navigation page.
Wherein, about above-mentioned personalized website navigation system, the details of specific implementation can with reference to the record in the aforementioned Personalized Navigation device, repeat no more here.
In order to understand better the embodiment of the invention, be introduced below by the example in the practical application.Referring to Fig. 4, it is a website navigation page synoptic diagram, content in the square frame 401 wherein is exactly to have carried out network address classification that personalized recommendation obtains and concrete network address according to the method for the embodiment of the invention, also namely when specific implementation, the part of personalized recommendation can be the part of whole website navigation page, certainly, in other implementations, also can be so that what show in the whole website navigation page all be personalized recommendation content about the active user.In this example shown in Figure 4, for the active user, the network address classification of recommending has " in riotous profusion video display ", " object for appreciation is played games ", " shopping of going window-shopping ", " hobby ", " physical culture information ", " focus today " etc., and the individual interest model that these classifications can make basis set up before obtains; Wherein, the website links of recommending under " in riotous profusion video display " classification has " new The Bride With White Hair ", " south of the River style Divine Comedy " etc., and the website links of recommending under " shopping of going window-shopping " classification has " Taobao store ", " all objective T-shirts " etc., by that analogy.Website links under each classification can be rule of thumb or the current acquisition of informations such as access focus arrive.
Intrinsic not relevant with any certain computer, virtual system or miscellaneous equipment with demonstration at this algorithm that provides.Various general-purpose systems also can be with using based on the teaching at this.According to top description, it is apparent constructing the desired structure of this type systematic.In addition, the present invention is not also for any certain programmed language.Should be understood that and to utilize various programming languages to realize content of the present invention described here, and the top description that language-specific is done is in order to disclose preferred forms of the present invention.
In the instructions that provides herein, a large amount of details have been described.Yet, can understand, embodiments of the invention can be put into practice in the situation of these details not having.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand one or more in each inventive aspect, in the description to exemplary embodiment of the present invention, each feature of the present invention is grouped together in single embodiment, figure or the description to it sometimes in the above.Yet the method for the disclosure should be construed to the following intention of reflection: namely the present invention for required protection requires the more feature of feature clearly put down in writing than institute in each claim.Or rather, as following claims reflected, inventive aspect was to be less than all features of the disclosed single embodiment in front.Therefore, follow claims of embodiment and incorporate clearly thus this embodiment into, wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that and can adaptively change and they are arranged in one or more equipment different from this embodiment the module in the equipment among the embodiment.Can be combined into a module or unit or assembly to the module among the embodiment or unit or assembly, and can be divided into a plurality of submodules or subelement or sub-component to them in addition.In such feature and/or process or unit at least some are mutually repelling, and can adopt any combination to disclosed all features in this instructions (comprising claim, summary and the accompanying drawing followed) and so all processes or the unit of disclosed any method or equipment make up.Unless in addition clearly statement, disclosed each feature can be by providing identical, being equal to or the alternative features of similar purpose replaces in this instructions (comprising claim, summary and the accompanying drawing followed).
In addition, those skilled in the art can understand, although embodiment more described herein comprise some feature rather than further feature included among other embodiment, the combination of the feature of different embodiment means and is within the scope of the present invention and forms different embodiment.For example, in the following claims, the one of any of embodiment required for protection can be used with array mode arbitrarily.
All parts embodiment of the present invention can realize with hardware, perhaps realizes with the software module of moving at one or more processor, and perhaps the combination with them realizes.It will be understood by those of skill in the art that and to use in practice microprocessor or digital signal processor (DSP) to realize according to some or all some or repertoire of parts in the personalized website navigation equipment of the embodiment of the invention.The present invention can also be embodied as be used to part or all equipment or the device program (for example, computer program and computer program) of carrying out method as described herein.Such realization program of the present invention can be stored on the computer-readable medium, perhaps can have the form of one or more signal.Such signal can be downloaded from internet website and obtain, and perhaps provides at carrier signal, perhaps provides with any other form.
It should be noted above-described embodiment the present invention will be described rather than limit the invention, and those skilled in the art can design alternative embodiment in the situation of the scope that does not break away from claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and is not listed in element or step in the claim.Being positioned at word " " before the element or " one " does not get rid of and has a plurality of such elements.The present invention can realize by means of the hardware that includes some different elements and by means of the computing machine of suitably programming.In having enumerated the unit claim of some devices, several in these devices can be to come imbody by same hardware branch.The use of word first, second and C grade does not represent any order.Can be title with these word explanations.
The application can be applied to computer system/server, and it can be with numerous other universal or special computingasystem environment or configuration operation.The example that is suitable for well-known computing system, environment and/or the configuration used with computer system/server includes but not limited to: personal computer system, server computer system, thin client, thick client computer, hand-held or laptop devices, system, set-top box, programmable consumer electronics, NetPC Network PC, minicomputer system, large computer system based on microprocessor and comprise the distributed cloud computing technology environment of above-mentioned any system, etc.Computer system/server can be described under the general linguistic context of the computer system executable instruction (such as program module) of being carried out by computer system.Usually, program module can comprise routine, program, target program, assembly, logic, data structure etc., and they are carried out specific task or realize specific abstract data type.Computer system/server can be implemented in distributed cloud computing environment, and in the distributed cloud computing environment, task is by carrying out by the teleprocessing equipment of communication network link.In distributed cloud computing environment, program module can be positioned on the Local or Remote computing system storage medium that comprises memory device.

Claims (20)

1. personalized website navigation method comprises:
Obtain many stylobates in colony's historical behavior data of the Access Events of client device;
Set up the group interest model of client device access side according to described colony historical behavior data, preserve the classification information that embodies client device access side colony Access Interest point in the described interest model;
Individual historical behavior data and described group interest model according to single client device, set up the individual interest model of client device access side, preserve the classification information that embodies client device access side's individual access point of interest in the described individual interest model;
Determine the network address classification that in website navigation page, shows according to described individual interest model.
2. the method for claim 1, described individual historical behavior data and described group interest model according to single client device are set up the individual interest model of client device access side, comprising:
From described individual historical behavior extracting data Feature Words;
According to the classification information of preserving in the described group interest model, each Feature Words is classified, obtain several feature classifications;
Preserve each feature classification, obtain described individual interest model.
3. the method for claim 1, also preserve other weight of each feature class in the described individual interest model, described weight be used for to embody client device access side's individuality to the interest level of each classification, describedly determines that according to described individual interest model the network address classification that shows comprises in website navigation page:
According to other weight of each feature class each feature classification is sorted, determine the network address classification that in website navigation page, shows and put in order according to ranking results.
4. method as claimed in claim 3, described individual historical behavior data and described group interest model according to single client device are set up the individual interest model of client device access side, comprising:
From described individual historical behavior extracting data Feature Words, and obtain the occurrence frequency of each Feature Words in described individual historical behavior data;
According to the classification information of preserving in the described group interest model, described Feature Words is classified, obtain several feature classifications;
According to the occurrence frequency of each Feature Words that comprises in each feature classification, obtain other weight of each feature class;
Preserve each feature classification and corresponding weight, obtain described individual interest model.
5. method as claimed in claim 4, classification in the described group interest model comprises one-level classification and secondary classification, comprise a plurality of secondary classifications under each one-level classification, and preserve each secondary classification with the form of two-dimensional matrix, wherein, each secondary classification under corresponding each the one-level classification of the every delegation in the described two-dimensional matrix, described classification information according to preserving in the described group interest model, described Feature Words is classified, obtains several feature classifications and comprise:
According to the secondary classification information of preserving in the described group interest model, described Feature Words is classified, obtain several secondary characteristics classifications;
Each feature classification of described preservation and corresponding weight comprise:
The weight of each secondary characteristics classification is saved in element place corresponding to described two-dimensional matrix;
Described each feature classification the ordering according to other weight of each feature class comprises:
With the weight addition of each secondary characteristics classification of every delegation in the described two-dimensional matrix, obtain other weight of one-level feature class respectively;
According to each other weight of one-level feature class each one-level feature classification is sorted, and according to the weight of each secondary characteristics classification, each secondary characteristics classification is sorted.
6. method as claimed in claim 4, the occurrence frequency of described each Feature Words according to comprising in each feature classification obtains other weight of each feature class and comprises:
Respectively the occurrence frequency of each Feature Words of comprising in each feature classification added up, obtain other weight of each feature class.
7. method as claimed in claim 4, the occurrence frequency of described each Feature Words according to comprising in each feature classification obtains other weight of each feature class and comprises:
According to the Feature Words hot information of search engine server statistics, obtain the focus degree information of each Feature Words;
The comprehensive frequency information of calculating each Feature Words according to occurrence frequency and the described focus degree information of each Feature Words;
Respectively the comprehensive frequency information of each Feature Words of comprising in each feature classification added up, obtain other weight of each feature class.
8. method as claimed in claim 7, the described comprehensive frequency information of calculating each Feature Words according to occurrence frequency and the described focus degree information of each Feature Words comprises:
The focus degree information of Feature Words be multiply by a weighting coefficient, and carry out addition with described occurrence frequency, obtain the comprehensive frequency information of Feature Words; Wherein, described weighting coefficient is less than 1.
9. the method for claim 1, the described group interest model of setting up client device access side according to described colony historical behavior data comprises:
From described colony historical behavior extracting data Feature Words;
Described Feature Words from described colony historical behavior extracting data is carried out cluster, obtain a plurality of class labels, preserve described a plurality of class label, obtain described group interest model.
10. method as claimed in claim 9 is describedly carried out cluster to described Feature Words, obtains a plurality of class labels and comprises:
Described Feature Words from described colony historical behavior extracting data is carried out normalized;
Feature Words after the normalized is carried out cluster, obtain a plurality of class labels.
11. a personalized website navigation device comprises:
Data capture unit is used for obtaining many stylobates in colony's historical behavior data of the Access Events of client device;
The group interest model is set up the unit, is used for setting up according to described colony historical behavior data the group interest model of client device access side, preserves the classification information that embodies client device access side colony Access Interest point in the described interest model;
Individual interest model is set up the unit, be used for individual historical behavior data and described group interest model according to single client device, set up the individual interest model of client device access side, preserve the classification information that embodies client device access side's individual access point of interest in the described individual interest model;
The classification determining unit is for the network address classification of determining according to described individual interest model to show at website navigation page.
12. device as claimed in claim 11, described individual interest model is set up the unit and is comprised:
The First Characteristic word extracts subelement, is used for from described individual historical behavior extracting data Feature Words;
The first classification subelement for the classification information of preserving according to described group interest model, is classified to each Feature Words, obtains several feature classifications;
First preserves subelement, is used for preserving each feature classification, obtains described individual interest model.
13. device as claimed in claim 11 is also preserved other weight of each feature class in the described individual interest model, described weight is used for embodying client device access side's individuality to the interest level of each classification, and described classification determining unit comprises:
The ordering subelement is used for according to other weight of each feature class each feature classification being sorted, and determines the network address classification that shows and put in order in website navigation page according to ranking results.
14. device as claimed in claim 13, described individual interest model is set up the unit and is comprised:
Frequency obtains subelement, is used for from described individual historical behavior extracting data Feature Words, and obtains the occurrence frequency of each Feature Words in described individual historical behavior data;
The second classification subelement for the classification information of preserving according to described group interest model, is classified to described Feature Words, obtains several feature classifications;
Weight obtains subelement, and the occurrence frequency for each Feature Words that comprises according to each feature classification obtains other weight of each feature class;
Second preserves subelement, is used for preserving each feature classification and corresponding weight, obtains described individual interest model.
15. device as claimed in claim 14, classification in the described group interest model comprises one-level classification and secondary classification, comprise a plurality of secondary classifications under each one-level classification, and preserve each secondary classification with the form of two-dimensional matrix, wherein, each secondary classification under corresponding each the one-level classification of every delegation in the described two-dimensional matrix, described the second classification subelement specifically is used for:
According to the secondary classification information of preserving in the described group interest model, described Feature Words is classified, obtain several secondary characteristics classifications;
Described second preserves subelement specifically is used for:
The weight of each secondary characteristics classification is saved in element place corresponding to described two-dimensional matrix;
Described ordering subelement specifically is used for:
With the weight addition of each secondary characteristics classification of every delegation in the described two-dimensional matrix, obtain other weight of one-level feature class respectively;
According to each other weight of one-level feature class each one-level feature classification is sorted, and according to the weight of each secondary characteristics classification, each secondary characteristics classification is sorted.
16. device as claimed in claim 14, described weight obtain subelement and comprise:
The first cumulative subelement, the occurrence frequency that is used for respectively each Feature Words that each feature classification is comprised adds up, and obtains other weight of each feature class.
17. device as claimed in claim 14, described weight obtain subelement and comprise:
Hot information obtains subelement, is used for the Feature Words hot information according to the search engine server statistics, obtains the focus degree information of each Feature Words;
Comprehensive frequency computation subunit is used for the comprehensive frequency information of calculating each Feature Words according to occurrence frequency and the described focus degree information of each Feature Words;
The second cumulative subelement, the comprehensive frequency information that is used for respectively each Feature Words that each feature classification is comprised adds up, and obtains other weight of each feature class.
18. device as claimed in claim 17, described comprehensive frequency computation subunit specifically is used for:
The focus degree information of Feature Words be multiply by a weighting coefficient, and carry out addition with described occurrence frequency, obtain the comprehensive frequency information of Feature Words; Wherein, described weighting coefficient is less than 1.
19. device as claimed in claim 11, described group interest model is set up the unit and is comprised:
The Second Characteristic word extracts subelement, is used for from described colony historical behavior extracting data Feature Words;
The cluster subelement is used for described Feature Words from described colony historical behavior extracting data is carried out cluster, obtains a plurality of class labels, preserves described a plurality of class label, obtains described group interest model.
20. device as claimed in claim 19, described cluster subelement specifically is used for:
Described Feature Words from described colony historical behavior extracting data is carried out normalized; Feature Words after the normalized is carried out cluster, obtain a plurality of class labels.
CN201210426285.6A 2012-10-30 2012-10-30 Personalized website navigation method and apparatus Expired - Fee Related CN102982079B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210426285.6A CN102982079B (en) 2012-10-30 2012-10-30 Personalized website navigation method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210426285.6A CN102982079B (en) 2012-10-30 2012-10-30 Personalized website navigation method and apparatus

Publications (2)

Publication Number Publication Date
CN102982079A true CN102982079A (en) 2013-03-20
CN102982079B CN102982079B (en) 2017-03-15

Family

ID=47856099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210426285.6A Expired - Fee Related CN102982079B (en) 2012-10-30 2012-10-30 Personalized website navigation method and apparatus

Country Status (1)

Country Link
CN (1) CN102982079B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014183544A1 (en) * 2013-05-13 2014-11-20 Tencent Technology (Shenzhen) Company Limited Method and device for generating a personalized navigation webpage
CN106294500A (en) * 2015-06-09 2017-01-04 深圳市腾讯计算机系统有限公司 The method for pushing of content item, Apparatus and system
CN106528737A (en) * 2016-10-27 2017-03-22 中企动力科技股份有限公司 Website navigation display method and system
CN108683734A (en) * 2018-05-15 2018-10-19 广州虎牙信息科技有限公司 Category method for pushing, device and storage device, computer equipment
CN110020139A (en) * 2017-11-14 2019-07-16 广州市动景计算机科技有限公司 Navigate website recommended method, device, calculating equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551806A (en) * 2008-04-03 2009-10-07 北京搜狗科技发展有限公司 Personalized website navigation method and system
CN101819572A (en) * 2009-09-15 2010-09-01 电子科技大学 Method for establishing user interest model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551806A (en) * 2008-04-03 2009-10-07 北京搜狗科技发展有限公司 Personalized website navigation method and system
CN101819572A (en) * 2009-09-15 2010-09-01 电子科技大学 Method for establishing user interest model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
教巍巍: "基于Web挖掘的个性化用户兴趣模型的研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》, no. 9, 15 September 2006 (2006-09-15), pages 26 - 30 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014183544A1 (en) * 2013-05-13 2014-11-20 Tencent Technology (Shenzhen) Company Limited Method and device for generating a personalized navigation webpage
CN106294500A (en) * 2015-06-09 2017-01-04 深圳市腾讯计算机系统有限公司 The method for pushing of content item, Apparatus and system
CN106528737A (en) * 2016-10-27 2017-03-22 中企动力科技股份有限公司 Website navigation display method and system
CN110020139A (en) * 2017-11-14 2019-07-16 广州市动景计算机科技有限公司 Navigate website recommended method, device, calculating equipment and storage medium
CN108683734A (en) * 2018-05-15 2018-10-19 广州虎牙信息科技有限公司 Category method for pushing, device and storage device, computer equipment
CN108683734B (en) * 2018-05-15 2021-04-09 广州虎牙信息科技有限公司 Method and device for pushing classes, storage equipment and computer equipment

Also Published As

Publication number Publication date
CN102982079B (en) 2017-03-15

Similar Documents

Publication Publication Date Title
CN102831199B (en) Method and device for establishing interest model
CN102822815B (en) For the method and system utilizing browser history to carry out action suggestion
CN102364473B (en) Netnews search system and method based on geographic information and visual information
CN111737582B (en) Content recommendation method and device
CN106686063A (en) Information recommendation method and apparatus, and electronic device
CN102054003B (en) Methods and systems for recommending network information and creating network resource index
CN102902753A (en) Method and device for complementing search terms and establishing individual interest models
CN111708740A (en) Mass search query log calculation analysis system based on cloud platform
CN103886090A (en) Content recommendation method and device based on user favorites
CN102915380A (en) Method and system for carrying out searching on data
WO2011008848A2 (en) Activity based users' interests modeling for determining content relevance
CN102982042A (en) Personalization content recommendation method and platform and system
CN102298616A (en) Method and device for providing related sub links in search result
CN102929939A (en) Personalized information supply method and device
CN102930009B (en) Individual website navigation system
CN102982134A (en) System enabling recommended web site information to be displayed in browser address bar
CN104423621A (en) Pinyin string processing method and device
CN102982079A (en) Method and device for personalized website navigation
CN112100221B (en) Information recommendation method and device, recommendation server and storage medium
CN112488781A (en) Search recommendation method and device, electronic equipment and readable storage medium
CN102521249A (en) Show method and device based on homogeneous resources
CN103745006A (en) Internet information searching system and internet information searching method
CN101840420B (en) Search aid system, search aid method and program
CN102955847A (en) System for loading website data on browser format page
CN103425767A (en) Method and system for determining prompt data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170315

Termination date: 20211030