CN103235824A - Method and system for determining web page texts users interested in according to browsed web pages - Google Patents

Method and system for determining web page texts users interested in according to browsed web pages Download PDF

Info

Publication number
CN103235824A
CN103235824A CN201310163619XA CN201310163619A CN103235824A CN 103235824 A CN103235824 A CN 103235824A CN 201310163619X A CN201310163619X A CN 201310163619XA CN 201310163619 A CN201310163619 A CN 201310163619A CN 103235824 A CN103235824 A CN 103235824A
Authority
CN
China
Prior art keywords
text
web page
user
classification
interest
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310163619XA
Other languages
Chinese (zh)
Inventor
刘臻
吕琳媛
肖思源
刘润然
佘莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI HEGUANG INFORMATION TECHNOLOGY Co Ltd
Original Assignee
SHANGHAI HEGUANG INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI HEGUANG INFORMATION TECHNOLOGY Co Ltd filed Critical SHANGHAI HEGUANG INFORMATION TECHNOLOGY Co Ltd
Priority to CN201310163619XA priority Critical patent/CN103235824A/en
Publication of CN103235824A publication Critical patent/CN103235824A/en
Pending legal-status Critical Current

Links

Images

Abstract

A method for determining related web page texts users interested in according to a browsed web page URL (Uniform Resource Locator) comprises the steps of performing filtration on browsed web pages of the users in a certain period of time, removing useless web pages and web pages which cannot be accessed and linking rest URL addresses through the filtration to obtain text contents of pages and extract title and text information; defining a category for every web page text of web page document collection according to a predefined topic category; and performing access frequency statistics on every category and enabling a web page set with the highest access frequency value to serve as the related web page texts the users interested in and analysis data to achieve directional push of data business and improve the credibility of data business push.

Description

Determine the method and system of user's interest web page text according to browsing page
Technical field
The present invention relates to a kind of method and system of determining user's interest related web page text according to browsing page URL, be used at user interest preference propelling data business field.
Background technology
Data service pushes and has begun comprehensively to burst forth in 2011, emerge numerous mechanisms in the industry, data service pushes also website combination from the phase one, and (medium are selected very important, make up and select according to audient's characteristics of medium), (content optimization is very important to subordinate phase context orientation, attract audient's type to make up according to content), three phases is that the directed propelling movement mode of crowd of core changes with crowd's directional technology till now again, more focuses on the identification to the crowd.In addition, location-based data service pushes in another one dimension development and ripe.
The objective of the invention is to accurately determine user's interest related web page text according to browsing page URL, and then can follow the tracks of each user's behavioural habits, and its behavior and browsing content analyzed, predict its interest preference, concentrate on the object of receiving information interested and the user who needs is arranged, realize that the orientation of data service pushes, improve the confidence level that data service pushes, improve the user preferences degree, can reduce data noise better.
Summary of the invention
The invention provides a kind of method of determining user's interest related web page text according to browsing page URL, comprise step: the webpage that user in certain period is browsed carries out filtration treatment, get rid of useless pages and some webpage that can't visit, to linking through the remaining URL address of screening, obtain the content of text of the page, extract title and text message; According to predefined subject categories, for each web document of web document set is determined a classification; To each class frequency statistics that conducts interviews, the highest webpage collection of visiting frequency value is as the user's interest related web page.
Wherein, need in the Web page classifying step to make up and the training net web page classifier, input training text collection, by text representation and feature selecting, make up sorter model according to the feature dictionary, be output as the classifying rules collection that is similar to tree structure, the training process of Web page classifying device namely is that training sample is constantly divided into groups, by setting up target variable about the classification forecast model of each input variable, packet under the different values with target variable of round Realization input variable, and then for classification and prediction to new data-objects.
The Web page classifying device uses the decision tree classification method, the steps include:
Test sample book is expressed as the form same with training sample;
T ← decision tree root node;
The testing attribute and the threshold value that depend on plan tree node t compare the value of sample character pair to be tested with it, determine according to the standard of t node division then to be
The right child of left child or t ← t of t ← t;
Recurrence is carried out to previous step, is leafy node up to t;
The classification of test sample book is the classification of leaf t representative.
In addition, in the Web page classifying step, the text to be sorted that input was handled through the text pretreatment module, pass through text representation, carry out feature selecting according to the feature dictionary, carry out text classification with the classifying rules of training the sorter model that generates, be output as the affiliated classification information of each text.
In addition, in the text representation step, adopt characteristic vector space to represent text feature, document i can be expressed as the proper vector of following formula:
W ij=(W i1,W i2,...,W im)
Wherein, W IjBe entry j frequency of occurrences f in document i IjFunction, directly use entry in the frequency of occurrences of document as eigenwert, computing formula is:
W ij=f ij
Also have, in the feature selecting step, adopt the feature dimension reduction method based on improved χ 2 statistics and pattern polymerization, step is:
⑴ according to formula
χ ij ′ 2 = sign ( n 11 × n 22 - n 12 × n 21 ) n × ( n 11 × n 22 - n 12 × n 21 ) 2 ( n 11 + n 12 ) × ( n 21 + n 22 ) × ( n 11 + n 21 ) × ( n 12 + n 22 )
Figure BDA00003146952400032
Calculate each entry to the improved χ of every class 2Statistic;
⑵ according to formula CHI i = max { | χ i 1 ′ 2 | , | χ i 2 ′ 2 | , · · · , | χ is ′ 2 | } Calculate the CHI of each entry, then feature is sorted from high to low by the CHI value, choose preceding M big feature entry of CHI value, the eigenmatrix that then obtains thus has M pattern;
⑶ for relatively whether each pattern is consistent to all kinds of classification contribution proportions at first handle the improvement statistic unification of each pattern between [1,1], and processing mode is as follows:
A ij = χ ij ′ 2 / ( max - min )
Wherein max, min are respectively the improvement χ of pattern i 2The maximal value of statistic and minimum value;
⑷ adopt simple clustering algorithm, carry out cluster (pattern of every line display of A) according to the pattern of A, of a sort pattern is polymerized to a new pattern, to obtain L new model like this, wherein L is much smaller than M, adopt the stratification of cohesion to carry out cluster, the most frequently used Euclidean distance is adopted in range observation, and is as follows:
d ( i , j ) = ( A i 1 - A j 1 ) 2 + ( A i 2 - A j 2 ) 2 + · · · + ( A is - A js ) 2
With Euclidean distance d (i j) carries out cluster less than the pattern of certain threshold value, and the process of cluster is:
1. calculate distance less than the pattern of threshold value according to matrix A, it is carried out cluster;
2. after the cluster, the pattern in every class is merged into a pattern, and this pattern comprises the whole entries in this class, and its word frequency is exactly the word frequency sum of these entries, recomputates the improvement statistic of new model, forms matrix A again according to new model;
Repeat 1., 2. two steps, till all patterns can not polymerization;
⑸ recomputate the CHI value of each characteristic item, the individual characteristic item of L ' before selecting according to CHI value size.
The present invention also provides a kind of system that determines user's interest related web page text according to browsing page URL, comprise that web page text obtains submodule, web page text classification submodule, visiting frequency statistics submodule and the current content interest of user and determines submodule, web page text obtains submodule the webpage that user in certain period browses is carried out filtration treatment, get rid of useless pages and some webpage that can't visit, to linking through the remaining URL address of screening, obtain the content of text of the page, extract title and text message; Web page text classification submodule is according to predefined subject categories, for each web document of web document set is determined a classification; Visiting frequency statistics submodule is to each class frequency statistics that conducts interviews, and the current content interest of user determines that the submodule webpage collection that the visiting frequency value is the highest is as the user's interest related web page.
Wherein, need to make up and the training net web page classifier in the web page text classification submodule, input training text collection, by text representation and feature selecting, make up sorter model according to the feature dictionary, be output as the classifying rules collection that is similar to tree structure, the training process of Web page classifying device namely is that training sample is constantly divided into groups, by setting up target variable about the classification forecast model of each input variable, packet under the different values with target variable of round Realization input variable, and then for classification and prediction to new data-objects.
In addition, the text to be sorted that the input of web page text classification submodule was handled through the text pretreatment module by text representation, carries out feature selecting according to the feature dictionary, carry out text classification with the classifying rules of training the sorter model that generates, be output as the affiliated classification information of each text.
Description of drawings
Fig. 1 is that a kind of portable terminal is by the system construction drawing of radio network gateway browsing pages;
Fig. 2 is a kind of method of obtaining the mobile phone users interest preference on Mobile Server by radio network gateway in real time;
Fig. 3 is the operational flowchart of time window adjusting of the present invention and web data statistic of classification module;
Fig. 4 is the operational flowchart of Web page classifying of the present invention/content information processing sub;
Fig. 5 a is the method that the present invention makes up the web page text sorter;
Fig. 5 b is the using method of web page text sorter of the present invention;
Fig. 6 is that user content interest of the present invention is extracted the submodule operational flowchart;
Fig. 7 is the exemplary tree-shaped structure of user interest preference of the present invention;
Fig. 8 pushes the module operation process flow diagram for data service;
Fig. 9 is location analysis module operational flowchart of the present invention;
Figure 10 is the related process flow diagram of positional information of the present invention.
Embodiment
Following with reference to accompanying drawing 1~10 further specify the method for determining user's interest related web page text according to browsing page URL of the present invention with and the data that are suitable for push the service implementation example.
Fig. 1 is that portable terminal passes through the system construction drawing as the radio network gateway browsing pages of WAP gateway.
The invention provides a kind of data service supplying system based on wireless network, after it obtains the log information of user's use as the portable terminal of mobile phone by radio network gateway, use the mobile phone behavior to carry out filtration treatment to user in the scope for the previous period, obtain the user behavior feature, make the internal interest of holding of user and behavioural habits in conjunction with the interest preference that forms the user, and associate in real time with the positional information of portable terminal, push to the portable terminal information of carrying out, described system is illustrated by the part of frame of broken lines institute mark among Fig. 1, comprise time window adjusting and web data statistic of classification module, the user interest extraction module, data service pushes module and location analysis module, wherein:
Time window is regulated and web data statistic of classification module receives the URL of browsing pages from radio network gateway, and user's browsing page in the scope is for the previous period carried out filtration treatment, acquisition user's interest related web page and user behavior feature;
The user interest extraction module comprises that behavioural information is analyzed submodule, content information is analyzed submodule and integrated study submodule,
Behavioural information is analyzed submodule according to the user behavior feature, and time series is added up and screening, dimensionality reduction, forms user behavior interest, is output as user's current behavior interest,
Content information is analyzed submodule according to the URL address of user's interest related web page, and web page contents is carried out text-processing, extracts Web page subject, and according to described Web page subject and other attribute informations of webpage, form user content interest, be output as the current content interest of user
The integrated study submodule uses the integrated study technology according to user's current behavior interest and current content interest, forms user interest, is output as the current interest of user;
Location analysis module by the GMLC gateway obtain the user current browse positional information;
Data service pushes module according to active user's interest of user interest extraction module output, utilizes the rule association strategy, judges whether to carry out the localization information Push Service; To not meeting active user's interest of localized service feature, service pushes module mates it with corresponding pre-pushed information, choose the highest pushed information of matching degree according to matching result; To meeting active user's interest of localized service feature, according to from the user of location analysis module current browse positional information, obtain location association information, the recycling matching strategy, the current interest of user and location association information are mated, and select the highest location association information of matching degree as pushed information according to matching result, push to portable terminal.
Wherein said radio network gateway comprises WAP GW, strengthens equipment such as GGSN, independent synthesized gateway, in the explanation of back, is the content that example is introduced whole invention with common WAP GW.
Wherein browsing pages is provided by the sp/cp server in the network, and portable terminal is visited these pages by radio network gateway.
The invention provides a kind of data service method for pushing based on wireless network, as shown in Figure 2, after it obtains the log information of user's use as the portable terminal of mobile phone by radio network gateway, use the mobile phone behavior to carry out filtration treatment to user in the scope for the previous period, obtain the user behavior feature, make interest that the user internally holds and behavioural habits in conjunction with the interest preference that forms the user, and associate in real time with the positional information of portable terminal, push to the portable terminal information of carrying out, comprising:
Receive the URL of browsing pages from radio network gateway, user's browsing page in the scope is for the previous period carried out filtration treatment, obtain user's interest related web page and user behavior feature;
According to the user behavior feature, time series is added up and screening, dimensionality reduction, form user behavior interest, as user's current behavior interest, URL address according to the user's interest related web page, web page contents is carried out text-processing, extract Web page subject, and according to described Web page subject and other attribute informations of webpage, form user content interest, as the current content interest of user, according to above-mentioned user's current behavior interest and current content interest, use the integrated study technology, form user interest, as the current interest of user;
By the GMLC gateway obtain the user current browse positional information;
According to active user's interest, utilize the rule association strategy, judge whether to carry out the localization information Push Service; To not meeting active user's interest of localized service feature, it is mated with corresponding pre-pushed information, choose the highest pushed information of matching degree according to matching result; To meeting active user's interest of localized service feature, according to the user current browse positional information, obtain location association information, the recycling matching strategy, the current interest of user and location association information are mated, and select the highest location association information of matching degree as pushed information according to matching result, push to portable terminal.
Time window is regulated and web data statistic of classification module comprises time window adjusting submodule and web data statistic of classification submodule, and web data statistic of classification submodule comprises behavioural information statistics submodule and Web page classifying submodule.Fig. 3 is the operational flowchart of time window adjusting and web data statistic of classification module.
Time window is regulated submodule execution time window control method,, determines and the adjusting time window the concentrated interest of reflection user current slot according to user's networking speed and custom.
In order to obtain user's interest related web page and user behavior feature, described system need carry out filtration treatment to user's browsing page in the scope for the previous period, the time range interval that needs statistical treatment in the prior art is fixed value normally, as the interest preference of user in a long period section processed, as one day, January even 1 year, though such processing is more comprehensive and accurate aspect analysis user interest, but the web page contents of analyzing is huge, real-time is relatively poor, or be trigger condition with single internet behavior or single browsing page, last net or browse a webpage and do once and recommend, though be real-time recommendation like this, but system can return too many content recommendation, has increased the burden of cordless communication network, has also reduced the entertaining that the user experiences.
The problems referred to above based on prior art, the present invention has adopted the control method of time window in, can take into account the long-term interest preference of user and interest preference in short-term, regulate between the two and control, control the quantity of obtaining webpage by regulating time window, the size of regulating time window reaches real-time effect, and is more timely and accurate.
The control method of described time window can be regulated submodule by time window and carry out.
The purpose of this method is to be beginning the current surf time with the user, is benchmark with a time range that meets user's networking speed and custom, analyzes the category of interest that the user reflects by online in this time range.
Networking speed and custom that the control method of described time window is different according to the user, the initial setting time value of setting-up time window, the setting-up time of time window automatically adjusts along with user's online custom afterwards, and step is:
The statistics user is reticular density in history
Figure BDA00003146952400091
Wherein, T is the phase of history time, and M is the user in T internet behavior quantity in the time period;
The initial setting time value is
Figure BDA00003146952400092
Wherein, α is an empirical value, is used for regulating the time window size, and the time range of setting guarantees that the user has certain online amount and surf time, and the time range of setting is shorter, makes user interest more concentrated, and user's displacement range is little;
Certain hour week after date, calculate again the user in a new time period on reticular density, d = M ′ T ′ ;
The setting-up time value is: t ′ = t + D - d D + d ;
Wherein, the α adjustable size,
Figure BDA00003146952400095
Statistics online quantity total amount is adjusted α according to above-mentioned formula after a long period.
Web data classification processing sub comprises behavioural information processing sub and Web page classifying/content information processing sub, and behavioural information and Web page classifying/content information are handled, and obtains user's interest related web page and user behavior feature.
Submodule and user's current behavior feature that the behavioural information processing sub comprises note behavioral statistics submodule, communication behavior statistics submodule, internet behavior statistics submodule, delete the user behavior feature by the PCA method are determined submodule.It carries out the time statistics according to the browsing page that obtains to the above-mentioned behavior of user in above-mentioned time window, obtain user's behavioural characteristic.
The operation steps of behavioural information processing sub is: the behavior of statistics note; The statistics communication behavior; The statistics internet behavior; By the PCA method user behavior feature is deleted; Determine user's current behavior feature.
Web page classifying/content information processing sub comprises that web page text obtains submodule, web page text classification submodule, visiting frequency statistics submodule and the current content interest of user and determines submodule.It is in the above-mentioned time window, and the webpage that the user browses carries out filtration treatment, obtains one group of related web page, according to the URL address of accessed web page, obtains the content of text of the page, to the content of text processing of classifying; To each class frequency statistics that conducts interviews, be the user's interest related web page with the highest webpage collection of visiting frequency value.Fig. 4 is the operational flowchart of Web page classifying/content information processing sub.
The operation steps of Web page classifying/content information processing sub is: obtain web page text; The web page text classification; The statistics visiting frequency; Determine the user's interest related web page.
Web page text obtains submodule to the URL address of input, gets rid of useless pages and some webpage that can't visit, to linking through the remaining URL address of screening, extracts title and text message.
The Word message of one piece of webpage source file distributes generally as follows:
Wherein link 4, link 5 is link information, also is text message.
By format analysis, coupling<title〉the acquisition heading message; Get rid of useless link information, obtain text and useful link information, as text 1, link 4, text 2, link 5, text 3.
Web page text obtains the title of submodule output webpage and text message to the web page text submodule of classifying.
Web page text classification submodule is according to predefined subject categories, for each web document of web document set is determined a classification, the subject categories of webpage such as physical culture, food and drink, IT, real estate, automobile, tourism etc.Fig. 5 a is for making up the method for web page text sorter; Fig. 5 b is the using method of web page text sorter.
The Web page classifying device comprises following two parts:
The structure of Web page classifying device and training part, it is input as the training text collection, by text representation and feature selecting, makes up sorter model according to the feature dictionary, is output as the classifying rules collection that is similar to tree structure, shown in Fig. 5 a;
The training process of Web page classifying device namely constantly divides into groups to training sample, by setting up target variable about the classification forecast model of each input variable, packet under the different values with target variable of round Realization input variable, and then for classification and prediction to new data-objects.
The training process step of sorter is: when decision tree nodes at different levels are selected attribute, with the choice criteria of gain ratio as attribute.
Web page classifying device classified part, it is input as the text of handling through the text pretreatment module to be sorted (web document object), pass through text representation, carry out feature selecting according to the feature dictionary, carry out text classification with the classifying rules of training the sorter model that generates, be output as the affiliated classification information of each text, shown in Fig. 5 b.
The Web page classifying device uses the decision tree classification method, the steps include:
1. test sample book is expressed as the form same with training sample;
2. t ← decision tree root node;
3. the testing attribute and the threshold value that depend on plan tree node t compare the value of sample character pair to be tested with it, determine according to the standard of t node division then to be
The right child of left child or t ← t of t ← t;
4. recurrence is carried out ⑶, is leafy node up to t;
5. the classification of test sample book is the classification of leaf t representative.
In the text representation step, adopt characteristic vector space to represent text feature, document i can be expressed as the proper vector of following formula:
W ij=(W i1,W i2,...,W im)
Wherein, W IjBe entry j frequency of occurrences f in document i IjFunction, directly use entry in the frequency of occurrences of document as eigenwert, computing formula is:
W ij=f ij
In the feature selecting step, adopt the feature dimension reduction method based on improved χ 2 statistics and pattern polymerization, step is:
⑴ according to formula
χ ij ′ 2 = sign ( n 11 × n 22 - n 12 × n 21 ) n × ( n 11 × n 22 - n 12 × n 21 ) 2 ( n 11 + n 12 ) × ( n 21 + n 22 ) × ( n 11 + n 21 ) × ( n 12 + n 22 )
Figure BDA00003146952400132
Calculate each entry to the improved χ of every class 2Statistic;
⑵ according to formula CHI i = max { | χ i 1 ′ 2 | , | χ i 2 ′ 2 | , · · · , | χ is ′ 2 | } Calculate the CHI of each entry, then feature is sorted from high to low by the CHI value, choose preceding M big feature entry of CHI value, the eigenmatrix that then obtains thus has M pattern;
⑶ for relatively whether each pattern is consistent to all kinds of classification contribution proportions at first handle the improvement statistic unification of each pattern between [1,1], and processing mode is as follows:
A ij = χ ij ′ 2 / ( max - min )
Wherein max, min are respectively the improvement χ of pattern i 2The maximal value of statistic and minimum value;
⑷ adopt simple clustering algorithm, carry out cluster (pattern of every line display of A) according to the pattern of A, of a sort pattern is polymerized to a new pattern, to obtain L new model like this, wherein L is much smaller than M, adopt the stratification of cohesion to carry out cluster, the most frequently used Euclidean distance is adopted in range observation, and is as follows:
d ( i , j ) = ( A i 1 - A j 1 ) 2 + ( A i 2 - A j 2 ) 2 + · · · + ( A is - A js ) 2
With Euclidean distance d (i j) carries out cluster less than the pattern of certain threshold value, and the process of cluster is:
1. calculate distance less than the pattern of threshold value according to matrix A, it is carried out cluster;
2. after the cluster, the pattern in every class is merged into a pattern, and this pattern comprises the whole entries in this class, and its word frequency is exactly the word frequency sum of these entries, recomputates the improvement statistic of new model, forms matrix A again according to new model;
Repeat 1., 2. two steps, till all patterns can not polymerization;
⑸ recomputate the CHI value of each characteristic item, the individual characteristic item of L ' before selecting according to CHI value size.
The user interest extraction module comprises that behavioural information is analyzed submodule, content information is analyzed submodule and integrated study submodule,
Behavioural information is analyzed submodule according to the user behavior feature, and time series is added up and screening, dimensionality reduction, forms user behavior interest, is output as user's current behavior interest,
Content information is analyzed submodule according to the URL address of user's interest related web page, and web page contents is carried out text-processing, extracts Web page subject, and according to described Web page subject and other attribute informations of webpage, form user content interest, be output as the current content interest of user
The integrated study submodule uses the integrated study technology according to user's current behavior interest and current content interest, forms user interest, is output as the current interest of user.
User interest is divided into behavior interest and two parts of content interest, extracts with behavioural information analysis submodule and user content interest analysis submodule respectively, and is integrated by the integrated study submodule at last.
User's usage behavior is analyzed submodule: the current behavioural characteristic of user is carried out obtaining user's current behavior interest based on the decision Tree algorithms classification.
User content interest is extracted submodule: the webpage to the current category of interest of user carries out text analyzing, obtains the web page text attribute information, according to the web page text attribute information, obtains the current content interest of user, and step is:
(1) obtains corresponding keyword and index thereof;
(2) calculate the user to the attention rate of keyword;
(3) according to the attention rate threshold value, obtain the current content interest of user.
The keyword acquisition process comprises:
1. to carrying out word segmentation processing (be to separate with the space as English between Chinese word, be convenient to handle) in full;
2. (it is the word that less semantic meaning is arranged, as function word and some high frequency words to filter out stop words.
Stop words is owing to appearing in a lot of files, so information analysis there is not contribution);
3. extract text header, deposit the title word set in vectorial V h
4. extract first section in text, second section, latter end, deposit the content word set in vectorial V c
If 5. | V h∩ V c|<P, judge that then text header is " abstract type " title.Wherein, P is a given threshold value, is defined as 3 according to experiment;
6.
Figure BDA00003146952400161
If x were ∈ { query dictionary }-, text header also would be judged as " abstract type " title (x refers to any one value of extracting from title set Vk);
If 7. title does not have (5) or (6) middle feature, judge that then it is " concrete type " title;
Title for " abstract type ", adopt the TFIDF method to search weights in the text and be higher than the word of certain threshold value as candidate word, whether this word of position judgment by the candidate word place is key word (weights of place sentence are more high, and the possibility that becomes key word is more big) then.
To with " concrete type " title, behind the title participle, the noun that obtains and verb just are the key word of the text.When calculating the sentence weight, give the bigger weight proportion factor of word in the heading tabulation.
By above method, can obtain the weight of each sentence, can calculate the weights of each sentence, for time of back provides foundation, and having upgraded the weight of lists of keywords, the keyword chained list of each article correspondence is the keyword of this article by the weight ordering.
Attention rate is calculated: by to each browsing content information of user with browse behavioural information analysis, just can quantitative calculation go out the user to the attention rate of each interest topic.Calculation procedure comprises:
1. the keyword in all theme vectors under the identical generic A is joined among this type of subordinate's the lists of keywords K;
2. with the duplicate key word normalizing that occurs in the same item subordinate keyword interpolation process, the duplicate key word has triggered the gathering of the similar theme of candidate, and all webpages under this word are integrated into form a similar theme group of candidate together;
3. for the similar theme group of the candidate at each duplicate key word place, the original weights of this word in this group theme vector are relatively found out the theme vector at weights the maximum place as the core theme representative of this group theme vector (and join among the K it);
4. calculate the similarity of each theme vector in the similar theme group to the place candidate of core theme, set a threshold value, all exceed thresholding person and join the similar theme group Ki group of formation among the theme group Ki, have also namely formed a topic Ki;
5. the core theme of being found out with the front is as the representative of topic Ki, will be core theme temperature after adjusting with the frequency stack of all theme vector place themes among the topic Ki, and the core theme after adjusting is joined in candidate's focus topic list;
6. calculate the attention rate of each theme among the K according to foregoing fever thermometer metering method;
The integrated study submodule is at same training set, train different sorters, it is the decision tree Weak Classifier, then these decision tree Weak Classifiers are gathered, constitute a stronger final sorter, form the final classification of user interest, adopt the AdaBoost algorithm that the result of user behavior sorter and user content categorize interests device is carried out the iteration adjustment, obtain the weight of different decision tree Weak Classifiers, and then obtain the current interest of user.
User interest preference comprises item of interest, category of interest, attention rate and generation time; In concrete enforcement, user's interest preference can be expressed as tree-shaped version, the upper strata of tree structure represents that it is interest subclass or theme that the type of interest preference, lower floor are represented.User's interest pattern confidence, the information that also can preserve user interest feature word both can have been preserved with tree structure.Fig. 7 is the exemplary tree-shaped structure of user interest preference of the present invention.
Data service pushes module: the described rule association strategy that utilizes, judge whether described user interest and preference are fit to local service, and as satisfying the condition of doing local service, then the trigger position analysis module obtains the current position of browsing; Otherwise, do general relevance Information Push Service.
The Rule of judgment of local service can for:
(1) the current categories of websites of browsing of user is as service system of food and drink, shopping, lodging, traffic website or the value added service provider of city version etc.
(2) classification of the current interest of user is as weather, inquiry traffic, predetermined ticketing service, discount, tourism classics, distinguishing products etc.
Above Rule of judgment can make up, as the current website of browsing of user be certain city version search the website, room, and the interest of browsing page reflection is to rent a house, and then can be fit to localized service recommendation.
Location analysis module is obtained the current position of browsing by the GMLC gateway, i.e. user residing geographic position when browsing current web page.Fig. 9 is location analysis module operational flowchart of the present invention.
Wherein, push module to service in described location analysis module and also comprise that described location analysis module browses URL that the positional information customization is associated with described mobile phone users present position or the step of URL content of pages based on described acquisition before sending positional information.Figure 10 is the process flow diagram of positional information association of the present invention.
The location association information bank: record is the information on services that provides of identical or close place or site attribute information etc. geographically, as:
Figure BDA00003146952400181
The location finding coupling: the process with user interest preference, customer position information and corresponding location association information are mated specifically comprises:
(1) with user's current location information as key word of the inquiry, carry out location association inquiry, obtain with as key word input consistent location information record;
(2) classification of the current interest preference of user and the information on services that provides in the location association information are mated, calculate matching degree, if matching degree exceeds a certain threshold value, then export this location association information;
1. if matching result is more, then the theme of the current interest preference of user and the information on services that provides in the location association information are mated, calculate matching degree
2. sort according to matching degree;
3. the output matching degree exceeds the positional information of threshold value.
(3) otherwise, the core position in the customer position information as key word of the inquiry, is carried out location association inquiry, obtain with as key word input consistent location information record, change (2);
Above step is in position analysis and the location association identical or close with the current present position of user.
If the matching degree of above information all is lower than preset threshold, the place or the service that do not have suitable interest preference in user's current location are described then.Therefore, need find suitable place or service according to its interest and preference.
The target location is analyzed: the target location comprises address or scene for the information of match user interest and preference, and process comprises:
(1) with the theme of the current interest preference of user as key word of the inquiry, carry out the location association inquiry, obtain with as key word input consistent location information record, export this location association information;
(2) if there is not consistent positional information record, then calculate the theme of the current interest preference of user and the matching degree that information on services is provided in the location association information,
1. sort according to matching degree;
2. the output matching degree exceeds the positional information of threshold value.
(3) positional information with output passes to the route recommendation unit.
The route recommendation unit comprises:
(1) recommended route generation unit is used for calculating and the selection schemer data;
(2) output route data, thus be created on from the departure place recommended route of recommending when moving to the destination;
(3) display unit is used for showing demonstration information.
It should be noted that at last: above embodiment is only in order to technical scheme of the present invention to be described but not limit it, although with reference to preferred embodiment the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: those skilled in the art can make amendment or are equal to replacement technical scheme of the present invention, and these modifications or be equal to replacement and also can not make amended technical scheme break away from the spirit and scope of technical solution of the present invention.

Claims (9)

1. method of determining user's interest related web page text according to browsing page URL is characterized in that: comprise step:
The webpage that user in certain period is browsed carries out filtration treatment, gets rid of useless pages and some webpage that can't visit, to linking through the remaining URL address of screening, obtains the content of text of the page, extracts title and text message;
According to predefined subject categories, for each web document of web document set is determined a classification;
To each class frequency statistics that conducts interviews,
The highest webpage collection of visiting frequency value is as the user's interest related web page.
2. a kind of method of determining user's interest related web page text according to browsing page URL as claimed in claim 1, it is characterized in that: need in the Web page classifying step to make up and the training net web page classifier, input training text collection, by text representation and feature selecting, make up sorter model according to the feature dictionary, be output as the classifying rules collection that is similar to tree structure
The training process of Web page classifying device namely is that training sample is constantly divided into groups, by setting up target variable about the classification forecast model of each input variable, packet under the different values with target variable of round Realization input variable, and then for classification and prediction to new data-objects.
3. a kind of method of determining user's interest related web page text according to browsing page URL as claimed in claim 2 is characterized in that: the Web page classifying device uses the decision tree classification method, the steps include:
1. test sample book is expressed as the form same with training sample;
2. t ← decision tree root node;
3. the testing attribute and the threshold value that depend on plan tree node t compare the value of sample character pair to be tested with it,
Standard decision according to the division of t node is then
The right child of left child or t ← t of t ← t;
4. recurrence is carried out ⑶, is leafy node up to t;
5. the classification of test sample book is the classification of leaf t representative.
4. a kind of method of determining user's interest related web page text according to browsing page URL as claimed in claim 2, it is characterized in that: in the Web page classifying step, the text to be sorted that input was handled through the text pretreatment module, pass through text representation, carry out feature selecting according to the feature dictionary, carry out text classification with the classifying rules of training the sorter model that generates, be output as the affiliated classification information of each text.
5. as claim 2 or 4 described a kind of methods of determining user's interest related web page text according to browsing page URL, it is characterized in that: in the text representation step, adopt characteristic vector space to represent text feature, document i can be expressed as the proper vector of following formula:
W ij=(W i1,W i2,...,W im)
Wherein, W IjBe entry j frequency of occurrences f in document i IjFunction, directly use entry in the frequency of occurrences of document as eigenwert, computing formula is:
W ij=f ij
6. as claim 2 or 4 described a kind of methods of determining user's interest related web page text according to browsing page URL, it is characterized in that: in the feature selecting step, adopt the feature dimension reduction method based on improved χ 2 statistics and pattern polymerization, step is:
⑴ according to formula x ij ′ 2 = sign ( n 11 × n 22 - n 12 × n 21 ) n × ( n 11 × n 22 - n 12 × n 21 ) 2 ( n 11 + n 12 ) × ( n 21 + n 22 ) × ( n 11 + n 21 ) × ( n 12 + n 22 )
Figure FDA00003146952300022
Calculate each entry to improved χ 2 statistics of every class;
⑵ according to formula CHI i = max { | x il ′2 | , | x i 2 ′ 2 | , · · · , | x is ′ 2 | } Calculate the CHI of each entry, then feature is sorted from high to low by the CHI value, choose preceding M big feature entry of CHI value, the eigenmatrix that then obtains thus has M pattern;
⑶ for relatively whether each pattern is consistent to all kinds of classification contribution proportions at first handle the improvement statistic unification of each pattern between [1,1], and processing mode is as follows:
A ij = x ij ′ 2 / ( max - min )
Wherein max, min are respectively the improvement χ of pattern i 2The maximal value of statistic and minimum value;
⑷ adopt simple clustering algorithm, carry out cluster (pattern of every line display of A) according to the pattern of A, of a sort pattern is polymerized to a new pattern, to obtain L new model like this, wherein L is much smaller than M, adopt the stratification of cohesion to carry out cluster, the most frequently used Euclidean distance is adopted in range observation, and is as follows:
d ( i , j ) = ( A i 1 - A j 1 ) 2 + ( A i 2 - A j 2 ) 2 + · · · + ( A is - A js ) 2
With Euclidean distance d (i j) carries out cluster less than the pattern of certain threshold value, and the process of cluster is:
1. calculate distance less than the pattern of threshold value according to matrix A, it is carried out cluster;
2. after the cluster, the pattern in every class is merged into a pattern, and this pattern comprises the whole entries in this class, and its word frequency is exactly the word frequency sum of these entries, recomputates the improvement statistic of new model, forms matrix A again according to new model;
Repeat 1., 2. two steps, till all patterns can not polymerization;
⑸ recomputate the CHI value of each characteristic item, the individual characteristic item of L ' before selecting according to CHI value size.
7. system that determines user's interest related web page text according to browsing page URL, it is characterized in that: comprise that web page text obtains submodule, web page text classification submodule, visiting frequency statistics submodule and the current content interest of user and determines submodule
Web page text obtains submodule the webpage that user in certain period browses is carried out filtration treatment, get rid of useless pages and some webpage that can't visit, to linking through the remaining URL address of screening, obtain the content of text of the page, extract title and text message;
Web page text classification submodule is according to predefined subject categories, for each web document of web document set is determined a classification;
Visiting frequency statistics submodule is to each class frequency statistics that conducts interviews,
The current content interest of user determines that the submodule webpage collection that the visiting frequency value is the highest is as the user's interest related web page.
8. a kind of system that determines user's interest related web page text according to browsing page URL as claimed in claim 7, it is characterized in that: need to make up and the training net web page classifier in the web page text classification submodule, input training text collection, by text representation and feature selecting, make up sorter model according to the feature dictionary, be output as the classifying rules collection that is similar to tree structure
The training process of Web page classifying device namely is that training sample is constantly divided into groups, by setting up target variable about the classification forecast model of each input variable, packet under the different values with target variable of round Realization input variable, and then for classification and prediction to new data-objects.
9. as claim 7 or 8 described a kind of systems that determine user's interest related web page text according to browsing page URL, it is characterized in that: the text to be sorted that the input of web page text classification submodule was handled through the text pretreatment module, pass through text representation, carry out feature selecting according to the feature dictionary, carry out text classification with the classifying rules of training the sorter model that generates, be output as the affiliated classification information of each text.
CN201310163619XA 2013-05-06 2013-05-06 Method and system for determining web page texts users interested in according to browsed web pages Pending CN103235824A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310163619XA CN103235824A (en) 2013-05-06 2013-05-06 Method and system for determining web page texts users interested in according to browsed web pages

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310163619XA CN103235824A (en) 2013-05-06 2013-05-06 Method and system for determining web page texts users interested in according to browsed web pages

Publications (1)

Publication Number Publication Date
CN103235824A true CN103235824A (en) 2013-08-07

Family

ID=48883865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310163619XA Pending CN103235824A (en) 2013-05-06 2013-05-06 Method and system for determining web page texts users interested in according to browsed web pages

Country Status (1)

Country Link
CN (1) CN103235824A (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810295A (en) * 2014-03-06 2014-05-21 北京邮电大学 Method and device for extracting internet data
CN103886090A (en) * 2014-03-31 2014-06-25 北京搜狗科技发展有限公司 Content recommendation method and device based on user favorites
CN104102650A (en) * 2013-04-07 2014-10-15 富士通株式会社 Content providing device, content providing method and electronic equipment
CN104268290A (en) * 2014-10-22 2015-01-07 武汉科技大学 Recommendation method based on user cluster
CN104536972A (en) * 2014-12-03 2015-04-22 北京邮电大学 CDN-based web page content perception system and method
CN104539678A (en) * 2014-12-19 2015-04-22 百度在线网络技术(北京)有限公司 Information pushing and receiving method and device
CN104735150A (en) * 2015-03-27 2015-06-24 努比亚技术有限公司 Message pushing method and device
CN104881458A (en) * 2015-05-22 2015-09-02 国家计算机网络与信息安全管理中心 Labeling method and device for web page topics
CN105512334A (en) * 2015-12-29 2016-04-20 成都陌云科技有限公司 Data mining method based on search words
CN106156259A (en) * 2015-04-28 2016-11-23 天脉聚源(北京)科技有限公司 A kind of user behavior information displaying method and system
CN106202294A (en) * 2016-07-01 2016-12-07 北京奇虎科技有限公司 The related news computational methods merged based on key word and topic model and device
CN106375369A (en) * 2016-08-18 2017-02-01 南京邮电大学 Mobile Web service recommendation method and collaborative recommendation system based on user behavior analysis
CN106445974A (en) * 2015-08-12 2017-02-22 腾讯科技(深圳)有限公司 Data recommendation method and apparatus
CN106557520A (en) * 2015-09-29 2017-04-05 百度在线网络技术(北京)有限公司 The recognition methods of the Type of website and device
CN106709756A (en) * 2016-12-08 2017-05-24 北京五八信息技术有限公司 User demand information acquisition method and apparatus thereof
CN106790570A (en) * 2016-12-27 2017-05-31 山东开创云软件有限公司 A kind of consumer behaviour analysis and management system and its analysis method
WO2017162031A1 (en) * 2016-03-22 2017-09-28 阿里巴巴集团控股有限公司 Method and device for collecting information, and intelligent terminal
CN107317870A (en) * 2017-07-11 2017-11-03 宁波公众信息产业有限公司 A kind of data analysis system based on portal website
CN107423308A (en) * 2016-05-24 2017-12-01 华为技术有限公司 subject recommending method and device
CN107463573A (en) * 2016-06-02 2017-12-12 广州市动景计算机科技有限公司 Content information provides method, equipment, browser, electronic equipment and server
CN108959329A (en) * 2017-05-27 2018-12-07 腾讯科技(北京)有限公司 A kind of file classification method, device, medium and equipment
CN109190024A (en) * 2018-08-20 2019-01-11 平安科技(深圳)有限公司 Information recommendation method, device, computer equipment and storage medium
CN110222191A (en) * 2019-04-19 2019-09-10 平安科技(深圳)有限公司 Construction method, device, computer equipment and the computer storage medium of user interest portrait
CN110895594A (en) * 2018-08-23 2020-03-20 武汉斗鱼网络科技有限公司 Page display method and related equipment
CN110990571A (en) * 2019-12-02 2020-04-10 精硕科技(北京)股份有限公司 Method and device for obtaining discussion occupation ratio, storage medium and electronic equipment
CN112291622A (en) * 2020-10-30 2021-01-29 中国建设银行股份有限公司 Method and device for determining favorite internet surfing time period of user
CN114971817A (en) * 2022-07-29 2022-08-30 中国电子科技集团公司第十研究所 Product self-adaptive service method, medium and device based on user demand portrait
US11531722B2 (en) 2018-12-11 2022-12-20 Samsung Electronics Co., Ltd. Electronic device and control method therefor

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079063A (en) * 2007-06-25 2007-11-28 腾讯科技(深圳)有限公司 Method, system and apparatus for transmitting advertisement based on scene information
CN101866341A (en) * 2009-04-17 2010-10-20 华为技术有限公司 Information push method, device and system
US20110154209A1 (en) * 2009-12-22 2011-06-23 At&T Intellectual Property I, L.P. Platform for proactive discovery and delivery of personalized content to targeted enterprise users

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079063A (en) * 2007-06-25 2007-11-28 腾讯科技(深圳)有限公司 Method, system and apparatus for transmitting advertisement based on scene information
CN101866341A (en) * 2009-04-17 2010-10-20 华为技术有限公司 Information push method, device and system
US20110154209A1 (en) * 2009-12-22 2011-06-23 At&T Intellectual Property I, L.P. Platform for proactive discovery and delivery of personalized content to targeted enterprise users

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王煜等: "基于模糊决策树的文本分类规则抽取", 《计算机应用》 *

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104102650A (en) * 2013-04-07 2014-10-15 富士通株式会社 Content providing device, content providing method and electronic equipment
CN104102650B (en) * 2013-04-07 2017-08-22 富士通株式会社 Content providing device, content providing and electronic equipment
CN103810295A (en) * 2014-03-06 2014-05-21 北京邮电大学 Method and device for extracting internet data
CN103886090A (en) * 2014-03-31 2014-06-25 北京搜狗科技发展有限公司 Content recommendation method and device based on user favorites
CN103886090B (en) * 2014-03-31 2018-01-02 北京搜狗科技发展有限公司 Content recommendation method and device based on user preferences
CN104268290A (en) * 2014-10-22 2015-01-07 武汉科技大学 Recommendation method based on user cluster
CN104268290B (en) * 2014-10-22 2017-08-08 武汉科技大学 A kind of recommendation method based on user clustering
CN104536972A (en) * 2014-12-03 2015-04-22 北京邮电大学 CDN-based web page content perception system and method
CN104536972B (en) * 2014-12-03 2018-08-14 北京邮电大学 Web page contents sensory perceptual system based on CDN and method
CN104539678B (en) * 2014-12-19 2018-08-07 百度在线网络技术(北京)有限公司 A kind of push of information, method of reseptance and device
CN104539678A (en) * 2014-12-19 2015-04-22 百度在线网络技术(北京)有限公司 Information pushing and receiving method and device
CN104735150A (en) * 2015-03-27 2015-06-24 努比亚技术有限公司 Message pushing method and device
CN106156259A (en) * 2015-04-28 2016-11-23 天脉聚源(北京)科技有限公司 A kind of user behavior information displaying method and system
CN104881458B (en) * 2015-05-22 2019-05-28 国家计算机网络与信息安全管理中心 A kind of mask method and device of Web page subject
CN104881458A (en) * 2015-05-22 2015-09-02 国家计算机网络与信息安全管理中心 Labeling method and device for web page topics
CN106445974A (en) * 2015-08-12 2017-02-22 腾讯科技(深圳)有限公司 Data recommendation method and apparatus
CN106557520A (en) * 2015-09-29 2017-04-05 百度在线网络技术(北京)有限公司 The recognition methods of the Type of website and device
CN105512334A (en) * 2015-12-29 2016-04-20 成都陌云科技有限公司 Data mining method based on search words
WO2017162031A1 (en) * 2016-03-22 2017-09-28 阿里巴巴集团控股有限公司 Method and device for collecting information, and intelligent terminal
CN107220230A (en) * 2016-03-22 2017-09-29 阿里巴巴集团控股有限公司 A kind of information collecting method and device, and a kind of intelligent terminal
US11830033B2 (en) 2016-05-24 2023-11-28 Huawei Technologies Co., Ltd. Theme recommendation method and apparatus
CN107423308A (en) * 2016-05-24 2017-12-01 华为技术有限公司 subject recommending method and device
US20190087884A1 (en) 2016-05-24 2019-03-21 Huawei Technologies Co., Ltd. Theme recommendation method and apparatus
CN107423308B (en) * 2016-05-24 2020-07-07 华为技术有限公司 Theme recommendation method and device
CN107463573A (en) * 2016-06-02 2017-12-12 广州市动景计算机科技有限公司 Content information provides method, equipment, browser, electronic equipment and server
CN107463573B (en) * 2016-06-02 2020-10-13 阿里巴巴(中国)有限公司 Content information providing method, device, browser, electronic device and server
CN106202294B (en) * 2016-07-01 2020-09-11 北京奇虎科技有限公司 Related news computing method and device based on keyword and topic model fusion
CN106202294A (en) * 2016-07-01 2016-12-07 北京奇虎科技有限公司 The related news computational methods merged based on key word and topic model and device
CN106375369A (en) * 2016-08-18 2017-02-01 南京邮电大学 Mobile Web service recommendation method and collaborative recommendation system based on user behavior analysis
CN106375369B (en) * 2016-08-18 2019-05-28 南京邮电大学 The business recommended method of mobile Web and Collaborative Recommendation system based on user behavior analysis
CN106709756A (en) * 2016-12-08 2017-05-24 北京五八信息技术有限公司 User demand information acquisition method and apparatus thereof
CN106790570A (en) * 2016-12-27 2017-05-31 山东开创云软件有限公司 A kind of consumer behaviour analysis and management system and its analysis method
CN108959329A (en) * 2017-05-27 2018-12-07 腾讯科技(北京)有限公司 A kind of file classification method, device, medium and equipment
CN108959329B (en) * 2017-05-27 2023-05-16 腾讯科技(北京)有限公司 Text classification method, device, medium and equipment
CN107317870A (en) * 2017-07-11 2017-11-03 宁波公众信息产业有限公司 A kind of data analysis system based on portal website
CN109190024A (en) * 2018-08-20 2019-01-11 平安科技(深圳)有限公司 Information recommendation method, device, computer equipment and storage medium
CN109190024B (en) * 2018-08-20 2023-04-07 平安科技(深圳)有限公司 Information recommendation method and device, computer equipment and storage medium
CN110895594A (en) * 2018-08-23 2020-03-20 武汉斗鱼网络科技有限公司 Page display method and related equipment
US11531722B2 (en) 2018-12-11 2022-12-20 Samsung Electronics Co., Ltd. Electronic device and control method therefor
CN110222191A (en) * 2019-04-19 2019-09-10 平安科技(深圳)有限公司 Construction method, device, computer equipment and the computer storage medium of user interest portrait
CN110222191B (en) * 2019-04-19 2023-08-22 平安科技(深圳)有限公司 User interest portrait construction method, device, computer equipment and computer storage medium
CN110990571A (en) * 2019-12-02 2020-04-10 精硕科技(北京)股份有限公司 Method and device for obtaining discussion occupation ratio, storage medium and electronic equipment
CN110990571B (en) * 2019-12-02 2024-04-02 北京秒针人工智能科技有限公司 Method and device for acquiring discussion duty ratio, storage medium and electronic equipment
CN112291622B (en) * 2020-10-30 2022-05-27 中国建设银行股份有限公司 Method and device for determining favorite internet surfing time period of user
CN112291622A (en) * 2020-10-30 2021-01-29 中国建设银行股份有限公司 Method and device for determining favorite internet surfing time period of user
CN114971817B (en) * 2022-07-29 2022-11-22 中国电子科技集团公司第十研究所 Product self-adaptive service method, medium and device based on user demand portrait
CN114971817A (en) * 2022-07-29 2022-08-30 中国电子科技集团公司第十研究所 Product self-adaptive service method, medium and device based on user demand portrait

Similar Documents

Publication Publication Date Title
CN103235824A (en) Method and system for determining web page texts users interested in according to browsed web pages
CN103235823A (en) Method and system for determining current interest of users according to related web pages and current behaviors
CN103246725A (en) Wireless network based data traffic pushing system and method
Ren et al. Context-aware probabilistic matrix factorization modeling for point-of-interest recommendation
CN106815297B (en) Academic resource recommendation service system and method
CN102982042B (en) A kind of personalization content recommendation method, platform and system
CN103235826B (en) A kind of control method of time window
CN101551806B (en) Personalized website navigation method and system
CN101866341A (en) Information push method, device and system
CN105005594B (en) Abnormal microblog users recognition methods
CN103870973B (en) Information push, searching method and the device of keyword extraction based on electronic information
Li et al. Community detection using hierarchical clustering based on edge-weighted similarity in cloud environment
CN105718579A (en) Information push method based on internet-surfing log mining and user activity recognition
CN106682686A (en) User gender prediction method based on mobile phone Internet-surfing behavior
CN109800350A (en) A kind of Personalize News recommended method and system, storage medium
CN103544188A (en) Method and device for pushing mobile internet content based on user preference
CN106970991B (en) Similar application identification method and device, application search recommendation method and server
CN106484764A (en) User's similarity calculating method based on crowd portrayal technology
CN104572797A (en) Individual service recommendation system and method based on topic model
CN103914478A (en) Webpage training method and system and webpage prediction method and system
CN109165367B (en) News recommendation method based on RSS subscription
TW201115370A (en) Systems and methods for capturing and managing collective social intelligence information
CN103049440A (en) Recommendation processing method and processing system for related articles
CN103324666A (en) Topic tracing method and device based on micro-blog data
Markou et al. Predicting taxi demand hotspots using automated internet search queries

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20130807