CN104750752A - Determination method and device of user community with internet-surfing preference - Google Patents

Determination method and device of user community with internet-surfing preference Download PDF

Info

Publication number
CN104750752A
CN104750752A CN201310752439.5A CN201310752439A CN104750752A CN 104750752 A CN104750752 A CN 104750752A CN 201310752439 A CN201310752439 A CN 201310752439A CN 104750752 A CN104750752 A CN 104750752A
Authority
CN
China
Prior art keywords
user
url
keyword
inverted index
index information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310752439.5A
Other languages
Chinese (zh)
Other versions
CN104750752B (en
Inventor
徐萌
何鸿凌
王彦峰
钱岭
孙少凌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201310752439.5A priority Critical patent/CN104750752B/en
Publication of CN104750752A publication Critical patent/CN104750752A/en
Application granted granted Critical
Publication of CN104750752B publication Critical patent/CN104750752B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

An embodiment of the invention discloses a determination method and device of a user community with internet-surfing preference. By adopting the technical scheme of the embodiment, the determination method includes steps of determining corresponding target URL (uniform resource locator) of keywords corresponding to the user community with internet-surfing preference when to determine the user community with the internet-surfing preference, and determining users corresponding to user identifiers with access times of the target URL conforming to user screening conditions and constituting the user community with the internet-surfing preference. Accordingly, characteristics of high performance and flexibility of reverse index information are sufficiently utilized, the user community with the internet-surfing preference can be quickly acquired, consumption of system resources resulted from massive data recording and matching is avoided, and processing efficiency and screening accuracy of the determination process of the user community with the internet-surfing preference are improved.

Description

A kind of determination method and apparatus of the preferences user colony that surfs the Net
Technical field
The present invention relates to networking technology area, particularly relate to a kind of determination method and apparatus of the preferences user colony that surfs the Net.
Background technology
In existing technical scheme, generally can carry out customer behavior analysis based on web page contents, as long as user browses webpage in upper network process, system just can use the access network address of mobile phone or broadband access network based on analysis user, carry out in-depth analysis coupling according to URL library to sort out, sum up the hobby attribute of user, thus represent its valuable content at website personalize according to the hobby of user.
Wherein, concrete realization example is as follows:
Steps A, selected one or more descriptor, such as x86, BMW, a schoolmate etc., it can be used as search keyword inputted search engine, thus, get the web page address list that this keyword a series of is relevant;
Step B, according to the address list in steps A, the daily record behavior of accessing with user matches, and finds the user group according to certain these address lists of rule access.
Such user group is the interested user group of above-mentioned selected descriptor.
Realizing in process of the present invention, inventor finds at least there is following problem in prior art:
Data volume is large.With current customer volume state, the data scale of daily record data is very huge, and rapid development, if the web page address list relevant to keyword is done mate, when especially also needing to mate certain rule, will have problems further as follows:
A) operation associated performance extreme difference is directly done, on the one hand, the data scale of daily record data is very huge, on the other hand, the quantity of the web page address associated is carried out with it, then can because of the change of selected keyword difference and search rule, and produce violent fluctuation, the stability of its data scale is very poor, and the difference of both data scales is also very huge, for the portfolio of a province, 17,000,000,000 daily record datas can be produced its every day, add computation period, such as one week or one month, show huge.The quantity of carrying out the web page address associated then may only have about 2,000,000,000.Each user group obtains and will carry out the operation associated of these two large tables.
B) the result storage redundancy degree after association is large, still with above-mentioned data instance, article 2000000000,8 times of storage redundancies (170/20=8) of the capacity of table, and, the daily record data moment of user is all in renewal, if want that the user behavior colony carrying out some cycles obtains, then need to preserve a large amount of daily record, cause the consumption of a large amount of storage space.
Summary of the invention
The object of the embodiment of the present invention is the determination method and apparatus providing a kind of preferences user colony that surfs the Net, can more accurate quick determination online preferences user colony.
In order to achieve the above object, embodiments provide a kind of defining method of the preferences user colony that surfs the Net, comprising:
Travel through user's internet log record to be analyzed, generate the inverted index information corresponding to each URL included in described user's internet log record respectively, wherein, inverted index information corresponding to a URL specifically comprises the user ID of accessing described URL, and described user ID is to the access characteristic information of described URL;
When online preferences user colony determined by needs, select the one or more keywords corresponding to described online preferences user colony, and determine corresponding target URL according to the keyword selected;
Inverted index information corresponding to determined target URL, determines to form described online preferences user colony to the user corresponding to each user ID of the access characteristic information conforms user screening conditions of described target URL.
Preferably, described when online preferences user colony determined by needs, select the one or more keywords corresponding to described online preferences user colony, and determine corresponding target URL according to the keyword selected, specifically comprise:
Inverted index information corresponding to selected keyword, determine that the occurrence number of described keyword meets the target URL of URL corresponding to described keyword of URL screening conditions, wherein, inverted index information corresponding to a keyword specifically comprises the URL of the webpage containing described keyword, and the occurrence number of described keyword in described webpage; Or,
According to selected keyword Webpage searching result in a search engine, determine the target URL of URL corresponding to described keyword of the webpage of satisfied 2nd URL screening conditions.
Preferably, described when online preferences user colony determined by needs, select the one or more keywords corresponding to described online preferences user colony, and determine corresponding target URL according to the keyword selected, also comprise:
Service feature information corresponding to selected keyword, screens determined target URL.
Preferably, user's internet log record that described traversal is to be analyzed, generates the inverted index information corresponding to each URL included in described user's internet log record respectively, also comprises:
According to the needs of different analytical cycles, inverted index information corresponding under different time interval is generated respectively to same URL, and carry different timestamp informations respectively.
Preferably, described inverted index information corresponding to determined target URL, determine to form described online preferences user colony to the user corresponding to each user ID of the access characteristic information conforms user screening conditions of described target URL, specifically comprise:
Inverted index information corresponding to determined target URL, and the timestamp information carried, determine to the access times of described target URL and meet user's screening conditions access cycle each user ID corresponding to user form described online preferences user colony.
Further, the embodiment of the present invention also proposed a kind of network equipment, comprising:
Generation module, for traveling through user's internet log record to be analyzed, generate the inverted index information corresponding to each URL included in described user's internet log record respectively, wherein, inverted index information corresponding to a URL specifically comprises the user ID of accessing described URL, and described user ID is to the access characteristic information of described URL;
URL screens module, during for determining online preferences user colony at needs, selects the one or more keywords corresponding to described online preferences user colony, and determines corresponding target URL according to the keyword selected;
User screens module, screen the corresponding inverted index information of the determined target URL of module for what generate according to described generation module with described URL, determine to form described online preferences user colony to the user corresponding to each user ID of the access characteristic information conforms user screening conditions of described target URL.
Preferably, described URL screens module, specifically for:
Inverted index information corresponding to selected keyword, determine that the occurrence number of described keyword meets the target URL of URL corresponding to described keyword of URL screening conditions, wherein, inverted index information corresponding to a keyword specifically comprises the URL of the webpage containing described keyword, and the occurrence number of described keyword in described webpage; Or,
According to selected keyword Webpage searching result in a search engine, determine the target URL of URL corresponding to described keyword of the webpage of satisfied 2nd URL screening conditions.
Preferably, described URL screens module, also for:
Service feature information corresponding to selected keyword, screens determined target URL.
Preferably, described generation module, also for:
According to the needs of different analytical cycles, inverted index information corresponding under different time interval is generated respectively to same URL, and carry different timestamp informations respectively.
Preferably, described user screens module, specifically for:
The corresponding inverted index information of the determined target URL of module is screened with described URL according to what generate according to described generation module, and the timestamp information carried, determine to the access times of described target URL and meet user's screening conditions access cycle each user ID corresponding to user form described online preferences user colony.
Compared with prior art, the technical scheme that the embodiment of the present invention proposes has the following advantages:
By the technical scheme that the application embodiment of the present invention proposes, when online preferences user colony determined by needs, corresponding target URL determined in keyword corresponding to online preferences user colony, and in conjunction with the inverted index information corresponding to this target URL, determine that the user corresponding to each user ID meet user's screening conditions to the access times of this target URL forms this online preferences user colony, thus, make full use of inverted index information high-performance, the feature of high flexibility ratio, realize the quick obtaining of online preferences user colony, avoid mass data record and the consumption of mating the system resource brought, improve treatment effeciency and the screening accuracy of online preferences user colony deterministic process.
Accompanying drawing explanation
The schematic flow sheet of the defining method of a kind of preferences user colony that surfs the Net that Fig. 1 provides for the embodiment of the present invention;
The schematic flow sheet of the defining method of the online preferences user colony in a kind of embody rule scene that Fig. 2 provides for the embodiment of the present invention;
The structural representation of a kind of network equipment that Fig. 3 proposes for the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the present invention, be clearly and completely described the technical scheme in the present invention, obviously, described embodiment is only section Example of the present invention, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making other embodiments all obtained under creative work prerequisite, belong to the scope of protection of the invention.
As shown in Figure 1, the schematic flow sheet of the defining method of the online preferences user colony provided for the embodiment of the present invention, the method specifically comprises:
Step S101, travel through user's internet log record to be analyzed, generate the inverted index information corresponding to each URL included in described user's internet log record respectively.
Wherein, the inverted index information corresponding to a URL specifically comprises the user ID of accessing described URL, and described user ID is to the access characteristic information of described URL.
In concrete application scenarios; described user ID to access times can be entered oneself for the examination in the access characteristic information of described URL, last access time, access time etc. first time can characterize the information to this URL access characteristics; specifying information content can adjust according to actual needs, and such change can't affect protection scope of the present invention.
It should be noted that, the process of this step, be actually and preparing to carry out further user and screen foundation for subsequent treatment, therefore, need to complete before subsequent step performs.
In order to realize such requirement, in concrete application scenarios, the corresponding statistical treatment cycle can be set, periodically analyzing and processing be carried out to user's internet log record, and obtain corresponding inverted index information.
By such setting, on the one hand, can upgrade inverted index information according to certain cycle, foundation is analyzed accurately for subsequent step provides more anxious, on the other hand, that can avoid happening suddenly focuses on the system processing load that mass data brings, and also can not cause Time Delay of Systems because subsequent step waits for the result of this step.
On the basis completing this step process, when online preferences user colony determined by needs, perform step S102.
Step S102, the one or more keywords selecting corresponding to described online preferences user colony, and determine corresponding target URL according to the keyword selected.
It should be noted that, according to the difference of the URL mode that specifically really sets the goal, this step can be realized by following two kinds of modes:
Mode one, inverted index information corresponding to selected keyword, determine that the occurrence number of described keyword meets the target URL of URL corresponding to described keyword of URL screening conditions.
Wherein, the inverted index information corresponding to a keyword specifically comprises the URL of the webpage containing described keyword, and the occurrence number of described keyword in described webpage.
Such process have employed the inverted index infotech be similar in step S101 equally, the number of times that a keyword occurs in the webpage corresponding to a URL is added up, and sort out according to keyword, compared with prior art, reduce brought huge data processing pressure and associated data amount are recorded to whole log information.
It is further noted that, URL screening conditions mentioned in the manner are a kind of threshold condition arranged to reject interfere information, concrete, it can be the minimum occurrence number (thus rejecting the too low webpage record of word frequency) of this keyword, also can be type of webpage information (thus rejecting the type of webpage not wishing to count statistics), or other data filtering conditions, thus, avoid the webpage record too low with keyword relevance to disturb the data that data statistics result is brought.
In concrete application scenarios, the content of URL screening conditions can be arranged as required, and such change can't affect protection scope of the present invention.
Mode two, according to selected keyword Webpage searching result in a search engine, determine the target URL of URL corresponding to described keyword of the webpage of satisfied 2nd URL screening conditions.
Compared with mode one, the manner does not rely on concrete keyword and occurs that quantity carries out URL screening, but utilizes the function of search of search engine, screens URL from the angle of webpage and keyword relevance.
It is further noted that, the 2nd URL screening conditions mentioned in the manner are a kind of threshold condition arranged to reject interfere information, concrete, can be that the ordinal position of URL in Search Results is (because Search Results is generally according to the degree of association of search information or access temperature and sort, thus reject the webpage record that relevance is too low or access temperature is too low), also can be type of webpage information (thus rejecting the type of webpage not wishing to count statistics), or other data filtering conditions, thus, the webpage record too low with keyword relevance is avoided to disturb the data that data statistics result is brought.
In concrete application scenarios, the content of the 2nd URL screening conditions can be arranged as required, and such change can't affect protection scope of the present invention.
Further, consider the traffic performance that keyword self has, can service feature information corresponding to selected keyword, determined target URL is screened, thus the further target URL that finally determines of raising.
Such as, other keywords can be determined further by other related informations corresponding to keyword, thus, the further screening of URL is carried out according to keyword combination, also can according to the character of keyword self, determine the type of webpage associated by it, thus, further type of webpage screening is carried out to URL.
Concrete; above-mentioned service feature information is not limited only to the above-mentioned content enumerated; everyly can carry out further precision screening to URL; thus the processing mode improving the accuracy of target URL finally determined can be applied in the technical scheme that the embodiment of the present invention proposes, such change can't affect protection scope of the present invention.
Step S103, inverted index information corresponding to determined target URL, determine to form described online preferences user colony to the user corresponding to each user ID of the access characteristic information conforms user screening conditions of described target URL.
It should be noted that, user's screening conditions mentioned in this step are a kind of threshold condition arranged to reject interfere information, concrete, it can be access times (thus rejecting access times are less than the Visitor Logs of a certain numerical value), also can be access time interval (thus rejecting the excessive Visitor Logs in access time interval), or other data filtering conditions, thus avoid user's contingency to access or access etc. can not embody the data interference that user brings data statistics result the Visitor Logs of the preference of corresponding website by mistake.
What needs further illustrated is, consider the impact of timing statistics length of an interval degree for statistics, can according to the needs of different analytical cycles in step S101, inverted index information corresponding under different time interval is generated respectively to same URL, and carry different timestamp informations respectively.
On this basis, the process of step S103 specifically can be adjusted to:
Inverted index information corresponding to determined target URL, and the timestamp information carried, determine to the access times of described target URL and meet user's screening conditions access cycle each user ID corresponding to user form described online preferences user colony.
Compared with prior art, the technical scheme that the embodiment of the present invention proposes has the following advantages:
By the technical scheme that the application embodiment of the present invention proposes, when online preferences user colony determined by needs, corresponding target URL determined in keyword corresponding to online preferences user colony, and in conjunction with the inverted index information corresponding to this target URL, determine that the user corresponding to each user ID meet user's screening conditions to the access times of this target URL forms this online preferences user colony, thus, make full use of inverted index information high-performance, the feature of high flexibility ratio, realize the quick obtaining of online preferences user colony, avoid mass data record and the consumption of mating the system resource brought, improve treatment effeciency and the screening accuracy of online preferences user colony deterministic process.
Be described in detail with the processing procedure of specific embodiment to technique scheme below, but be not limited to following embodiment.
As shown in Figure 2; for the schematic flow sheet of the defining method of the online preferences user colony in a kind of embody rule scene that the embodiment of the present invention provides; for two kinds of modes mentioned by above-mentioned step S102; specifically in mode one, processing procedure is described in the present embodiment, but this can't affect protection scope of the present invention.
Concrete, the method specifically comprises:
Step S201, according to info web, generate inverted index information corresponding to keyword.
In concrete application scenarios, the processing procedure of this step by the keyword selected in advance and can be realized by the info web of Network Capture.
The obtain manner of concrete info web can be the targeted information collection of specifying the webpage of certain limit to carry out; also can be the popularity information acquisition that all webpages are carried out; concrete acquisition of information means then can be selected according to actual needs, and such change can't affect protection scope of the present invention.
In the present embodiment, the explanation of this step implementation procedure is carried out in the mode of named web page.
First, 3 webpages (in actual applications, the quantity of named web page is much larger than this, and the present embodiment just adopts such quantity information for convenience of description, does not affect protection domain) are specified:
Webpage A, webpage B, webpage C.
Then, determine keyword to be counted, in namely above-mentioned each webpage, keyword key1 may be comprised, key2, key3.
In order to quick indexing, first carry out participle to webpage, and add up word frequency, the form setting up inverted index information is as follows:
Keyword: (web page address 1: word frequency, classification etc.); (web page address 2: word frequency); (web page address 3: word frequency).
Such as:
Key1:(webpage A:5, amusement); (webpage C:2, physical culture)
Key2:(webpage B:1, news); (webpage C:4, amusement)
Key3:(webpage A:1, finance); (webpage B:2, finance)
Concrete, when carrying out physical store, store according to the pointer information after key, so, when carrying out information association, a string information can be obtained rapidly below by key1.
Be exemplified below:
Games for university students:
(http://www.sz2011.org/:5, physical culture); ( http:// zhidao.baidu.com/question/4602235:7, physical culture).
Such information represents the inverted index information of keyword " Games for university students ", wherein, in URL address is http:// www.sz2011.org/webpage in, " Games for university students " one word occurred 5 times, and in URL address be http:// zhidao.baidu.com/question/4602235webpage in, " Games for university students " one word occurred 7 times.
Step S202, according to user's internet log record, generate user and access the inverted index information of URL.
Owing to being according to URL extraction user profile in subsequent treatment, so in this step, need the internet log record of a traversal user, the relation for URL in log recording and user sets up inverted index.
In concrete application scenarios, the roughly form of log recording is as follows:
Field Example
Time 2013-7-112:00.987
End time 2013-7-112:01.876
userID User A
Access URL http://www.soopat.com/Home/Result?Sort=&
Up or is descending
Flow
Application type Application (micro-letter, microblogging, qq), webpage etc.
By traveling through above-mentioned log information, set up a kind of user oriented inverted index information, its form is specific as follows:
URL:(user ID: access times, the last access time, initial access time, access duration time).
In physical store process, similar with the inverted index information corresponding to aforesaid keyword, the inverted index information that user accesses URL is also store according to the form of key-value, wherein, key is URL, value is (user ID: access times, duration) list.
Specifically can be exemplified below:
Http:// www.soopat.com/Home/Result Sort=&: (user A:4,1s); (user B:2,10s).
Http:// www.chinanews.com/shipin/2013/08-13/news2771.shtml:(user C:5).
The inverted index information of " http://www.soopat.com/Home/Result Sort=& " that such information represents respectively and the inverted index information of " http://www.chinanews.com/shipin/2013/08-13/news2771.shtml ".
Wherein, for the webpage that URL address is http://www.soopat.com/Home/Result Sort=&, user A have accessed 4 times, user B have accessed 2 times, and be the webpage of http://www.chinanews.com/shipin/2013/08-13/news2771.shtml for URL address, user C have accessed 5 times.
It should be noted that, in this step, the inverted index information of URL and user be used for preserving URL and user access between incidence relation, and this relation and keyword in step S201 and webpage arrange larger difference.The inverted index information of keyword and webpage relatively stable (webpage once generate, body matter changes can not very greatly), generally upgrades according to some cycles, does not need to preserve multiple version, be directly as the criterion with latest data.And in the inverted index information of URL and user, As time goes on the increase of user's internet behavior, corresponding user's internet log record content can produce larger change, needs the index information preserving multiple version.
In concrete application scenarios, the inverted index information of URL and user can upgrade according to day data, the date of these data can be shown by timestamp, such as: the situation simultaneously can preserving the same day and a week, accordingly, will there is the data record that two key are identical, but its timestamp is different simultaneously.
When reading data, after first can obtaining result according to key, filter accordingly utilizing timestamp.
After above two inverted indexs, the target extracting online preferences user colony can be reached fast.
Step S203, needs according to online preferences user colony to be determined, select keyword.
Step S204, the inverted index information generated according to step S201, determine the target URL needing to select.
Step S205, the inverted index information generated according to step S202, determine the user meeting system convention corresponding to target URL.
Certainly, step S203 to step S205 is the operation on backstage, in the foregrounding process of concrete manifestation, realizes specific as follows accordingly:
System inputs: keyword
System convention: accessed related web page in such as one week more than 3 times.Be limited in certain class website, such as novel class.
System exports: the user group accessing keyword related web page in the recent period
For such processing procedure, concrete example is as follows:
Input keyword: BMW x5;
Output rusults: the user list of information corresponding to this keyword of preference.
Internal system realization flow is as follows:
Steps A, receive keyword after, in the webpage inverted index information that step S201 generates, navigate to all URL corresponding to keyword.
The screening rule that step B, basis are preset, can suitably reduce URL scope, such as, get the url list of top100 or 1000 as related content.
Step C, the url list will determined in step B, search in the URL generated in step S202 one by one and the inverted index information of user.
Step D, to screen according to system convention, such as access times, cycle etc.
Step e, determine screen after user list be required user group.
It is further noted that, the above-mentioned direct colony carrying out user according to keyword can meet the requirement of the overwhelming majority as basic function, but, for the user group of the personalization of example service-oriented further extracts optimization, following optimal screening process can be carried out.
Such as:
For keyword " I is singer ", this is the label required for a service feature, actually can correspond to " HNTV " & " I is singer " & " point in evening 10 Saturday " a series of like this keyword.
For another example:
" swordsman's class " is the label needed for service feature, actually can correspond to " swordsman ", and may be defined as categories of websites in system convention is novel class.
By above-mentioned mapping definition method, can service-oriented be accomplished, the configuration rule in system is set flexibly, convenient.
Compared with prior art, the technical scheme that the embodiment of the present invention proposes has the following advantages:
By the technical scheme that the application embodiment of the present invention proposes, when online preferences user colony determined by needs, corresponding target URL determined in keyword corresponding to online preferences user colony, and in conjunction with the inverted index information corresponding to this target URL, determine that the user corresponding to each user ID meet user's screening conditions to the access times of this target URL forms this online preferences user colony, thus, make full use of inverted index information high-performance, the feature of high flexibility ratio, realize the quick obtaining of online preferences user colony, avoid mass data record and the consumption of mating the system resource brought, improve treatment effeciency and the screening accuracy of online preferences user colony deterministic process.
Further, in order to realize above-mentioned technical scheme, the embodiment of the present invention further provides a kind of network equipment, and its structural representation as shown in Figure 3, specifically comprises:
Generation module 31, for traveling through user's internet log record to be analyzed, generate the inverted index information corresponding to each URL included in described user's internet log record respectively, wherein, inverted index information corresponding to a URL specifically comprises the user ID of accessing described URL, and described user ID is to the access characteristic information of described URL;
URL screens module 32, during for determining online preferences user colony at needs, selects the one or more keywords corresponding to described online preferences user colony, and determines corresponding target URL according to the keyword selected;
User screens module 33, screen the corresponding inverted index information of the determined target URL of module 32 for what generate according to described generation module 31 with described URL, determine to form described online preferences user colony to the user corresponding to each user ID of the access characteristic information conforms user screening conditions of described target URL.
Preferably, described URL screens module 32, specifically for:
Inverted index information corresponding to selected keyword, determine that the occurrence number of described keyword meets the target URL of URL corresponding to described keyword of URL screening conditions, wherein, inverted index information corresponding to a keyword specifically comprises the URL of the webpage containing described keyword, and the occurrence number of described keyword in described webpage; Or,
According to selected keyword Webpage searching result in a search engine, determine the target URL of URL corresponding to described keyword of the webpage of satisfied 2nd URL screening conditions.
Preferably, described URL screens module 32, also for:
Service feature information corresponding to selected keyword, screens determined target URL.
Preferably, described generation module 31, also for:
According to the needs of different analytical cycles, inverted index information corresponding under different time interval is generated respectively to same URL, and carry different timestamp informations respectively.
Preferably, described user screens module 33, specifically for:
The corresponding inverted index information of the determined target URL of module 32 is screened with described URL according to what generate according to described generation module 31, and the timestamp information carried, determine to the access times of described target URL and meet user's screening conditions access cycle each user ID corresponding to user form described online preferences user colony.
Compared with prior art, the technical scheme that the embodiment of the present invention proposes has the following advantages:
By the technical scheme that the application embodiment of the present invention proposes, when online preferences user colony determined by needs, corresponding target URL determined in keyword corresponding to online preferences user colony, and in conjunction with the inverted index information corresponding to this target URL, determine that the user corresponding to each user ID meet user's screening conditions to the access times of this target URL forms this online preferences user colony, thus, make full use of inverted index information high-performance, the feature of high flexibility ratio, realize the quick obtaining of online preferences user colony, avoid mass data record and the consumption of mating the system resource brought, improve treatment effeciency and the screening accuracy of online preferences user colony deterministic process.
Through the above description of the embodiments, those skilled in the art can be well understood to the mode that the present invention can add required general hardware platform by software and realize, and can certainly pass through hardware, but in a lot of situation, the former is better embodiment.Based on such understanding, technical scheme of the present invention can embody with the form of software product the part that prior art contributes in essence in other words, this computer software product is stored in a storage medium, comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) perform method described in each embodiment of the present invention.
It will be appreciated by those skilled in the art that accompanying drawing is the schematic diagram of a preferred embodiment, the module in accompanying drawing or flow process might not be that enforcement the present invention is necessary.
It will be appreciated by those skilled in the art that the module in the device in embodiment can carry out being distributed in the device of embodiment according to embodiment description, also can carry out respective change and be arranged in the one or more devices being different from the present embodiment.The module of above-described embodiment can merge into a module, also can split into multiple submodule further.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
Be only several specific embodiment of the present invention above, but the present invention is not limited thereto, the changes that any person skilled in the art can think of all should fall into protection scope of the present invention.

Claims (10)

1. to surf the Net the defining method of preferences user colony, it is characterized in that, comprising:
Travel through user's internet log record to be analyzed, generate the inverted index information corresponding to each URL included in described user's internet log record respectively, wherein, inverted index information corresponding to a URL specifically comprises the user ID of accessing described URL, and described user ID is to the access characteristic information of described URL;
When online preferences user colony determined by needs, select the one or more keywords corresponding to described online preferences user colony, and determine corresponding target URL according to the keyword selected;
Inverted index information corresponding to determined target URL, determines to form described online preferences user colony to the user corresponding to each user ID of the access characteristic information conforms user screening conditions of described target URL.
2. the method for claim 1, it is characterized in that, described when online preferences user colony determined by needs, select the one or more keywords corresponding to described online preferences user colony, and determine corresponding target URL according to the keyword selected, specifically comprise:
Inverted index information corresponding to selected keyword, determine that the occurrence number of described keyword meets the target URL of URL corresponding to described keyword of URL screening conditions, wherein, inverted index information corresponding to a keyword specifically comprises the URL of the webpage containing described keyword, and the occurrence number of described keyword in described webpage; Or,
According to selected keyword Webpage searching result in a search engine, determine the target URL of URL corresponding to described keyword of the webpage of satisfied 2nd URL screening conditions.
3. method as claimed in claim 2, it is characterized in that, described when online preferences user colony determined by needs, select the one or more keywords corresponding to described online preferences user colony, and determine corresponding target URL according to the keyword selected, also comprise:
Service feature information corresponding to selected keyword, screens determined target URL.
4. the method for claim 1, is characterized in that, user's internet log record that described traversal is to be analyzed, generates the inverted index information corresponding to each URL included in described user's internet log record respectively, also comprises:
According to the needs of different analytical cycles, inverted index information corresponding under different time interval is generated respectively to same URL, and carry different timestamp informations respectively.
5. method as claimed in claim 4, it is characterized in that, described inverted index information corresponding to determined target URL, determine to form described online preferences user colony to the user corresponding to each user ID of the access characteristic information conforms user screening conditions of described target URL, specifically comprise:
Inverted index information corresponding to determined target URL, and the timestamp information carried, determine to the access times of described target URL and meet user's screening conditions access cycle each user ID corresponding to user form described online preferences user colony.
6. a network equipment, is characterized in that, comprising:
Generation module, for traveling through user's internet log record to be analyzed, generate the inverted index information corresponding to each URL included in described user's internet log record respectively, wherein, inverted index information corresponding to a URL specifically comprises the user ID of accessing described URL, and described user ID is to the access characteristic information of described URL;
URL screens module, during for determining online preferences user colony at needs, selects the one or more keywords corresponding to described online preferences user colony, and determines corresponding target URL according to the keyword selected;
User screens module, screen the corresponding inverted index information of the determined target URL of module for what generate according to described generation module with described URL, determine to form described online preferences user colony to the user corresponding to each user ID of the access characteristic information conforms user screening conditions of described target URL.
7. the network equipment as claimed in claim 6, is characterized in that, described URL screens module, specifically for:
Inverted index information corresponding to selected keyword, determine that the occurrence number of described keyword meets the target URL of URL corresponding to described keyword of URL screening conditions, wherein, inverted index information corresponding to a keyword specifically comprises the URL of the webpage containing described keyword, and the occurrence number of described keyword in described webpage; Or,
According to selected keyword Webpage searching result in a search engine, determine the target URL of URL corresponding to described keyword of the webpage of satisfied 2nd URL screening conditions.
8. the network equipment as claimed in claim 7, is characterized in that, described URL screens module, also for:
Service feature information corresponding to selected keyword, screens determined target URL.
9. network equipment method as claimed in claim 6, is characterized in that, described generation module, also for:
According to the needs of different analytical cycles, inverted index information corresponding under different time interval is generated respectively to same URL, and carry different timestamp informations respectively.
10. the network equipment as claimed in claim 9, it is characterized in that, described user screens module, specifically for:
The corresponding inverted index information of the determined target URL of module is screened with described URL according to what generate according to described generation module, and the timestamp information carried, determine to the access times of described target URL and meet user's screening conditions access cycle each user ID corresponding to user form described online preferences user colony.
CN201310752439.5A 2013-12-31 2013-12-31 A kind of determining method and apparatus for the preferences user group that surfs the Internet Active CN104750752B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310752439.5A CN104750752B (en) 2013-12-31 2013-12-31 A kind of determining method and apparatus for the preferences user group that surfs the Internet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310752439.5A CN104750752B (en) 2013-12-31 2013-12-31 A kind of determining method and apparatus for the preferences user group that surfs the Internet

Publications (2)

Publication Number Publication Date
CN104750752A true CN104750752A (en) 2015-07-01
CN104750752B CN104750752B (en) 2018-06-15

Family

ID=53590447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310752439.5A Active CN104750752B (en) 2013-12-31 2013-12-31 A kind of determining method and apparatus for the preferences user group that surfs the Internet

Country Status (1)

Country Link
CN (1) CN104750752B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145934A (en) * 2017-12-22 2019-01-04 北京数安鑫云信息技术有限公司 User behavior data processing method, medium, equipment and device based on log
CN109299084A (en) * 2018-10-24 2019-02-01 北京小米移动软件有限公司 User's representation data filter method and device
CN112291622A (en) * 2020-10-30 2021-01-29 中国建设银行股份有限公司 Method and device for determining favorite internet surfing time period of user

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090006371A1 (en) * 2007-06-29 2009-01-01 Fuji Xerox Co., Ltd. System and method for recommending information resources to user based on history of user's online activity
CN101814083A (en) * 2010-01-08 2010-08-25 上海复歌信息科技有限公司 Automatic webpage classification method and system
CN102402566A (en) * 2011-08-09 2012-04-04 江苏欣网视讯科技有限公司 Web user behavior analysis method based on Chinese webpage automatic classification technology
CN103338260A (en) * 2013-07-04 2013-10-02 武汉世纪金桥安全技术有限公司 Distributed analytical system and analytical method for URL logs in network auditing
CN103383685A (en) * 2012-05-02 2013-11-06 腾讯科技(深圳)有限公司 Method and device for keyword attribute quantification based on user click data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090006371A1 (en) * 2007-06-29 2009-01-01 Fuji Xerox Co., Ltd. System and method for recommending information resources to user based on history of user's online activity
CN101814083A (en) * 2010-01-08 2010-08-25 上海复歌信息科技有限公司 Automatic webpage classification method and system
CN102402566A (en) * 2011-08-09 2012-04-04 江苏欣网视讯科技有限公司 Web user behavior analysis method based on Chinese webpage automatic classification technology
CN103383685A (en) * 2012-05-02 2013-11-06 腾讯科技(深圳)有限公司 Method and device for keyword attribute quantification based on user click data
CN103338260A (en) * 2013-07-04 2013-10-02 武汉世纪金桥安全技术有限公司 Distributed analytical system and analytical method for URL logs in network auditing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
易红: "基于数据挖掘的手机上网用户偏好应用模型和套餐生舱模型研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145934A (en) * 2017-12-22 2019-01-04 北京数安鑫云信息技术有限公司 User behavior data processing method, medium, equipment and device based on log
CN109145934B (en) * 2017-12-22 2019-05-21 北京数安鑫云信息技术有限公司 User behavior data processing method, medium, equipment and device based on log
CN109299084A (en) * 2018-10-24 2019-02-01 北京小米移动软件有限公司 User's representation data filter method and device
CN109299084B (en) * 2018-10-24 2022-04-01 北京小米移动软件有限公司 User portrait data filtering method and device
CN112291622A (en) * 2020-10-30 2021-01-29 中国建设银行股份有限公司 Method and device for determining favorite internet surfing time period of user
CN112291622B (en) * 2020-10-30 2022-05-27 中国建设银行股份有限公司 Method and device for determining favorite internet surfing time period of user

Also Published As

Publication number Publication date
CN104750752B (en) 2018-06-15

Similar Documents

Publication Publication Date Title
CN102999586B (en) A kind of method and apparatus of recommendation of websites
KR101463974B1 (en) Big data analysis system for marketing and method thereof
TWI533246B (en) Method and system for discovery of user unknown interests
CN106021583B (en) Statistical method and system for page flow data
CN103186539A (en) Method and system for confirming user groups, inquiring information and recommending
CN107329983B (en) Machine data distributed storage and reading method and system
US20130185429A1 (en) Processing Store Visiting Data
WO2014180130A1 (en) Method and system for recommending contents
CN103607496A (en) A method and an apparatus for deducting interests and hobbies of handset users and a handset terminal
CN102955810B (en) A kind of Web page classification method and equipment
CN104423621A (en) Pinyin string processing method and device
CN103617266A (en) Personalized extension search method, device and system
CN103744856A (en) Method, device and system for linkage extended search
CN107861981A (en) A kind of data processing method and device
CN103186666A (en) Method, device and equipment for searching based on favorites
CN111159563A (en) Method, device and equipment for determining user interest point information and storage medium
CN102955802A (en) Method and device for acquiring data from data reports
CN103200269A (en) Internet information statistical method and Internet information statistical system
KR101682659B1 (en) Method for customized news alarm based on keyword and management server for news search for the same
CN111368227A (en) URL processing method and device
CN104615723B (en) The determination method and apparatus of query word weighted value
CN106874509B (en) Resource recommendation method and device based on medium-granularity user grouping
CN104123321B (en) A kind of determining method and device for recommending picture
CN104750752A (en) Determination method and device of user community with internet-surfing preference
CN104484367A (en) Data mining and analyzing system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant