CN103607496B - A method and an apparatus for deducting interests and hobbies of handset users and a handset terminal - Google Patents
A method and an apparatus for deducting interests and hobbies of handset users and a handset terminal Download PDFInfo
- Publication number
- CN103607496B CN103607496B CN201310573351.7A CN201310573351A CN103607496B CN 103607496 B CN103607496 B CN 103607496B CN 201310573351 A CN201310573351 A CN 201310573351A CN 103607496 B CN103607496 B CN 103607496B
- Authority
- CN
- China
- Prior art keywords
- key word
- network address
- search
- cellphone subscriber
- hobby
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The invention is applicable to the technical field of communication, and provides a method and an apparatus for deducting interests and hobbies of handset users and a handset terminal. The method comprises the following steps: reading browse history files in handset browsers, analyzing the browsing history files, obtaining keywords of handset user search histories and browsed Web addresses, classifying the keywords of the handset user search histories and the browsed Web addresses, carrying out statistical analysis of keywords of each class and search frequencies of the browsed Web addresses in order to deduct the interests and the hobbies of the handset users according to the high-low degree of the search frequencies. According to the method and the apparatus for deducting the interests and the hobbies of the handset users and the handset terminal of the invention, through browsing the history files, obtaining the keywords and the browsed Web addresses, and carrying out the statistical analysis of the keywords of each class and the search frequencies of the browsed Web addresses, the interests and the hobbies of the handset users can be effectively deducted according to the high-low degree of the search frequencies.
Description
Technical field
The invention belongs to communication technique field, more particularly to a kind of method for inferring cellphone subscriber's hobby, device and
Mobile phone terminal.
Background technology
With the popularization of mobile phone, cellphone subscriber is continuously increased, and mobile phone netizen is also being continuously increased.According to relevant data statisticss,
Domestic cellphone subscriber's total amount has reached 9.3 hundred million within 2011, and surfing Internet with cell phone user is more than 3.9 hundred million.Additionally, according to DCCI the Internets
Data center's prediction, will be up to 7.20 hundred million to China mobile netizen in 2013, and mobile phone netizen will surmount computer netizen.Mobile phone browser
The instrument of webpage is browsed by mobile phone as netizen, with extraordinary development prospect.
In prior art, common mobile phone terminal does not possess the function of inferring cellphone subscriber's hobby.
The content of the invention
The purpose of the embodiment of the present invention is to provide a kind of method for inferring cellphone subscriber's hobby, it is intended to solved common
Mobile phone terminal do not possess infer cellphone subscriber's hobby function problem.
The embodiment of the present invention is achieved in that a kind of method of deduction cellphone subscriber's hobby, and methods described includes:
Read in mobile phone browser and browse log file;
Parsing browses log file, obtains the key word of cellphone subscriber's historical search and browses network address;
To the key word of historical search and browse network address and classify respectively;
Count lower key word of all categories and browse the search rate of network address, to infer mobile phone according to the height of search rate
User interest is liked.
The embodiment of the present invention additionally provides a kind of device of deduction cellphone subscriber's hobby, and described device includes:
Reading unit, for reading mobile phone browser in browse log file;
Acquiring unit, for parsing log file is browsed, and is obtained the key word of cellphone subscriber's historical search and is browsed network address;
Taxon, for the key word of historical search and browsing network address and classifying respectively;
Statistical inference unit, for the search rate for counting lower key word of all categories He browse network address, with according to search frequency
The height of rate is inferring cellphone subscriber's hobby.
The embodiment of the present invention additionally provides a kind of mobile phone terminal, and the mobile phone terminal includes above-mentioned device.
Compared with prior art, beneficial effect is the embodiment of the present invention:Browse log file, obtain crucial by reading
Word and browse network address, and count lower key word of all categories and browse the search rate of network address, can effectively according to the height of frequency
To infer cellphone subscriber's hobby.
Description of the drawings
Fig. 1 is the flow chart of the method for deduction cellphone subscriber's hobby provided in an embodiment of the present invention;
Fig. 2 is the first logical schematic of the device of deduction cellphone subscriber's hobby provided in an embodiment of the present invention;
Fig. 3 is the second logical schematic of the device of deduction cellphone subscriber's hobby provided in an embodiment of the present invention;
Fig. 4 is the 3rd logical schematic of the device of deduction cellphone subscriber's hobby provided in an embodiment of the present invention.
Specific embodiment
In order that the objects, technical solutions and advantages of the present invention become more apparent, it is right below in conjunction with drawings and Examples
The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, and
It is not used in the restriction present invention.
Embodiment provided in an embodiment of the present invention is as follows:
To facilitate understanding of the present embodiment of the invention, first here introduce the embodiment of the present invention description in can introduce it is several will
Element:
Domain name(Domain Name):
Domain name, by a certain computer on the Internet that constitutes of name for being separated with point for a string or calculates the name of unit
Claim, for the electronic bearing for identifying computer in data transfer(Sometimes referred to as geographical position, geographic domain name, have referred to
One local area of administrative autonomy power).Domain name is that have " mask " in an IP address.The purpose of one domain name is easy for memory
With the address of the one group of server linked up(Website, Email, FTP etc.).Domain name is used as unforgettable the Internet ginseng in power
With the title of person, such as computer, mobile phone terminal, network and service.
One complete domain name is made up of two or two above sections, is divided with the fullstop ". " of English between each several part
Every for example following domain name:Yahoo.com, yahoo.ca.us, yahoo.co.uk.Wherein first domain name is made up of two parts,
Second domain name and the 3rd domain name are made up of three parts.In a complete domain name, the right-hand component of last ". " claims
For TLD or top-level domain(TLD), in superincumbent domain name example, com, us and uk are TLDs.Last ". "
Left-hand component be referred to as second level domain(SLD), for example, yahoo is second level domain in domain name yahoo.com, domain name
Ca is second level domain in yahoo.ca.us, and co is second level domain in domain name yahoo.co.uk.The left-hand component of second level domain
Referred to as three-level domain name, the left-hand component of three-level domain name is referred to as level Four domain name, by that analogy.For example, domain name yahoo.ca.us and
Yahoo is three-level domain name in yahoo.co.uk.
The definition of B+ trees and feature:
First, the B+ trees of a m rank are defined as:
(1) each node can have up to m element;
(2) in addition to root node, the individual element of each node a minimum of (m/2);
(3) if root node is not leaf node, then its a minimum of 2 child nodes;
(4) all of leaf node is all in same layer;
(5) non-leaf nodes for having k child nodes have (k-1) individual element, arrange by ascending order;
(6) element in the left subtree of certain element is all less than that, and the element of right subtree is both greater than or equal to it;
(7) non-leaf nodes only deposits keyword and points to the index of next child nodes, and record is stored only in leaf
In node;
(8) it is connected with pointer between adjacent leaf node.
2nd, B+ trees have the special feature that and are:
(1) all keywords are appeared in the chained list of leafy node(Dense index), and the keyword in chained list is lucky
It is ordered into;
(2) can not possibly hit in n omicronn-leaf child node;
(3) n omicronn-leaf child node is the equal of the index of leafy node(Sparse index), leafy node is the equal of storage(Close
Key word)The data Layer of data;
(4) it is more suitable for document indexing system;
Fig. 1 is referred to, the embodiment of the present invention provides a kind of method for inferring cellphone subscriber's hobby, methods described bag
Include:
101st, read in mobile phone browser and browse log file.
In mobile phone browser log file is browsed including journal file and cache file.
Cellphone subscriber owns by the mobile phone browser of mobile phone or other mobile phone browser internet softwares of installation
Browse record all record correspondence browser journal file and cache file in.
For different browsers, the position that its journal file is stored on mobile phone with cache file is different, during reading
Can be distinguished by the suffix name of file, because the journal file and cache file of different browsers there are different suffix
Name.
102nd, parsing browses log file, obtains the key word of cellphone subscriber's historical search and browses network address.
By parsing journal file and cache file, it is possible to which obtain cellphone subscriber browses record, this browses record bag
Include be input into before user search key word, before browse network address.These key words reflect user's concern with network address is browsed
Field and hobby.
The key word of cellphone subscriber's historical search is obtained by way of parsing browses log file and browse network address, letter
It is single and easy to operate.
103rd, to historical search key word and browse network address and classify respectively.
In the present embodiment, the key word to historical search includes with browsing the step of network address is classified respectively:
To the key word of historical search according to semantic classification;
Network address is browsed according to domain name hierarchical classification from high to low to historical search.
In the present embodiment, it is preferred that methods described also includes:
Stored by the way of array it is of all categories under key word and corresponding key word input search rate, same class
Different key words under not are distinguished by the array index index set up come labelling;
Store by the way of B+ trees by domain name that from high to low hierarchical classification browses network address and the corresponding network address that browses
Search access frequency.
In the present embodiment, the frequency of key word and its input search is stored by way of array, can also pass through array
Subscript sets up index;Stored by way of B+ trees and browse network address and search access frequency, can be inquired about according to chained list during inquiry
Or tree query, both modes conveniently set up search index, are conducive to sorting and search, and execution efficiency is high.
In the present embodiment, the key word used by user is classified according to semantic similarity degree, each apoplexy due to endogenous wind takes frequency highest
Key word, when user is again turned on browser is ready for key word, these key words can be recommended user.
It is further comprising the steps of before 103 in the present embodiment:
Pre-set the classification of key word.
Preferably, it is possible to provide a setting interface, allow user that conventional classification is set according to the needs of oneself.Actually should
With in, can be classified only according to the classification that user is arranged, so as to by the interaction with user, can preferably infer user's
Hobby, is provided in other words the hobby of oneself, more direct convenience by user oneself.The classification that user is not provided with
Do not consider.
When classifying to key word, classification can be in advance set, for example, can be divided into amusement according to semanteme, be learnt, be done
Public affairs, leisure etc., first sort out all of key word, and key word is divided into into specific big apoplexy due to endogenous wind, and synonymous keyword is incorporated into
To a class.Then, the key word of specific big apoplexy due to endogenous wind is arranged from high to low according to frequency.Again by the frequency highest of big apoplexy due to endogenous wind
Key word takes out and is arranged according still further to frequency height, and the hobby of user is determined after arranging with this order.The use frequency of key word
Rate height represents the specific hobby of user.
The access times of the website accessed user are counted, using the data structure of B+ trees, by network address according to domain name
Hierarchical classification statistics from high to low, such as user accesses Baidu, and Baidu includes Baidupedia, Baidu's picture, Baidu's news etc.,
And the corresponding website of the specific content of Baidupedia belongs to the network address of highest domain name, its content for including is only the search of user
Final result.The frequency that counting user browses web sites, belonging to the website of the domain name that user is browsed the highest level of appointed website
The classification liked as user interest of classification, the website that will access domain name highest level arranges according to frequency height, according to row
Row result infers the hobby of user.
104th, count lower key word of all categories and browse the search rate of network address, to be inferred according to the height of search rate
Cellphone subscriber's hobby.
In the present embodiment, step 104 specifically includes following steps:
Count the search rate of lower key word of all categories;
Middle search rate highest key word of all categories is sorted from high to low according to frequency;
The search rate of network address is browsed under statistics is of all categories:
The network address of middle highest domain name of all categories is sorted from high to low according to frequency;
The hobby of cellphone subscriber is inferred according to two sequences.
In the present embodiment, key word that user is used and browse network address and be divided in corresponding classification, then by respective class
Key word in not and browse network address and arrange from high to low according to frequency and come out, so as to the classification that user most pays close attention to can be drawn
The classification least paid close attention to, with this hobby of user is inferred to.
In the present embodiment, it is preferred that methods described also includes:
According to cellphone subscriber's hobby to cellphone subscriber's recommended keywords, website or application.
According to cellphone subscriber's hobby, can be to its recommended keywords, related website or application.Can be clear for mobile phone
The device developer that lookes at provides facility, using the method for the present embodiment can facilitate to user recommend conventional search keyword, can also be
Recommend the website related to its hobby to user in the webpage for browsing, not only increase the function of browser, enhance mobile phone
The experience of operating system and software, facilitates cellphone subscriber, and Development of Web Browser business can earn the recommendation expense of advertisement, band with this
Carry out economic benefit.
In the present embodiment, it is preferred that methods described also includes:
Recommend middle search rate highest key word of all categories to cellphone subscriber;
Recommend the network address of middle search rate highest highest domain name of all categories to cellphone subscriber.
Browse to commonly used mobile phone browser to be input into certain fixed keyword or access certain mobile phone use for fixing website
Family offers convenience.
Fig. 2 is referred to, the embodiment of the present invention additionally provides a kind of device of deduction cellphone subscriber's hobby, described device
Including:
Reading unit 201, for reading mobile phone browser in browse log file;
Acquiring unit 202, for parsing log file is browsed, and is obtained the key word of cellphone subscriber's historical search and is browsed net
Location;
Taxon 203, for the key word of historical search and browsing network address and classifying respectively;
Preferably, taxon 203 also includes class Modules, for pre-setting the classification of key word.
Preferably, it is possible to provide a setting interface, allow user that conventional classification is set according to the needs of oneself.Actually should
With in, can be classified only according to the classification that user is arranged, so as to by the interaction with user, can preferably infer user's
Hobby, is provided in other words the hobby of oneself, more direct convenience by user oneself.The classification that user is not provided with
Do not consider.
Statistical inference unit 204, for the search rate for counting lower key word of all categories He browse network address, with according to search
The height of frequency is inferring cellphone subscriber's hobby.
Preferably, the taxon, searches specifically for the key word to historical search according to semantic classification and to history
Rope browses network address according to domain name hierarchical classification from high to low.
Fig. 3 is referred to, in the present embodiment, it is preferred that described device also includes:
Storage of array unit 301, for stored by the way of array it is of all categories under key word and corresponding key word
Input search rate, the different key words under same category by set up array index index come labelling differentiation;
B+ trees memory element 302, for being stored by the way of B+ trees by domain name, from high to low hierarchical classification browses net
Location and the corresponding search access frequency for browsing network address.
Fig. 4 is referred to, described device also includes:
Recommendation unit 401, for according to cellphone subscriber's hobby to cellphone subscriber's recommended keywords, website or application.
Detail schema in device described in method, will not be described here.
The embodiment of the present invention also provides a kind of mobile phone terminal, and the mobile phone terminal includes above-mentioned device.
The deduction method of cellphone subscriber's hobby of the present invention, device and mobile phone terminal, by reading record text is browsed
Part, obtain key word and browse network address, and count lower key word of all categories and browse the search rate of network address, can effective basis
The height of frequency is inferring cellphone subscriber's hobby.
Presently preferred embodiments of the present invention is the foregoing is only, not to limit the present invention, all essences in the present invention
Any modification, equivalent and improvement made within god and principle etc., should be included within the scope of the present invention.
Claims (8)
1. it is a kind of infer cellphone subscriber's hobby method, it is characterised in that methods described includes:
Read in mobile phone browser and browse log file;
Parsing browses log file, obtains the key word of cellphone subscriber's historical search and browses network address;
To the key word of historical search and browse network address and classify respectively;
Count lower key word of all categories and browse the search rate of network address, by the key word middle search rate highest of all categories
Key word sorts from high to low according to search rate, by the network address for browsing network address middle highest domain name of all categories according to search frequency
Rate sorts from high to low, to infer cellphone subscriber's hobby according to the height of search rate;
The key word to historical search includes with browsing the step of network address is classified respectively:
To the key word of historical search according to semantic classification;
Network address is browsed according to domain name hierarchical classification from high to low to historical search.
2. the method for claim 1, it is characterised in that methods described also includes:
Stored by the way of array it is of all categories under key word and corresponding key word input search rate, under same category
Different key words by set up array index index come labelling differentiation;
Store by domain name that from high to low hierarchical classification browses network address and the corresponding search for browsing network address by the way of B+ trees
Access frequency.
3. method as claimed in claim 1 or 2, it is characterised in that methods described also includes:
According to cellphone subscriber's hobby to cellphone subscriber's recommended keywords, website or application.
4. the method for claim 1, it is characterised in that the log file that browses is including journal file and caching text
Part.
5. it is a kind of infer cellphone subscriber's hobby device, it is characterised in that described device includes:
Reading unit, for reading mobile phone browser in browse log file;
Acquiring unit, for parsing log file is browsed, and is obtained the key word of cellphone subscriber's historical search and is browsed network address;
Taxon, for the key word of historical search and browsing network address and classifying respectively;
Statistical inference unit, it is for the search rate for counting lower key word of all categories He browse network address, the key word is all kinds of
Not middle search rate highest key word sorts from high to low according to search rate, and by described network address middle highest domain of all categories is browsed
The network address of name sorts from high to low according to search rate, to infer cellphone subscriber's hobby according to the height of search rate;
The taxon, network address is browsed specifically for the key word to historical search according to semantic classification and to historical search
According to domain name hierarchical classification from high to low.
6. device as claimed in claim 5, it is characterised in that described device also includes:
Storage of array unit, for stored by the way of array it is of all categories under key word and the input of corresponding key word search
Different key words under rope frequency, same category are distinguished by the array index index set up come labelling;
B+ tree memory element, for being stored by the way of B+ trees by domain name, from high to low hierarchical classification browses network address and correspondence
The search access frequency for browsing network address.
7. the device as described in claim 5 or 6, it is characterised in that described device also includes:
Recommendation unit, for according to cellphone subscriber's hobby to cellphone subscriber's recommended keywords, website or application.
8. a kind of mobile phone terminal, it is characterised in that the mobile phone terminal includes the device described in claim 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310573351.7A CN103607496B (en) | 2013-11-15 | 2013-11-15 | A method and an apparatus for deducting interests and hobbies of handset users and a handset terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310573351.7A CN103607496B (en) | 2013-11-15 | 2013-11-15 | A method and an apparatus for deducting interests and hobbies of handset users and a handset terminal |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103607496A CN103607496A (en) | 2014-02-26 |
CN103607496B true CN103607496B (en) | 2017-04-19 |
Family
ID=50125696
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310573351.7A Active CN103607496B (en) | 2013-11-15 | 2013-11-15 | A method and an apparatus for deducting interests and hobbies of handset users and a handset terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103607496B (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103810295A (en) * | 2014-03-06 | 2014-05-21 | 北京邮电大学 | Method and device for extracting internet data |
CN103955464B (en) * | 2014-03-25 | 2017-10-03 | 南京邮电大学 | It is a kind of that the recommendation method perceived is merged based on situation |
CN105095303B (en) * | 2014-05-19 | 2021-08-31 | 腾讯科技(深圳)有限公司 | Quick link pushing method and quick link pushing device |
CN103970891B (en) * | 2014-05-23 | 2017-08-25 | 三星电子(中国)研发中心 | A kind of user interest information querying method based on situation |
CN105095363A (en) * | 2015-06-26 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Invitation commenting method and device for sites |
CN105653686A (en) * | 2015-12-30 | 2016-06-08 | 赛尔网络有限公司 | Domain name network address activeness statistics method and system |
CN105791100A (en) * | 2016-05-11 | 2016-07-20 | 潘成军 | Chat information prompt method |
WO2018023683A1 (en) * | 2016-08-05 | 2018-02-08 | 吴晓敏 | Usage data statistical method for point of interest capturing technology and recognition system |
WO2018023684A1 (en) * | 2016-08-05 | 2018-02-08 | 吴晓敏 | Information pushing method during recognition of user's interests and recognition system |
CN108229991B (en) * | 2016-12-15 | 2022-04-29 | 北京奇虎科技有限公司 | Method and device for displaying aggregation promotion information, browser and terminal equipment |
CN108205555A (en) * | 2016-12-19 | 2018-06-26 | 北京奇虎科技有限公司 | Information recommendation method, device, browser and terminal device |
CN108804431A (en) * | 2017-04-26 | 2018-11-13 | 广东原昇信息科技有限公司 | A kind of keyword effect analysis method based on big data |
CN108595461B (en) * | 2018-01-05 | 2021-03-16 | 武汉斗鱼网络科技有限公司 | Interest exploration method, storage medium, electronic device and system |
CN108710622B (en) * | 2018-03-13 | 2022-12-27 | 星际数科科技股份有限公司 | Webpage information recommendation method and system based on machine learning |
CN115033146A (en) * | 2022-06-29 | 2022-09-09 | 深圳市沃特沃德信息有限公司 | Method and device for replacing application icon, computer equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7631032B1 (en) * | 1998-01-30 | 2009-12-08 | Net-Express, Ltd. | Personalized internet interaction by adapting a page format to a user record |
CN102831199A (en) * | 2012-08-07 | 2012-12-19 | 北京奇虎科技有限公司 | Method and device for establishing interest model |
-
2013
- 2013-11-15 CN CN201310573351.7A patent/CN103607496B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7631032B1 (en) * | 1998-01-30 | 2009-12-08 | Net-Express, Ltd. | Personalized internet interaction by adapting a page format to a user record |
CN102831199A (en) * | 2012-08-07 | 2012-12-19 | 北京奇虎科技有限公司 | Method and device for establishing interest model |
Also Published As
Publication number | Publication date |
---|---|
CN103607496A (en) | 2014-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103607496B (en) | A method and an apparatus for deducting interests and hobbies of handset users and a handset terminal | |
CN103218431B (en) | A kind ofly can identify the system that info web gathers automatically | |
CN102831199B (en) | Method and device for establishing interest model | |
Li et al. | Community detection using hierarchical clustering based on edge-weighted similarity in cloud environment | |
White et al. | Predicting user interests from contextual information | |
Szomszor et al. | Semantic modelling of user interests based on cross-folksonomy analysis | |
Agre et al. | Keyword focused web crawler | |
US20060287988A1 (en) | Keyword charaterization and application | |
CN111708740A (en) | Mass search query log calculation analysis system based on cloud platform | |
CN106383887A (en) | Environment-friendly news data acquisition and recommendation display method and system | |
CN1930566A (en) | Systems and methods for search query processing using trend analysis | |
WO2011063035A1 (en) | A method and system to contextualize information being displayed to a user | |
Zhao et al. | Topic-centric and semantic-aware retrieval system for internet of things | |
Prajapati | A survey paper on hyperlink-induced topic search (HITS) algorithms for web mining | |
Saranya et al. | A personalized online news recommendation system | |
Papneja et al. | Context aware personalized content recommendation using ontology based spreading activation | |
CN110955855B (en) | Information interception method, device and terminal | |
Xu et al. | Lightweight tag-aware personalized recommendation on the social web using ontological similarity | |
Han et al. | Folksonomy-based ontological user interest profile modeling and its application in personalized search | |
Ramanathan et al. | Creating user profiles using wikipedia | |
CN103823805A (en) | Community-based related post recommendation system and method | |
Maratea et al. | An heuristic approach to page recommendation in web usage mining | |
CN114625973A (en) | Anonymous information cross-domain recommendation method and device, electronic equipment and storage medium | |
Li et al. | A hierarchical entity-based approach to structuralize user generated content in social media: A case of Yahoo! answers | |
Rana et al. | Analysis of web mining technology and their impact on semantic web |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |