CN103607496B - A method and an apparatus for deducting interests and hobbies of handset users and a handset terminal - Google Patents

A method and an apparatus for deducting interests and hobbies of handset users and a handset terminal Download PDF

Info

Publication number
CN103607496B
CN103607496B CN201310573351.7A CN201310573351A CN103607496B CN 103607496 B CN103607496 B CN 103607496B CN 201310573351 A CN201310573351 A CN 201310573351A CN 103607496 B CN103607496 B CN 103607496B
Authority
CN
China
Prior art keywords
key word
network address
search
cellphone subscriber
hobby
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310573351.7A
Other languages
Chinese (zh)
Other versions
CN103607496A (en
Inventor
李翔宇
张潇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201310573351.7A priority Critical patent/CN103607496B/en
Publication of CN103607496A publication Critical patent/CN103607496A/en
Application granted granted Critical
Publication of CN103607496B publication Critical patent/CN103607496B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention is applicable to the technical field of communication, and provides a method and an apparatus for deducting interests and hobbies of handset users and a handset terminal. The method comprises the following steps: reading browse history files in handset browsers, analyzing the browsing history files, obtaining keywords of handset user search histories and browsed Web addresses, classifying the keywords of the handset user search histories and the browsed Web addresses, carrying out statistical analysis of keywords of each class and search frequencies of the browsed Web addresses in order to deduct the interests and the hobbies of the handset users according to the high-low degree of the search frequencies. According to the method and the apparatus for deducting the interests and the hobbies of the handset users and the handset terminal of the invention, through browsing the history files, obtaining the keywords and the browsed Web addresses, and carrying out the statistical analysis of the keywords of each class and the search frequencies of the browsed Web addresses, the interests and the hobbies of the handset users can be effectively deducted according to the high-low degree of the search frequencies.

Description

A kind of method for inferring cellphone subscriber's hobby, device and mobile phone terminal
Technical field
The invention belongs to communication technique field, more particularly to a kind of method for inferring cellphone subscriber's hobby, device and Mobile phone terminal.
Background technology
With the popularization of mobile phone, cellphone subscriber is continuously increased, and mobile phone netizen is also being continuously increased.According to relevant data statisticss, Domestic cellphone subscriber's total amount has reached 9.3 hundred million within 2011, and surfing Internet with cell phone user is more than 3.9 hundred million.Additionally, according to DCCI the Internets Data center's prediction, will be up to 7.20 hundred million to China mobile netizen in 2013, and mobile phone netizen will surmount computer netizen.Mobile phone browser The instrument of webpage is browsed by mobile phone as netizen, with extraordinary development prospect.
In prior art, common mobile phone terminal does not possess the function of inferring cellphone subscriber's hobby.
The content of the invention
The purpose of the embodiment of the present invention is to provide a kind of method for inferring cellphone subscriber's hobby, it is intended to solved common Mobile phone terminal do not possess infer cellphone subscriber's hobby function problem.
The embodiment of the present invention is achieved in that a kind of method of deduction cellphone subscriber's hobby, and methods described includes:
Read in mobile phone browser and browse log file;
Parsing browses log file, obtains the key word of cellphone subscriber's historical search and browses network address;
To the key word of historical search and browse network address and classify respectively;
Count lower key word of all categories and browse the search rate of network address, to infer mobile phone according to the height of search rate User interest is liked.
The embodiment of the present invention additionally provides a kind of device of deduction cellphone subscriber's hobby, and described device includes:
Reading unit, for reading mobile phone browser in browse log file;
Acquiring unit, for parsing log file is browsed, and is obtained the key word of cellphone subscriber's historical search and is browsed network address;
Taxon, for the key word of historical search and browsing network address and classifying respectively;
Statistical inference unit, for the search rate for counting lower key word of all categories He browse network address, with according to search frequency The height of rate is inferring cellphone subscriber's hobby.
The embodiment of the present invention additionally provides a kind of mobile phone terminal, and the mobile phone terminal includes above-mentioned device.
Compared with prior art, beneficial effect is the embodiment of the present invention:Browse log file, obtain crucial by reading Word and browse network address, and count lower key word of all categories and browse the search rate of network address, can effectively according to the height of frequency To infer cellphone subscriber's hobby.
Description of the drawings
Fig. 1 is the flow chart of the method for deduction cellphone subscriber's hobby provided in an embodiment of the present invention;
Fig. 2 is the first logical schematic of the device of deduction cellphone subscriber's hobby provided in an embodiment of the present invention;
Fig. 3 is the second logical schematic of the device of deduction cellphone subscriber's hobby provided in an embodiment of the present invention;
Fig. 4 is the 3rd logical schematic of the device of deduction cellphone subscriber's hobby provided in an embodiment of the present invention.
Specific embodiment
In order that the objects, technical solutions and advantages of the present invention become more apparent, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, and It is not used in the restriction present invention.
Embodiment provided in an embodiment of the present invention is as follows:
To facilitate understanding of the present embodiment of the invention, first here introduce the embodiment of the present invention description in can introduce it is several will Element:
Domain name(Domain Name):
Domain name, by a certain computer on the Internet that constitutes of name for being separated with point for a string or calculates the name of unit Claim, for the electronic bearing for identifying computer in data transfer(Sometimes referred to as geographical position, geographic domain name, have referred to One local area of administrative autonomy power).Domain name is that have " mask " in an IP address.The purpose of one domain name is easy for memory With the address of the one group of server linked up(Website, Email, FTP etc.).Domain name is used as unforgettable the Internet ginseng in power With the title of person, such as computer, mobile phone terminal, network and service.
One complete domain name is made up of two or two above sections, is divided with the fullstop ". " of English between each several part Every for example following domain name:Yahoo.com, yahoo.ca.us, yahoo.co.uk.Wherein first domain name is made up of two parts, Second domain name and the 3rd domain name are made up of three parts.In a complete domain name, the right-hand component of last ". " claims For TLD or top-level domain(TLD), in superincumbent domain name example, com, us and uk are TLDs.Last ". " Left-hand component be referred to as second level domain(SLD), for example, yahoo is second level domain in domain name yahoo.com, domain name Ca is second level domain in yahoo.ca.us, and co is second level domain in domain name yahoo.co.uk.The left-hand component of second level domain Referred to as three-level domain name, the left-hand component of three-level domain name is referred to as level Four domain name, by that analogy.For example, domain name yahoo.ca.us and Yahoo is three-level domain name in yahoo.co.uk.
The definition of B+ trees and feature:
First, the B+ trees of a m rank are defined as:
(1) each node can have up to m element;
(2) in addition to root node, the individual element of each node a minimum of (m/2);
(3) if root node is not leaf node, then its a minimum of 2 child nodes;
(4) all of leaf node is all in same layer;
(5) non-leaf nodes for having k child nodes have (k-1) individual element, arrange by ascending order;
(6) element in the left subtree of certain element is all less than that, and the element of right subtree is both greater than or equal to it;
(7) non-leaf nodes only deposits keyword and points to the index of next child nodes, and record is stored only in leaf In node;
(8) it is connected with pointer between adjacent leaf node.
2nd, B+ trees have the special feature that and are:
(1) all keywords are appeared in the chained list of leafy node(Dense index), and the keyword in chained list is lucky It is ordered into;
(2) can not possibly hit in n omicronn-leaf child node;
(3) n omicronn-leaf child node is the equal of the index of leafy node(Sparse index), leafy node is the equal of storage(Close Key word)The data Layer of data;
(4) it is more suitable for document indexing system;
Fig. 1 is referred to, the embodiment of the present invention provides a kind of method for inferring cellphone subscriber's hobby, methods described bag Include:
101st, read in mobile phone browser and browse log file.
In mobile phone browser log file is browsed including journal file and cache file.
Cellphone subscriber owns by the mobile phone browser of mobile phone or other mobile phone browser internet softwares of installation Browse record all record correspondence browser journal file and cache file in.
For different browsers, the position that its journal file is stored on mobile phone with cache file is different, during reading Can be distinguished by the suffix name of file, because the journal file and cache file of different browsers there are different suffix Name.
102nd, parsing browses log file, obtains the key word of cellphone subscriber's historical search and browses network address.
By parsing journal file and cache file, it is possible to which obtain cellphone subscriber browses record, this browses record bag Include be input into before user search key word, before browse network address.These key words reflect user's concern with network address is browsed Field and hobby.
The key word of cellphone subscriber's historical search is obtained by way of parsing browses log file and browse network address, letter It is single and easy to operate.
103rd, to historical search key word and browse network address and classify respectively.
In the present embodiment, the key word to historical search includes with browsing the step of network address is classified respectively:
To the key word of historical search according to semantic classification;
Network address is browsed according to domain name hierarchical classification from high to low to historical search.
In the present embodiment, it is preferred that methods described also includes:
Stored by the way of array it is of all categories under key word and corresponding key word input search rate, same class Different key words under not are distinguished by the array index index set up come labelling;
Store by the way of B+ trees by domain name that from high to low hierarchical classification browses network address and the corresponding network address that browses Search access frequency.
In the present embodiment, the frequency of key word and its input search is stored by way of array, can also pass through array Subscript sets up index;Stored by way of B+ trees and browse network address and search access frequency, can be inquired about according to chained list during inquiry Or tree query, both modes conveniently set up search index, are conducive to sorting and search, and execution efficiency is high.
In the present embodiment, the key word used by user is classified according to semantic similarity degree, each apoplexy due to endogenous wind takes frequency highest Key word, when user is again turned on browser is ready for key word, these key words can be recommended user.
It is further comprising the steps of before 103 in the present embodiment:
Pre-set the classification of key word.
Preferably, it is possible to provide a setting interface, allow user that conventional classification is set according to the needs of oneself.Actually should With in, can be classified only according to the classification that user is arranged, so as to by the interaction with user, can preferably infer user's Hobby, is provided in other words the hobby of oneself, more direct convenience by user oneself.The classification that user is not provided with Do not consider.
When classifying to key word, classification can be in advance set, for example, can be divided into amusement according to semanteme, be learnt, be done Public affairs, leisure etc., first sort out all of key word, and key word is divided into into specific big apoplexy due to endogenous wind, and synonymous keyword is incorporated into To a class.Then, the key word of specific big apoplexy due to endogenous wind is arranged from high to low according to frequency.Again by the frequency highest of big apoplexy due to endogenous wind Key word takes out and is arranged according still further to frequency height, and the hobby of user is determined after arranging with this order.The use frequency of key word Rate height represents the specific hobby of user.
The access times of the website accessed user are counted, using the data structure of B+ trees, by network address according to domain name Hierarchical classification statistics from high to low, such as user accesses Baidu, and Baidu includes Baidupedia, Baidu's picture, Baidu's news etc., And the corresponding website of the specific content of Baidupedia belongs to the network address of highest domain name, its content for including is only the search of user Final result.The frequency that counting user browses web sites, belonging to the website of the domain name that user is browsed the highest level of appointed website The classification liked as user interest of classification, the website that will access domain name highest level arranges according to frequency height, according to row Row result infers the hobby of user.
104th, count lower key word of all categories and browse the search rate of network address, to be inferred according to the height of search rate Cellphone subscriber's hobby.
In the present embodiment, step 104 specifically includes following steps:
Count the search rate of lower key word of all categories;
Middle search rate highest key word of all categories is sorted from high to low according to frequency;
The search rate of network address is browsed under statistics is of all categories:
The network address of middle highest domain name of all categories is sorted from high to low according to frequency;
The hobby of cellphone subscriber is inferred according to two sequences.
In the present embodiment, key word that user is used and browse network address and be divided in corresponding classification, then by respective class Key word in not and browse network address and arrange from high to low according to frequency and come out, so as to the classification that user most pays close attention to can be drawn The classification least paid close attention to, with this hobby of user is inferred to.
In the present embodiment, it is preferred that methods described also includes:
According to cellphone subscriber's hobby to cellphone subscriber's recommended keywords, website or application.
According to cellphone subscriber's hobby, can be to its recommended keywords, related website or application.Can be clear for mobile phone The device developer that lookes at provides facility, using the method for the present embodiment can facilitate to user recommend conventional search keyword, can also be Recommend the website related to its hobby to user in the webpage for browsing, not only increase the function of browser, enhance mobile phone The experience of operating system and software, facilitates cellphone subscriber, and Development of Web Browser business can earn the recommendation expense of advertisement, band with this Carry out economic benefit.
In the present embodiment, it is preferred that methods described also includes:
Recommend middle search rate highest key word of all categories to cellphone subscriber;
Recommend the network address of middle search rate highest highest domain name of all categories to cellphone subscriber.
Browse to commonly used mobile phone browser to be input into certain fixed keyword or access certain mobile phone use for fixing website Family offers convenience.
Fig. 2 is referred to, the embodiment of the present invention additionally provides a kind of device of deduction cellphone subscriber's hobby, described device Including:
Reading unit 201, for reading mobile phone browser in browse log file;
Acquiring unit 202, for parsing log file is browsed, and is obtained the key word of cellphone subscriber's historical search and is browsed net Location;
Taxon 203, for the key word of historical search and browsing network address and classifying respectively;
Preferably, taxon 203 also includes class Modules, for pre-setting the classification of key word.
Preferably, it is possible to provide a setting interface, allow user that conventional classification is set according to the needs of oneself.Actually should With in, can be classified only according to the classification that user is arranged, so as to by the interaction with user, can preferably infer user's Hobby, is provided in other words the hobby of oneself, more direct convenience by user oneself.The classification that user is not provided with Do not consider.
Statistical inference unit 204, for the search rate for counting lower key word of all categories He browse network address, with according to search The height of frequency is inferring cellphone subscriber's hobby.
Preferably, the taxon, searches specifically for the key word to historical search according to semantic classification and to history Rope browses network address according to domain name hierarchical classification from high to low.
Fig. 3 is referred to, in the present embodiment, it is preferred that described device also includes:
Storage of array unit 301, for stored by the way of array it is of all categories under key word and corresponding key word Input search rate, the different key words under same category by set up array index index come labelling differentiation;
B+ trees memory element 302, for being stored by the way of B+ trees by domain name, from high to low hierarchical classification browses net Location and the corresponding search access frequency for browsing network address.
Fig. 4 is referred to, described device also includes:
Recommendation unit 401, for according to cellphone subscriber's hobby to cellphone subscriber's recommended keywords, website or application.
Detail schema in device described in method, will not be described here.
The embodiment of the present invention also provides a kind of mobile phone terminal, and the mobile phone terminal includes above-mentioned device.
The deduction method of cellphone subscriber's hobby of the present invention, device and mobile phone terminal, by reading record text is browsed Part, obtain key word and browse network address, and count lower key word of all categories and browse the search rate of network address, can effective basis The height of frequency is inferring cellphone subscriber's hobby.
Presently preferred embodiments of the present invention is the foregoing is only, not to limit the present invention, all essences in the present invention Any modification, equivalent and improvement made within god and principle etc., should be included within the scope of the present invention.

Claims (8)

1. it is a kind of infer cellphone subscriber's hobby method, it is characterised in that methods described includes:
Read in mobile phone browser and browse log file;
Parsing browses log file, obtains the key word of cellphone subscriber's historical search and browses network address;
To the key word of historical search and browse network address and classify respectively;
Count lower key word of all categories and browse the search rate of network address, by the key word middle search rate highest of all categories Key word sorts from high to low according to search rate, by the network address for browsing network address middle highest domain name of all categories according to search frequency Rate sorts from high to low, to infer cellphone subscriber's hobby according to the height of search rate;
The key word to historical search includes with browsing the step of network address is classified respectively:
To the key word of historical search according to semantic classification;
Network address is browsed according to domain name hierarchical classification from high to low to historical search.
2. the method for claim 1, it is characterised in that methods described also includes:
Stored by the way of array it is of all categories under key word and corresponding key word input search rate, under same category Different key words by set up array index index come labelling differentiation;
Store by domain name that from high to low hierarchical classification browses network address and the corresponding search for browsing network address by the way of B+ trees Access frequency.
3. method as claimed in claim 1 or 2, it is characterised in that methods described also includes:
According to cellphone subscriber's hobby to cellphone subscriber's recommended keywords, website or application.
4. the method for claim 1, it is characterised in that the log file that browses is including journal file and caching text Part.
5. it is a kind of infer cellphone subscriber's hobby device, it is characterised in that described device includes:
Reading unit, for reading mobile phone browser in browse log file;
Acquiring unit, for parsing log file is browsed, and is obtained the key word of cellphone subscriber's historical search and is browsed network address;
Taxon, for the key word of historical search and browsing network address and classifying respectively;
Statistical inference unit, it is for the search rate for counting lower key word of all categories He browse network address, the key word is all kinds of Not middle search rate highest key word sorts from high to low according to search rate, and by described network address middle highest domain of all categories is browsed The network address of name sorts from high to low according to search rate, to infer cellphone subscriber's hobby according to the height of search rate;
The taxon, network address is browsed specifically for the key word to historical search according to semantic classification and to historical search According to domain name hierarchical classification from high to low.
6. device as claimed in claim 5, it is characterised in that described device also includes:
Storage of array unit, for stored by the way of array it is of all categories under key word and the input of corresponding key word search Different key words under rope frequency, same category are distinguished by the array index index set up come labelling;
B+ tree memory element, for being stored by the way of B+ trees by domain name, from high to low hierarchical classification browses network address and correspondence The search access frequency for browsing network address.
7. the device as described in claim 5 or 6, it is characterised in that described device also includes:
Recommendation unit, for according to cellphone subscriber's hobby to cellphone subscriber's recommended keywords, website or application.
8. a kind of mobile phone terminal, it is characterised in that the mobile phone terminal includes the device described in claim 5.
CN201310573351.7A 2013-11-15 2013-11-15 A method and an apparatus for deducting interests and hobbies of handset users and a handset terminal Active CN103607496B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310573351.7A CN103607496B (en) 2013-11-15 2013-11-15 A method and an apparatus for deducting interests and hobbies of handset users and a handset terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310573351.7A CN103607496B (en) 2013-11-15 2013-11-15 A method and an apparatus for deducting interests and hobbies of handset users and a handset terminal

Publications (2)

Publication Number Publication Date
CN103607496A CN103607496A (en) 2014-02-26
CN103607496B true CN103607496B (en) 2017-04-19

Family

ID=50125696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310573351.7A Active CN103607496B (en) 2013-11-15 2013-11-15 A method and an apparatus for deducting interests and hobbies of handset users and a handset terminal

Country Status (1)

Country Link
CN (1) CN103607496B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810295A (en) * 2014-03-06 2014-05-21 北京邮电大学 Method and device for extracting internet data
CN103955464B (en) * 2014-03-25 2017-10-03 南京邮电大学 It is a kind of that the recommendation method perceived is merged based on situation
CN105095303B (en) * 2014-05-19 2021-08-31 腾讯科技(深圳)有限公司 Quick link pushing method and quick link pushing device
CN103970891B (en) * 2014-05-23 2017-08-25 三星电子(中国)研发中心 A kind of user interest information querying method based on situation
CN105095363A (en) * 2015-06-26 2015-11-25 百度在线网络技术(北京)有限公司 Invitation commenting method and device for sites
CN105653686A (en) * 2015-12-30 2016-06-08 赛尔网络有限公司 Domain name network address activeness statistics method and system
CN105791100A (en) * 2016-05-11 2016-07-20 潘成军 Chat information prompt method
WO2018023683A1 (en) * 2016-08-05 2018-02-08 吴晓敏 Usage data statistical method for point of interest capturing technology and recognition system
WO2018023684A1 (en) * 2016-08-05 2018-02-08 吴晓敏 Information pushing method during recognition of user's interests and recognition system
CN108229991B (en) * 2016-12-15 2022-04-29 北京奇虎科技有限公司 Method and device for displaying aggregation promotion information, browser and terminal equipment
CN108205555A (en) * 2016-12-19 2018-06-26 北京奇虎科技有限公司 Information recommendation method, device, browser and terminal device
CN108804431A (en) * 2017-04-26 2018-11-13 广东原昇信息科技有限公司 A kind of keyword effect analysis method based on big data
CN108595461B (en) * 2018-01-05 2021-03-16 武汉斗鱼网络科技有限公司 Interest exploration method, storage medium, electronic device and system
CN108710622B (en) * 2018-03-13 2022-12-27 星际数科科技股份有限公司 Webpage information recommendation method and system based on machine learning
CN115033146A (en) * 2022-06-29 2022-09-09 深圳市沃特沃德信息有限公司 Method and device for replacing application icon, computer equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7631032B1 (en) * 1998-01-30 2009-12-08 Net-Express, Ltd. Personalized internet interaction by adapting a page format to a user record
CN102831199A (en) * 2012-08-07 2012-12-19 北京奇虎科技有限公司 Method and device for establishing interest model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7631032B1 (en) * 1998-01-30 2009-12-08 Net-Express, Ltd. Personalized internet interaction by adapting a page format to a user record
CN102831199A (en) * 2012-08-07 2012-12-19 北京奇虎科技有限公司 Method and device for establishing interest model

Also Published As

Publication number Publication date
CN103607496A (en) 2014-02-26

Similar Documents

Publication Publication Date Title
CN103607496B (en) A method and an apparatus for deducting interests and hobbies of handset users and a handset terminal
CN103218431B (en) A kind ofly can identify the system that info web gathers automatically
CN102831199B (en) Method and device for establishing interest model
Li et al. Community detection using hierarchical clustering based on edge-weighted similarity in cloud environment
White et al. Predicting user interests from contextual information
Szomszor et al. Semantic modelling of user interests based on cross-folksonomy analysis
Agre et al. Keyword focused web crawler
US20060287988A1 (en) Keyword charaterization and application
CN111708740A (en) Mass search query log calculation analysis system based on cloud platform
CN106383887A (en) Environment-friendly news data acquisition and recommendation display method and system
CN1930566A (en) Systems and methods for search query processing using trend analysis
WO2011063035A1 (en) A method and system to contextualize information being displayed to a user
Zhao et al. Topic-centric and semantic-aware retrieval system for internet of things
Prajapati A survey paper on hyperlink-induced topic search (HITS) algorithms for web mining
Saranya et al. A personalized online news recommendation system
Papneja et al. Context aware personalized content recommendation using ontology based spreading activation
CN110955855B (en) Information interception method, device and terminal
Xu et al. Lightweight tag-aware personalized recommendation on the social web using ontological similarity
Han et al. Folksonomy-based ontological user interest profile modeling and its application in personalized search
Ramanathan et al. Creating user profiles using wikipedia
CN103823805A (en) Community-based related post recommendation system and method
Maratea et al. An heuristic approach to page recommendation in web usage mining
CN114625973A (en) Anonymous information cross-domain recommendation method and device, electronic equipment and storage medium
Li et al. A hierarchical entity-based approach to structuralize user generated content in social media: A case of Yahoo! answers
Rana et al. Analysis of web mining technology and their impact on semantic web

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant