CN104462245A - User Internet surfing preference data recognition method - Google Patents

User Internet surfing preference data recognition method Download PDF

Info

Publication number
CN104462245A
CN104462245A CN201410664717.6A CN201410664717A CN104462245A CN 104462245 A CN104462245 A CN 104462245A CN 201410664717 A CN201410664717 A CN 201410664717A CN 104462245 A CN104462245 A CN 104462245A
Authority
CN
China
Prior art keywords
index
preference
application
user
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410664717.6A
Other languages
Chinese (zh)
Other versions
CN104462245B (en
Inventor
刘雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Yaxin Software Co. Ltd.
Original Assignee
Asialnfo Technology (nanjing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Asialnfo Technology (nanjing) Co Ltd filed Critical Asialnfo Technology (nanjing) Co Ltd
Priority to CN201410664717.6A priority Critical patent/CN104462245B/en
Publication of CN104462245A publication Critical patent/CN104462245A/en
Application granted granted Critical
Publication of CN104462245B publication Critical patent/CN104462245B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a user Internet surfing preference data recognition method. User preferences and preference degrees are recognized though the existing median concept and the H-index algorithm according to the user Internet surfing behavior characteristics, and the data recognition accuracy and recognition efficiency are improved. The method includes the steps that firstly, obtained user Internet surfing behavior log data are gathered respectively according to different applications, then the index types of the applications are appointed, the maximum value and the minimum value of each index are obtained, variable coefficients are worked out through the preference mining algorithm according to the maximum values and the minimum values, the different index values of the different applications are standardized, the preference degrees of the different applications are calculated according to the standard value, the applications are ranked according to the preference degrees, and the preference degree of the application with the intermediate preference degree is selected from the ranked applications to serve as a median; the applications with the preference degrees greater than the median are added to the preference degree rank, and a user is tagged with the preference degrees according to the H-index algorithm.

Description

A kind of user surfs the Net preference data recognition methods
Technical field
The present invention relates to data mining technology, particularly a kind of can be used for, to surf the Net preference data recognition methods based on surf the Net user that preference carries out marketing objectives excavation of user.
Background technology
Classic method, when preference of surfing the Net based on user carries out marketing objectives excavation, usually by the daily record of user's internet behavior, stamps preference label in the mode of " rule restriction " to user from obtaining user base information through divided data warehouse.
Such as: music site preference, the user that is greater than 10 times by surfing Internet with cell phone access music site number of times is just music site preferences user this month; Miaow cluck music preferences, this month, uses miaow cluck music client end to be greater than the user of 5 times for miaow cluck music preferences user by mobile phone.
Based on the mode of " rule limits ", traditional database identify in the preference process of user, preference label stamped in record (example: the user that net access music site number of times is greater than 10 times) according to user mobile phone online, higher identification precision cannot be ensured, lack algorithm that is unified, standard, degree of accuracy cannot ensure.
Summary of the invention
The technical problem to be solved in the present invention is: by utilizing existing median concept, H-index algorithm, according to the internet behavior feature of user, carry out preference, preference identification to the preference of user, improves accuracy and the recognition efficiency of data identification.
The technical scheme that the present invention takes is specially: user surfs the Net preference data recognition methods, comprises the following steps:
1) obtain user internet behavior daily record data, described internet behavior data comprise application that user uses, the content of access, the period of online, the position data of online;
2) user's internet behavior daily record data is gathered respectively according to different application;
3) pointer type that designated user accesses each application comprises access times, flow, visitation frequency, takes out maximal value and the minimum value of each index from the combined data that variant application is corresponding respectively;
4) for the different indexs of variant application, desired value standardization is carried out according to maxima and minima respectively: the maximal value defining certain index is a_max, and minimum value is a_min, and standardized index value is index, then the standardized index value of this index is:
Index=(currency-a_min)/(a_max-a_min);
5) for variant application, obtaining step 4) in the weight weight of standardized index value corresponding to each index of obtaining, and calculate the preference score of the comprehensive all indexs of each application according to standardized desired value and respective weights:
score(n)=index(1)*weight(1)+index(2)*weight(2)+index(3)*weight(3)...index(n)*weight(n)
In formula: n represents the quantity of index; Index (n) and weight (n) represents standardized index value and the weighted value of the n-th index respectively;
6) according to preference numerical value height, each application is sorted;
7) from each application after sequence, preference application preferences degree placed in the middle is chosen as median;
8) preference of each application and median are compared, then abandon data corresponding to this application as an application preferences degree is less than median; Otherwise this application is added in preference rank;
9) preference label is stamped according to H-index algorithm to user:
When the rank <=of user preference degree uses the number of users * 5/6 of this application, and the rank > of user preference degree uses number of users * 4/6 for general preference application;
When the rank <=of user preference degree uses number of users * 4/6, and the rank > of user preference degree uses number of users * 2/6 to be the application of strong preference.
In the present invention, user's internet log can obtain from server, as access times, flow, these leading indicators of visitation frequency etc., these data is carried out gathering in input preference mining model of the present invention, mode input reference table 1:
Table 1
Step 4) in, preference mining algorithm is existing algorithm, and its algorithm principle is: the score weight determining each index in variant application, and this weighted value can be obtained by empirical value, also major component expression formula coefficient can be drawn according to existing principal component analysis (PCA) modeling, as index score weight.When setting up preference mining algorithm model, first write the raw data of corresponding different application as different matrix, the achievement data that the data element in matrix is namely corresponding with application.P the index of raw data matrix X needs certain correlativity, and is positive correlation, if be negative correlation, then transforms accordingly.
When calculating maximal value and the minimum value of each index corresponding to variant application, use existing minimax method of standardizing to carry out standardization to achievement data, obtain standardized index index, standardized specific formula for calculation is:
If certain index maximal value of certain application is a_max, the minimum value of index is a_min, then standardized value: index=(currency-a_min)/(a_max-a_min), wherein currency and current desired value to be processed; Calculate channel preference score score; According to application preferences score, user's contact channel preference is stamped preference label to user, as: when the rank > of user preference degree uses number of users * 5/6 to be the application of weak preference;
When the rank <=of user preference degree uses number of users * 5/6, and the rank > of user preference degree uses number of users * 4/6 for general preference application;
When the rank <=of user preference degree uses number of users * 4/6, and the rank > of user preference degree uses number of users * 2/6 to be the application of strong preference.
The result that preference mining model of the present invention exports can reference table 2:
Contain corresponding different time, diverse location in table 2, user is the application of preference and application preferences grade rank comparatively, and the efficiency of data mining is higher, can greatly facilitate the carrying out of follow-up marketing.
Beneficial effect of the present invention is: achieve and trained off the transformation of label to preference mining algorithm by the method for " rule limits " to user, simultaneously by carrying out the acquisition of standardized index value and weighted value in preference mining algorithm to different index, achieve the transformation being calculated distributed type assemblies calculating by unit, solve the problem of preference identification precision.
Accompanying drawing explanation
Figure 1 shows that schematic flow sheet of the present invention.
Embodiment
Further illustrate below in conjunction with the drawings and specific embodiments.
Shown in composition graphs 1, user surfs the Net preference data recognition methods, comprises the following steps:
1) obtain user's internet behavior daily record data, described internet behavior data comprise:
2) user's internet behavior daily record data is gathered respectively according to different application;
3) pointer type that designated user accesses each application comprises access times, flow, visitation frequency, takes out maximal value and the minimum value of each index from the combined data that variant application is corresponding respectively;
4) for the different indexs of variant application, desired value standardization is carried out according to maxima and minima respectively: the maximal value defining certain index is a_max, and minimum value is a_min, and standardized index value is index, then the standardized index value of this index is:
Index=(currency-a_min)/(a_max-a_min);
5) for variant application, obtaining step 4) in the weight weight of standardized index value corresponding to each index of obtaining, and calculate the preference score of each application according to standardized desired value and respective weights:
score(n)=index(1)*weight(1)+index(2)*weight(2)+index(3)*weight(3)...index(n)*weight(n)
In formula: n represents the kind quantity of index; Index (n) and weight (n) represents standardized index value and the weighted value of the n-th class index respectively;
6) according to preference numerical value height, each application is sorted;
7) from each application after sequence, preference application preferences degree placed in the middle is chosen as median;
8) preference of each application and median are compared, then abandon data corresponding to this application as an application preferences degree is less than median; Otherwise this application is added in preference rank;
9) preference label is stamped according to H-index algorithm to user:
When the rank <=of user preference degree uses number of users * 5/6, and the rank > of user preference degree uses number of users * 4/6 for general preference application;
When the rank <=of user preference degree uses number of users * 4/6, and the rank > of user preference degree uses number of users * 2/6 to be the application of strong preference.
In the present invention, user's internet log can obtain from server, and except access times, flow, these leading indicators of visitation frequency, the pointer type that user accesses each application also can comprise.These data are carried out gather in input preference mining model of the present invention, mode input reference table 1:
Table 1
Step 4) in, preference mining algorithm is existing algorithm, its algorithm principle is: determine the sub-index score weight in each application class index, empirical value can be provided by business personnel, there are needs also can draw major component expression formula coefficient according to existing principal component analysis (PCA) modeling, as index score weight.When setting up preference mining algorithm model, first write raw data as matrix.Attention: p the index of raw data matrix X needs certain contact, and be positive correlation (if be negative correlation, needing to transform accordingly).When calculating maximal value and the minimum value of each index corresponding to variant application, use existing minimax method of standardizing to carry out standardization to achievement data, obtain standardized index index, standardized specific formula for calculation is:
If certain index maximal value of certain application is a_max, the minimum value of index is a_min, then standardized value: index=(currency-a_min)/(a_max-a_min); Calculate channel preference score score; According to application preferences score, user's contact channel preference is stamped preference label to user.
The result that preference mining model of the present invention exports can reference table 2:
Table 2
Contain corresponding different time, diverse location in table 2, user is the application of preference and application preferences grade rank comparatively, and the efficiency of data mining is higher, can greatly facilitate the carrying out of follow-up marketing.
The present invention, by utilizing existing median concept, H-index algorithm, according to the internet behavior feature of user, carries out preference, preference identification to the preference of user, improves accuracy and the recognition efficiency of data identification.Achieve and trained off the transformation of label to preference mining algorithm by the method for " rule limits " to user, calculated the transformation of distributed type assemblies calculating by unit, solve the problem of preference identification precision.

Claims (1)

1. user surfs the Net a preference data recognition methods, it is characterized in that, comprises the following steps:
1) obtain user internet behavior daily record data, described internet behavior data comprise application that user uses, the content of access, the period of online, the position data of online;
2) user's internet behavior daily record data is gathered respectively according to different application;
3) pointer type that designated user accesses each application comprises access times, flow, visitation frequency, takes out maximal value and the minimum value of each index from the combined data that variant application is corresponding respectively;
4) for the different indexs of variant application, desired value standardization is carried out according to maxima and minima respectively: the maximal value defining certain index is a_max, and minimum value is a_min, and standardized index value is index, then the standardized index value of this index is:
Index=(currency-a_min)/(a_max-a_min);
5) for variant application, obtaining step 4) in the weight weight of standardized index value corresponding to each index of obtaining, and calculate the preference score of each application according to standardized desired value and respective weights:
score(n)=index(1)*weight(1)+index(2)*weight(2)+index(3)*weight(3)...index(n)*weight(n)
In formula: n represents the kind quantity of index; Index (n) and weight (n) represents standardized index value and the weighted value of the n-th class index respectively;
6) according to preference numerical value height, each application is sorted;
7) from each application after sequence, preference application preferences degree placed in the middle is chosen as median;
8) preference of each application and median are compared, then abandon data corresponding to this application as an application preferences degree is less than median; Otherwise this application is added in preference rank;
9) preference label is stamped according to H-index algorithm to user:
When the rank <=of user preference degree uses number of users * 5/6, and the rank > of user preference degree uses number of users * 4/6 for general preference application;
When the rank <=of user preference degree uses number of users * 4/6, and the rank > of user preference degree uses number of users * 2/6 to be the application of strong preference.
CN201410664717.6A 2014-11-19 2014-11-19 A kind of user's online preference data recognition methods Active CN104462245B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410664717.6A CN104462245B (en) 2014-11-19 2014-11-19 A kind of user's online preference data recognition methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410664717.6A CN104462245B (en) 2014-11-19 2014-11-19 A kind of user's online preference data recognition methods

Publications (2)

Publication Number Publication Date
CN104462245A true CN104462245A (en) 2015-03-25
CN104462245B CN104462245B (en) 2017-09-05

Family

ID=52908281

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410664717.6A Active CN104462245B (en) 2014-11-19 2014-11-19 A kind of user's online preference data recognition methods

Country Status (1)

Country Link
CN (1) CN104462245B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677925A (en) * 2016-03-30 2016-06-15 北京京东尚科信息技术有限公司 Method and device for processing user data in database
CN106066864A (en) * 2016-05-27 2016-11-02 重庆邮电大学 A kind of various dimensions mobile subscriber's preference dynamic identifying method
CN106504099A (en) * 2015-09-07 2017-03-15 国家计算机网络与信息安全管理中心 A kind of system for building user's portrait
CN107066512A (en) * 2017-01-23 2017-08-18 重庆邮电大学 A kind of user preference appraisal procedure and system based on Hadoop
WO2018023673A1 (en) * 2016-08-05 2018-02-08 吴晓敏 Method for recognizing user's interests on basis of site and recognition system
WO2018023672A1 (en) * 2016-08-05 2018-02-08 吴晓敏 Information pushing method during matching of site and user's interest and recognition system
CN107948015A (en) * 2017-11-29 2018-04-20 中国联合网络通信集团有限公司 A kind of Analysis on Quality of Service method, apparatus and network system
WO2018157818A1 (en) * 2017-03-02 2018-09-07 广州市动景计算机科技有限公司 Method and apparatus for inferring preference of user, terminal device, and storage medium
CN109840795A (en) * 2017-11-29 2019-06-04 北京京东尚科信息技术有限公司 Information generating method and device
CN110110176A (en) * 2018-02-01 2019-08-09 新奥科技发展有限公司 A kind of data display method and device
CN110717101A (en) * 2019-09-30 2020-01-21 上海淇玥信息技术有限公司 User classification method and device based on application behaviors and electronic equipment
CN112291622A (en) * 2020-10-30 2021-01-29 中国建设银行股份有限公司 Method and device for determining favorite internet surfing time period of user
CN112328644A (en) * 2020-10-12 2021-02-05 联通智网科技有限公司 Application preference degree generation method and device, storage medium and computer equipment
CN112398751A (en) * 2020-10-12 2021-02-23 联通智网科技有限公司 Flow speed control method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101105815A (en) * 2007-09-06 2008-01-16 腾讯科技(深圳)有限公司 Internet music file sequencing method, system and search method and search engine
US20120290441A1 (en) * 2011-05-09 2012-11-15 Google Inc. Using Application Market Log Data To Identify Applications Of Interest
CN102890689A (en) * 2011-07-22 2013-01-23 北京百度网讯科技有限公司 Method and system for building user interest model
US8452797B1 (en) * 2011-03-09 2013-05-28 Amazon Technologies, Inc. Personalized recommendations based on item usage

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101105815A (en) * 2007-09-06 2008-01-16 腾讯科技(深圳)有限公司 Internet music file sequencing method, system and search method and search engine
US8452797B1 (en) * 2011-03-09 2013-05-28 Amazon Technologies, Inc. Personalized recommendations based on item usage
US20120290441A1 (en) * 2011-05-09 2012-11-15 Google Inc. Using Application Market Log Data To Identify Applications Of Interest
CN102890689A (en) * 2011-07-22 2013-01-23 北京百度网讯科技有限公司 Method and system for building user interest model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐泽水: "部分权重信息下多目标决策方法研究", 《系统工程理论与实践》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106504099A (en) * 2015-09-07 2017-03-15 国家计算机网络与信息安全管理中心 A kind of system for building user's portrait
CN105677925B (en) * 2016-03-30 2021-10-15 北京京东尚科信息技术有限公司 Database user data processing method and device
CN105677925A (en) * 2016-03-30 2016-06-15 北京京东尚科信息技术有限公司 Method and device for processing user data in database
CN106066864B (en) * 2016-05-27 2019-04-30 重庆邮电大学 A kind of various dimensions mobile subscriber preference dynamic identifying method
CN106066864A (en) * 2016-05-27 2016-11-02 重庆邮电大学 A kind of various dimensions mobile subscriber's preference dynamic identifying method
WO2018023673A1 (en) * 2016-08-05 2018-02-08 吴晓敏 Method for recognizing user's interests on basis of site and recognition system
WO2018023672A1 (en) * 2016-08-05 2018-02-08 吴晓敏 Information pushing method during matching of site and user's interest and recognition system
CN107066512A (en) * 2017-01-23 2017-08-18 重庆邮电大学 A kind of user preference appraisal procedure and system based on Hadoop
WO2018157818A1 (en) * 2017-03-02 2018-09-07 广州市动景计算机科技有限公司 Method and apparatus for inferring preference of user, terminal device, and storage medium
CN107948015A (en) * 2017-11-29 2018-04-20 中国联合网络通信集团有限公司 A kind of Analysis on Quality of Service method, apparatus and network system
CN109840795A (en) * 2017-11-29 2019-06-04 北京京东尚科信息技术有限公司 Information generating method and device
CN110110176A (en) * 2018-02-01 2019-08-09 新奥科技发展有限公司 A kind of data display method and device
CN110717101A (en) * 2019-09-30 2020-01-21 上海淇玥信息技术有限公司 User classification method and device based on application behaviors and electronic equipment
CN112328644A (en) * 2020-10-12 2021-02-05 联通智网科技有限公司 Application preference degree generation method and device, storage medium and computer equipment
CN112398751A (en) * 2020-10-12 2021-02-23 联通智网科技有限公司 Flow speed control method and device, computer equipment and storage medium
CN112291622A (en) * 2020-10-30 2021-01-29 中国建设银行股份有限公司 Method and device for determining favorite internet surfing time period of user
CN112291622B (en) * 2020-10-30 2022-05-27 中国建设银行股份有限公司 Method and device for determining favorite internet surfing time period of user

Also Published As

Publication number Publication date
CN104462245B (en) 2017-09-05

Similar Documents

Publication Publication Date Title
CN104462245A (en) User Internet surfing preference data recognition method
CN103106259B (en) A kind of mobile webpage content recommendation method based on situation
CN103106285B (en) Recommendation algorithm based on information security professional social network platform
CN109558530A (en) User&#39;s portrait automatic generation method and system based on data processing
CN104933622A (en) Microblog popularity degree prediction method based on user and microblog theme and microblog popularity degree prediction system based on user and microblog theme
CN103295145A (en) Mobile phone advertising method based on user consumption feature vector
CN112104642B (en) Abnormal account number determination method and related device
CN103646070A (en) Data processing method and device for search engine
CN105874753A (en) Systems and methods for behavioral segmentation of users in a social data network
CN109446328A (en) A kind of text recognition method, device and its storage medium
JP2013511085A (en) Search method and system
CN106126626B (en) A kind of book retrieval method based on feature extraction
CN103559619A (en) Response method and system for garment size information
CN109033281B (en) Intelligent pushing system of knowledge resource library
CN103970891A (en) Method for inquiring user interest information based on context
CN103198098A (en) Network information transfer method and device
CN103092348A (en) Mobile terminal advertisement playing method based on user behavior
CN108664515A (en) A kind of searching method and device, electronic equipment
CN103020141A (en) Method and equipment for providing searching results
CN110019763B (en) Text filtering method, system, equipment and computer readable storage medium
CN104793925A (en) Microblog function allocating method and device
Weiß Fully observed INAR (1) processes
CN103257976A (en) Display method and device of data objects
CN103514237B (en) A kind of method and system obtaining user and Document personalization feature
CN107818144A (en) A kind of method that multi-data source data are integrated based on Solr

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
C41 Transfer of patent application or patent right or utility model
CB03 Change of inventor or designer information

Inventor after: Liu Lei

Inventor after: Feng Xianhong

Inventor before: Liu Lei

COR Change of bibliographic data
TA01 Transfer of patent application right

Effective date of registration: 20161205

Address after: 210013 Jiangsu, Nanjing, Yuhuatai District, software, road, No. 02, building 201, 180

Applicant after: Nanjing Yaxin Software Co. Ltd.

Address before: 210013 Jiangsu city of Nanjing province dinghuai Gate No. 12 Building No. 16

Applicant before: Asialnfo Technology (Nanjing) Co., Ltd.

GR01 Patent grant
GR01 Patent grant