CN105095450A - Method used for determining mobile Internet access interest point of user - Google Patents

Method used for determining mobile Internet access interest point of user Download PDF

Info

Publication number
CN105095450A
CN105095450A CN201510444508.5A CN201510444508A CN105095450A CN 105095450 A CN105095450 A CN 105095450A CN 201510444508 A CN201510444508 A CN 201510444508A CN 105095450 A CN105095450 A CN 105095450A
Authority
CN
China
Prior art keywords
url
user
classification
access
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510444508.5A
Other languages
Chinese (zh)
Inventor
袁海
嵇正鹏
袁黎轶
汪敏娟
胡仲刚
张聪
马安华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Smart Family Technology Co Ltd
Original Assignee
JIANGSU PUBLIC INFORMATION CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIANGSU PUBLIC INFORMATION CO Ltd filed Critical JIANGSU PUBLIC INFORMATION CO Ltd
Priority to CN201510444508.5A priority Critical patent/CN105095450A/en
Publication of CN105095450A publication Critical patent/CN105095450A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method used for determining mobile Internet access interest points of users. The method comprises the following steps: acquiring a mobile Internet http access log of a user from a DPI system, the http access log at least comprising a user MDN, access URLs, access time and other information; performing pre-processing on the acquired http access log of the user, performing picture browsing, software downloading, and information search URL filtering; searching in an effective URL information base, determining whether interest points exist; according to the corresponding classification of the URL in the base, determining access interest points of the user; and using the URL as a to-be-crawler classification URL to output; and performing crawler classification processing on the URL, and determining access interest points of the user. The method is advantaged by fast and functionalization.

Description

A kind of method for determining user's mobile Internet Access Interest point
Technical field
The present invention relates to wireless technology field, being specifically related to a kind of method for determining user's mobile Internet Access Interest point.
Background technology
On the one hand, universal fast along with the continuous lifting of wireless network transmissions bandwidth and intelligent terminal, user can use mobile phone terminal at any time, with accessing mobile Internet; On the other hand, in recent years, internet mass information is explosive growth, and people need spend a large amount of time to go to obtain the information oneself needed.
For allowing user find oneself interested content fast, mining analysis need be carried out to user's mobile Internet access behavior, determining the Access Interest point of user, and then carry out commending contents pointedly, to promote Consumer's Experience, adding users loyalty.
Patent " Web page representative words recommending method " (200910010713.5) and " Website content combine recommendation system and method " (200910010593.9), from the angle of relevance between webpage, only realize the analysis that accesses content to user's some websites (taking website as core) and recommendation.
Summary of the invention
For above defect or the Improvement requirement of prior art, the object of the present invention is to provide a kind of system and method determining user's mobile Internet Access Interest point, from DPI system acquisition user mobile Internet access log, after the filtration treatment to invalid daily record data, carry out retrieving, mating with effective URL information storehouse, the reptile classification of carrying out of failing to mate processes, determine user interest point, and then the precision marketing of supporting user level (taking user as core), personalized, differentiated service is provided.
Technical scheme of the present invention is: a kind of method for determining user's mobile Internet Access Interest point, comprises the steps:
Steps A, from DPI system acquisition user mobile Internet http access log, http access log at least comprise user MDN, access the information such as URL, access time;
Step B, to gather user http access log carry out pre-service, carry out the url filterings such as picture browsing, software download, information search;
Step C, to retrieve in effective URL information storehouse, judge whether to exist;
If step D exists, go to step E, otherwise go to step F1;
Step e, the classification corresponding in storehouse according to URL, determine the Access Interest point of user;
Step F 1, using this URL as treat reptile classification URL export;
Step F 2, this URL carried out to reptile classification process, determine user's Access Interest point;
Further, in described step B, in the following way pre-service is carried out to user http access log:
Step B1, filtration coupling: filter the URL with picture browsing feature, such as: * .ico, * .bmp, * .gif;
Step B2, Software match: filter the URL with software download feature, such as: * .apk, * .ipa;
Step B3, search coupling, filter the URL with information search feature, this URL comprises search engine and search key usually;
Step B4, with filter URL storehouse and compare, filter and cannot crawl the URL of content.
Further, in described step F 2, in the following way reptile classification process is carried out to URL:
Step F 21, access URL according to user, carry out reptile, obtain web page contents;
The title of step F 22, analyzing web page, metamessage and text, carry out cutting word and rejecting function word, obtain effective word of web page contents, calculate the word frequency number of effective word;
Step F 23, according to the classifying content of configuration and the existing training text content of each classification, calculate classification and represents the weight of word;
Step F 24, to compare with classifying content dictionary, represent the weight of word according to the word frequency number of effective word, classification, calculate the degree of confidence of the corresponding multiple classification of URL;
Step F 25, get the maximum classification of confidence value, be defined as user's Access Interest point, this URL and classification are added into effective URL information storehouse, and record effective word of this URL.
Therefore, the present invention can obtain following beneficial effect:
1, compared with general user behavior analysis technology, the present invention is more scientific, intelligent and robotization;
2, with patent " Web page representative words recommending method " (200910010713.5) compared with " Website content combine recommendation system and method " (200910010593.9), this patented claim achieves the internet access content analysis of user class (taking user as core).
Accompanying drawing explanation
Below in conjunction with drawings and Examples, the invention will be further described, in accompanying drawing:
Fig. 1 is the main flow schematic diagram that the present invention realizes;
Fig. 2 is http daily record data pretreatment process schematic diagram of the present invention;
Fig. 3 is reptile of the present invention classification treatment scheme schematic diagram;
Fig. 4 is the structural representation of one embodiment of the invention.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.In addition, if below in described each embodiment of the present invention involved technical characteristic do not form conflict each other and just can mutually combine.
As shown in Figure 4, the system architecture schematic diagram of user's mobile Internet Access Interest point is determined for the present invention is a kind of.In the present embodiment, comprising: data acquisition unit, URL processing unit, reptile and content processing unit, represent word upgrade and weight calculation unit, point of interest determining unit, administrative unit etc.
1, data acquisition unit
Comprise data acquisition module and data preprocessing module.
Data acquisition module: the http daily record data of accessing mobile Internet from DPI system acquisition user, http daily record at least comprises user MDN (or IMSI), the access information such as URL, access time, and send data preprocessing module.
Data preprocessing module: carry out pre-service to http daily record data, comprises the URL filtering and have picture browsing feature, software download feature, information search feature etc.; Compare with filtration URL storehouse, filter the related urls that cannot crawl content, such as: QQ farm, advertisement page; URL processing unit is sent after completing pre-service.
2, URL processing unit
Comprise effective URL matching module, URL reptile sort module and effective URL information storehouse update module.
Effective URL matching module: in the daily record data provide data acquisition unit, user accesses URL and retrieves with effective URL information storehouse, compare.If existed in effective URL information storehouse, then find out the classification that this URL is corresponding, and send point of interest determining unit; Otherwise, this URL is sent URL reptile sort module.
URL reptile sort module: to the user access logs of failing to mate in effective URL information storehouse, send reptile and content processing unit.
Effective URL information storehouse update module: the user that point of interest determining unit can be classified is accessed URL and classified information is added into effective URL information storehouse, and record effective word of this URL.
3, reptile and content processing unit
Comprise spiders module, content of pages analysis module and effective word frequency number statistical module.
Spiders module: adopt automatic method acquisition user to access the info web of URL, and send content of pages analysis module.
Content of pages analysis module: analyze the title of webpage, metamessage and text, and participle operation is carried out to content of text, removing wherein interjection, adverbial word, adjective, preposition etc. does not have the word of concrete meaning, obtains N number of effective word R=(r 1, r 2..., r n).
Effective word frequency number statistical module: add up each effective word r kthe number of times occurred in the web page with the number of times that each effective word occurs divided by effective word number N, obtain the word frequency number that each effective word occurs in webpage aSSOCIATE STATISTICS result send point of interest determining unit.
4, represent word to upgrade and weight calculation unit
Represent word weight computation module: according to formula wherein N irepresent the number of i-th whole URL of classification, n i, jrepresent in the effective word of webpage containing entry r i, juRL number.
Represent word update module: with determining that effective word of classification URL title upgrades the representative word of this classification.
5, point of interest determining unit
Comprise classification confidence computing module and user's Access Interest point determination module.
Classification confidence computing module: effective word frequency number of accessing URL webpage according to the classificating word weight calculated and user, calculates the degree of confidence of this URL each classification corresponding.Concrete grammar is as follows:
Effective word R and i-th the representative word C classified of webpage icommon factor use D irepresent, i.e. D i=R ∩ C i, then matrix D is expressed as follows:
D = d 1 , 1 d 1 , 2 ... d 1 , q 1 d 2 , 1 d 2 , 2 ... d 2 , q 2 · · · ... ... · · · d M , 1 d M , 2 ... d M , q M
According to the word frequency number of the effective word of webpage of statistics, the word frequency matrix number corresponding with matrix D can be determined, represent with α:
α = α 1 , 1 α 1 , 2 ... α 1 , q 1 α 2 , 1 α 2 , 2 ... α 2 , q 2 · · · ... ... · · · α M , 1 α M , 2 ... α M , q M
Equally, represent the weight of word according to the classifying content calculated, the weight matrix corresponding with matrix D can be determined, represent with β:
β = β 1 , 1 β 1 , 2 ... β 1 , q 1 β 2 , 1 β 2 , 2 ... β 2 , q 2 · · · ... ... · · · β M , 1 β M , 2 ... β M , q M
D i, jfor effective word of user's accessed web page, and occur in the representative word of the classifying content.α i, jbe worth larger, d is described i, jmore can represent this webpage; β i, jbe worth larger, d is described i, jmore can distinguish other classification.The degree of confidence of corresponding i-th classification of this URL is η ibe worth larger, illustrate the web page contents of this URL and i-th relation of classifying stronger.
User's Access Interest point determination module: according to the result of calculation of the degree of confidence of URL each classification corresponding, determine η ithe I value of=max (η), namely the title of I classification is confirmed as user's Access Interest point.
5, administrative unit
Comprise classifying content maintenance module and filter URL storehouse maintenance module.
Classifying content maintenance module: according to everyday knowledge, pre-sets classifying content, and classification can be configured to multistage.Such as:
Filter URL storehouse maintenance module: configuration cannot crawl the URL blacklist of content.
Those skilled in the art will readily understand; the foregoing is only preferred embodiment of the present invention; not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims (3)

1., for determining a method for user's mobile Internet Access Interest point, it is characterized in that, comprise the steps:
Steps A, from DPI system acquisition user mobile Internet http access log, http access log at least comprise user MDN, access URL, access time information;
Step B, to gather user http access log carry out pre-service, carry out the url filterings such as picture browsing, software download, information search;
Step C, to retrieve in effective URL information storehouse, judge whether to exist;
If step D exists, go to step E, otherwise go to step F1;
Step e, the classification corresponding in storehouse according to URL, determine the Access Interest point of user;
Step F 1, using this URL as treat reptile classification URL export;
Step F 2, this URL carried out to reptile classification process, determine user's Access Interest point.
2. for determining a method for user's mobile Internet Access Interest point, it is characterized in that, in described step B, in the following way pre-service being carried out to user http access log:
Step B1, filtration coupling: filter the URL with picture browsing feature, such as: * .ico, * .bmp, * .gif;
Step B2, Software match: filter the URL with software download feature, such as: * .apk, * .ipa;
Step B3, search coupling, filter the URL with information search feature, this URL comprises search engine and search key usually;
Step B4, with filter URL storehouse and compare, filter and cannot crawl the URL of content.
3. for determining a method for user's mobile Internet Access Interest point, it is characterized in that, in described step F 2, in the following way reptile classification process being carried out to URL:
Step F 21, access URL according to user, carry out reptile, obtain web page contents;
The title of step F 22, analyzing web page, metamessage and text, carry out cutting word and rejecting function word, obtain effective word of web page contents, calculate the word frequency number of effective word;
Step F 23, according to the classifying content of configuration and the existing training text content of each classification, calculate classification and represents the weight of word;
Step F 24, to compare with classifying content dictionary, represent the weight of word according to the word frequency number of effective word, classification, calculate the degree of confidence of the corresponding multiple classification of URL;
Step F 25, get the maximum classification of confidence value, be defined as user's Access Interest point, this URL and classification are added into effective URL information storehouse, and record effective word of this URL.
CN201510444508.5A 2015-07-24 2015-07-24 Method used for determining mobile Internet access interest point of user Pending CN105095450A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510444508.5A CN105095450A (en) 2015-07-24 2015-07-24 Method used for determining mobile Internet access interest point of user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510444508.5A CN105095450A (en) 2015-07-24 2015-07-24 Method used for determining mobile Internet access interest point of user

Publications (1)

Publication Number Publication Date
CN105095450A true CN105095450A (en) 2015-11-25

Family

ID=54575887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510444508.5A Pending CN105095450A (en) 2015-07-24 2015-07-24 Method used for determining mobile Internet access interest point of user

Country Status (1)

Country Link
CN (1) CN105095450A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202312A (en) * 2016-07-01 2016-12-07 江苏省公用信息有限公司 A kind of interest point search method for mobile Internet and system
CN106933883A (en) * 2015-12-31 2017-07-07 中移(苏州)软件技术有限公司 Point of interest Ordinary search word sorting technique, device based on retrieval daily record
CN107590169A (en) * 2017-04-14 2018-01-16 南方科技大学 A kind of preprocess method and system of carrier gateway data
CN110334321A (en) * 2019-06-24 2019-10-15 天津城建大学 A kind of city area Gui Jiaozhan identification of function method based on interest point data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101493832A (en) * 2009-03-06 2009-07-29 辽宁般若网络科技有限公司 Website content combine recommendation system and method
CN101499091A (en) * 2009-03-17 2009-08-05 辽宁般若网络科技有限公司 Web page representative words recommending method
CN104573021A (en) * 2015-01-12 2015-04-29 浪潮软件集团有限公司 Method for analyzing internet behaviors

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101493832A (en) * 2009-03-06 2009-07-29 辽宁般若网络科技有限公司 Website content combine recommendation system and method
CN101499091A (en) * 2009-03-17 2009-08-05 辽宁般若网络科技有限公司 Web page representative words recommending method
CN104573021A (en) * 2015-01-12 2015-04-29 浪潮软件集团有限公司 Method for analyzing internet behaviors

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
周岳: "基于兴趣分类的用户行为分析系统的研究与设计", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
张宇等: "基于URL主题的查询分类方法", 《计算机研究与发展》 *
肖艳炜: "Web访问行为分析及其在搜索引擎精准营销中的应用", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106933883A (en) * 2015-12-31 2017-07-07 中移(苏州)软件技术有限公司 Point of interest Ordinary search word sorting technique, device based on retrieval daily record
CN106933883B (en) * 2015-12-31 2019-12-27 中移(苏州)软件技术有限公司 Method and device for classifying common search terms of interest points based on search logs
CN106202312A (en) * 2016-07-01 2016-12-07 江苏省公用信息有限公司 A kind of interest point search method for mobile Internet and system
CN107590169A (en) * 2017-04-14 2018-01-16 南方科技大学 A kind of preprocess method and system of carrier gateway data
CN107590169B (en) * 2017-04-14 2020-03-06 南方科技大学 Operator gateway data preprocessing method and system
CN110334321A (en) * 2019-06-24 2019-10-15 天津城建大学 A kind of city area Gui Jiaozhan identification of function method based on interest point data
CN110334321B (en) * 2019-06-24 2023-03-31 天津城建大学 City rail transit station area function identification method based on interest point data

Similar Documents

Publication Publication Date Title
US11847612B2 (en) Social media profiling for one or more authors using one or more social media platforms
CA2865187C (en) Method and system relating to salient content extraction for electronic content
CN103218431B (en) A kind ofly can identify the system that info web gathers automatically
CN109376273B (en) Enterprise information map construction method, enterprise information map construction device, computer equipment and storage medium
US10083222B1 (en) Automated categorization of web pages
CN104008109A (en) User interest based Web information push service system
CN102456054B (en) A kind of searching method and system
CN102831199A (en) Method and device for establishing interest model
CN102622375A (en) Intelligent matching system and method for third-party lawyer recommendations
CN103617266A (en) Personalized extension search method, device and system
CN108572990A (en) Information-pushing method and device
CN105095450A (en) Method used for determining mobile Internet access interest point of user
WO2017114282A1 (en) Information search device and method, search server and machine-readable storage medium
CN102315953A (en) Method and device for detecting junk posts based on occurrence rule of posts
US20150347503A1 (en) Multi-domain query completion
CN106021418A (en) News event clustering method and device
CN108228760A (en) Method, apparatus, mobile terminal and the storage medium of filtering sensitive words
CN111447575B (en) Short message pushing method, device, equipment and storage medium
CN105512300A (en) Information filtering method and system
CN106202312B (en) A kind of interest point search method and system for mobile Internet
CN104572719A (en) Information collecting method and device
CN105824884A (en) User internet surfing information processing method and device
CN111324725B (en) Topic acquisition method, terminal and computer readable storage medium
CN109064067B (en) Financial risk operation subject determination method and device based on Internet
CN105491136A (en) Message sending method and apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20190321

Address after: 210006 Tongyu Building, 501 Zhongshan South Road, Nanjing City, Jiangsu Province

Applicant after: Tianyi Smart Family Technology Co., Ltd.

Address before: 210008 No. 260 Central Road, Xuanwu District, Nanjing City, Jiangsu Province 1901

Applicant before: Jiangsu Public Information Co., Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20151125

RJ01 Rejection of invention patent application after publication