CN102143224A - Mobile phone Internet accessing-based user behavior analysis method and device - Google Patents

Mobile phone Internet accessing-based user behavior analysis method and device Download PDF

Info

Publication number
CN102143224A
CN102143224A CN2011100319009A CN201110031900A CN102143224A CN 102143224 A CN102143224 A CN 102143224A CN 2011100319009 A CN2011100319009 A CN 2011100319009A CN 201110031900 A CN201110031900 A CN 201110031900A CN 102143224 A CN102143224 A CN 102143224A
Authority
CN
China
Prior art keywords
url data
website
classification
user
described url
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011100319009A
Other languages
Chinese (zh)
Inventor
程皓
李培华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JIANG JIELI
Original Assignee
JIANG JIELI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIANG JIELI filed Critical JIANG JIELI
Priority to CN2011100319009A priority Critical patent/CN102143224A/en
Publication of CN102143224A publication Critical patent/CN102143224A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a mobile phone Internet accessing-based user behavior analysis method and a wireless application protocol-based user behavior analysis device. The method of one embodiment of the invention comprises the following steps of: acquiring uniform resource locator (URL) data for the Internet accessing of a user; and performing multilevel identification and tagging the URL data, and counting tags to analyze the characteristics of user behaviors. By the method provided by the invention, the mobile phone Internet accessing user behaviors can be deeply and accurately analyzed, and personalized services can be provided for specific users by associating the characteristics of the user behaviors with the services so as to improve user experiences.

Description

Analytical method and device based on the surfing Internet with cell phone user behavior
Technical field
The present invention relates to the technology relevant, relate to analytical method and device especially based on the surfing Internet with cell phone user behavior with surfing Internet with cell phone.
Background technology
Along with development of science and technology, increasing people has had mobile phone.And, use the user of surfing Internet with cell phone more and more along with the raising of wireless Internet access speed.But at mobile Internet user's behavioural analysis, can only accomplish the URL(uniform resource locator) (URL) of part website is simply mated at present, cause user behavior coupling precision on the low side, and statistical means be single.The analytical technology of this simple coupling URL can not realize analyzing the access websites behavior of particular user, can not analyze user's custom and hobby.Therefore these data have lost very big utilizability and analyticity to business statistics.
Therefore, need a kind of improved analytical method, its accurately analysis user behavior based on the surfing Internet with cell phone user behavior.
Summary of the invention
Consider the problem of above-mentioned existence, one of purpose of the present invention is to provide a kind of method of analyzing surfing Internet with cell phone user's behavior based on url data.Method according to an embodiment of the invention comprises: the uniform resource position mark URL data of obtaining user's online; Described url data is carried out multistage identification and markup tags, and the label of mark is added up, with the feature of analysis user behavior.
Another object of the present invention is to provide a kind of analyzer based on the surfing Internet with cell phone user behavior.Analyzer according to an embodiment of the invention comprises: deriving means is used to obtain the uniform resource position mark URL data of user's online; Recognition device is used for described url data is carried out multistage identification and markup tags; And the statistical analysis device, be used for the label of mark is added up, with the feature of analysis user behavior.
According to embodiments of the invention, can degree of depth ground Accurate Analysis surfing Internet with cell phone user behavior.Can release personalized service to particular user by with user behavior feature and business association, improve user experience.In addition, can excavate potential user by user behavior feature and professional association analysis, colony extends one's service.
Description of drawings
By shown embodiment in conjunction with the accompanying drawings is elaborated, above-mentioned and other features of the present invention will be more obvious, and in the accompanying drawings, identical label is represented same or analogous parts.In the accompanying drawings,
Fig. 1 shows the flow chart of the analytical method based on the surfing Internet with cell phone user behavior according to an embodiment of the invention.
Fig. 2 shows the flow chart that carries out the process of website identification according to url data according to an embodiment of the invention.
Fig. 3 shows the flow chart that the url data according to user's online according to an embodiment of the invention carries out the process of subject classification identification.
Fig. 4 shows the flow chart that the url data according to user's online according to an embodiment of the invention carries out the process of classifying content identification.
Fig. 5 shows the flow chart of the process of the url data identify customer end according to user online according to an embodiment of the invention.
Fig. 6 shows the classification tree of Kongzhong of portal website.
Fig. 7 shows a webpage of exemplary extracting.
Fig. 8 shows the tree-like storehouse of feature of exemplary QQ client.
Fig. 9 shows the flow chart based on the analytical method of surfing Internet with cell phone user behavior according to another embodiment of the invention.
Figure 10 shows the flow chart based on the analytical method of surfing Internet with cell phone user behavior according to still another embodiment of the invention.
Figure 11 shows the structure chart of the analyzer based on the surfing Internet with cell phone user behavior according to an embodiment of the invention.
Embodiment
Hereinafter, will by execution mode the present invention's be used to align method and apparatus of speech data be described in detail with reference to the accompanying drawings.
Fig. 1 shows the flow chart of the analytical method 100 based on the surfing Internet with cell phone user behavior according to an embodiment of the invention.
After method begins, enter step S110.In step S110, obtain the uniform resource position mark URL data of cellphone subscriber's online.Described url data comprises URL address or its partial data, and the data that send with the URL address in when online of mobile phone.
Method 100 proceeds to step S120 then.In step S120, described url data is carried out multistage identification and markup tags.In an embodiment of the present invention, can discern the website of user capture according to this url data, and the classification under the webpage of identification cellphone subscriber visit.Further, can also discern the particular content of the page of user capture.Preferably, can also discern user's employed client of surfing the Net.Then, correspondingly mark site tags, classification scheme label, content tab and/or client tag.To specifically introduce website identification, classification scheme identification and client identifying with reference to figure 2-4 respectively after a while.
Then, method 100 proceeds to step S130.In step S130, the label of mark is added up, with the feature of analysis user behavior.At concrete phone number, add up the number of times and/or the time period of the label appearance of institute's mark, can analyze user's behavioural characteristic.For example, can statistical separate out the website that the user likes according to the site tags of mark.According to the classification scheme label of mark, can statistical separate out user's preference and interest place, whether be music-lover person, recreation preference person, e-book reading fan or the like for example.According to the content tab of mark, further statistical is separated out storyteller's fan of song fan that whether user is certain singer (for example Zhou Jielun), certain game lover, certain this popular book or the like.Thereby the analysis to user behavior can be as accurate as specific e-book or specific music, or the like.In addition,, can know the client mailbox that the user uses, thereby understand the whether senior fan of certain classification of user auxiliaryly according to the client tag of mark.For example, use the user of private client more senior usually than the user who uses generic browser.
Then, method 100 finishes.
Should be appreciated that said method only is exemplary, it can also comprise more step.For example, preferably, method 100 can also comprise the steps: the feature according to the user behavior of being analyzed, the similar user of polymerization.By after a large amount of cellphone subscribers being carried out above-mentioned analysis and drawing its user behavior feature, the user that identical or similar hobby are arranged can be referred to together, aggregate into similar user.For example, the user is divided into music preferences group, recreation preference group, speculation in stocks preference group or the like.Alternatively, can also the preference group of particular category further be excavated, analyze customer group of liking specific project or the like.Can pre-determine the feature of customer group and it is set up warehouse-in, to set up the customer group feature database.Preferably, can dynamic collection customer group feature, and dynamically update and maintenance customer's group character storehouse.So, can analyze the user group who belongs to particular group by mating at the various labels and the customer group feature database of user's statistics, for example analyze the user group who likes Zhou Jielun.
The method according to this invention can be obtained accurate user behavior feature.Thereby, can release personalized service to particular user by with user behavior feature and business association, improve user experience.In addition, can excavate potential user by user behavior feature and professional association analysis, colony extends one's service.For example, can recommend contents such as its relevant music and video to the user group who likes Zhou Jielun.
Below, will specifically introduce multistage identification and label that described url data is carried out among the step S120 with reference to figure 2-Fig. 4.
For convenience of explanation, at first look back the content that URL comprises.URL is a kind of identification method that is used for intactly describing the address of webpage on the Internet.URL adopts identical basic syntax, no matter is addressed to which webpage or describes which kind of mechanism to obtain this webpage by.
The grammer of URL is as follows usually,
Agreement: // host name (: port numbers)/path? inquiry.
Wherein, protocol section is specified the host-host protocol that uses.The most frequently used is http protocol, and the protocol header of corresponding URL address is HTTP: //.Host name is meant domain name system (DNS) host name or the IP address of the server of depositing resource.Sometimes, before host name, also can comprise and be connected to the required username and password of server (form: username; Password).Port numbers is an integer, and it is optional.Use default port during omission, various host-host protocols all have the port numbers of acquiescence, are 80 as the default port of http.The path part comprises the path definition of hierarchical organization, by zero or the character string that separates of a plurality of "/" symbol constitute, generally be used for representing catalogue or file address on the main frame.Query portion is optionally, is used for transmitting parameter to dynamic web page, and it can have a plurality of parameters, uses “ ﹠amp; " symbol separates, separate with "=" symbol between the name of each parameter and the value.Common unified resource identifier grammer complete, that have authorization portions looks as follows: agreement: // user name password: subdomain name. domain name. and TLD: port numbers/directory/file name. file suffixes? parameter=value # sign.
Fig. 2 shows the flow chart that carries out the process 200 of website identification according to an embodiment of the invention according to url data.In step S210, the url data and the website storehouse of user's online are mated, wherein the website in the website storehouse is associated with site tags.
According to one embodiment of present invention, the website storehouse is predetermined.According to another embodiment of the present invention, the website storehouse is dynamic collection, renewal and maintenance.Be different from traditional user behavior analysis mode, what comprise in the website storehouse is not the URL address of website, but the keyword of the domain name of each website that comprises.Preferably, the website storehouse also comprises IP or its keyword of each website.In other words, the website in the website storehouse can identify by the keyword of domain name keyword, IP address or IP address.Domain name keyword sina, qq, taobao and the ko or the like that for example, can comprise Sina website, www.qq.com, Taobao, Kongzhong or the like website in the website storehouse.One or more IP address or its keyword that can comprise as an alternative or supplement, the website in the website storehouse.The domain name keyword of specific website and one or more IP of this website correspondence or its keyword all are associated with the site tags of this website, and can set up mapping relations by this site tags.Alternatively, the domain name keyword of this website itself also can be used as the site tags of this website.
In one embodiment, mate in the domain name part and the website storehouse of the URL address that the user can be surfed the Net.If found certain domain name keyword, IP address or IP keyword in the website storehouse, its domain name part or its part with the URL address of user's online is identical, then thinks and has found the website that mates.In another embodiment, the domain name part of the URL address of user online can be resolved, remove the prefix that is similar to www etc. and be similar to suffix such as .cn .org and .com, obtain the center domain name of this URL address.Then center domain name and website storehouse in the URL address of user's online are mated.If in the website storehouse, found the keyword identical, then think and found the website that mates with the central field name of this URL address.
If in the website storehouse, find the website of the url data coupling of surfing the Net, then advance to step S220 with the user.If in the website storehouse, do not find the website of the url data coupling of surfing the Net with the user, then can abandon this url data simply, perhaps may be advanced to optional step S230 and further handle.
In step S220, to the corresponding site tags of url data mark of user online, i.e. the site tags that is associated with the coupling website that finds in the website storehouse.
In step S230, can will not find the url data of user's online of coupling website to send to the early warning platform as the early warning data.By the early warning data are analyzed, can increase new site to the website storehouse, thus the website storehouse of enriching the representative of consumer behavior.
Then, process 200 finishes.
Fig. 3 shows the flow chart that the url data according to user's online according to an embodiment of the invention carries out the process 300 of subject classification identification.Process 300 is based on the url data of user's online and the coupling of classification tree.When only in the classification treebank, having the corresponding classification tree of website with user capture, just can implementation 300.If in the classification treebank, do not have the classification tree of the website of user capture, then can carry out Classification and Identification by other modes.For example can carry out after a while with reference to figure 4 described classifying content identifyings.
Following mask body is introduced process 300.In step S310, the classification tree of described url data with the corresponding website of classification in the treebank mated, wherein each node of classification tree is associated with theme label.The subject classification storehouse of the website that the classification treebank may be visited for the user.According to one embodiment of present invention, this classification treebank is predetermined.According to another embodiment of the present invention, the classification treebank is dynamic collection, renewal and maintenance.
Theme label indication subject classification.Each node of classification tree can be associated with theme label by subject classification.The kind of subject classification can be scheduled to.For example, can be divided into music, e-book, video, software, recreation five classes usually.As an alternative or supplement, can divide other or more subject categories, for example shopping, community, information or the like.Alternatively, the kind of subject classification and quantity can dynamically update and safeguard.
The classification tree of each website can be organized according to the structure of corresponding website.For example, for 3g portal website, its classification tree can be with 3g door (3g.cn) as top mode (being root node).With the child node (ground floor node) as top mode such as news (news), physical culture (sports), amusement (ent).Further, can generate next node layer according to the structure of website.For example, the amusement node can comprise child nodes such as film, music, picture library.By that analogy, up to the leaf node (that is the node that, does not have child node) of classification tree.As mentioned above, the node of classification tree comprises root node and each child node, can represent by the keyword in the URL address of correspondence.Alternatively, these nodes also can be represented by its complete URL address or other modes (for example IP).
As another example, Fig. 6 shows the classification tree of Kongzhong of portal website.This classification tree is top mode (root node) with Kongzhong (kong.net).With mobile phone games (ko.cn), picture (stockpics), lottery ticket (lottery), recreation (ko.cn) or the like child node (being the ground floor node) as top mode.Second layer node for example comprises drawing together recreation download node and helping node or the like under the recreation node.By that analogy.
Each node of classification tree (its leaf node at least) will distribute the theme label of an association usually.For each node, can also its type of mark and other attributes.For example, news node, physical culture node, film node and the music node of above-mentioned 3g door classification tree can be distributed information label, information label, video tab and music label respectively.For example, the mobile phone games of the classification tree of above-mentioned Kongzhong and recreation node can distribute the recreation label.
Each node in the classification tree also may distribute a plurality of theme label.For example the download of the recreation in the classification tree of Kongzhong node can distribute recreation label and software label.
Alternatively, each node in the classification tree also may distribute multistage theme label.For example, the next stage node can be inherited the theme label of its father node, can have its oneself subtab in addition.For example, the help node under the recreation node of Kongzhong shown in Figure 6 can be inherited the recreation label of recreation node, can also have the Games Help subtab of himself.
If in the classification treebank, find the node of the url data coupling of surfing the Net, then advance to step S320 with the user.If in the classification treebank, do not find the node of the url data coupling of surfing the Net with the user, then can abandon this url data simply, perhaps may be advanced to optional step S330 and further handle.
In step S320, to the corresponding theme label of url data mark of user online, i.e. the theme label that is associated with the matched node that finds in the classification tree.
In step S330, can will not find the url data of user's online of matched node to send to the early warning platform as the early warning data.By the early warning data are analyzed, can increase new node to classification tree, thereby enrich the classification treebank of representative of consumer behavior.
Then, process 300 finishes.
Introduce process 300 below by two specific embodiments.
The url data of user's online of obtaining in one embodiment, is as being Http:// news.3g.cnIn step 310, with this url data and the 3g of classification in the treebank outdoors the classification tree of website mate.So can find website news (news) node of coupling.Because this news node is associated with the information label, therefore in step 320, give the theme label of described url data mark correspondence, i.e. the information label.
In another embodiment, the url data that obtains for example is http://ko.cn/game/help.wml.In step 310, the classification tree of this url data with the Kongzhong of classification in the treebank mated.So can find the node of coupling is help node under chivalrous person's node.Therefore, in step 320, can correspondingly give this URL mark recreation label and recreation subtab.
According to embodiments of the invention, by the classification tree coupling, can on the basis of the website that analyzes user capture, further excavate, analyze the hobby of user to specific channel.As party A-subscriber's news that (one month or longer) watches 3g net and www.qq.com every day in certain period, and the frequency of occurrence of information label is higher than certain threshold values, can determine that then the party A-subscriber belongs to the news cluster user.
Fig. 4 shows the flow chart that the url data according to user's online according to an embodiment of the invention carries out the process 400 of classifying content identification.Process 400 is undertaken by the webpage to described url data correspondence that the page grasps and text-processing, carrying out classifying content, thereby gives described url data tag content label.Process 400 can be carried out separately, also can carry out after identifying website.Preferably, process 400 does not only have the url data of corresponding classification tree to carry out at identifying website and this website in the classification treebank.
In step S410,, use the crawler capturing user accessing web page according to the url data of user's online.Preferably, this reptile is that distributed multinode climbs out of, and it can be cooperated on multiple servers and carry out the extracting task.By setting up reptile, can carry out content to the website of appointment and grasp (this is also referred to as the node reptile); If user's access links does not occur in the reptile storehouse, then can use reptile (directed reptile) to grasp at user behavior.
Preferably, can utilize extracting (tree reptile), regularly grasp up-to-date content at the partial channel of single website.The rule of tree reptile can effectively guarantee not grab the duplicate contents page, and can grasp incremental data by self-timing.
Then, in step S420, can resolve the page that grabs according to predetermined resolution rules, determining its classifying content, and tag content label correspondingly.Described resolution rules can for example be expressed with the form of similar regular expression, perhaps expresses with any other suitable form.Preferably, resolution rules can dynamically update and safeguard.
The classification of content tab instruction content.The kind of classifying content can be scheduled to.Classifying content can be similar with subject classification, also can be divided into music, e-book, video, software, recreation five classes usually.As an alternative or supplement, can divide other or the more contents classification, for example shopping, community, information or the like.Alternatively, the kind of classifying content and quantity can dynamically update and safeguard.Under the theme label situation identical, in statistics, can not distinguish with content tab.Preferably, classifying content can also be associated with the customer group feature, for example arrives certain music name, singer, title or the like in detail.
In step S420, the object of parsing comprises the title of the page, preferably includes the content of the page.Preferably, can specifically parse the content relevant, for example music name, singer, e-book title or the like with the customer group feature.
In step S430, the content of pages at grasping can adopt any known web page text sorting algorithm to classify.At first web page contents is cut speech (for example, cutting speech according to dictionary), contrast the appearance threshold values of these speech in corpus then.If surpass threshold values, then be considered as certain classification.
In one embodiment, the following URL of user capture: Http:// sports.3g.cn/nba/NewsContent.aspx? sid=bb23c58aee93c24606d19 49ff0e4e2cb﹠amp; Waped=2﹠amp; Gaid=T3BlcmE%3d﹠amp; Wid=﹠amp; Pz=﹠amp; Nid=315562
The content of this link of grasping in step S410 as shown in Figure 7.First-selection is cut speech to content, and proposes keyword, shown in red circle.Then keyword and corpus being compared, realize that the keyword of red circle part belongs to sports news, is sports news so can determine this webpage, and also promptly this user is seeing sports news.Correspondingly, can give this URL flag information label, perhaps thinner ground mark sports news label.
Preferably, step S430 only carries out at the url data of not determining classifying content among the step S420.
Should be appreciated that step S420 and step S430 are optional.Process 400 can only comprise any one in above-mentioned two steps.
In case determined the classifying content and the tag content label of the url data of user's online among step S420 and the step S430, then process 400 can finish.If can not determine the classifying content of URL, then can abandon this data simply, perhaps it is sent to the early warning platform as the early warning data and further handle.
Fig. 5 shows the flow chart of the process 500 of the url data identify customer end according to user online according to an embodiment of the invention.
As shown in the figure, in step S510, the url data and the client identification storehouse of user's online are mated.Through test of many places and analysis over a long time, the inventor has been found that and uses the client online can attach the exclusive sign of a part of client.The exclusive sign of these clients can be recorded and set up warehouse-in, forms the client identification storehouse.According to one embodiment of present invention, the client identification storehouse is predetermined.According to another embodiment of the present invention, the client identification storehouse is dynamic collection, renewal and maintenance.
In step S520,, then give the corresponding client tag of described url data mark if in the client identification storehouse, find the client of coupling.Otherwise default user adopts the generic browser mode to surf the Net.Process 500 finishes then.
Be that example is specifically described with the QQ client below.By the QQ client is tested and is analyzed, can put out the network identity that each step action of user is triggered in order, set up tree-like storehouse then, as shown in Figure 8.If any paths coupling can determine that then the user has used the QQ client in a series of url datas of the user who obtains online and the tree shown in Figure 8.
As another example, the inventor has been found that when the user uses the UCWEB client to remove accession page that UCWEB can send out a HTTP/POST earlier, and sends: http://uc.ucweb.com:80.Therefore, the exclusive expression of this expression as the UCWEB client can be recorded in the client identification storehouse.If comprise the information that is complementary with above-mentioned expression in the user's who obtains the url data, then can identify the user and use the UCWEB client.
By identify customer end, can determine which specific client end whether the user use and use carry out access to netwoks.In conjunction with the result of website identification, can statistical separate out user's which website that used client-access.In conjunction with subject classification identification, can statistical separate out user's which channel of which website that used client-access.
Should be appreciated that Fig. 2-identifying shown in Figure 5 only is exemplary, rather than restrictive.Those skilled in the art
It will be understood by those skilled in the art that Fig. 2-identifying shown in Figure 5 only is the optional example of the identifying that can comprise of the multistage identification in the process 120 shown in Figure 1.Shown in can comprising in the process 120 shown in Figure 1 at least two of identifying, can also comprise other unshowned identifying.
Fig. 9 shows the flow chart based on the analytical method 900 of surfing Internet with cell phone user behavior according to a preferred embodiment of the present invention.Method 900 can be regarded a specific implementation of method 100 as.Step S910 and step S930 are identical with step S110 and S130 among Fig. 1 respectively, do not repeat them here.Step S921-S922 is corresponding with step S120 among Fig. 1.In step S921, the url data and the website storehouse of user's online are mated, with the identification website.If identify website, mark site tags then.Then, advance to step S922.In step S922, the classification tree of described url data with the corresponding website of classification in the treebank mated, with identification subject classification and mark theme label correspondingly.
Figure 10 shows the flow chart based on the analytical method 1000 of surfing Internet with cell phone user behavior according to a further advantageous embodiment of the invention.Method 1000 can be regarded a specific implementation of method 100 as.Step S1010 and step S1030 are identical with step S110 and S130 among Fig. 1 respectively, do not repeat them here.Step S1021-S1024 is corresponding with step S120 among Fig. 1.In step S1021, the url data and the website storehouse of user's online are mated, with the identification website.If identify website, mark site tags then.Then, advance to step S1022.In step S1022, the classification tree of described url data with the corresponding website of classification in the treebank mated, with identification subject classification and mark theme label correspondingly.If in step S1022, it fails to match, then may be advanced to step S1023.In step S1023, the webpage of described url data correspondence is carried out the page grasp and text-processing, carrying out classifying content identification, thereby give described url data tag content label.Then, may be advanced to step S1024, described url data and client identification storehouse are mated, with identify customer end and mark client tag correspondingly.Should be appreciated that the order of step S1024 is not fixed, it can occur in after the step S1010 and any time between step S1030.
It will be understood by those skilled in the art that analytical method of the present invention is not limited to the specific embodiment shown in Fig. 9-Figure 10.For example, in a distortion of method shown in Figure 10, can not comprise step S1022 and/or S1024.
Figure 11 illustrates the analyzer 1100 based on the surfing Internet with cell phone user behavior according to an embodiment of the invention.Analyzer 1100 can be used to carry out according to the analytical method based on the surfing Internet with cell phone user behavior of the present invention, and for example method 100.As shown in the figure, analyzer 1100 comprises deriving means 1110, recognition device 1120 and statistical analysis device 1130.Wherein, deriving means 1110 is used to obtain the uniform resource position mark URL data of user's online.Recognition device 1120 is used for described url data is carried out multistage identification and markup tags.Statistical analysis device 1130 is used for the label of mark is added up, with the feature of analysis user behavior.
According to a preferred embodiment of the present invention, recognition device 1120 can comprise in following at least two: website recognition device, subject classification device, classifying content device and client recognition device.Wherein, the website recognition device is used for carrying out website identification according to described url data, and the mark site tags.The subject classification device is used for carrying out subject classification identification according to described url data, and the mark theme label.The classifying content device is used for carrying out classifying content identification according to described url data, and the tag content label.The client recognition device is used for according to described url data identify customer end, and the mark client tag.
In one embodiment of the invention, the website recognition device comprises and is used for the device that the classification tree of corresponding website with described url data and classification treebank mates that wherein each node of classification tree is associated with theme label.And if this website recognition device also comprises the node that is used for finding in described classification stack room coupling, then give the device of the theme label of described url data mark correspondence.
In another embodiment of the present invention, the subject classification device comprises and is used for the device that the classification tree of corresponding website with described url data and classification treebank mates that wherein each node of classification tree is associated with theme label.And if this subject classification device also comprises the node that is used for finding in described classification stack room coupling, then give the device of the theme label of described url data mark correspondence.In a preferred embodiment of the invention, classification tree is and the corresponding website tree structure of the structure of website.
In another embodiment of the present invention, described classifying content device comprises that the page grasps and analytical equipment, it is used for webpage to described url data correspondence and carries out the page and grasp and text-processing, carrying out classifying content, thereby gives the device of described url data tag content label.In a preferred embodiment of the invention, the described page grasps and analytical equipment further comprises the reptile of the url data corresponding page that is used to grasp user capture.And described page extracting and analytical equipment can also comprise at least one in the following apparatus: resolver and/or page text sorter.Resolver is used for according to predetermined resolution rules the page that grabs being resolved, to determine the classification of webpage.The page text sorter is used for the content of text at the page that grabs, and by the web page text sorting algorithm the described page is classified.
In another embodiment of the present invention, the client recognition device comprises and is used for device that described url data and client identification storehouse are mated, if and the client that is used for finding coupling, then give the device of the corresponding client tag of described url data mark.
Analyzer according to another embodiment of the invention also comprises polyplant.Described polyplant is used for the feature according to the user behavior of being analyzed, the similar user of polymerization.
According to embodiments of the invention, based on using surfing Internet with cell phone to carry out depth analysis, can support for operator provides marketing strategy to wireless interconnected network users, to reach the purpose of accurate marketing, promote the related service great-leap-forward development.Analytical method comprises following advantage at least according to an embodiment of the invention:
1). obtain the user of hobby surfing Internet with cell phone.
2). user behavior is discerned and classified.For example, concrete number user can be accurate to concrete certain e-book or certain music for to have visited certain website in certain period, waits each user's hobby and multinomial useful data.
3). excavate grouped data, the similar preference user of polymerization.For example, the user is divided into the music preferences group, recreation preference group, speculation in stocks preference group etc.Simultaneously the user among the music preferences group is further excavated, as analyze the user group who likes Zhou Jielun.
4). carry out the business association analysis according to user behavior, excavate the potential user, colony extends one's service.For example, recommend its relevant music, video, contents such as video display to the user group who likes Zhou Jielun.
By above description to specific embodiment, it will be appreciated by those skilled in the art that, can use a computer executable instruction and/or be included in the processor control routine of above-mentioned apparatus and method realizes, for example provides such code on such as the mounting medium of disk, CD or DVD-ROM, such as the programmable memory of read-only memory (firmware) or the data medium such as optics or electronic signal carrier.The device of present embodiment and unit thereof can be by such as very lagre scale integrated circuit (VLSIC) or gate array, realize such as the semiconductor of logic chip, transistor etc. or such as the hardware circuit of the programmable hardware device of field programmable gate array, programmable logic device etc., also can use the software of carrying out by various types of processors to realize, also can realize by the combination of above-mentioned hardware circuit and software.
Though below in conjunction with specific embodiments, analytical method and the analyzer based on the surfing Internet with cell phone user behavior according to the present invention described in detail, the present invention is not limited to this.Those of ordinary skills can be under specification instruction carry out multiple conversion, substitutions and modifications and without departing from the spirit and scope of the present invention to the present invention.Should be appreciated that all such variations, replacement, modification still fall within protection scope of the present invention.Protection scope of the present invention is limited by claims.

Claims (18)

1. analytical method based on the surfing Internet with cell phone user behavior comprises:
Obtain the uniform resource position mark URL data of user's online,
Described url data is carried out multistage identification and markup tags, and
Label to mark is added up, with the feature of analysis user behavior.
2. method according to claim 1, wherein said identification step comprise at least two in following:
Carry out website identification according to described url data, and the mark site tags;
Carry out subject classification identification according to described url data, and the mark theme label;
Carry out classifying content identification according to described url data, and the tag content label; And
According to described url data identify customer end, and the mark client tag.
3. method according to claim 2, wherein said website identification step comprises:
Described url data and website storehouse are mated, and the website in the wherein said website storehouse is associated with site tags, and
If in the website storehouse, find the website of coupling, then give the site tags of described url data mark correspondence.
4. method according to claim 2, wherein said subject classification identification step comprises:
The classification tree of described url data with the corresponding website of classification in the treebank mated, and wherein each node of classification tree is associated with theme label;
If in described classification stack room, find the node of coupling, then give the theme label of described url data mark correspondence.
5. according to the described method of claim 4, the hierarchical structure of wherein said classification tree is corresponding with the structure of website.
6. method according to claim 2, wherein said classifying content identification step comprises:
The webpage of described url data correspondence is carried out the page grasp and text-processing, carrying out classifying content, thereby give described url data tag content label.
7. method according to claim 6, wherein said extracting and text-processing comprise:
Use the url data corresponding page of crawler capturing user capture; And
In the following step at least one:
According to predetermined resolution rules the page that grabs is resolved, to determine the classifying content of webpage;
At the content of text of the page that grabs, the described page is carried out classifying content by the web page text sorting algorithm.
8. method according to claim 2, wherein said identify customer end comprises:
Described url data and client identification storehouse are mated, and
If find the client of coupling, then give the corresponding client tag of described url data mark.
9. method according to claim 1 also comprises:
According to the feature of the user behavior of being analyzed, the similar user of polymerization.
10. analyzer based on the surfing Internet with cell phone user behavior comprises:
Deriving means is used to obtain the uniform resource position mark URL data of user's online,
Recognition device is used for described url data is carried out multistage identification and markup tags, and
The statistical analysis device is used for the label of mark is added up, with the feature of analysis user behavior.
11. analyzer according to claim 10, wherein said recognition device comprise at least two in following:
The website recognition device is used for carrying out website identification according to described url data, and the mark site tags;
The subject classification device is used for carrying out subject classification identification according to described url data, and the mark theme label;
The classifying content device is used for carrying out classifying content identification according to described url data, and the tag content label; And
The client recognition device is used for according to described url data identify customer end, and the mark client tag.
12. analyzer according to claim 11, wherein said website recognition device further comprises:
Be used for the device that the classification tree of corresponding website with described url data and classification treebank mates, wherein each node of classification tree is associated with theme label;
If be used for finding the node of coupling, then give the device of the theme label of described url data mark correspondence in described classification stack room.
13. analyzer according to claim 11, wherein said subject classification device further comprises:
Be used for the device that the classification tree of corresponding website with described url data and classification treebank mates, wherein each node of classification tree is associated with theme label;
If be used for finding the node of coupling, then give the device of the theme label of described url data mark correspondence in described classification stack room.
14. analyzer according to claim 13, wherein said classification tree are and the corresponding website tree structure of the structure of website.
15. analyzer according to claim 11, wherein said classifying content device further comprises:
The page grasps and analytical equipment, is used for webpage to described url data correspondence and carries out the page and grasp and text-processing, carrying out classifying content, thereby gives described url data tag content label.
16. analyzer according to claim 15, the wherein said page grasps and analytical equipment further comprises:
Reptile is used to grasp the url data corresponding page of user capture; And
In the following apparatus at least one:
Resolver is used for according to predetermined resolution rules the page that grabs being resolved, to determine the classification of webpage;
The page text sorter is used for the content of text at the page that grabs, and by the web page text sorting algorithm the described page is classified.
17. analyzer according to claim 11, wherein said client recognition device comprises:
Be used for device that described url data and client identification storehouse are mated, and
If be used for finding the client of coupling, then give the device of the corresponding client tag of described url data mark.
18. analyzer according to claim 10 also comprises:
Polyplant is used for the feature according to the user behavior of being analyzed, the similar user of polymerization.
CN2011100319009A 2011-01-25 2011-01-25 Mobile phone Internet accessing-based user behavior analysis method and device Pending CN102143224A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011100319009A CN102143224A (en) 2011-01-25 2011-01-25 Mobile phone Internet accessing-based user behavior analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011100319009A CN102143224A (en) 2011-01-25 2011-01-25 Mobile phone Internet accessing-based user behavior analysis method and device

Publications (1)

Publication Number Publication Date
CN102143224A true CN102143224A (en) 2011-08-03

Family

ID=44410439

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011100319009A Pending CN102143224A (en) 2011-01-25 2011-01-25 Mobile phone Internet accessing-based user behavior analysis method and device

Country Status (1)

Country Link
CN (1) CN102143224A (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102547663A (en) * 2012-03-09 2012-07-04 北京神州数码思特奇信息技术股份有限公司 Method for optimizing wireless application protocol based on business matrix
CN102760163A (en) * 2012-06-12 2012-10-31 奇智软件(北京)有限公司 Personalized recommendation method and device of characteristic information
CN103188347A (en) * 2013-03-15 2013-07-03 亿赞普(北京)科技有限公司 Internet event analyzing method and internet event analyzing device
CN103780414A (en) * 2012-10-22 2014-05-07 中国移动通信集团公司 Website access summarizing method and apparatus
CN103905489A (en) * 2012-12-27 2014-07-02 腾讯科技(深圳)有限公司 Network information service processing method and system
CN104063453A (en) * 2014-06-24 2014-09-24 晶赞广告(上海)有限公司 Method for extracting key words of marketing based on URL (uniform resource locator) analysis
CN104732425A (en) * 2015-03-24 2015-06-24 浪潮集团有限公司 E-commerce platform customer behavior analytical method based on big data
CN104866474A (en) * 2014-02-20 2015-08-26 阿里巴巴集团控股有限公司 Personalized data searching method and device
CN105302844A (en) * 2014-08-01 2016-02-03 腾讯科技(深圳)有限公司 Internet monitoring method, device and system
CN105429976A (en) * 2015-11-13 2016-03-23 厦门安胜网络科技有限公司 Net citizen behavior analysis method and system based on cell phone number
CN105511933A (en) * 2015-12-03 2016-04-20 深圳市创维软件有限公司 Compiling method of source code and related equipment
CN105610665A (en) * 2015-07-29 2016-05-25 哈尔滨工业大学(威海) VPN protocol for mobile devices
CN105824884A (en) * 2016-03-10 2016-08-03 海信集团有限公司 User internet surfing information processing method and device
CN105893581A (en) * 2016-04-03 2016-08-24 北京设集约科技有限公司 Method and system for effectively sharing and collecting
CN106230809A (en) * 2016-07-27 2016-12-14 南京快页数码科技有限公司 A kind of mobile Internet public sentiment monitoring method based on URL and system
CN106339422A (en) * 2016-08-15 2017-01-18 南方科技大学 Method and device for determining user behavior characteristics through webpage addresses
CN106446115A (en) * 2016-09-18 2017-02-22 成都九鼎瑞信科技股份有限公司 Mobile Internet user classification method and device
CN106878438A (en) * 2017-03-03 2017-06-20 久远谦长(北京)技术服务有限公司 The method and system of user behavior analysis under a kind of https environment
CN106919625A (en) * 2015-12-28 2017-07-04 中国移动通信集团公司 A kind of internet customer attribute recognition methods and device
CN107368598A (en) * 2017-07-26 2017-11-21 北京锐安科技有限公司 The acquisition method and device of user data
CN107818145A (en) * 2017-10-18 2018-03-20 南京邮数通信息科技有限公司 A kind of user behavior tag along sort extracting method based on dynamic reptile
CN107870986A (en) * 2017-10-13 2018-04-03 平安科技(深圳)有限公司 User behavior analysis method, application server and computer-readable recording medium based on reptile data
CN108062337A (en) * 2016-11-09 2018-05-22 北京国双科技有限公司 A kind of method and device to label to reptile seed

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070239716A1 (en) * 2006-04-07 2007-10-11 Google Inc. Generating Specialized Search Results in Response to Patterned Queries
CN101847160A (en) * 2010-05-19 2010-09-29 深圳市五巨科技有限公司 Method and device for pushing personalized pages to mobile terminal
CN101923545A (en) * 2009-06-15 2010-12-22 北京百分通联传媒技术有限公司 Method for recommending personalized information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070239716A1 (en) * 2006-04-07 2007-10-11 Google Inc. Generating Specialized Search Results in Response to Patterned Queries
CN101923545A (en) * 2009-06-15 2010-12-22 北京百分通联传媒技术有限公司 Method for recommending personalized information
CN101847160A (en) * 2010-05-19 2010-09-29 深圳市五巨科技有限公司 Method and device for pushing personalized pages to mobile terminal

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102547663A (en) * 2012-03-09 2012-07-04 北京神州数码思特奇信息技术股份有限公司 Method for optimizing wireless application protocol based on business matrix
CN102547663B (en) * 2012-03-09 2016-05-11 北京思特奇信息技术股份有限公司 A kind of surfing Internet with cell phone optimization method based on traffic matrix
CN102760163A (en) * 2012-06-12 2012-10-31 奇智软件(北京)有限公司 Personalized recommendation method and device of characteristic information
CN103780414A (en) * 2012-10-22 2014-05-07 中国移动通信集团公司 Website access summarizing method and apparatus
CN103780414B (en) * 2012-10-22 2017-05-31 中国移动通信集团公司 A kind of website visiting statistical method and device
CN103905489B (en) * 2012-12-27 2015-04-29 腾讯科技(深圳)有限公司 Network information service processing method and system
CN103905489A (en) * 2012-12-27 2014-07-02 腾讯科技(深圳)有限公司 Network information service processing method and system
CN103188347B (en) * 2013-03-15 2016-03-30 亿赞普(北京)科技有限公司 The Internet affair analytical method and device
CN103188347A (en) * 2013-03-15 2013-07-03 亿赞普(北京)科技有限公司 Internet event analyzing method and internet event analyzing device
CN104866474A (en) * 2014-02-20 2015-08-26 阿里巴巴集团控股有限公司 Personalized data searching method and device
CN104866474B (en) * 2014-02-20 2018-10-09 阿里巴巴集团控股有限公司 Individuation data searching method and device
CN104063453A (en) * 2014-06-24 2014-09-24 晶赞广告(上海)有限公司 Method for extracting key words of marketing based on URL (uniform resource locator) analysis
CN105302844A (en) * 2014-08-01 2016-02-03 腾讯科技(深圳)有限公司 Internet monitoring method, device and system
CN104732425A (en) * 2015-03-24 2015-06-24 浪潮集团有限公司 E-commerce platform customer behavior analytical method based on big data
CN105610665A (en) * 2015-07-29 2016-05-25 哈尔滨工业大学(威海) VPN protocol for mobile devices
CN105610665B (en) * 2015-07-29 2019-06-18 哈尔滨工业大学(威海) A kind of VPN agreement suitable for mobile device
CN105429976A (en) * 2015-11-13 2016-03-23 厦门安胜网络科技有限公司 Net citizen behavior analysis method and system based on cell phone number
CN105511933A (en) * 2015-12-03 2016-04-20 深圳市创维软件有限公司 Compiling method of source code and related equipment
CN106919625A (en) * 2015-12-28 2017-07-04 中国移动通信集团公司 A kind of internet customer attribute recognition methods and device
CN105824884A (en) * 2016-03-10 2016-08-03 海信集团有限公司 User internet surfing information processing method and device
CN105893581A (en) * 2016-04-03 2016-08-24 北京设集约科技有限公司 Method and system for effectively sharing and collecting
CN106230809A (en) * 2016-07-27 2016-12-14 南京快页数码科技有限公司 A kind of mobile Internet public sentiment monitoring method based on URL and system
CN106230809B (en) * 2016-07-27 2019-11-19 南京快页数码科技有限公司 A kind of mobile Internet public sentiment monitoring method and system based on URL
CN106339422A (en) * 2016-08-15 2017-01-18 南方科技大学 Method and device for determining user behavior characteristics through webpage addresses
CN106446115A (en) * 2016-09-18 2017-02-22 成都九鼎瑞信科技股份有限公司 Mobile Internet user classification method and device
CN108062337A (en) * 2016-11-09 2018-05-22 北京国双科技有限公司 A kind of method and device to label to reptile seed
CN106878438A (en) * 2017-03-03 2017-06-20 久远谦长(北京)技术服务有限公司 The method and system of user behavior analysis under a kind of https environment
CN107368598A (en) * 2017-07-26 2017-11-21 北京锐安科技有限公司 The acquisition method and device of user data
CN107870986A (en) * 2017-10-13 2018-04-03 平安科技(深圳)有限公司 User behavior analysis method, application server and computer-readable recording medium based on reptile data
CN107818145A (en) * 2017-10-18 2018-03-20 南京邮数通信息科技有限公司 A kind of user behavior tag along sort extracting method based on dynamic reptile

Similar Documents

Publication Publication Date Title
CN102143224A (en) Mobile phone Internet accessing-based user behavior analysis method and device
Rahman et al. Efficient and scalable socware detection in online social networks
US8577829B2 (en) Extracting information from unstructured data and mapping the information to a structured schema using the naïve bayesian probability model
US20170185921A1 (en) System and method for deploying customized machine learning services
CN103218431B (en) A kind ofly can identify the system that info web gathers automatically
CA2769946C (en) A method and system for efficient and exhaustive url categorization
Chiu et al. Personalized blog content recommender system for mobile phone users
US10311120B2 (en) Method and apparatus for identifying webpage type
CN108881339A (en) Push method, user tag generation method, device and equipment
US20110153423A1 (en) Method and system for creating user based summaries for content distribution
US10984452B2 (en) User/group servicing based on deep network analysis
US20090327234A1 (en) Updating answers with references in forums
CN102105879A (en) Federated community search
CN101729288B (en) Method and device for counting network access behaviours of internet users
JP2019530295A (en) Network-based advertising data traffic latency reduction
Dinh et al. Spam campaign detection, analysis, and investigation
CN101228521A (en) Integration of personalized portals with WEB content syndication
Dewan et al. Facebook Inspector (FbI): Towards automatic real-time detection of malicious content on Facebook
CN103810268B (en) Search result recommendation information loading method, device and system and URL detection method, device and system
US20090327235A1 (en) Presenting references with answers in forums
CN106446115A (en) Mobile Internet user classification method and device
US20100274887A1 (en) System and Method for Recommending Personalized Identifiers
CN108289093A (en) The construction method and structure system in App application condition codes library
CN110636038A (en) Account number analysis method, account number analysis device, security gateway and system
KR20090048998A (en) System and method for alarming bad public opinion using keyword and recording medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20110803