CN108804431A - A kind of keyword effect analysis method based on big data - Google Patents

A kind of keyword effect analysis method based on big data Download PDF

Info

Publication number
CN108804431A
CN108804431A CN201710281439.XA CN201710281439A CN108804431A CN 108804431 A CN108804431 A CN 108804431A CN 201710281439 A CN201710281439 A CN 201710281439A CN 108804431 A CN108804431 A CN 108804431A
Authority
CN
China
Prior art keywords
keyword
path
browsing
aggregators
effect analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710281439.XA
Other languages
Chinese (zh)
Inventor
林正春
梁文庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Original Mdt Infotech Ltd
Original Assignee
Guangdong Original Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Original Mdt Infotech Ltd filed Critical Guangdong Original Mdt Infotech Ltd
Priority to CN201710281439.XA priority Critical patent/CN108804431A/en
Publication of CN108804431A publication Critical patent/CN108804431A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a kind of keyword effect analysis method based on big data, includes the following steps:A, keyword will occur to record, while the result for using corresponding keyword to be retrieved will be recorded;B, the different viewing path for belonging to same keyword is classified as one group, the browse path that similarity is more than to first threshold merges, the browsing time in statistics different viewing path, browsing hit rate;C, the different viewing path for belonging to same retrieval result is classified as one group, the browse path that similarity is more than to first threshold merges, the browsing time in statistics different viewing path, browsing hit rate;D, the statistical data obtained in step B and step C is analyzed, obtains keyword effect analysis results.The present invention can solve the deficiencies in the prior art, improve the speed of data analysis.

Description

A kind of keyword effect analysis method based on big data
Technical field
The present invention relates to big data analysis technical field, especially a kind of keyword effect analysis side based on big data Method.
Background technology
Search engine is the cyber stalker that present numerous netizens are commonly used.What search engine was shown searches Hitch fruit plays the crucial effect of non-production for improving website browsing amount.Due to big data technology have information comprehensively, result The high feature of validity, the validity promoted to targeted website pageview to search key using big data technology are analyzed It is a kind of common method.But existing analysis method is all directly to be counted to mass data, and it is computationally intensive, cause Analytical effect real-time is bad.
Invention content
The technical problem to be solved in the present invention is to provide a kind of keyword effect analysis method based on big data, can solve Certainly the deficiencies in the prior art improve the speed of data analysis.
In order to solve the above technical problems, the technical solution used in the present invention is as follows.
A kind of keyword effect analysis method based on big data, includes the following steps:
A, keyword will occur to record, while the result for using corresponding keyword to be retrieved will be recorded;
B, the different viewing path for belonging to same keyword is classified as one group, similarity is more than to the browsing of first threshold Path merges, the browsing time in statistics different viewing path, browsing hit rate;
C, the different viewing path for belonging to same retrieval result is classified as one group, similarity is more than the clear of first threshold Looking at path merges, the browsing time in statistics different viewing path, browsing hit rate;
D, the statistical data obtained in step B and step C is analyzed, obtains keyword effect analysis results.
Preferably, in step A, according to the similarity of retrieval result, on the basis of each keyword, by remaining key Word is divided into associated group and dereferenced group.
Preferably, in step B, the browse path of the associated group of pending keyword and pending keyword will be belonged to Browse path is merged;Several aggregators, each aggregators setting and other browsing roads are set on browse path The session permission that aggregators on diameter are merged, when data pass through some aggregators and meet the session of the aggregators When permission, it is that the data establish ephemeral data mapping in this aggregators, maps that in corresponding aggregators, receive number According to aggregators mapping path and mapping result are preserved.
Preferably, in step C, the entropy of each browse path after merging is calculated, the browsing of second threshold is higher than to entropy It is deleted in path.
Preferably, in step C, the hinged node in the browse path of reservation is traversed, hinged node is obtained Characteristic function F (x, y), wherein x are from the external link for being directed toward the hinged node, and y is from the chain outside hinged node direction It connects;Characteristic function F (x, y) is marked in the browsing time and browsing hit rate obtained using statistics.
Preferably, the browsing time obtained in step B and step C and browsing hit rate are normalized, so Summation process is weighted to it afterwards, acquired results are directly proportional to keyword effect.
It is using advantageous effect caused by above-mentioned technical proposal:The present invention to browse path by carrying out two-way point Analysis, accelerates the speed for data processing.During forward analysis, by the fusion of browse path, it can effectively reduce The amount of computing repeatedly.In reversed analytic process, by the way that hinged node progress signature analysis, invalid data can be effectively removed, Improve data-handling efficiency.The big data processing method of the present invention effectively has evaded falling in big data processing procedure due to data Measure the slow problem of the data processing speed brought greatly.
Specific implementation mode
The specific embodiment of the present invention includes the following steps:
A, keyword will occur to record, while the result for using corresponding keyword to be retrieved will be recorded;
B, the different viewing path for belonging to same keyword is classified as one group, similarity is more than to the browsing of first threshold Path merges, the browsing time in statistics different viewing path, browsing hit rate;
C, the different viewing path for belonging to same retrieval result is classified as one group, similarity is more than the clear of first threshold Looking at path merges, the browsing time in statistics different viewing path, browsing hit rate;
D, the statistical data obtained in step B and step C is analyzed, obtains keyword effect analysis results.
In step A, according to the similarity of retrieval result, on the basis of each keyword, remaining keyword is divided into pass Connection group and dereferenced group.
In step B, the browse path of the browse path and pending keyword of the associated group of pending keyword will be belonged to It is merged;Several aggregators, each aggregators setting and melting on other browse paths are set on browse path The session permission that node is merged is closed, when data are by some aggregators and meet the session permission of the aggregators, It is that the data establish ephemeral data mapping in this aggregators, maps that in corresponding aggregators, receive melting for data Node is closed to preserve mapping path and mapping result.
In step C, the entropy of each browse path after merging is calculated, the browse path to entropy higher than second threshold is deleted It removes.
In step C, the hinged node in the browse path of reservation is traversed, obtains the characteristic function F of hinged node (x, y), wherein x are from the external link for being directed toward the hinged node, and y is from the link outside hinged node direction;Use system It counts the obtained browsing time and characteristic function F (x, y) is marked in browsing hit rate.
The browsing time obtained in step B and step C and browsing hit rate are normalized, then it is carried out Weighted sum is handled, and acquired results are directly proportional to keyword effect.
When aggregators and hinged node overlap, x and y is replaced using the mapping of corresponding ephemeral data, obtains new spy Function F ' is levied, uses F ' to be modified F, to the consistency for improving forward analysis and reversely analyzing.
Foregoing description is only proposed as the enforceable technical solution of the present invention, not as to the single of its technical solution itself Restrictive condition.

Claims (6)

1. a kind of keyword effect analysis method based on big data, it is characterised in that include the following steps:
A, keyword will occur to record, while the result for using corresponding keyword to be retrieved will be recorded;
B, the different viewing path for belonging to same keyword is classified as one group, similarity is more than to the browse path of first threshold It merges, the browsing time in statistics different viewing path, browsing hit rate;
C, the different viewing path for belonging to same retrieval result is classified as one group, similarity is more than to the browsing road of first threshold Diameter merges, the browsing time in statistics different viewing path, browsing hit rate;
D, the statistical data obtained in step B and step C is analyzed, obtains keyword effect analysis results.
2. the keyword effect analysis method according to claim 1 based on big data, it is characterised in that:In step A, root According to the similarity of retrieval result, on the basis of each keyword, remaining keyword is divided into associated group and dereferenced group.
3. the keyword effect analysis method according to claim 2 based on big data, it is characterised in that:It, will in step B The browse path for belonging to the associated group of pending keyword is merged with the browse path of pending keyword;In browse path The session merged with the aggregators on other browse paths is arranged in upper several aggregators of setting, each aggregators Permission is the data in this aggregators when data are by some aggregators and meet the session permission of the aggregators Ephemeral data mapping is established, is mapped that in corresponding aggregators, receive the aggregators of data by mapping path and is reflected Result is penetrated to be preserved.
4. the keyword effect analysis method according to claim 1 based on big data, it is characterised in that:In step C, meter The entropy for calculating each browse path after merging, the browse path to entropy higher than second threshold are deleted.
5. the keyword effect analysis method according to claim 4 based on big data, it is characterised in that:It is right in step C Hinged node in the browse path of reservation is traversed, and the characteristic function F (x, y) of hinged node is obtained, and wherein x is from outside It is directed toward the link of the hinged node, y is from the link outside hinged node direction;Use statistics obtained browsing time and clear Characteristic function F (x, y) is marked in hit rate of looking at.
6. the keyword effect analysis method according to claim 1 based on big data, it is characterised in that:By step B and The browsing time and browsing hit rate obtained in step C is normalized, and summation process, gained are then weighted to it As a result directly proportional to keyword effect.
CN201710281439.XA 2017-04-26 2017-04-26 A kind of keyword effect analysis method based on big data Pending CN108804431A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710281439.XA CN108804431A (en) 2017-04-26 2017-04-26 A kind of keyword effect analysis method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710281439.XA CN108804431A (en) 2017-04-26 2017-04-26 A kind of keyword effect analysis method based on big data

Publications (1)

Publication Number Publication Date
CN108804431A true CN108804431A (en) 2018-11-13

Family

ID=64068882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710281439.XA Pending CN108804431A (en) 2017-04-26 2017-04-26 A kind of keyword effect analysis method based on big data

Country Status (1)

Country Link
CN (1) CN108804431A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101114287A (en) * 2006-07-27 2008-01-30 国际商业机器公司 Method and device for generating browsing paths for data and method for browsing data
US20100161406A1 (en) * 2008-12-23 2010-06-24 Motorola, Inc. Method and Apparatus for Managing Classes and Keywords and for Retrieving Advertisements
CN103607496A (en) * 2013-11-15 2014-02-26 中国科学院深圳先进技术研究院 A method and an apparatus for deducting interests and hobbies of handset users and a handset terminal
CN103744869A (en) * 2013-12-18 2014-04-23 天脉聚源(北京)传媒科技有限公司 Method, device and browser for displaying hotspot keyword

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101114287A (en) * 2006-07-27 2008-01-30 国际商业机器公司 Method and device for generating browsing paths for data and method for browsing data
US20100161406A1 (en) * 2008-12-23 2010-06-24 Motorola, Inc. Method and Apparatus for Managing Classes and Keywords and for Retrieving Advertisements
CN103607496A (en) * 2013-11-15 2014-02-26 中国科学院深圳先进技术研究院 A method and an apparatus for deducting interests and hobbies of handset users and a handset terminal
CN103744869A (en) * 2013-12-18 2014-04-23 天脉聚源(北京)传媒科技有限公司 Method, device and browser for displaying hotspot keyword

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
彭朝晖 等: "S-CBR:基于数据库模式展现数据库关键词检索结果", 《软件学报》 *

Similar Documents

Publication Publication Date Title
CN110753064B (en) Machine learning and rule matching fused security detection system
CN105930727B (en) Reptile recognition methods based on Web
CN104050178B (en) A kind of anti-cheat method of Internet surveillance and device
CN103927307B (en) A kind of method and apparatus of identification website user
US9223968B2 (en) Determining whether virtual network user is malicious user based on degree of association
CN101414939B (en) Internet application recognition method based on dynamical depth package detection
CN103559235B (en) A kind of online social networks malicious web pages detection recognition methods
EP2530874B1 (en) Method and apparatus for detecting network attacks using a flow based technique
CN106453438B (en) Network attack identification method and device
EP1918832A2 (en) Session based web usage reporter
CN111143415B (en) Data processing method, device and computer readable storage medium
CN107483488A (en) A kind of malice Http detection methods and system
CN105281973A (en) Webpage fingerprint identification method aiming at specific website category
CN103746982B (en) A kind of http network condition code automatic generation method and its system
CN107302534A (en) A kind of DDoS network attack detecting methods and device based on big data platform
CN110708339B (en) Correlation analysis method based on WEB log
CN104348642B (en) A kind of garbage information filtering method and device
CN107578263A (en) A kind of detection method, device and the electronic equipment of advertisement abnormal access
CN109275045B (en) DFI-based mobile terminal encrypted video advertisement traffic identification method
CN106878314A (en) Network malicious act detection method based on confidence level
CN106802904A (en) Log processing method, apparatus and system
CN108289125A (en) TCP sessions recombination based on Stream Processing and statistical data extracting method
CN108055227B (en) WAF unknown attack defense method based on site self-learning
CN109361575A (en) A kind of method and its system obtaining analysis DNS data on flows
CN109981389A (en) Phone number recognition methods, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20181113