CN107016121A - Fuzzy C-Mean Algorithm based on Bayes realizes that search engine keywords optimize - Google Patents

Fuzzy C-Mean Algorithm based on Bayes realizes that search engine keywords optimize Download PDF

Info

Publication number
CN107016121A
CN107016121A CN201710268840.XA CN201710268840A CN107016121A CN 107016121 A CN107016121 A CN 107016121A CN 201710268840 A CN201710268840 A CN 201710268840A CN 107016121 A CN107016121 A CN 107016121A
Authority
CN
China
Prior art keywords
keyword
fuzzy
bayes
search engine
algorithm based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710268840.XA
Other languages
Chinese (zh)
Inventor
金平艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Yonglian Information Technology Co Ltd
Original Assignee
Sichuan Yonglian Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Yonglian Information Technology Co Ltd filed Critical Sichuan Yonglian Information Technology Co Ltd
Priority to CN201710268840.XA priority Critical patent/CN107016121A/en
Publication of CN107016121A publication Critical patent/CN107016121A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Fuzzy C-Mean Algorithm based on Bayes realizes that search engine keywords optimize, and kernel keyword, the corresponding data item of search keyword, such as national monthly volumes of searches, degree of contention and each clicking cost of estimation are determined according to business eventDeng, dimension-reduction treatment again is carried out to above-mentioned keyword set, each keyword is represented with First Five-Year Plan dimensional vector, increase homepage webpage number and total searched page number, and then the four-dimension is reduced to again by five dimensions, Fuzzy C-Mean Algorithm based on Bayes is to above-mentioned keyword clustering, further according to enterprise's concrete condition, select suitable keyword optimisation strategy, invention applies on the basis of Bayes, with reference to Fuzzy C-Mean Algorithm, so that classification results more meet empirical value, reduce isolated point influence, avoid Premature Convergence and be absorbed in locally optimal solution, run time complexity is low simultaneously, processing speed is faster, can be with fast lifting keyword ranking.

Description

Fuzzy C-Mean Algorithm based on Bayes realizes that search engine keywords optimize
Technical field
The present invention relates to Semantic Web technology field, and in particular to the Fuzzy C-Mean Algorithm based on Bayes realizes search Engine keyword optimizes.
Background technology
With internet economy develop rapidly and network deep popularization, search engine have become enterprise rollout from A kind of oneself critically important stage, especially medium-sized and small enterprises of many enterprises have selected in order that the website of oneself is in the top Cost is low, and operation is easy, meets the search engine optimization mode of user's searching preferences.At present on search engine optimization method Theoretical research goes by real example to analyze the seldom of the effect brought of search engine optimization method compared with horn of plenty.How to obtain The natural ranking of preferable search engine is obtained, increases the exposure rate and conversion ratio of website, finally realizes direct marketing, be medium-sized and small enterprises The focal issue of common concern.Search engine optimization, referred to as popular saying is by website overall architecture, web page contents, pass Link in keyword and webpage carries out related Optimization Work, improves its row in particular search engine in search result Name, so that website visiting amount is improved, the sales force of final lifting website or the technology of publicity capacity.Search engine optimization technology Including black cap technology and white cap technology, wherein black cap technology represents to violate the malice optimisation technique of principle of optimality of search engine, Show as piling up keyword in the page in keyword optimisation technique or place unrelated keyword to improve in a search engine Ranking, current each search engine has been incorporated into correlation technique and rule is punished the website using black cap technology;White cap skill Art then represents the optimisation technique of searched engine accreditation.At present both at home and abroad to keyword optimize theoretical research and technology apply than More, but temporarily do not propose an effective method to simplify key word analysis flow, also the perfect mechanism of neither one is managed Keyword optimisation strategy and progress.Based on the demand, the invention provides a kind of Fuzzy C-Mean Algorithm based on Bayes is real Existing search engine keywords optimization.
The content of the invention
The technical problem that search engine optimization is realized in keyword optimization is directed to, the invention provides a kind of based on VSM's Fuzzy c-Means Clustering Algorithm realizes that search engine keywords optimize.
In order to solve the above problems, the present invention is achieved by the following technical solutions:
Step 1:Kernel keyword is determined according to business event, related keyword is collected using search engine, these are crucial Word has corresponding data items in a search engine, such as national monthly volumes of searches, degree of contention and each clicking cost (CPC) of estimation
Step 2:With reference to enterprise product and market analysis, the above-mentioned related keyword set searched of dimensionality reduction is screened;
Step 3:For the keyword set after screening dimensionality reduction, by the corresponding page of search engine search keyword, this In record homepage webpage number and total searched page number, i.e., by five dimensional vectors, dimensionality reduction is four-dimensional to each keyword again.
Step 4:Fuzzy C-Mean Algorithm based on Bayes, clustering processing, its specific sub-step are carried out to above-mentioned keyword It is as follows:
Step 4.1:It is c classes using the k-means algorithm initializations based on ε fields.
Step 4.2:Subject Matrix J is initialized with the random number between value [0,1], the whole constraints for being subordinate to its satisfaction
Step 4.3:The probability distribution in each ε fields is initialized, c class catalogue scalar functions are builtSynthesis is subordinate to constraint Condition, builds m equation group, it is solved, you can obtain cluster result
Step 4.4:According to above formula wij、cj、pnew(ck) convergence, recalculate Ge Cu centers
Step 4.5:If pnew(ck) change, then step 4.2 is gone to, Subject Matrix J is recalculated, otherwise iteration Terminate, export cluster result.
Step 5:According to enterprise's concrete condition, comprehensive keyword efficiency optimization and the optimization of value rate, selection are suitable crucial Word optimisation strategy reaches web information flow target.
Present invention has the advantages that:
1, this algorithm can simplify key word analysis flow, and then reduce whole web information flow workload.
2, the run time complexity of this algorithm is low, and processing speed is faster.
3rd, this algorithm has bigger value.
4th, the ranking of website its keyword of fast lifting in a short time can be helped.
5th, certain flow and inquiry are brought for enterprise web site, so as to reach preferable web information flow target.
6th, this algorithm applies the classification results that Bayes principle obtains and more meets empirical value.
7th, influence of the isolated point to cluster result is reduced.
8th, Premature Convergence can be avoided with reference to Fuzzy C-Mean Algorithm, it is to avoid be absorbed in locally optimal solution.
Brief description of the drawings
Fuzzy C-Mean Algorithms of the Fig. 1 based on Bayes realizes that search engine keywords optimize structure flow chart
Applicating flow chart of Fuzzy C-Mean Algorithms of the Fig. 2 based on Bayes in clustering
Embodiment
In order to solve the technical problem that search engine optimization is realized in keyword optimization, the present invention is carried out with reference to Fig. 1-Fig. 2 Describe in detail, its specific implementation step is as follows:
Step 1:Kernel keyword is determined according to business event, related keyword is collected using search engine, these are crucial Word has corresponding data items in a search engine, such as national monthly volumes of searches, degree of contention and each clicking cost (CPC) of estimation Deng.
Step 2:With reference to enterprise product and market analysis, the above-mentioned related keyword set searched of dimensionality reduction is screened;
Step 3:For the keyword set after screening dimensionality reduction, by the corresponding page of search engine search keyword, this In record homepage webpage number and total searched page number, i.e., by five dimensional vectors, dimensionality reduction is four-dimensional to each keyword again, and it is specific to count Calculation process is as follows:
Here associative key number is m, the matrix of existing following m × 5:
Ni、Ldi、CPCi、NiS、NiYIt is followed successively by corresponding national monthly volumes of searches, degree of contention, the estimation of i-th of keyword Each clicking cost (CPC), homepage webpage number, total searched page number.
Dimensionality reduction is the four-dimension again, i.e.,
XI ∈ (1,2 ..., m)For search efficiency, ZI ∈ (1,2 ..., m)For value rate, as following formula:
Step 4:Fuzzy c-Means Clustering Algorithm based on Bayes, clustering processing is carried out to above-mentioned keyword, its specific son Step is as follows:
Step 4.1:It is c classes using the k-means algorithm initializations based on ε fields.
Step 4.2:Subject Matrix J is initialized with the random number between value [0,1], the whole constraint bar for being subordinate to its satisfaction Part;Its specific calculating process is as follows:
C classes are divided into according to ε fields initialization data object set D;
It is m × C to initialize Subject Matrix J:
wijBelong to for keyword i the degree coefficient of j classes, i.e. j ∈ (1,2 ..., C), i ∈ (1,2 ..., m).
The whole constraints being subordinate to is:
Step 4.3:The probability distribution in each ε fields is initialized, c class catalogue scalar functions are builtSynthesis is subordinate to constraint Condition, builds m equation group, it is solved, you can obtain cluster result, and its specific calculating process is as follows:
Above formula xiFor keyword, cjFor j classes.
Above formulaFor class j center vectors,For the global optimum position of set of data objects, p (ck) be k classes probability, n Total data object number, nεFor data object number in j class clusters.
It is comprehensive to be subordinate to constraints, build m equation group:
λi(i=1 ..., m) be m constraint formula Lagrangian, to above-mentioned formula carry out derivation, to all inputs Parameter derivation, you can trying to achieve makesReach the necessary condition c of maximumj、wij
wij=p (cj\xi)
Above formulaFor the vector corresponding to keyword i;
Step 4.4:According to above formula wij、cj、pnew(ck) convergence, recalculate Ge Cu centers, its specific calculating process It is as follows:
Work as pnew(ck) when converging on certain value, wijJust convergence certain value, and then cjCertain value is converged on, then be have found most Good cluster result, does not otherwise find.
Step 4.5:If pnew(ck) change, then step 4.2 is gone to, Subject Matrix J is recalculated, otherwise iteration Terminate, export cluster result.
Concrete structure flow such as Fig. 2 of Fuzzy c-Means Clustering Algorithm based on Bayes.
Step 5:According to enterprise's concrete condition, comprehensive keyword efficiency optimization and the optimization of value rate, selection are suitable crucial Word optimisation strategy reaches web information flow target.
Fuzzy C-Mean Algorithm based on Bayes realizes that search engine keywords optimize, its false code process
Input:The kernel keyword that website is extracted, c clusters are initialized as based on ε fields
Output:wij、cj、pnew(ck) convergent c cluster or catalogue scalar functionsC maximum cluster.

Claims (2)

1. the Fuzzy C-Mean Algorithm based on Bayes realizes that search engine keywords optimize, the present invention relates to Semantic Web technology Field, and in particular to the Fuzzy C-Mean Algorithm based on Bayes realizes that search engine keywords optimize,
It is characterized in that, comprise the following steps:
Step 1:Kernel keyword is determined according to business event, related keyword is collected using search engine, these keywords exist There are corresponding data items in search engine, such as national monthly volumes of searches, degree of contention and each clicking cost of estimation(CPC)Deng
Step 2:With reference to enterprise product and market analysis, the above-mentioned related keyword set searched of dimensionality reduction is screened;
Step 3:For the keyword set after screening dimensionality reduction, by the corresponding page of search engine search keyword, remember here Homepage webpage number and total searched page number are recorded, i.e., dimensionality reduction is four-dimensional to each keyword again by five dimensional vectors, and it was specifically calculated Journey is as follows:
Here associative key number is m, existing followingMatrix:
It is followed successively by corresponding national monthly volumes of searches, degree of contention, the estimation of i-th of keyword Each clicking cost(CPC), homepage webpage number, total searched page number dimensionality reduction again
For the four-dimension, i.e.,
For search efficiency,For value rate, as following formula:
Step 4:Fuzzy C-Mean Algorithm based on Bayes, carries out clustering processing, its specific sub-step is such as to above-mentioned keyword Under:
Step 4.1:Using based onThe k-means algorithm initializations in field are c classes
Step 4.2:Subject Matrix J is initialized with the random number between value [0,1], the whole constraints for being subordinate to its satisfaction
Step 4.3:Initialization is eachThe probability distribution in field, builds c class catalogue scalar functions, it is comprehensive to be subordinate to constraint bar Part, builds m equation group, it is solved, you can obtain cluster result
Step 4.4:According to above formulaConvergence, recalculate Ge Cu centers
Step 4.5:IfChange, then go to step 4.2, recalculate Subject Matrix J, otherwise iteration knot Beam, exports cluster result
Step 5:According to enterprise's concrete condition, comprehensive keyword efficiency optimization and the optimization of value rate select suitable keyword excellent Change strategy and reach web information flow target.
2. the Fuzzy C-Mean Algorithm based on Bayes according to claim 1 realizes that search engine keywords optimize, its It is characterized in that the specific calculating process in step 4 described above is as follows:
Step 4:Fuzzy c-Means Clustering Algorithm based on Bayes, clustering processing, its specific sub-step are carried out to above-mentioned keyword It is as follows:
Step 4.1:Using based onThe k-means algorithm initializations in field are c classes
Step 4.2:Subject Matrix J is initialized with the random number between value [0,1], the whole constraints for being subordinate to its satisfaction;Its Specific calculating process is as follows:
According toField initialization data object set D is divided into C classes;
Initializing Subject Matrix J is
Belong to the degree coefficient of j classes for keyword i, i.e.,
The whole constraints being subordinate to is:
Step 4.3:Initialization is eachThe probability distribution in field, builds c class catalogue scalar functions, it is comprehensive to be subordinate to constraint bar Part, builds m equation group, it is solved, you can obtain cluster result, and its specific calculating process is as follows:
Above formulaFor keyword,For j classes
Above formulaFor class j center vectors,For the global optimum position of set of data objects,For the probability of k classes, n is total Data object number,For data object number in j class clusters
It is comprehensive to be subordinate to constraints, build m equation group:
It is the Lagrangian of m constraint formula, derivation is carried out to above-mentioned formula, to all inputs Parameter derivation, you can trying to achieve makesReach the necessary condition of maximum
Above formulaFor the vector corresponding to keyword i;
Step 4.4:According to above formulaConvergence, recalculate Ge Cu centers, it was specifically calculated Journey is as follows:
WhenWhen converging on certain value,Certain value is just restrained, and thenCertain value is converged on, then be have found most Good cluster result, does not otherwise find
Step 4.5:IfChange, then go to step 4.2, recalculate Subject Matrix J, otherwise iteration knot Beam, exports cluster result
Concrete structure flow such as Fig. 2 of Fuzzy c-Means Clustering Algorithm based on Bayes.
CN201710268840.XA 2017-04-23 2017-04-23 Fuzzy C-Mean Algorithm based on Bayes realizes that search engine keywords optimize Pending CN107016121A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710268840.XA CN107016121A (en) 2017-04-23 2017-04-23 Fuzzy C-Mean Algorithm based on Bayes realizes that search engine keywords optimize

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710268840.XA CN107016121A (en) 2017-04-23 2017-04-23 Fuzzy C-Mean Algorithm based on Bayes realizes that search engine keywords optimize

Publications (1)

Publication Number Publication Date
CN107016121A true CN107016121A (en) 2017-08-04

Family

ID=59447711

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710268840.XA Pending CN107016121A (en) 2017-04-23 2017-04-23 Fuzzy C-Mean Algorithm based on Bayes realizes that search engine keywords optimize

Country Status (1)

Country Link
CN (1) CN107016121A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218435A (en) * 2013-04-15 2013-07-24 上海嘉之道企业管理咨询有限公司 Method and system for clustering Chinese text data
CN103258000A (en) * 2013-03-29 2013-08-21 北界创想(北京)软件有限公司 Method and device for clustering high-frequency keywords in webpages

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103258000A (en) * 2013-03-29 2013-08-21 北界创想(北京)软件有限公司 Method and device for clustering high-frequency keywords in webpages
CN103218435A (en) * 2013-04-15 2013-07-24 上海嘉之道企业管理咨询有限公司 Method and system for clustering Chinese text data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
林元国 等: "K-means算法在关键词优化中的应用", 《计算机系统应用》 *
邓健爽 等: "基于搜索引擎的关键词自动聚类法", 《计算机科学》 *

Similar Documents

Publication Publication Date Title
CN110674407B (en) Hybrid recommendation method based on graph convolution neural network
Jadhav et al. Comparative study of K-NN, naive Bayes and decision tree classification techniques
WO2021109464A1 (en) Personalized teaching resource recommendation method for large-scale users
CN112613602A (en) Recommendation method and system based on knowledge-aware hypergraph neural network
CN106021457A (en) Keyword-based RDF distributed semantic search method
CN106933954A (en) Search engine optimization technology is realized based on Decision Tree Algorithm
Yu et al. Graph neural network based model for multi-behavior session-based recommendation
CN106649616A (en) Clustering algorithm achieving search engine keyword optimization
CN106933953A (en) A kind of fuzzy K mean cluster algorithm realizes search engine optimization technology
CN111080551A (en) Multi-label image completion method based on depth convolution characteristics and semantic neighbor
CN106909626A (en) Improved Decision Tree Algorithm realizes search engine optimization technology
Mehrotra et al. Comparative analysis of K-Means with other clustering algorithms to improve search result
Lu et al. Grouped multi-attention network for hyperspectral image spectral-spatial classification
Nezamabadi-pour et al. Concept learning by fuzzy k-NN classification and relevance feedback for efficient image retrieval
Wang et al. Weakly supervised object detection based on active learning
Zhu et al. Attribute-image person re-identification via modal-consistent metric learning
Xia et al. Clothing classification using transfer learning with squeeze and excitation block
CN107622071A (en) By indirect correlation feedback without clothes image searching system and the method looked under source
Dey et al. A quantum inspired differential evolution algorithm for automatic clustering of real life datasets
CN106897356A (en) Improved Fuzzy C mean algorithm realizes that search engine keywords optimize
CN106874376A (en) A kind of method of verification search engine keyword optimisation technique
CN106933950A (en) New Model tying algorithm realizes search engine optimization technology
CN106874377A (en) The improved clustering algorithm based on constraints realizes that search engine keywords optimize
CN106802945A (en) Fuzzy c-Means Clustering Algorithm based on VSM realizes that search engine keywords optimize
CN106897376A (en) Fuzzy C-Mean Algorithm based on ant colony realizes that keyword optimizes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170804

WD01 Invention patent application deemed withdrawn after publication