CN103399918B - A kind of method improving the searched rate in website - Google Patents

A kind of method improving the searched rate in website Download PDF

Info

Publication number
CN103399918B
CN103399918B CN201310330651.2A CN201310330651A CN103399918B CN 103399918 B CN103399918 B CN 103399918B CN 201310330651 A CN201310330651 A CN 201310330651A CN 103399918 B CN103399918 B CN 103399918B
Authority
CN
China
Prior art keywords
website
size
page
score
ratio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310330651.2A
Other languages
Chinese (zh)
Other versions
CN103399918A (en
Inventor
王冬琦
魏小淞
黄新宇
王静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN201310330651.2A priority Critical patent/CN103399918B/en
Publication of CN103399918A publication Critical patent/CN103399918A/en
Application granted granted Critical
Publication of CN103399918B publication Critical patent/CN103399918B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

本发明一种提高网站被搜索率的方法,属于互联网与搜索引擎领域,本发明通过收集影响搜索引擎的因素和模式,利用线性迭代算法进行迭代获取权值,从而对网站排名得出一个可以量化的分数,并能够针对各个影响因素与权值的乘积的取值提出对网站优化的合理建议,降低了成本,提高网站推广的效率。

The invention discloses a method for improving the search rate of websites, which belongs to the field of the Internet and search engines. The invention collects factors and patterns that affect search engines, and uses a linear iterative algorithm to iteratively obtain weights, thereby obtaining a quantifiable ranking for websites. , and can put forward reasonable suggestions for website optimization according to the value of the product of each influencing factor and weight value, which reduces the cost and improves the efficiency of website promotion.

Description

一种提高网站被搜索率的方法A method to increase the search rate of the website

技术领域 technical field

本发明属于互联网与搜索引擎领域,具体涉及一种提高网站被搜索率的方法。 The invention belongs to the field of the Internet and search engines, and in particular relates to a method for increasing the search rate of a website.

背景技术 Background technique

随着互联网经济的不断发展,“以搜索引擎为平台,通过调整网页在搜索结果页上排名从而带来访问量”已经成为一种营销方式,即所谓的搜索引擎营销(SEM,Search Engine Marketing)。SEM主要包括付费的商业推广优化和搜索引擎优化(SEO,Search Engine Optimization)两种模式。付费的SEM一般指竞价排名机制,而SEO是指为了提升网页在搜索引擎自然搜索结果中(非商业性推广结果)的收录数量以及排序位置而做的优化行为。这一行为能为网站带来用户体验和转化率的提升,同时给网站提供生态式的自我营销解决方案,让网站在行业内占据领先地位,从而获得品牌收益,特别是在企业网站缺乏足够资金进行宣传推广的情况下,SEO无疑是一种低成本、见效快、效果持久的最佳网站推广方法,因此受到越来越多企业的注重。而且随着百度关于竞价排名的审核日趋严厉,很多竞价排名都进行了调整,这吸引了更多网站经营者开始寻求通过网站优化来提升在搜索引擎中的排名。 With the continuous development of the Internet economy, "using search engines as a platform to bring traffic by adjusting the ranking of web pages on search results pages" has become a marketing method, the so-called search engine marketing (SEM, Search Engine Marketing) . SEM mainly includes two modes: paid commercial promotion optimization and search engine optimization (SEO, Search Engine Optimization). Paid SEM generally refers to the bidding ranking mechanism, while SEO refers to the optimization behavior to increase the number and ranking position of web pages in the natural search results of search engines (non-commercial promotion results). This behavior can improve the user experience and conversion rate for the website, and at the same time provide the website with an ecological self-marketing solution, allowing the website to occupy a leading position in the industry, thereby gaining brand benefits, especially when the corporate website lacks sufficient funds In the case of publicity and promotion, SEO is undoubtedly the best website promotion method with low cost, quick effect and long-lasting effect, so more and more companies pay attention to it. Moreover, as Baidu's review of PPCs has become increasingly stringent, many PPCs have been adjusted, which has attracted more website operators to seek to improve their rankings in search engines through website optimization.

但是,在SEO优化的实际过程中网站管理者往往仅凭经验来实施优化,不仅消耗管理者精力而且速度慢,成本高,降低了企业自身的创造力。因此,有必要开发一个自动分析网站SEO质量的工具,辅助网站管理者做好SEO优化。 However, in the actual process of SEO optimization, website managers often implement optimization only based on experience, which not only consumes the energy of the managers, but also is slow and expensive, and reduces the creativity of the enterprise itself. Therefore, it is necessary to develop a tool that automatically analyzes the SEO quality of websites to assist website managers in SEO optimization.

发明内容 Contents of the invention

针对现有技术的不足,本发明提出一种提高网站被搜索率的方法,以达到使网站排名量化,降低成本,明确解决方案,提高网站推广的效率的目的。 Aiming at the deficiencies of the prior art, the present invention proposes a method for increasing the search rate of a website, so as to quantify the ranking of the website, reduce the cost, clarify the solution, and improve the efficiency of website promotion.

一种提高网站被搜索率的方法,包括以下步骤: A method for increasing the search rate of a website, comprising the following steps:

步骤1、确定影响网站搜索引擎质量的因素,包括被测网站的页面文件大小、页面中脚本代码行数与总代码行数的比值、被测网站的名称与网站页面中关键字的匹配程度、网站中动态图片文件的大小与网站页面文件大小的比值、网站中动画文件的大小与网站页面大小的比值,并通过迭代的方式分别确定上述五个因素每项得分的阈值; Step 1. Determine the factors that affect the quality of the website search engine, including the page file size of the website under test, the ratio of the number of lines of script code in the page to the total number of lines of code, the degree of matching between the name of the website under test and keywords in the website page, The ratio of the size of the dynamic picture file in the website to the size of the website page file, the ratio of the size of the animation file in the website to the size of the website page, and iteratively determine the threshold value of each of the above five factors;

步骤1-1、用户根据需求设定第一个已知网站的页面文件大小、页面中脚本代码行数与总代码行数的比值、被测网站的名称与网站页面中关键字的匹配程度、网站中动态图片文件的大小与网站页面文件大小的比值、网站中动画文件的大小与网站页面大小的比值这五个因素的初始权值,保证五个因素的初始权值之和为1; Step 1-1. The user sets the page file size of the first known website, the ratio of the number of script code lines in the page to the total number of code lines, the matching degree of the name of the website under test and the keyword in the website page, The initial weights of the five factors, the ratio of the size of the dynamic picture file in the website to the size of the website page file, and the ratio of the size of the animation file in the website to the size of the website page, ensure that the sum of the initial weights of the five factors is 1;

步骤1-2、根据该网站五个因素的数值及其各项初始权值,计算第一个已知网站初始化得分; Step 1-2, calculate the first known website initialization score according to the values of the five factors of the website and their respective initial weights;

步骤1-3、根据第二个已知网站五个因素的数值及初始权值,计算该网站初始化得分; Steps 1-3, calculating the website initialization score according to the values and initial weights of the five factors of the second known website;

步骤1-4、通过计算第一个已知网站中每个因素评分与总分的比值,分别求得五个因素的五个新的权值; Steps 1-4, by calculating the ratio of each factor score to the total score in the first known website, five new weights of the five factors are obtained respectively;

步骤1-5、根据五个新的权值重新计算第二个已知网站的得分,并将上述重新获得的得分与第二个网站的初始得分做差; Steps 1-5, recalculating the score of the second known website according to the five new weights, and making a difference between the score obtained above and the initial score of the second website;

步骤1-6、判断步骤1-5中得到的差值是否近似得0,若是,则完成权值最终的确定,并根据第一个已知网站或第二个已知网站的五个因素,进而获得网站中五个因素每项得分的阈值,迭代过程结束;否则,将得到的权值代入第一个已知网站,重复步骤1-4和步骤1-5; Steps 1-6, judging whether the difference obtained in steps 1-5 is approximately 0, if so, then complete the final determination of the weight, and according to the five factors of the first known website or the second known website, Then obtain the threshold value of each score of the five factors in the website, and the iterative process ends; otherwise, substitute the obtained weight into the first known website, and repeat steps 1-4 and 1-5;

步骤2、根据待测网站的网址采集该网站的页面文件大小、页面中脚本代码行数、总代码行数、被测网站的名称、网站页面中关键字、网站中动态图片文件大小和网站中动画文件大小,并根据上述七个参数确定被测网站的页面文件大小、页面中脚本代码行数与总代码行数的比值、被测网站的名称与网站页面中关键字的匹配程度、网站中动态图片文件的大小与网站页面文件大小的比值、网站中动画文件的大小与网站页面大小的比值五个因素; Step 2. Collect the page file size of the website according to the URL of the website to be tested, the number of script code lines in the page, the total number of code lines, the name of the website to be tested, keywords in the website page, the size of dynamic picture files in the website, and the number of lines in the website. Animation file size, and determine the page file size of the website under test according to the above seven parameters, the ratio of the number of script code lines in the page to the total number of code lines, the matching degree between the name of the website under test and the keywords in the website page, and the Five factors are the ratio of the size of the dynamic picture file to the size of the website page file, and the ratio of the size of the animation file in the website to the size of the website page;

步骤3、将计算获得的五个因素得分分别与各自的阈值相比较,实现对网站网页的调整; Step 3. Comparing the calculated scores of the five factors with their respective thresholds to realize the adjustment of the web pages of the website;

若被测网站的页面文件大小得分大于阈值,则减小页面文件大小;否则显示结果; If the page file size score of the website under test is greater than the threshold, then reduce the page file size; otherwise, display the result;

若被测网站的页面中脚本代码行数与总代码行数的比值得分大于阈值,则减少页面中脚本代码量;否则显示结果; If the ratio score of the number of script code lines in the page of the website under test to the total number of code lines is greater than the threshold, then reduce the amount of script code in the page; otherwise, display the result;

若被测网站的名称与网站页面中关键字的匹配程度得分大于阈值,则优化网站关键字;否则显示结果; If the matching degree score between the name of the website under test and the keyword in the website page is greater than the threshold, then optimize the keyword of the website; otherwise, display the result;

若被测网站中的动态图片文件的大小与网站页面文件大小的比值得分大于阈值,则减少动态图片文件数量;否则显示结果; If the ratio score between the size of the dynamic picture file in the tested website and the website page file size is greater than the threshold, then reduce the number of dynamic picture files; otherwise display the result;

若被测网站中的动画文件的大小与网站页面大小的比值得分大于阈值,则减少动画文件数量;否则显示结果。 If the ratio score of the size of the animation file in the website under test to the size of the website page is greater than the threshold, then reduce the number of animation files; otherwise, display the result.

步骤1-2中所述的根据该网站五个因素的数值及其各项初始权值,计算网站初始化得分,公式如下: According to the values of the five factors of the website and their initial weights described in steps 1-2, calculate the website initialization score, the formula is as follows:

aa == ΣΣ ii == 11 55 (( pp 11 ii ×× ww ii )) -- -- -- (( 11 ))

其中,a代表根据初始权值检测出来的评分;p1i代表第一个已知网站的第i个影响因素的计算结果;wi代表该影响因素所对应的权值。 Among them, a represents the score detected according to the initial weight; p 1i represents the calculation result of the i-th influencing factor of the first known website; w i represents the weight corresponding to the influencing factor.

步骤1-3所述的根据第二个已知网站五个因素的数值及初始权值,计算该网站初始化得 分,公式如下: According to the values and initial weights of the five factors of the second known website described in steps 1-3, calculate the initialization score of the website, the formula is as follows:

bb == ΣΣ ii == 11 55 (( pp 22 ii ×× ww ii )) -- -- -- (( 22 ))

其中,b代表第二个已知网站检测得分,p2i代表第二个已知网站的第i个影响因素的计算结果,wi代表该影响因素所对应的权值。 Wherein, b represents the detection score of the second known website, p 2i represents the calculation result of the i-th influencing factor of the second known website, and w i represents the weight corresponding to the influencing factor.

步骤1-4所述的通过求第一个已知网站中每个因素评分与总分的比值,分别求得五个因素的五个新的权值,公式如下: The five new weights of the five factors are respectively obtained by calculating the ratio of each factor score to the total score in the first known website as described in steps 1-4. The formula is as follows:

ww ii ′′ == pp 11 ii ×× ww ii ΣΣ ii == 11 55 (( pp 11 ii ×× ww ii )) -- -- -- (( 33 ))

其中,w′i代表调整后的第i个影响因素p1i所对应的权值,wi代表该影响因素所对应的权值。 Among them, w′ i represents the weight corresponding to the i-th influencing factor p 1i after adjustment, and w i represents the weight corresponding to the influencing factor.

本发明优点: Advantages of the present invention:

本发明一种提高网站被搜索率的方法,通过收集影响搜索引擎的因素和模式,利用线性迭代算法进行迭代获取权值,从而对网站排名得出一个可以量化的分数,并能够针对各个影响因素与权值的乘积的取值提出对网站优化的合理建议,降低了成本,提高网站推广的效率。 The present invention is a method for improving the search rate of websites. By collecting the factors and patterns that affect the search engine, the linear iterative algorithm is used to iteratively obtain weights, so that a quantifiable score can be obtained for the website ranking, and it can target each influencing factor. The value of the product of the weight and the value puts forward reasonable suggestions for website optimization, which reduces the cost and improves the efficiency of website promotion.

附图说明 Description of drawings

图1为本发明一种实施例的提高网站被搜索率的方法流程图; Fig. 1 is the flow chart of the method for improving website search rate of an embodiment of the present invention;

图2为本发明一种实施例的第一个网站各个影响因素与初始权值乘积坐标图; Fig. 2 is the product coordinate diagram of each influencing factor and the initial weight value of the first website of an embodiment of the present invention;

图3为本发明一种实施例的第二个网站各个影响因素与一次迭代后权值乘积坐标图; Fig. 3 is a coordinate diagram of each influencing factor of the second website and the weight product after one iteration in an embodiment of the present invention;

图4为本发明一种实施例的评分阈值坐标图。 Fig. 4 is a scoring threshold coordinate diagram of an embodiment of the present invention.

具体实施方式 detailed description

下面结合附图对本发明一种实施例做进一步说明。 An embodiment of the present invention will be further described below in conjunction with the accompanying drawings.

本发明实施例的开发环境硬件配置为Windows操作系统:Microsoft Windows 7;CPU:Intel Centrino2内存:2GB;硬盘:320GB。 The development environment hardware configuration of the embodiment of the present invention is Windows operating system: Microsoft Windows 7; CPU: Intel Centrino2 memory: 2GB; hard disk: 320GB.

一种提高网站被搜索率的方法,方法流程图如图1所示,包括以下步骤: A method for increasing the search rate of a website, the flow chart of the method is shown in Figure 1, comprising the following steps:

步骤1、确定影响网站搜索引擎质量的因素,包括被测网站的页面文件大小、页面中脚本代码行数与总代码行数的比值、被测网站的名称与网站页面中关键字的匹配程度、网站中动态图片文件的大小与网站页面文件大小的比值、网站中动画文件的大小与网站页面大小的比值,并通过迭代的方式分别确定上述五个因素每项得分的阈值; Step 1. Determine the factors that affect the quality of the website search engine, including the page file size of the website under test, the ratio of the number of lines of script code in the page to the total number of lines of code, the degree of matching between the name of the website under test and keywords in the website page, The ratio of the size of the dynamic picture file in the website to the size of the website page file, the ratio of the size of the animation file in the website to the size of the website page, and iteratively determine the threshold value of each of the above five factors;

本发明实施例中采用迭代算法的方法,通过检测搜索引擎中同一关键字排名的网站获取数据作图(如图2~图4所示),横坐标为各个影响因素,纵坐标为该影响因素对应的计算数值。然后,通过获取每一个影响因素的最大值与最小值之差作为该影响因素所占权值的初值。把数据集中的结果输入BP迭代算法中进行训练,完成后获得的BP迭代算法的权值与初始权值相加,取二者的平均值,最后进行平衡权值的处理,以避免由于偶然误差造成的明显错误。平衡权值过程的主要步骤如下: Adopt the method of iterative algorithm in the embodiment of the present invention, obtain data drawing (as shown in Fig. 2~Fig. 4) by detecting the website of same keyword ranking in the search engine, abscissa is each influence factor, and ordinate is this influence factor The corresponding calculation value. Then, by obtaining the difference between the maximum value and the minimum value of each influencing factor as the initial value of the weight of the influencing factor. Input the results in the data set into the BP iterative algorithm for training. After the completion, the weight of the BP iterative algorithm is added to the initial weight, the average value of the two is taken, and finally the balance weight is processed to avoid accidental errors. obvious error. The main steps in the weight balancing process are as follows:

步骤1-1、确定第一个已知网站,本实施例中选取东北大学的网站,页面文件大小p1=25/50=0.48、页面中脚本代码行数与总代码行数的比值p2=12/250=0.048、被测网站的名称与网站页面中关键字的匹配程度p3=0.8、网站中动态图片文件的大小与网站页面文件大小的比值p4=0、网站中动画文件的大小与网站页面大小的比值p5=0五个因素的初始权值wi(i=1,2,...,5)=0.2(如表1所示),满足五个因素的初始权值wi(i=1,2,...,5)之和为1; Step 1-1. Determine the first known website. In this embodiment, the website of Northeastern University is selected. The page file size p 1 =25/50=0.48, the ratio p 2 of the number of script code lines in the page to the total number of code lines =12/250=0.048, the degree of matching between the name of the website to be tested and the keyword in the website page p 3 =0.8, the ratio of the size of the dynamic picture file in the website to the file size of the website page p 4 =0, the animation file in the website The ratio of the size to the website page size p 5 =0 The initial weight of the five factors w i (i=1, 2,...,5) = 0.2 (as shown in Table 1), satisfying the initial weight of the five factors The sum of the values w i (i=1, 2, ..., 5) is 1;

表1 Table 1

其中,初始化权值是系统定义的,不具有任何实际意义,仅仅是迭代算法中的权值初始值。 Among them, the initialization weight is defined by the system and has no practical significance, it is only the initial value of the weight in the iterative algorithm.

步骤1-2、根据该网站五个因素的数值及其各项初始权值,计算网站初始化得分; Step 1-2, calculate the website initialization score according to the values of the five factors of the website and their initial weights;

公式如下: The formula is as follows:

aa == ΣΣ ii == 11 55 (( pp 11 ii ×× ww ii )) == 0.26560.2656 -- -- -- (( 11 ))

其中,a代表根据初始权值检测出来的评分;p1i代表第一个已知网站的第i个影响因素的计算结果;wi代表该影响因素所对应的权值。 Among them, a represents the score detected according to the initial weight; p 1i represents the calculation result of the i-th influencing factor of the first known website; w i represents the weight corresponding to the influencing factor.

通过公式(1)可以计算出网站排名的得分。该公式为线性迭代算法模型,通过运用以上的迭代公式,通过初始值的简单变化,即可产生取值非常丰富、差别大的权值与阀值的向量 机。因此选定初值wi,然后通过不断的迭代运算,就可以逐步精确每个影响因素的权重,本发明利用了模式识别的方式,把未知的权重通过迭代的方法逐步精确化。 The website ranking score can be calculated by formula (1). This formula is a linear iterative algorithm model. By using the above iterative formula and simply changing the initial value, a vector machine with very rich and different weights and thresholds can be generated. Therefore, the initial value w i is selected, and then through continuous iterative calculations, the weight of each influencing factor can be gradually refined. The present invention uses pattern recognition to gradually refine unknown weights through iterative methods.

步骤1-3、根据第二个已知网站,本发明实施例中采用东北大学软件学院网站,其五个因素的数值pj(j=1,2,...,5),p1=1.02、p2=0.1875、p3=0.7、p4=0.3、p5=25.6;及步骤1-1中的初始权值wi(i=1,2,...,5)=0.2,计算该网站初始化得分;步骤1-1所述的第一个已知网站、步骤1-3所述的第二个已知网站为检索率高,检索排名靠前的网站。 Step 1-3. According to the second known website, the website of the School of Software of Northeastern University is used in the embodiment of the present invention, and the values of the five factors p j (j=1, 2, ..., 5), p 1 = 1.02, p 2 =0.1875, p 3 =0.7, p 4 =0.3, p 5 =25.6; and the initial weight w i (i=1,2,...,5)=0.2 in step 1-1, Calculate the initialization score of the website; the first known website described in step 1-1 and the second known website described in step 1-3 are websites with a high retrieval rate and a top ranking in retrieval.

公式如下: The formula is as follows:

bb == ΣΣ ii == 11 55 (( pp 22 ii ×× ww ii )) == 5.56155.5615 -- -- -- (( 22 ))

其中,b代表第二个已知网站检测得分,p2i代表第二个已知网站的第i个影响因素的计算结果,wi代表该影响因素所对应的权值。 Wherein, b represents the detection score of the second known website, p 2i represents the calculation result of the i-th influencing factor of the second known website, and w i represents the weight corresponding to the influencing factor.

步骤1-4、通过求第一个已知网站中每个因素评分与总分的比值,分别求得五个因素的五个新的权值;如果某一个影响因素的权值调整后为0,则用总体减去不为0的权值后平均分配; Steps 1-4, by calculating the ratio of each factor score to the total score in the first known website, respectively obtain five new weights of five factors; if the weight of a certain influencing factor is adjusted to 0 , then use the whole to subtract the weights that are not 0 and distribute them evenly;

公式如下: The formula is as follows:

ww ii ′′ == pp 11 ii ×× ww ii ΣΣ ii == 11 55 (( pp 11 ii ×× ww ii )) -- -- -- (( 33 ))

经计算,结果如下: After calculation, the result is as follows:

w1’=0.361 w 1 '=0.361

w2’=0.036 w 2 '=0.036

w3’=0.602 w 3 '=0.602

w4’=0.0005 w 4 '=0.0005

w5’=0.0005 w 5 '=0.0005

其中,w′i代表调整后的第i个影响因素pi所对应的权值。 Among them, w' i represents the adjusted weight corresponding to the i-th influencing factor p i .

步骤1-5、根据五个新的权值重新计算第二个已知网站的得分,并将上述重新获得的得分与第二个网站的初始得分做差,然后去绝对值; Steps 1-5, recalculate the score of the second known website according to the five new weights, and make a difference between the score obtained above and the initial score of the second website, and then remove the absolute value;

公式如下: The formula is as follows:

sthe s == || ΣΣ ii == 11 55 (( pp 22 ii ×× ww ii ′′ -- bb )) || == 4.752184.75218 -- -- -- (( 44 ))

其中,s代表修改权值后的第二个被测网站的评分结果与未修改前的评分结果的差值。 Wherein, s represents the difference between the scoring result of the second website under test after the weight is modified and the scoring result before modification.

步骤1-6、判断步骤1-5中得到的差值是否近似得0,若是,则完成权值最终的确定,并根据第一个已知网站或第二个已知网站的五个因素,进而获得网站中五个因素每项得分的阈值,迭代过程结束;否则,将得到的权值代入第一个已知网站,重复步骤1-4和步骤1-5; Steps 1-6, judging whether the difference obtained in steps 1-5 is approximately 0, if so, then complete the final determination of the weight, and according to the five factors of the first known website or the second known website, Then obtain the threshold value of each score of the five factors in the website, and the iterative process ends; otherwise, substitute the obtained weight into the first known website, and repeat steps 1-4 and 1-5;

根据计算目标网站的各个影响因素得分取值情况,判断网站中存在的优化方向,提供合理的建议。 According to the calculation of the score values of each influencing factor of the target website, the optimization direction existing in the website is judged, and reasonable suggestions are provided.

Si=p2i×w″i (5) S i =p 2i ×w″ i (5)

其中,Si为第二个已知网站的第i个因素的得分阈值,w″i为第i个因素的最终权值; Wherein, S i is the scoring threshold of the i-th factor of the second known website, and w″ i is the final weight of the i-th factor;

w″1=0.104 w″ 1 =0.104

w″2=0.39 w″ 2 =0.39

w″3=0.286 w″ 3 =0.286

w″4=0.109 w″ 4 =0.109

w″5=0.201 w″ 5 =0.201

选取第二个网站为标准网站,经计算 Select the second website as the standard website, after calculation

S1=0.10608 S 1 =0.10608

S2=0.73125 S 2 =0.73125

S3=0.2022 S 3 =0.2022

S4=0.0327 S 4 =0.0327

S5=5.1456 S 5 =5.1456

步骤2、根据待测网站的网址(本发明实施例中采用中国知网的首页http://epub.cnki.net/kns/default.htm)采集该网站的页面文件大小、页面中脚本代码行数、总代码行数、被测网站的名称、网站页面中关键字、网站中动态图片文件大小和网站中动画文件大小,并根据上述七个参数确定被测网站的页面文件大小、页面中脚本代码行数与总代码行数的比值、被测网站的名称与网站页面中关键字的匹配程度、网站中动态图片文件的大小与网站页面文件大小的比值、网站中动画文件的大小与网站页面大小的比值五个因素; Step 2, gather the page file size of this website, the script code line in the page according to the website address of the website to be tested (the home page http://epub.cnki.net/kns/default.htm of China National Knowledge Network is adopted in the embodiment of the present invention) number, total code lines, the name of the website under test, keywords in the website page, the size of the dynamic picture file in the website and the size of the animation file in the website, and determine the page file size of the website under test, the script in the page according to the above seven parameters The ratio of the number of lines of code to the total number of lines of code, the matching degree between the name of the website under test and the keywords in the website page, the ratio of the size of the dynamic picture file in the website to the file size of the website page, the size of the animation file in the website and the size of the website page The ratio of size to five factors;

Ci=(p′i·w″i) (6) C i =(p′ i ·w″ i ) (6)

其中,Ci代表目标网站第i个影响因素的得分;w″i代表确定之后的权值;p′i为被检测网站的第i个因素; Among them, C i represents the score of the i-th influencing factor of the target website; w″ i represents the weight after determination; p′ i is the i-th factor of the detected website;

C1=0.1248 C 1 =0.1248

C2=0 C 2 =0

C3=0 C 3 =0

C4=0 C 4 =0

C5=0 C 5 =0

步骤3、将计算获得的五个因素得分分别与各自的阈值相比较,实现对网站网页的调整; Step 3. Comparing the calculated scores of the five factors with their respective thresholds to realize the adjustment of the web pages of the website;

若被测网站的页面文件大小得分大于阈值,则减小页面文件大小到50K;否则显示结果; If the page file size score of the website under test is greater than the threshold, reduce the page file size to 50K; otherwise, display the result;

页面过大,会导致页面下载速度慢。同时,部分搜索引擎只抓取页面的部分内容,从而无法得到预期的排名效果。 If the page size is too large, the page download speed will be slow. At the same time, some search engines only crawl part of the content of the page, so that the expected ranking effect cannot be obtained.

若被测网站的页面中脚本代码行数与总代码行数的比值得分大于阈值,则减少页面中脚本代码量;否则显示结果; If the ratio score of the number of script code lines in the page of the website under test to the total number of code lines is greater than the threshold, then reduce the amount of script code in the page; otherwise, display the result;

过多的脚本会干扰搜索引擎的爬虫对网页内容进行分析,无形中降低了关键字密度,影响网页权重的分布。 Excessive scripts will interfere with search engine crawlers analyzing webpage content, virtually reducing keyword density and affecting the distribution of webpage weights.

若被测网站的名称与网站页面中关键字的匹配程度得分大于阈值,则优化网站关键字;否则显示结果; If the matching degree score between the name of the website under test and the keyword in the website page is greater than the threshold, then optimize the keyword of the website; otherwise, display the result;

如果网站的名称和页面关键字不符,将会降低改网站在该关键字搜索的相关度。 If the name of the website does not match the keyword on the page, it will reduce the relevance of the website in this keyword search.

若被测网站中的动态图片文件的大小与网站页面文件大小的比值得分大于阈值,则减少动态图片文件数量;否则显示结果; If the ratio score between the size of the dynamic picture file in the tested website and the website page file size is greater than the threshold, then reduce the number of dynamic picture files; otherwise display the result;

搜索引擎不能很好地解析动态图片中的内容,造成了搜索引擎对网站内容获取的缺失。 Search engines cannot parse the content in dynamic pictures very well, resulting in the lack of search engines' acquisition of website content.

若被测网站中的动画文件的大小与网站页面大小的比值得分大于阈值,则减少动画文件数量;否则显示结果。 If the ratio score of the size of the animation file in the website under test to the size of the website page is greater than the threshold, then reduce the number of animation files; otherwise, display the result.

由于搜索引擎对Flash非常不友好,不能找出其中隐藏的链接,这个是绝对要杜绝的。虽然Google现在开始搜录Flash里的内容,但是对于主导航这样的重要链接,是绝对不能用Flash来制作,而且也不直观,下载速度也慢,无论对于搜索引擎还是用户,都是非常不友好的。 Since search engines are very unfriendly to Flash and cannot find hidden links in it, this must be absolutely eliminated. Although Google has started to search the content in Flash now, for important links such as the main navigation, it is absolutely not possible to use Flash to make, and it is not intuitive, and the download speed is also slow. It is very unfriendly to both search engines and users. of.

对各项影响因素得分提供解决方案,反馈给用户,用户通过上述建议修改自己的网站。 Provide solutions to the scores of various influencing factors and give feedback to users, who modify their websites through the above suggestions.

Claims (4)

1. the method improving the searched rate in website, it is characterised in that: comprise the following steps:
Step 1, determine the factor affecting site search engine masses, including foot in the pagefile size of tested website, the page This lines of code and the ratio of total code line number, tested website title with in the matching degree of keyword, website in Website page The size of motion picture files and the size of animation file in the ratio of Website page file size, website and Website page size Ratio, and by the way of iteration, determine the threshold value of above-mentioned five each scores of factor respectively;
Step 1-1, user set according to demand in the pagefile size of first known website, the page scripted code line number with The ratio of total code line number, title and the motion picture files in the matching degree of keyword, website in Website page of tested website Size and the ratio of Website page file size, website in the size of animation file and the ratio these five of Website page size because of The initial weight of element, it is ensured that the initial weight sum of five factors is 1;
Step 1-2, according to the numerical value of these five factors in website and every initial weight thereof, calculate first known website and initialize Score;
Step 1-3, according to the numerical value of five factors in second known website and initial weight, calculate this website and initialize score;
Step 1-4, by calculating each factor scores and the ratio of total score in first known website, try to achieve five factors respectively Five new weights;
Step 1-5, recalculate the score of second known website according to five new weights, and by above-mentioned regain The initial score with second website is divided to do difference;
Step 1-6, judge the difference obtained in step 1-5 whether approximate 0, the most then complete the determination that weights are final, And according to first known website or five factors of second known website, and then obtain five each scores of factor in website Threshold value, iterative process terminates;Otherwise, the weights obtained are substituted into first known website, repeats step 1-4 and step 1- 5;
Step 2, network address according to website to be measured gather scripted code line number, total generation in the pagefile size of this website, the page Animation file in motion picture files size and website in keyword, website in code line number, the title of tested website, Website page Size, and according to scripted code line number in the pagefile size of above-mentioned seven tested websites of parameter determination, the page and total code row The ratio of number, tested website title and Website page in the matching degree of keyword, website the size of motion picture files with The size of animation file and five factors of ratio of Website page size in the ratio of Website page file size, website;
Step 3, will calculate obtain five factor scores respectively compared with respective threshold value, it is achieved the tune to website and webpage Whole;
If the pagefile size score of tested website is more than threshold value, then reduce pagefile size;Otherwise show result;
If scripted code line number and the ratio score of total code line number are more than threshold value in the page of tested website, then reduce foot in the page This size of code;Otherwise show result;
If the title of tested website and the matching degree score of keyword in Website page are more than threshold value, then optimize website keyword; Otherwise show result;
If the size of the motion picture files in tested website is more than threshold value with the ratio score of Website page file size, then reduce Motion picture files quantity;Otherwise show result;
If the size of the animation file in tested website is more than threshold value with the ratio score of Website page size, then reduce animation file Quantity;Otherwise show result.
The method of the searched rate in raising website the most according to claim 1, it is characterised in that: described in step 1-2 according to this net The numerical value of five factors of standing and every initial weight thereof, calculate website and initialize score, and formula is as follows:
a = Σ i = 1 5 ( p 1 i × w i ) - - - ( 1 )
Wherein, a represents the scoring detected according to initial weight;p1iRepresent the i-th influence factor of first known website Result of calculation;wiRepresent the weights corresponding to this influence factor.
The method of the searched rate in raising website the most according to claim 1, it is characterised in that: described in step 1-3 according to second The numerical value of five factors in known website and initial weight, calculate this website and initialize score, and formula is as follows:
b = Σ i = 1 5 ( p 2 i × w i ) - - - ( 2 )
Wherein, b represents second known website detection score, p2iRepresent the meter of the i-th influence factor of second known website Calculate result, wiRepresent the weights corresponding to this influence factor.
The method of the searched rate in raising website the most according to claim 1, it is characterised in that: described in step 1-4 by asking first In individual known website, each factor scores and the ratio of total score, try to achieve five new weights of five factors respectively, and formula is as follows:
w i ′ = p 1 i × w i Σ i = 1 5 ( p 1 i × w i ) - - - ( 3 )
Wherein, w 'iRepresent the i-th influence factor p after adjusting1iCorresponding weights, wiRepresent the power corresponding to this influence factor Value.
CN201310330651.2A 2013-07-31 2013-07-31 A kind of method improving the searched rate in website Expired - Fee Related CN103399918B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310330651.2A CN103399918B (en) 2013-07-31 2013-07-31 A kind of method improving the searched rate in website

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310330651.2A CN103399918B (en) 2013-07-31 2013-07-31 A kind of method improving the searched rate in website

Publications (2)

Publication Number Publication Date
CN103399918A CN103399918A (en) 2013-11-20
CN103399918B true CN103399918B (en) 2016-08-17

Family

ID=49563546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310330651.2A Expired - Fee Related CN103399918B (en) 2013-07-31 2013-07-31 A kind of method improving the searched rate in website

Country Status (1)

Country Link
CN (1) CN103399918B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156033B (en) * 2015-03-25 2019-09-03 阿里巴巴集团控股有限公司 A kind of search engine optimization SEO page generation method and equipment
CN107229631B (en) * 2016-03-24 2020-11-03 北京京东尚科信息技术有限公司 Method and device for capturing website data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968447A (en) * 2012-10-24 2013-03-13 西安工程大学 SEO (search engine optimization) keyword competition level computing method based on decision tree algorithm

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968447A (en) * 2012-10-24 2013-03-13 西安工程大学 SEO (search engine optimization) keyword competition level computing method based on decision tree algorithm

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于Google排名影响因素的SEO策略分析;张欢,夏圆,齐向楠;《现代情报》;20111115;第31卷(第11期);全文 *
搜索引擎中网站排名的影响因素;王煜;《中国科技信息》;20070201(第3期);全文 *
搜索引擎技术的研究与应用;范君剑;《中国优秀硕士学位论文全文数据库信息科技辑 》;20130715(第7期);全文 *

Also Published As

Publication number Publication date
CN103399918A (en) 2013-11-20

Similar Documents

Publication Publication Date Title
CN105005589B (en) A kind of method and apparatus of text classification
US9898554B2 (en) Implicit question query identification
JP6211605B2 (en) Ranking search results based on click-through rate
CN100416569C (en) A Formal Description Method of User Access Behavior Based on Web Page Metadata
CN104462399B (en) The processing method and processing device of search result
TW201317814A (en) Method and Apparatus of Ranking Search Results, and Search Method and Apparatus
CN107169873A (en) A kind of microblog users authority evaluation method of multiple features fusion
TW201301064A (en) Search method and equipment
CN103150663A (en) Method and device for placing network placement data
CN106445954A (en) Business object display method and apparatus
CN111552882A (en) News influence calculation method and device, computer equipment and storage medium
US8407207B1 (en) Measuring video content of web domains
CN101382939A (en) Webpage Text Personalized Search Method Based on Eye Tracking
CN104281619A (en) System and method for ordering search results
CN103399918B (en) A kind of method improving the searched rate in website
CN104462556B (en) Question and answer page relevant issues recommend method and apparatus
CN106919570B (en) A method and device for deduplication scanning of page links oriented to new network media
CN106980677A (en) The subject search method of Industry-oriented
CN104951476B (en) Method and device for determining link level in website
CN106649795A (en) Keyword data outputting method and apparatus
CN111382385B (en) Web page industry classification method and device
CN107316246A (en) A kind of method for digging of social networks key user
CN107832203B (en) Method for diagnosing rendering performance of mobile terminal
CN107016135A (en) It is a kind of towards non-determined, infidelity, onlap the positive and negative two-way dynamic equilibrium search strategy of miscellaneous resource environment
CN102708244B (en) A kind of concept map automatic graph generation method based on importance degree tolerance

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160817