Summary of the invention
For the deficiencies in the prior art, the invention provides a kind of search data statistic of attribute method, the search information that reptile captured and indexed and the quality of data are quantized, formulated index item, by search information and data that reptile captured and indexed, through quality influence force data test re-scheduling, obtain and affect the larger index item of search data overall quality judge.Respectively by obtained index item, the corresponding index properties classification of classifying: level of factor comprehensively: " the micro-event data item of news front end pushes par " Accurate level of factor: " the dead chain ratio of page link " ....; Intellective factor index: " the independent visitor of the page of being interviewed " Interactive level of factor: " interactive micro-(answer all the questions micro-, mark micro-, vote micro-) column sum, the total column of shared SRP count ratio " Level of factor attractive in appearance: " picture (containing video thumbnails) links dead chain ratio " Set up the basis point of index item individual event, set up the basis point of classification indicators item, set up the basis point of whole index item.Set up index individual event score weight formula, set up classification indicators item score weight formula, set up whole index item score weight formula.Carry out data-pushing, obtain the data feedback of each index item individual event, and by default index weights formula and default index basis point, calculated crosswise, obtains index item individual event score, classification indicators item score, whole index item score.To by predetermined judgment criteria, the result of calculation of obtaining is carried out to the judgement of search data quality grade.
The object of the invention is to adopt following technical proposals to realize:
A statistic of attribute method, its improvements are, described method comprises:
(1), by search information and data that reptile captured and indexed, through qualitative data test re-scheduling, obtain and affect the larger index item of search data overall quality judge;
(2) respectively by obtained index item, the classification of classifying of corresponding index properties;
(3) set up the basis point of index item, set up index item score weight formula;
(4) obtain index item individual event score, classification indicators item score, whole index item score;
(5) to by predetermined judgment criteria, the result of calculation of obtaining is carried out to the judgement of search data quality grade.
Preferably, described step (2) comprises comprehensive level of factor, level of factor attractive in appearance, intellective factor index, accurate level of factor and the large class of interactive level of factor five.
Preferably, described step (4) comprises that index weights is multiplied by index item individual event basis point mark and is final index item score.
Preferably, the basis point of described index item individual event be divided into 10 minutes, 20 minutes, 30 minutes and 40 minutes fourth gear.
Preferably,, there is minimum value 10 minutes in the basis point of index item individual event, has maximal value 40 minutes, and meeting is the importance in total quality appraisement system with index item single index, and can change at any time.
Preferably,, there is minimum value 10 minutes in the basis point of classification indicators item, but does not have maximal value, and meeting is with the increase and decrease of affiliated index item single index sum, or the plus-minus of affiliated each index item single index basis point, and can change at any time.
Preferably,, there is minimum value 10 minutes, but do not have maximal value in the basis point of index item all, and meeting is with the increase and decrease of whole index item single index sums, or the plus-minus of whole index item single index basis points, and can change at any time.
Preferably, described step (5) comprises
All individual event score sum is more than or equal to 0, but is less than 60% of minimum individual event benchmark mark, is defined as and rejects warning;
All individual event score sum is more than or equal to 60% of minimum individual event benchmark mark, but is less than minimum individual event benchmark mark, is defined as and improves prompting;
All individual event score sum is more than or equal to minimum individual event benchmark mark, but is greater than Largest Single Item benchmark mark, is defined as and can accepts;
All individual event score sum is more than or equal to Largest Single Item benchmark mark, is defined as superior in quality.
Compared with the prior art, beneficial effect of the present invention is:
The present invention contributes to the quality good or not of keyword to carry out concrete quantification, marks, thereby can to keyword, improve more targetedly and optimize from different angles to keyword, improves keyword quality, gives the better experience of user.
The informative data that algorithm of the present invention obtains is credible, accumulative total is for searching comprehensive search in (http://www.zhongsou.com), the Search Results that retrieves quality dissatisfaction reaches tens thousand of, to improving the search data quality of searching comprehensive search in (http://www.zhongsou.com), played irreplaceable effect.
Embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in further detail.
Take comprehensive level of factor, intellective factor index and interactive level of factor is example:
1, comprehensive level of factor: classification indicators total points is 50 minutes, is divided into 4 single indexs
1) the micro-event data item of news front end pushes par
Cycle is 72 hours, index item individual event mark is 20 minutes, by the promptness pushing, judge that whether Search Results is complete comprehensively, computing formula is Yc=YC/AA, and in YC=72 hour, the micro-event data forward end of news pushes total quantity, micro-column sum of AA=news, 0≤Yc<1, index weights is 0; 1≤Yc<2, index weights is 60%; 2≤Yc<3, index weights is 80%; Yc >=3, index weights is 100%; Index weights is multiplied by index item individual event mark and is final score.
2) the micro-event data item of webpage front end pushes par
Cycle is 72 hours, index item individual event mark is 20 minutes, by the promptness pushing, judge that whether Search Results is complete comprehensively, computing formula is Zc=ZC/BB, and in ZC=72 hour, the micro-event data forward end of news pushes total quantity, micro-column sum of BB=news, 0≤Zc<1, index weights is 0; 1≤Zc<2, index weights is 60%; 2≤Zc<3, index weights is 80%; Zc >=3, index weights is 100%; Index weights is multiplied by index item individual event mark and is final score.
3) the micro-event data item of picture front end pushes par
Cycle is 72 hours, index item individual event mark is 10 minutes, by the promptness pushing, judge that whether Search Results is complete comprehensively, computing formula is Xc=XC/CC, and in XC=72 hour, the micro-event data forward end of news pushes total quantity, micro-column sum of CC=news, 0≤Xc<1, index weights is 0; 1≤Xc<2, index weights is 60%; 2≤Xc<3, index weights is 80%; Xc >=3, index weights is 100%; Index weights is multiplied by index item individual event mark and is final score.
4) the micro-event data item of picture front end pushes par
Cycle is 72 hours, index item individual event mark is 10 minutes, by the promptness pushing, judge that whether Search Results is complete comprehensively, computing formula is Rc=RC/DD, and in Rc=72 hour, the micro-event data forward end of news pushes total quantity, micro-column sum of DD=news, 0≤Rc<1, index weights is 0; 1≤Rc<2, index weights is 60%; 2≤Rc<3, index weights is 80%; Rc >=3, index weights is 100%; Index weights is multiplied by index item individual event mark and is final score.
2, intellective factor index, classification indicators total points is 50 minutes, is divided into 4 single indexs.
1) vocabulary entry prompting improves and leaves over quantity
Cycle is 24 hours, and index item individual event mark is 20 minutes, gets the average of index sum of the statistics day of same final stage sort key word, is defined as E, E<0.6 times of E averages, index weights is 0; 0.6 times of E average≤E<0.8 times of E averages, index weights is 60%; 0.8 times E average≤E<1 times E average times E average, index weights is 80%; E >=1 times E average, index weights is 100%; Index weights is multiplied by index item individual event mark and is final score.
2) jump out rate
Cycle is 24 hours, and index item individual event mark is 10 minutes, belongs to user behavior analysis, computing formula N=PV/UV, the searching times of the PV=page leaving from station, the be interviewed independent visitor of the page of UV=, N<0.6 times of N averages, index weights is 0; 0.6 times of N average≤N<0.8 times of N averages, index weights is 60%; 0.8 times N average≤N<1 times N average times E average, index weights is 80%; N >=1 times N average, index weights is 100%; Index weights is multiplied by index item individual event mark and is final score.
3) the be interviewed independent visitor of the page
Cycle is 24 hours, and index item individual event mark is 10 minutes, and the independent visitor of the page of being interviewed, is defined as UV, UV<0.6 times of UV averages, index weights is 0; 0.6 times of UV average≤UV<0.8 times of UV averages, index weights is 60%; 0.8 times UV average≤UV<1 times UV average times UV average, index weights is 80%; UV >=1 times UV average, index weights is 100%; Index weights is multiplied by index item individual event mark and is final score.
4) the be interviewed new independent visitor of the page
Cycle is 24 hours, and index item individual event mark is 10 minutes, and the independent visitor of the page of being interviewed is defined as UV (NEW), UV (NEW) <0.6 times of UV (NEW) average, index weights is 0; 0.6 times of UV (NEW) average≤UV (NEW) <0.8 times of UV (NEW) average, index weights is 60%; 0.<1 times of UV of 8 times of UV (NEW) average≤UV (NEW) (NEW) average times UV (NEW) average, index weights is 80%; UV (NEW) >=1 times UV (NEW) average, index weights is 100%; Index weights is multiplied by index item individual event mark and is final score.
3, interactive level of factor, classification indicators total points is 20 minutes, is divided into 1 single index.
1) interactive micro-(answer all the questions micro-, mark micro-, vote micro-) the total column of the column shared SRP of sum counts ratio
Cycle is 24 hours, and index item individual event mark is 10 minutes, and computing formula is T=T1/T0, micro-column sum of interactive micro-the column sum T0=SRP of T1=, T<0.6 times of T averages, index weights is 0; 0.6 times of T average≤T<0.8 times of T averages, index weights is 60%; 0.8 times T average≤T<1 times T average times T average, index weights is 80%; T >=1 times T average, index weights is 100%; Index weights is multiplied by index item individual event mark and is final score.
As described in Figure 2, by whole index item individual event score sums, be whole scores, and be performed as follows judgement:
1, all individual event score sum is more than or equal to 0, but is less than 60% of minimum individual event benchmark mark, is defined as and rejects warning; To the serious of existing Search Results quality, negate, cannot maintain present situation, must roll off the production line.
2, all individual event score sum is more than or equal to 60% of minimum individual event benchmark mark, but is less than minimum individual event benchmark mark, is defined as and improves prompting; To the slight of existing Search Results quality, negate, can maintain present situation, but must optimize.
3, all individual event score sum is more than or equal to minimum individual event benchmark mark, but is greater than Largest Single Item benchmark mark, is defined as and can accepts; Substantially sure to existing Search Results quality, can maintain present situation, does not need to optimize.
4, all individual event score sum is more than or equal to Largest Single Item benchmark mark, is defined as superior in quality; Completely sure to existing Search Results quality, can maintain present situation, must recommend.
Embodiment
On August 1st, 2013, this search data statistic of attribute algorithm application is searched comprehensive search in (http://www.zhongsou.com), data result through nearly 2 months is verified repeatedly, the informative data that algorithm obtains is credible, accumulative total is for searching comprehensive search in (http://www.zhongsou.com), the Search Results that retrieves quality dissatisfaction reaches tens thousand of, to improving the search data quality of searching comprehensive search in (http://www.zhongsou.com), has played irreplaceable effect.
Finally should be noted that: above embodiment is only in order to illustrate that technical scheme of the present invention is not intended to limit, although the present invention is had been described in detail with reference to above-described embodiment, those of ordinary skill in the field are to be understood that: still can modify or be equal to replacement the specific embodiment of the present invention, and do not depart from any modification of spirit and scope of the invention or be equal to replacement, it all should be encompassed in the middle of claim scope of the present invention.