CN103279529A

CN103279529A - Unstructured data retrieval method and system

Info

Publication number: CN103279529A
Application number: CN2013102105709A
Authority: CN
Inventors: 鄂海红; 宋美娜; 韩晶; 许可; 宋俊德; 黎燕; 毕建鹏
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2013-05-30
Filing date: 2013-05-30
Publication date: 2013-09-04

Abstract

The invention provides an unstructured data retrieval method which comprises the following steps of collecting data of user behaviors, processing the data of the user behaviors regularly so as to combine the task attribute of the data of the user behaviors in a preset time period to a task list, using keyword search to obtain a plurality of search results according to search requests of a user, calculating the task mark, the access frequency mark and the edition time duration mark of each search result, wherein the task mark refers to the similarity of the task attribute of each search result and the task attribute in the task list, calculating data popularity of the search results based on the task mark, the access frequency mark and the edition time duration mark, and rearranging the search results according to the data popularity calculation. The unstructured data retrieval method can improve retrieval efficiency and retrieval accuracy of unstructured data, and the invention further provides an unstructured data retrieval system.

Description

Unstructured data search method and system

Technical field

The present invention relates to field of computer technology, relate in particular to a kind of unstructured data search method and system.

Background technology

Big data age has accumulated unstructured datas such as a large amount of office documents, PDF, video in the enterprise, these data come from the multiple operation of the enterprise staff course of work, have by employee oneself establishment, what have comes from mail, have then from network download.Want the file that search needs in a large amount of unstructured datas that accumulate, need also need to spend considerable time through repeatedly search trial.For strengthening the unstructured data retrieval effectiveness, emerge a lot of researchs at the search rank method, the research that has is by linking to improve retrieval rank effect for setting up between the file, the research that has improves effectiveness of retrieval again by the recording user search history, also have research by allowing the user add oneself the memory of target data to be helped improve recall precision, existing research is carried out under " data are equality " this prerequisite substantially in the rank to Search Results, do not consider the relation of data and user behavior to the influence of retrieval rank, the prior art scheme relates to less in the data importance problem simultaneously.

Unstructured data is retrieved the major programme of rank at present both at home and abroad, 1) by excavating single reference situation and the access frequency of file, the algorithm of newly-built desktop resource link is proposed; 2) proposed rank algorithm based on study, its efficient is much better than the rank algorithm based on the file base attribute; 3) propose desktop searching method based on user's memory, when search, improve mode to the memory (as filename, last visit time) of file destination by the user, improve retrieval rank efficient; 4) propose to come the locator data resource based on task, how to utilize task when not considering retrieval; 5) propose to locate based on the desktop resource of User Activity analysis, except the Automatic Extraction user task, also fetch the support fuzzy search by excavating the desktop resource chain.

Existing solution to unstructured data retrieval rank mainly is conceived to data itself, does not consider the relation of data and user behavior to the influence of retrieval rank, and the prior art scheme relates to less in the data importance problem simultaneously.Yet in fact, any operation of data always is in certain task of user, if the task context to data identifies, list names advance to the similar data of the current task of user search behavior with task attribute in the Search Results, obviously can more allow the user satisfied.In addition, the user finishes in a job or the task process, some data can be by frequent operation (for example project demands document in project), and another part data were only operated (may be one piece of technical article that comes automatic network) for several times, just belonged to its importance difference of different pieces of information of same task.

Summary of the invention

The present invention is intended to one of solve the problems of the technologies described above at least.

For this reason, one object of the present invention is to propose a kind of unstructured data search method, and this method can promote recall precision and the retrieval accuracy of unstructured data.

Another object of the present invention is to propose a kind of unstructured data searching system.

To achieve these goals, the embodiment of first aspect present invention discloses the unstructured data search method, may further comprise the steps: gather user behavior data; Regularly handling described user behavior data merges in the task list with the task attribute with the user behavior data in the predetermined amount of time; Use key search to obtain a plurality of Search Results according to user's searching request; Calculate task branch, the access times of each Search Results and divide and editor's duration branch the similarity of the task attribute that wherein said task branch is described each Search Results and the task attribute in the described task list; Based on described task branch, described access times branch and described editor's duration branch described a plurality of Search Results being carried out the data temperature calculates; And calculate according to the data temperature described a plurality of Search Results are resequenced.

Unstructured data search method according to the embodiment of the invention, can promote the recall precision of unstructured data, not only has the accuracy of using key word to retrieve, also by calculating the factors such as task similarity, the significance level in task, key word matching degree, access times and editor's duration of unstructured data, effectively promote retrieval accuracy and with the matching degree of retrieval purpose.

In addition, unstructured data search method according to the above embodiment of the present invention can also have following additional technical characterictic:

In some instances, further comprise step: show the described a plurality of Search Results after resequencing.

In some instances, described user behavior data is to obtain according to described user's behavior daily record is analyzed.

In some instances, further comprise step: calculate the debut ranking branch of each Search Results, wherein divide based on described task branch, described access times branch, described editor's duration branch and described debut ranking described a plurality of Search Results are carried out the calculating of data temperature.

In some instances, described data temperature computing formula:

Heat_score=p*taskScore* (t ₁+ t ₂* accessScore+t ₃* edittimeScore)+and q*initScore, wherein, p, q, t1, t2, t3 are weighted values.

In some instances: p:q:t1:t2:t3=95:5:0.9:0.07:0.03.

In some instances, also comprise: be adjusted at according to the data type of each Search Results and application scenarios and carry out the weight of data temperature when calculating.

In some instances, further comprise step: described a plurality of Search Results are carried out cluster; Respectively each Search Results in described each cluster result is sorted.

In some instances, the step that described a plurality of Search Results are carried out cluster specifically comprises: the relevance of obtaining each Search Results and task in described a plurality of Search Results; According to described relevance each Search Results in described a plurality of result for retrieval is carried out cluster.

The embodiment of second aspect present invention discloses the unstructured data searching system, comprising: acquisition module is used for gathering user behavior data; Processing module, be used for regularly handling described user behavior data and merge to task list with the task attribute with the user behavior data in the predetermined amount of time, and task branch, the access times of calculating each Search Results are divided and editor's duration branch, the similarity of the task attribute that wherein said task branch is described each Search Results and the task attribute in the described task list, and based on described task branch, described access times branch and described editor's duration branch described a plurality of Search Results are carried out the data temperature and calculate; Retrieval module be used for using key search to obtain a plurality of Search Results according to user's searching request, and described a plurality of Search Results is resequenced in calculating according to the data temperature.

Unstructured data searching system according to the embodiment of the invention, can promote the recall precision of unstructured data, not only has the accuracy of using key word to retrieve, also by calculating the factors such as task similarity, the significance level in task, key word matching degree, access times and editor's duration of unstructured data, effectively promote retrieval accuracy and with the matching degree of retrieval purpose.

In addition, unstructured data searching system according to the above embodiment of the present invention can also have following additional technical characterictic:

In some instances, also comprise: display module is used for showing the described a plurality of Search Results after resequencing.

In some instances, described processing module also is used for calculating the debut ranking branch of each Search Results, wherein divides based on described task branch, described access times branch, described editor's duration branch and described debut ranking described a plurality of Search Results are carried out the calculating of data temperature.

In some instances, described data temperature computing formula:

In some instances: p:q:t1:t2:t3=95:5:0.9:0.07:0.03.

In some instances, described processing module also is used for being adjusted at according to the data type of each Search Results and application scenarios and carries out the weight of data temperature when calculating.

In some instances, also comprise: the cluster module is used for described a plurality of Search Results are carried out cluster, respectively each Search Results in described each cluster result is sorted by described retrieval module.

In some instances, described cluster module is carried out cluster to described a plurality of Search Results and is specifically comprised: the relevance of obtaining each Search Results and task in described a plurality of Search Results; According to described relevance each Search Results in described a plurality of result for retrieval is carried out cluster.

The aspect that the present invention adds and advantage part in the following description provide, and part will become obviously from the following description, or recognize by practice of the present invention.

Description of drawings

Above-mentioned and/or the additional aspect of the present invention and advantage be from obviously and easily understanding becoming the description of embodiment below in conjunction with accompanying drawing, wherein,

Fig. 1 is the process flow diagram of unstructured data search method according to an embodiment of the invention;

Fig. 2 is the structural drawing of unstructured data searching system according to an embodiment of the invention; And

Fig. 3 is the retrieving synoptic diagram of unstructured data searching system according to an embodiment of the invention.

Embodiment

Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein identical or similar label is represented identical or similar elements or the element with identical or similar functions from start to finish.Be exemplary below by the embodiment that is described with reference to the drawings, only be used for explaining the present invention, and can not be interpreted as limitation of the present invention.On the contrary, embodiments of the invention comprise spirit and interior all changes, modification and the equivalent of intension scope that falls into institute's additional claims.

In description of the invention, it will be appreciated that term " first ", " second " etc. only are used for describing purpose, and can not be interpreted as indication or hint relative importance.In description of the invention, need to prove that unless clear and definite regulation and restriction are arranged in addition, term " links to each other ", " connection " should do broad understanding, for example, can be fixedly connected, also can be to removably connect, or connect integratedly; Can be mechanical connection, also can be to be electrically connected; Can be directly to link to each other, also can link to each other indirectly by intermediary.For the ordinary skill in the art, can concrete condition understand above-mentioned term concrete implication in the present invention.In addition, in description of the invention, except as otherwise noted, the implication of " a plurality of " is two or more.

Describe and to be understood that in the process flow diagram or in this any process of otherwise describing or method, expression comprises module, fragment or the part of code of the executable instruction of the step that one or more is used to realize specific logical function or process, and the scope of preferred implementation of the present invention comprises other realization, wherein can be not according to order shown or that discuss, comprise according to related function by the mode of basic while or by opposite order, carry out function, this should be understood by the embodiments of the invention person of ordinary skill in the field.

Below in conjunction with unstructured data search method and the system of accompanying drawing description according to the embodiment of the invention.

Fig. 1 is the process flow diagram of unstructured data search method according to an embodiment of the invention.As shown in Figure 1, this unstructured data search method comprises the steps:

Step S101: gather user behavior data.

Wherein, user behavior data refers to the data that the user generates the operation behavior of unstructured data, for example: the user to the editor of unstructured data, the behavioral data of operation such as browse.

In one embodiment of the invention, user behavior data is to obtain according to user's behavior daily record is analyzed.Particularly, the user is stored in the journal file the behavioral data of the operation of unstructured data, referred to herein as the behavior daily record.The user can therefrom extract above-mentioned user behavior data by user's behavior daily record is analyzed.

Step S102: regularly the process user behavioral data merges in the task list with the task attribute with the user behavior data in the predetermined amount of time.

Specifically, " human behavior dynamics " studies show that, people's behavior (for example the user is to the operation behavior of unstructured data) can be regarded a series of task of handling as, and concentrate in a period of time and to finish a certain task, visible user's operation behavior is very relevant with the task of carrying out in the recent period.Therefore, can think that user behavior data is relevant with one or a series of task.

Therefore, regularly the process user behavioral data merges in the predefined task list with the task attribute with the user behavior data in the predetermined amount of time.In above-mentioned example, predetermined amount of time is such as, but not limited to 2 days, namely gets union for the task attribute of user interactive data (user behavior data) in a couple of days, and is updated in the task list.For example, user behavior data A is relevant with task B, then task B is updated in the task list.

Need to prove, unstructured data need identify in advance with attribute, for example adopts the unstructured data galactic model that unstructured data is identified its feature with attribute (for example comprising: task (task attribute), file access number of times, file editor duration etc. under the file).

For example, the attribute of unstructured data adopts the unstructured data galactic model to be described.As shown in table 1, the attribute of definition unstructured data is fi, thereby identifies the feature of this unstructured data with attribute.For example typical attribute comprises:

Table 1

Step S103: use key search to obtain a plurality of Search Results according to user's searching request.

For example: the examination reply PPT of user's WKG working project A, want with reference to other PPT in the project implementation this moment, therefore can use key word " project A; PPT " to retrieve, thereby obtain a plurality of Search Results, wherein, Search Results is a plurality of unstructured datas.

Step S104: task branch, the access times of calculating each Search Results are divided and editor's duration branch, and wherein the task branch is the task attribute of each Search Results and the similarity of the task attribute in the task list.The access times branch for example obtains the access times of the unstructured data of this Search Results by the user.Editor's duration branch for example obtains the edit session of the unstructured data of this Search Results by the user.

Certainly, in other example of the present invention, also can calculate the debut ranking branch of each Search Results, and based on task branch, access times branch, editor's duration branch and debut ranking branch a plurality of Search Results be carried out the data temperature and calculate.Wherein, debut ranking divides and can generate according to the rank that above-mentioned user obtains a plurality of Search Results by key search, and for example, rank is more forward, and its corresponding debut ranking divides more high.

Step S105: based on task branch, access times branch and editor's duration branch a plurality of Search Results are carried out the data temperature and calculate.

Specifically, the data temperature (being data temperature score value) of a file (for example unstructured data) is represented the significance level of these data in affiliated task, and this score value comes COMPREHENSIVE CALCULATING by access times, editor's duration, the task matching degree of file.

For example, establishing sim(fileTask, recentTask) is file f _iThe similarity of task attribute vector f ileTask and recent task vector recentTask, then

taskScore=sim(fileTask，recentTask） (1)

If file f _iAccess times be a _i, A={a _j| 0＜j＜n and fi.taskScore=fj.taskScore}, then

accessScore=a _i/2Max _A (2)

If file f _iEditor's duration be et _i,, ET={et _j| 0＜j＜n and fi.taskScore=fj.taskScore}, then

edittimeScore=et _i/2Max _ET (3)

Data temperature computing formula then:

heat_score=p*taskScore*(t ₁+t ₂*accessScore+t ₃*edittimeScore)+q*initScore (4)

Wherein, p, q, t1, t2, t3 are weighted values, in one embodiment of the invention, and p:q:t1:t2:t3=95:5:0.9:0.07:0.03.

In addition, carrying out the data temperature by above-mentioned formula when calculating for dissimilar unstructured datas, can adjust the weights of attribute scores when calculating temperature such as access times, editor's duration, task matching degree according to data characteristics and application scenarios, namely be adjusted at according to the data type of each Search Results and application scenarios and carry out the weight of data temperature when calculating, to reach best rank effect.

As a concrete example, the degree of correlation of the task of carrying out in the recent period for the file of understanding Search Results and user need record and analyze, thereby calculate recent task vector user journal.

By calculating the set F of the file that the user visited in the recent period, by the task attribute of file among the F make up recent task vector recentTask=(rtask1, rtask2 ...).

After submit queries, extract user's key word of the inquiry, be designated as vectorial userQuery=(keyw1, keyw2 ...), wherein keyw1 and keyw2 represent key word of the inquiry.With keyword vector userQuery submit to search engine (such as, the search engine of Windows system) after the retrieval, return initial retrieval result set InitF, each destination file all has task attribute, can be designated as a vector f ileTask=(ftask1, ftask2 ...), wherein ftask1 and ftask2 represent the mark of the task attribute that this document has.Divide taskScore, access times to divide accessScore and editor's duration to divide edittimeScore COMPREHENSIVE CALCULATING data temperature according to task.

Step S106: a plurality of Search Results are resequenced in calculating according to the data temperature.For example search result rank that can data temperature score value is higher is forward.

In addition, in the data temperature is calculated, because access times and editor's duration are numeric types, its minimax value span is big, in order to reduce excessive too small property value to the excessive influence of rank score value, for example, the journal file access times that generated by software are very big, but usually and the user task relation less.

Therefore, embodiments of the invention can at first carry out cluster to a plurality of Search Results, then respectively each Search Results in each cluster result is sorted, particularly, the step that a plurality of Search Results are carried out cluster specifically comprises: the relevance of obtaining each Search Results and task in a plurality of Search Results; According to relevance each Search Results in a plurality of result for retrieval is carried out cluster.For example, can adopt some existing cluster modes, at first destination file is divided into 3 grades according to the task dependencies score value, A and the ET to the identical result for retrieval of task rank gets maximal value then, finally calculates access times score value and editor's duration score value respectively.

Further, after resequencing, this method also can comprise step: show the described a plurality of Search Results after resequencing, thereby make things convenient for the user to check.

Fig. 2 is unstructured data searching system according to an embodiment of the invention.As shown in Figure 2, the unstructured data searching system according to the embodiment of the invention comprises: acquisition module 210, processing module 220 and retrieval module 230.

Particularly, in conjunction with the retrieval flow of this unstructured data searching system shown in Figure 3, acquisition module 210 is used for gathering user behavior data.Wherein, user behavior data refers to the data that the user generates the operation behavior of unstructured data, for example: the user to the editor of unstructured data, the behavioral data of operation such as browse.

In one embodiment of the invention, user behavior data is to obtain according to user's behavior daily record is analyzed.Particularly, the user is stored in the behavior daily record storehouse of being made up of journal file the behavioral data of the operation of unstructured data.The user can therefrom extract above-mentioned user behavior data by user's behavior daily record is analyzed.As shown in Figure 2, as a concrete example, behavior daily record, unstructured data and task list (being recent task list) etc. all can be stored in the memory module 240.

Processing module 220 is used for regular process user behavioral data and merges to task list with the task attribute with the user behavior data in the predetermined amount of time, and task branch, the access times of calculating each Search Results are divided and editor's duration branch, wherein the task branch is the task attribute of each Search Results and the similarity of the task attribute in the task list, and based on task branch, access times branch and editor's duration branch a plurality of Search Results is carried out the data temperature and calculate.

Therefore, processing module 220 regularly the process user behavioral data merge in the predefined task list with the task attribute with the user behavior data in the predetermined amount of time.In above-mentioned example, predetermined amount of time is such as, but not limited to 2 days, namely gets union for the task attribute of user interactive data (user behavior data) in a couple of days, and is updated in the task list.For example, user behavior data A is relevant with task B, then task B is updated in the task list.

For example, the attribute of unstructured data adopts the unstructured data galactic model to be described.As shown in table 1, the attribute of definition unstructured data is fi, thereby identifies the feature of this unstructured data with attribute.

In above-mentioned example, the access times branch for example obtains the access times of the unstructured data of this Search Results by the user.Editor's duration branch for example obtains the edit session of the unstructured data of this Search Results by the user.

Certainly, in other example of the present invention, processing module 220 also can be calculated the debut ranking branch of each Search Results, wherein, based on task branch, access times branch, editor's duration branch and debut ranking branch a plurality of Search Results is carried out the data temperature and calculates.Wherein, debut ranking divides and can generate according to the rank that above-mentioned user obtains a plurality of Search Results by key search, and for example, rank is more forward, and its corresponding debut ranking divides more high.

The data temperature (being data temperature score value) of a file (for example unstructured data) is represented the significance level of these data in affiliated task, and this score value comes COMPREHENSIVE CALCULATING by access times, editor's duration, the task matching degree of file.

taskScore=sim(fileTask，recentTask） (1)

accessScore=a _i/2Max _A (2)

edittimeScore=et _i/2Max _ET (3)

Data temperature computing formula then:

In addition, carrying out the data temperature by above-mentioned formula when calculating for dissimilar unstructured datas, can adjust the weights of attribute scores when calculating temperature such as access times, editor's duration, task matching degree according to data characteristics and application scenarios, be that processing module 220 also is used for being adjusted at according to the data type of each Search Results and application scenarios and carries out the weight of data temperature when calculating, to reach best rank effect.

Retrieval module 230 is used for using key search to obtain a plurality of Search Results according to user's searching request, and a plurality of Search Results are resequenced in calculating according to the data temperature.For example search result rank that can data temperature score value is higher is forward.

Therefore, embodiments of the invention also provide cluster module (not shown), this cluster module is used for a plurality of Search Results are carried out cluster, respectively each Search Results in each cluster result is sorted by retrieval module 230, particularly, the cluster module specifically comprises the step that a plurality of Search Results carry out cluster: the relevance of obtaining each Search Results and task in a plurality of Search Results; According to relevance each Search Results in a plurality of result for retrieval is carried out cluster.For example, can adopt some existing cluster modes, at first destination file is divided into 3 grades according to the task dependencies score value, A and the ET to the identical result for retrieval of task rank gets maximal value then, finally calculates access times score value and editor's duration score value respectively.

Further, this system also comprises: the display module (not shown) is used for showing the described a plurality of Search Results after resequencing.

Should be appreciated that each several part of the present invention can realize with hardware, software, firmware or their combination.In the above-described embodiment, a plurality of steps or method can realize with being stored in the storer and by software or firmware that suitable instruction execution system is carried out.For example, if realize with hardware, the same in another embodiment, in the available following technology well known in the art each or their combination realize: have for the discrete logic of data-signal being realized the logic gates of logic function, special IC with suitable combinational logic gate circuit, programmable gate array (PGA), field programmable gate array (FPGA) etc.

In the description of this instructions, concrete feature, structure, material or characteristics that the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means in conjunction with this embodiment or example description are contained at least one embodiment of the present invention or the example.In this manual, the schematic statement to above-mentioned term not necessarily refers to identical embodiment or example.And concrete feature, structure, material or the characteristics of description can be with the suitable manner combination in any one or more embodiment or example.

Although illustrated and described embodiments of the invention, for the ordinary skill in the art, be appreciated that without departing from the principles and spirit of the present invention and can carry out multiple variation, modification, replacement and modification to these embodiment that scope of the present invention is by claims and be equal to and limit.

Claims

1. a unstructured data search method is characterized in that, may further comprise the steps:

Gather user behavior data;

Regularly handling described user behavior data merges in the task list with the task attribute with the user behavior data in the predetermined amount of time;

Use key search to obtain a plurality of Search Results according to user's searching request;

Calculate task branch, the access times of each Search Results and divide and editor's duration branch the similarity of the task attribute that wherein said task branch is described each Search Results and the task attribute in the described task list;

Based on described task branch, described access times branch and described editor's duration branch described a plurality of Search Results being carried out the data temperature calculates; And

Described a plurality of Search Results are resequenced in calculating according to the data temperature.

2. method according to claim 1 is characterized in that, further comprises step: show the described a plurality of Search Results after resequencing.

3. method according to claim 1 is characterized in that, described user behavior data is to obtain according to described user's behavior daily record is analyzed.

4. method according to claim 1 is characterized in that, further comprises step:

Calculate the debut ranking branch of each Search Results, wherein divide based on described task branch, described access times branch, described editor's duration branch and described debut ranking described a plurality of Search Results are carried out the calculating of data temperature.

5. method according to claim 4 is characterized in that, described data temperature computing formula:

heat_score=p*taskScore*(t ₁+t ₂*accessScore+t ₃*edittimeScore)+q*initScore，

Wherein, p, q, t1, t2, t3 are weighted values.

6. method according to claim 5 is characterized in that, p:q:t1:t2:t3=95:5:0.9:0.07:0.03.

7. method according to claim 6 is characterized in that, also comprises:

Be adjusted at according to the data type of each Search Results and application scenarios and carry out the weight of data temperature when calculating.

8. method according to claim 1 is characterized in that, further comprises step:

Described a plurality of Search Results are carried out cluster;

Respectively each Search Results in described each cluster result is sorted.

9. method according to claim 8 is characterized in that, the step that described a plurality of Search Results are carried out cluster specifically comprises:

Obtain the relevance of each Search Results and task in described a plurality of Search Results;

According to described relevance each Search Results in described a plurality of result for retrieval is carried out cluster.

10. a unstructured data searching system is characterized in that, comprising:

Acquisition module is used for gathering user behavior data;

Processing module, be used for regularly handling described user behavior data and merge to task list with the task attribute with the user behavior data in the predetermined amount of time, and task branch, the access times of calculating each Search Results are divided and editor's duration branch, the similarity of the task attribute that wherein said task branch is described each Search Results and the task attribute in the described task list, and based on described task branch, described access times branch and described editor's duration branch described a plurality of Search Results are carried out the data temperature and calculate;

Retrieval module be used for using key search to obtain a plurality of Search Results according to user's searching request, and described a plurality of Search Results is resequenced in calculating according to the data temperature.

11. system according to claim 10 is characterized in that, also comprises:

Display module is used for showing the described a plurality of Search Results after resequencing.

12. system according to claim 10 is characterized in that, described user behavior data is to obtain according to described user's behavior daily record is analyzed.

13. system according to claim 10, it is characterized in that, described processing module also is used for calculating the debut ranking branch of each Search Results, wherein divides based on described task branch, described access times branch, described editor's duration branch and described debut ranking described a plurality of Search Results are carried out the calculating of data temperature.

14. system according to claim 13 is characterized in that, described data temperature computing formula:

Wherein, p, q, t1, t2, t3 are weighted values.

15. system according to claim 14 is characterized in that, p:q:t1:t2:t3=95:5:0.9:0.07:0.03.

16. system according to claim 15 is characterized in that, described processing module also is used for being adjusted at according to the data type of each Search Results and application scenarios carries out the weight of data temperature when calculating.

17. system according to claim 10 is characterized in that, also comprises:

The cluster module is used for described a plurality of Search Results are carried out cluster, respectively each Search Results in described each cluster result is sorted by described retrieval module.

18. system according to claim 17 is characterized in that, described cluster module is carried out cluster to described a plurality of Search Results and is specifically comprised: