CN105528430A - Method and device for determining weights of search terms - Google Patents

Method and device for determining weights of search terms Download PDF

Info

Publication number
CN105528430A
CN105528430A CN201510917486.XA CN201510917486A CN105528430A CN 105528430 A CN105528430 A CN 105528430A CN 201510917486 A CN201510917486 A CN 201510917486A CN 105528430 A CN105528430 A CN 105528430A
Authority
CN
China
Prior art keywords
search
key
value
search terms
terms
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510917486.XA
Other languages
Chinese (zh)
Other versions
CN105528430B (en
Inventor
陈进平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201510917486.XA priority Critical patent/CN105528430B/en
Publication of CN105528430A publication Critical patent/CN105528430A/en
Application granted granted Critical
Publication of CN105528430B publication Critical patent/CN105528430B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The invention discloses a method and a device for determining weights of search terms. The method comprises the following steps: obtaining a set of search data pairs, in which the search data pairs comprise search words and corresponding search result contents; determining the occurrence probability of each search term included in each search word in the search result contents according to the set of the search data pairs; and determining the weight of each search term according to the occurrence probability of each search term in the search result contents. Through the technical scheme of the invention, the importance of the content of each search term shown in the search result can be fully considered; the occurrence probabilities of segments and the search terms included in the segments of the search data pairs in the search results can be excavated in a large scale; the weights of the search terms can be determined according to the excavated probabilities.

Description

A kind of method and apparatus determining the weight of search terms
Technical field
The present invention relates to technical field of data processing, be specifically related to a kind of method and apparatus determining the weight of search terms.
Background technology
Along with the development of computer networking technology, general by web search data grows, and along with network information more and more huger, the data that user can search also get more and more, how according to the demand of user, for user provides information the most accurately in the data of magnanimity, improve search efficiency, become the problem that each large search engine will solve.
In the prior art, the weight according to search terms (term) each in search word provides Search Results, with in the data of magnanimity for user provides search result information the most accurately.But how the weight of each search terms in search word calculates, and just to provide Search Results accurately be but the problem needing solution badly
Summary of the invention
In view of the above problems, the present invention is proposed to provide a kind of method and apparatus of the weight of determination search terms overcoming the problems referred to above or solve the problem at least in part.
According to one aspect of the present invention, provide a kind of method determining the weight of search terms, the method comprises:
Obtain the set that search data is right; Wherein said search data is to comprising: search word and corresponding search result content;
The set right according to search data, determines the probability that each search terms comprised in each search word occurs in search result content;
According to the probability that each search terms occurs in search result content, determine the weight of each search terms.
Alternatively, the described set right according to search data, determine that the probability that each search terms comprised in each search word occurs in search result content comprises:
For search data to each search data pair in set, determine each continuous print fragment that can obtain from the search word that each search data is right;
Take fragment as key, and situation about whether occurring in the search result content that the search word at this fragment place is corresponding with each search terms that this fragment comprises is for value, exports key-value pair;
In the key-value pair set exported, by the value in each key-value pair that stat key is identical, obtain the probability that each search terms in this key occurs in search result content.
Alternatively, the set that described acquisition search data is right comprises:
Search data is obtained to composition set from search engine click logs.
Alternatively, the described each search terms comprised with this fragment situation about whether occurring in the search result content that the search word at this fragment place is corresponding comprises for value:
Determine the search item number N comprised in this fragment, N is natural number;
Using the binary number of N position as described value, and represent whether corresponding search terms occurs in the search result content of correspondence by two kinds of possibility values of every bit.
Alternatively, described by the value in the identical each key-value pair of stat key, obtain the probability that each search terms in this key occurs in search result content and comprise:
For each search terms in the key that this is identical, add up this search terms and show as the number of times occurred in search result content in the value of the identical each key-value pair of described key, be designated as the first numerical value;
Add up the number of the identical each key-value pair of described key, be designated as second value;
According to the ratio of described first numerical value and second value, determine the probability that this search terms occurs in search result content.
Alternatively, the situation whether described each search terms comprised with this fragment occurs in the search result content that the search word at this fragment place is corresponding comprises for value: determine the search item number N comprised in this fragment, N is natural number; Using the binary number of N position as described value, and with representing during every bit value 1 that corresponding search terms occurs in the search result content of correspondence, representing during value 0 and not occurring;
Value in described each key-value pair identical by stat key, obtain the probability that each search terms in this key occurs in search result content to comprise: for each search terms in the key that this is identical, adding up this search terms value in the value of the identical each key-value pair of described key is 1 number, is designated as the first numerical value;
Add up the number of the identical each key-value pair of described key, be designated as second value;
According to the ratio of described first numerical value and second value, determine the probability that this search terms occurs in search result content.
Alternatively, described search result content be following in any one;
The title of search results pages;
The summary of search results pages;
The full content of search results pages.
Alternatively, the method comprises further:
Each search terms and corresponding weight are saved in weight database;
When receiving search word, be multiple search terms by this search word cutting;
The weight that the plurality of search terms is corresponding is respectively obtained from described weight database;
The weight corresponding respectively according to the plurality of search terms carries out search process.
According to another aspect of the present invention, provide a kind of device determining the weight of search terms, wherein, this device comprises:
Data capture unit, is suitable for obtaining the right set of search data; Wherein this search data is to comprising: search word and corresponding search result content;
Probability determining unit, is suitable for the set right according to search data, determines the probability that each search terms comprised in each search word occurs in search result content;
Weight determining unit, is suitable for the probability occurred in search result content according to each search terms, determines the weight of each search terms.
Alternatively, described probability determining unit, comprises further:
Key-value pair output unit, be suitable for for search data to set in each search data pair, determine each continuous print fragment that can obtain from the search word that each search data is right; Take fragment as key, and situation about whether occurring in the search result content that the search word at this fragment place is corresponding with each search terms that this fragment comprises is for value, exports key-value pair;
Statistic unit, is suitable for, in the key-value pair set exported, by the value in each key-value pair that stat key is identical, obtaining the probability that each search terms in this key occurs in search result content.
Alternatively, described data capture unit, is suitable for from search engine click logs, obtaining search data to composition set.
Alternatively, described key-value pair output unit, be suitable for the search item number N determining to comprise in this fragment, N is natural number; Using the binary number of N position as described value, and represent whether corresponding search terms occurs in the search result content of correspondence by two kinds of possibility values of every bit.
Alternatively, described statistic unit, is suitable for for each search terms in this identical key, adds up this search terms and show as the number of times occurred in search result content in the value of the identical each key-value pair of described key, be designated as the first numerical value;
Add up the number of the identical each key-value pair of described key, be designated as second value; According to the ratio of described first numerical value and second value, determine the probability that this search terms occurs in search result content.
Alternatively, described key-value pair output unit, be suitable for the search item number N determining to comprise in this fragment, N is natural number; Using the binary number of N position as described value, and with representing during every bit value 1 that corresponding search terms occurs in the search result content of correspondence, representing during value 0 and not occurring;
Described statistic unit, is suitable for for each search terms in this identical key, and adding up this search terms value in the value of the identical each key-value pair of described key is 1 number, is designated as the first numerical value; Add up the number of the identical each key-value pair of described key, be designated as second value; According to the ratio of described first numerical value and second value, determine the probability that this search terms occurs in search result content.
Alternatively, described search result content be following in any one;
The title of search results pages;
The summary of search results pages;
The full content of search results pages.
Alternatively,
Described weight determining unit, is further adapted for and is saved in weight database by each search terms and corresponding weight;
This device comprises further:
Storage unit, is suitable for storing described weight database;
Search processing, is suitable for when receiving search word, is multiple search terms by this search word cutting; The weight that the plurality of search terms is corresponding is respectively obtained from described weight database; The weight corresponding respectively according to the plurality of search terms carries out search process.
The right set of search data is obtained according to technical scheme of the present invention, the set right according to search data, determine the probability that each search terms comprised in each search word occurs in search result content, according to the probability that each search terms occurs in search result content, determine the weight of each search terms.By technical scheme of the present invention, the importance of each search terms content appeared in Search Results can be taken into full account, the probability that each search terms that in excavation search data, fragment and fragment comprise on a large scale occurs in Search Results, and according to the probability excavated, determine the weight of each search terms.
Above-mentioned explanation is only the general introduction of technical solution of the present invention, in order to technological means of the present invention can be better understood, and can be implemented according to the content of instructions, and can become apparent, below especially exemplified by the specific embodiment of the present invention to allow above and other objects of the present invention, feature and advantage.
Accompanying drawing explanation
By reading hereafter detailed description of the preferred embodiment, various other advantage and benefit will become cheer and bright for those of ordinary skill in the art.Accompanying drawing only for illustrating the object of preferred implementation, and does not think limitation of the present invention.And in whole accompanying drawing, represent identical parts by identical reference symbol.In the accompanying drawings:
Fig. 1 shows a kind of according to an embodiment of the invention process flow diagram determining the method for the weight of search terms;
Fig. 2 shows a kind of according to an embodiment of the invention schematic diagram determining the device of the weight of search terms;
Fig. 3 shows a kind of in accordance with another embodiment of the present invention probability determining unit schematic diagram determining the device of the weight of search terms;
Fig. 4 shows a kind of in accordance with another embodiment of the present invention schematic diagram determining the device of the weight of search terms.
Embodiment
Below with reference to accompanying drawings exemplary embodiment of the present disclosure is described in more detail.Although show exemplary embodiment of the present disclosure in accompanying drawing, however should be appreciated that can realize the disclosure in a variety of manners and not should limit by the embodiment set forth here.On the contrary, provide these embodiments to be in order to more thoroughly the disclosure can be understood, and complete for the scope of the present disclosure can be conveyed to those skilled in the art.
Fig. 1 shows a kind of according to an embodiment of the invention method flow diagram determining the weight of search terms.As shown in Figure 1, the method comprises:
Step S110, obtains the set that search data is right; Wherein search data is to comprising: search word and corresponding search result content.
The set that the search data got is right comprises one or more search data pair.
Step S120, the set right according to search data, determines the probability that each search terms comprised in each search word occurs in search result content.
Step S130, according to the probability that each search terms occurs in search result content, determines the weight of each search terms.
The Method for Accurate Calculation of each search terms in a kind of search word at the probability of occurrence in Search Results is given in method shown in Fig. 1, the probability obtained according to the method can determine the weight of each search terms further, thus provide Search Results according to the weight of each search terms, substantially increase the accuracy of search engine.
In one embodiment of the invention, the set that the step S120 in method shown in Fig. 1 is right according to search data, determine that the probability that each search terms comprised in each search word occurs in search result content comprises:
Step S121, for search data to each search data pair in set, determines each continuous print fragment that can obtain from the search word that each search data is right.
Step S122 take fragment as key, and situation about whether occurring in the search result content that the search word at this fragment place is corresponding with each search terms that this fragment comprises is for value, exports key-value pair.
Step S123, in the key-value pair set exported, by the value in each key-value pair that stat key is identical, obtains the probability that each search terms in this key occurs in search result content.
In one embodiment of the invention, the set that the step S110 in method shown in Fig. 1 obtains search data right comprises: from search engine click logs, obtain search data to composition set.
From search engine click logs, obtain search data to composition set, the data volume that can utilize is large, and data acquisition is easy, and due to its correlativity that take into account user's request and click result, more meet user's request, the degree of correlation is high.
Here, the often pair of search data centering obtained from search engine click logs, search word is the search query word of user's input, and search result content is the Search Results that user finally clicks.Visible, from search engine click logs, obtain search data the expectation of family to Search Results is share to composition set symbol.
In one embodiment of the invention, the situation whether each search terms comprised with this fragment described in step S122 occurs in the search result content that the search word at this fragment place is corresponding comprises for value:
Step S1221, determine the search item number N comprised in this fragment, N is natural number.
By two kinds of possibility values of every bit, step S1222, using the binary number of N position as described value, and represent whether corresponding search terms occurs in the search result content of correspondence.
Such as, a fragment comprises three search termses, then use the binary number of three as value, and 1 represents that corresponding search terms occurs in the Search Results that this search data is right, and 0 expression does not occur.Then value 110 represents that the first search terms in this fragment and the second search terms occur in the Search Results that this search data is right, and the 3rd search terms does not occur in the Search Results that this search data is right.
In one embodiment of the invention, by the value in the identical each key-value pair of stat key described in step S123, obtain the probability that each search terms in this key occurs in search result content and comprise:
Step S1231, for each search terms in the key that this is identical, adds up this search terms and show as the number of times occurred in search result content in the value of the identical each key-value pair of key, be designated as the first numerical value.
Step S1232, the number of each key-value pair that stat key is identical, is designated as second value.
Step S1233, according to the ratio of the first numerical value and second value, determines the probability that this search terms occurs in search result content.
For fragment " ABC ", there is following key-value pair: [ABC, 111] [ABC, 100] [ABC, 011] [ABC, 101] [ABC, 110].
Probability=4/5=0.8 that A search terms occurs in Search Results;
Probability=3/5=0.6 that B search terms occurs in Search Results;
Probability=3/5=0.6 that C search terms occurs in Search Results.
In one embodiment of the invention, the situation whether each search terms comprised with this fragment described in step S122 occurs in the search result content that the search word at this fragment place is corresponding comprises for value:
Step S1221 ', determine the search item number N comprised in this fragment, N is natural number.
Step S1222 ', using the binary number of N position as described value, and with representing during every bit value 1 that corresponding search terms occurs in the search result content of correspondence, representing during value 0 and not occurring.
Then described in step S123 by the value in the identical each key-value pair of stat key, obtain the probability that each search terms in this key occurs in search result content and comprise:
Step S1231 ', for each search terms in the key that this is identical, adding up this search terms value in the value of the identical each key-value pair of key is the number of times of 1, is designated as the first numerical value.
Step S1232 ', the number of each key-value pair that stat key is identical, is designated as second value.
Step S1233 ', according to the ratio of the first numerical value and second value, determines the probability that this search terms occurs in search result content.
In one embodiment of the invention, in method shown in Fig. 1 search result content be following in any one; The title of the search results pages that user clicks; The summary of the search results pages that user clicks; The full content of the search results pages that user clicks.
Such as, from search engine click logs, obtain the set that search data is right, for search word be " ABCDE ", the title content of search results pages be " FGACDHJ " a pair search data to.Comprise from the obtainable continuous print fragment of search word " ABCDE ":
1. comprise the fragment of 1 search terms: A, B, C, D, E.
2. comprise the fragment of 2 search termses: AB, BC, CD, DE.
3. comprise the fragment of 3 search termses: ABC, BCD, CDE.
4. comprise the fragment of 4 search termses: ABCD, BCDE.
5. comprise the fragment of 5 search termses: ABCDE.
With each fragment for key, and situation about whether occurring in the search result content that this search data is right with each search terms that this fragment comprises is for value, exports about the right key-value pair of this search data.In this example, the search terms occurred in Search Results is A, C and D, then determine that the value that A, C and D are corresponding is 1, and value corresponding to all the other each search terms contents do not occurred is 0.This search word and corresponding search result content are processed, following key-value pair can be exported:
1. comprise the fragment of 1 search terms: A:1, B:0, C:1, D:1, E:0.
2. comprise the fragment of 2 search termses: AB:10, BC:01, CD:11, DE:10.
3. comprise the fragment of 3 search termses: ABC:101, BCD:011, CDE:110.
4. comprise the fragment of 4 search termses: ABCD:1011, BCDE:0110.
5. comprise the fragment of 5 search termses: ABCDE:10110.
To every a pair search data to all carrying out as above processing procedure, after processing all search words of the right set of search data and the search result content of correspondence, obtain key-value pair set.Add up the value in the identical each key-value pair of key in key-value pair set, for each search terms in the key that this is identical, adding up this search terms value in the value of the identical each key-value pair of key is the number of times of 1, is designated as the first numerical value; The number of each key-value pair that stat key is identical, is designated as second value; According to the ratio of the first numerical value and second value, determine the probability that this search terms occurs in search result content.For A, B and C, suppose, in each key-value pair of same keys " ABC ", to add up the probability that each search terms occurs in search result content, similar following data can be obtained:
ABC:0.7,0.3,0.9。
The following implication of this data representation: all search data centerings comprising fragment " ABC ", the probability comprising A in the Search Results of click is 0.7, and the probability comprising B is 0.3, and the probability comprising C is 0.9.Therefore can think that the important ratio of A and C is higher, and the important ratio of B is lower.
In one embodiment of the invention, be saved in weight database by each search terms and corresponding weight, then on the basis of the above, the method comprises further:
This search word cutting, when receiving search word, is multiple search terms by step S140.
Step S150, obtains the weight that the plurality of search terms is corresponding respectively from weight database.
Step S160, the weight corresponding respectively according to the plurality of search terms carries out search process.
Utilize the weight of each search terms in the probability calculation fragment obtained, because probability calculation process considers the importance of each search terms content in Search Results, so this weight more meets user's request, accuracy is high, gained weight is saved in weight database, in on-line search process, utilize this weight to carry out search process, effectively can improve the search quality of search engine.
Fig. 2 shows a kind of according to an embodiment of the invention device schematic diagram determining the weight of search terms, and as shown in Figure 2, this determines that the device 200 of the weight of search terms comprises:
Data capture unit 210, is suitable for obtaining the right set of search data; Wherein this search data is to comprising: search word and corresponding search result content.
Probability determining unit 220, is suitable for the set right according to search data, determines the probability that each search terms comprised in each search word occurs in search result content.
Weight determining unit 230, is suitable for the probability occurred in search result content according to each search terms, determines the weight of each search terms.
Fig. 3 shows a kind of in accordance with another embodiment of the present invention probability determining unit schematic diagram determining the device of the weight of search terms, and as shown in Figure 3, probability determining unit 220 comprises further:
Key-value pair output unit 221 and statistic unit 222.
Key-value pair output unit 221, be suitable for for search data to set in each search data pair, determine each continuous print fragment that can obtain from the search word that each search data is right; Take fragment as key, and situation about whether occurring in the search result content that the search word at this fragment place is corresponding with each search terms that this fragment comprises is for value, exports key-value pair.
Statistic unit 222, is suitable for, in the key-value pair set exported, by the value in each key-value pair that stat key is identical, obtaining the probability that each search terms in this key occurs in search result content.
In one embodiment of the invention, data capture unit 210, is suitable for obtaining search data collection from search engine click logs.
In one embodiment of the invention, key-value pair output unit 221, be suitable for the search item number N determining to comprise in this fragment, N is natural number; Using the binary number of N position as described value, and represent whether corresponding search terms occurs in the search result content of correspondence by two kinds of possibility values of every bit.
In one embodiment of the invention, statistic unit 222, is suitable for for each search terms in this identical key, adds up this search terms and show as the number of times occurred in search result content in the value of the identical each key-value pair of key, be designated as the first numerical value; The number of each key-value pair that stat key is identical, is designated as second value; According to the ratio of described first numerical value and second value, determine the probability that this search terms occurs in search result content.
Such as, key-value pair output unit 221, be suitable for the search item number N determining to comprise in this fragment, N is natural number; Using the binary number of N position as described value, and with representing during every bit value 1 that corresponding search terms occurs in the search result content of correspondence, representing during value 0 and not occurring.
Correspondingly, statistic unit 222, is suitable for for each search terms in this identical key, and adding up this search terms value in the value of the identical each key-value pair of key is the number of times of 1, is designated as the first numerical value; The number of each key-value pair that stat key is identical, is designated as second value; According to the ratio of the first numerical value and second value, determine the probability that this search terms occurs in search result content.
In one embodiment of the invention, search result content be following in any one;
The title of search results pages;
The summary of search results pages;
The full content of search results pages.
Fig. 4 shows a kind of in accordance with another embodiment of the present invention device schematic diagram determining the weight of search terms, as shown in Figure 4, this determines that the device 300 of the weight of search terms comprises: data capture unit 310, probability determining unit 320, weight determining unit 330, storage unit 340 and search processing 350.
Wherein, data capture unit 310, probability determining unit 320 and weight determining unit 330 and data capture unit 210 mentioned above, probability determining unit 220 are corresponding identical with weight determining unit 230, do not repeat them here.
In the present embodiment, described weight determining unit 330, is further adapted for and is saved in weight database by each search terms and corresponding weight;
Storage unit 340, is suitable for storing described weight database;
Search processing 350, is suitable for when receiving search word, is multiple search terms by this search word cutting; The weight that the plurality of search terms is corresponding is respectively obtained from described weight database; The weight corresponding respectively according to the plurality of search terms carries out search process.
It should be noted that, each embodiment of Fig. 2 to Fig. 4 shown device is corresponding identical with each embodiment of method shown in Fig. 1 above, describes in detail above, does not repeat them here.
In sum, the right set of search data is obtained according to technical scheme of the present invention, the set right according to search data, determine the probability that each search terms comprised in each search word occurs in search result content, according to the probability that each search terms occurs in search result content, determine the weight of each search terms, and be saved in weight database.The present invention obtains search data to set from search engine click logs, and the data volume that can utilize is large, and data acquisition is easy, and due to its correlativity that take into account user's request and click result, more meet user's request, the degree of correlation is high.By technical scheme of the present invention, the importance of each search terms content appeared in Search Results can be taken into full account, the probability that each search terms that excavation search data centering fragment and fragment comprise on a large scale occurs in Search Results, simultaneously, utilize the weight of each search terms in the probability calculation fragment obtained, the weight of gained more meets user's request, accuracy is high, gained weight is saved in weight database, in on-line search process, utilize this weight to carry out search process, effectively can improve the search quality of search engine.
It should be noted that:
Intrinsic not relevant to any certain computer, virtual bench or miscellaneous equipment with display at this algorithm provided.Various fexible unit also can with use based on together with this teaching.According to description above, the structure constructed required by this kind of device is apparent.In addition, the present invention is not also for any certain programmed language.It should be understood that and various programming language can be utilized to realize content of the present invention described here, and the description done language-specific is above to disclose preferred forms of the present invention.
In instructions provided herein, describe a large amount of detail.But can understand, embodiments of the invention can be put into practice when not having these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand in each inventive aspect one or more, in the description above to exemplary embodiment of the present invention, each feature of the present invention is grouped together in single embodiment, figure or the description to it sometimes.But, the method for the disclosure should be construed to the following intention of reflection: namely the present invention for required protection requires feature more more than the feature clearly recorded in each claim.Or rather, as claims below reflect, all features of disclosed single embodiment before inventive aspect is to be less than.Therefore, the claims following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that and adaptively can change the module in the equipment in embodiment and they are arranged in one or more equipment different from this embodiment.Module in embodiment or unit or assembly can be combined into a module or unit or assembly, and multiple submodule or subelement or sub-component can be put them in addition.Except at least some in such feature and/or process or unit be mutually repel except, any combination can be adopted to combine all processes of all features disclosed in this instructions (comprising adjoint claim, summary and accompanying drawing) and so disclosed any method or equipment or unit.Unless expressly stated otherwise, each feature disclosed in this instructions (comprising adjoint claim, summary and accompanying drawing) can by providing identical, alternative features that is equivalent or similar object replaces.
In addition, those skilled in the art can understand, although embodiments more described herein to comprise in other embodiment some included feature instead of further feature, the combination of the feature of different embodiment means and to be within scope of the present invention and to form different embodiments.Such as, in the following claims, the one of any of embodiment required for protection can use with arbitrary array mode.
All parts embodiment of the present invention with hardware implementing, or can realize with the software module run on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that the some or all functions that microprocessor or digital signal processor (DSP) can be used in practice to realize the some or all parts in the device of the weight of the determination search terms according to the embodiment of the present invention.The present invention can also be embodied as part or all equipment for performing method as described herein or device program (such as, computer program and computer program).Realizing program of the present invention and can store on a computer-readable medium like this, or the form of one or more signal can be had.Such signal can be downloaded from internet website and obtain, or provides on carrier signal, or provides with any other form.
The present invention will be described instead of limit the invention to it should be noted above-described embodiment, and those skilled in the art can design alternative embodiment when not departing from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and does not arrange element in the claims or step.Word "a" or "an" before being positioned at element is not got rid of and be there is multiple such element.The present invention can by means of including the hardware of some different elements and realizing by means of the computing machine of suitably programming.In the unit claim listing some devices, several in these devices can be carry out imbody by same hardware branch.Word first, second and third-class use do not represent any order.Can be title by these word explanations.

Claims (10)

1. determine a method for the weight of search terms, wherein, the method comprises:
Obtain the set that search data is right; Wherein said search data is to comprising: search word and corresponding search result content;
The set right according to search data, determines the probability that each search terms comprised in each search word occurs in search result content;
According to the probability that each search terms occurs in search result content, determine the weight of each search terms.
2. the method for claim 1, wherein described set right according to search data, determine that the probability that each search terms comprised in each search word occurs in search result content comprises:
For search data to each search data pair in set, determine each continuous print fragment that can obtain from the search word that each search data is right;
Take fragment as key, and situation about whether occurring in the search result content that the search word at this fragment place is corresponding with each search terms that this fragment comprises is for value, exports key-value pair;
In the key-value pair set exported, by the value in each key-value pair that stat key is identical, obtain the probability that each search terms in this key occurs in search result content.
3. method as claimed in claim 1 or 2, wherein, the set that described acquisition search data is right comprises:
Search data is obtained to composition set from search engine click logs.
4. the method according to any one of claim 1-3, wherein, the situation whether described each search terms comprised with this fragment occurs in the search result content that the search word at this fragment place is corresponding comprises for value:
Determine the search item number N comprised in this fragment, N is natural number;
Using the binary number of N position as described value, and represent whether corresponding search terms occurs in the search result content of correspondence by two kinds of possibility values of every bit.
5. the method according to any one of claim 1-4, wherein, described by the value in the identical each key-value pair of stat key, obtain the probability that each search terms in this key occurs in search result content and comprise:
For each search terms in the key that this is identical, add up this search terms and show as the number of times occurred in search result content in the value of the identical each key-value pair of described key, be designated as the first numerical value;
Add up the number of the identical each key-value pair of described key, be designated as second value;
According to the ratio of described first numerical value and second value, determine the probability that this search terms occurs in search result content.
6. the method according to any one of claim 1-5, wherein,
The situation whether described each search terms comprised with this fragment occurs in the search result content that the search word at this fragment place is corresponding comprises for value: determine the search item number N comprised in this fragment, N is natural number; Using the binary number of N position as described value, and with representing during every bit value 1 that corresponding search terms occurs in the search result content of correspondence, representing during value 0 and not occurring;
Value in described each key-value pair identical by stat key, obtain the probability that each search terms in this key occurs in search result content to comprise: for each search terms in the key that this is identical, adding up this search terms value in the value of the identical each key-value pair of described key is 1 number, is designated as the first numerical value;
Add up the number of the identical each key-value pair of described key, be designated as second value;
According to the ratio of described first numerical value and second value, determine the probability that this search terms occurs in search result content.
7. the method according to any one of claim 1-6, wherein, described search result content be following in any one;
The title of search results pages;
The summary of search results pages;
The full content of search results pages.
8. the method according to any one of claim 1-7, the method comprises further:
Each search terms and corresponding weight are saved in weight database;
When receiving search word, be multiple search terms by this search word cutting;
The weight that the plurality of search terms is corresponding is respectively obtained from described weight database;
The weight corresponding respectively according to the plurality of search terms carries out search process.
9. determine a device for the weight of search terms, wherein, this device comprises:
Data capture unit, is suitable for obtaining the right set of search data; Wherein this search data is to comprising: search word and corresponding search result content;
Probability determining unit, is suitable for the set right according to search data, determines the probability that each search terms comprised in each search word occurs in search result content;
Weight determining unit, is suitable for the probability occurred in search result content according to each search terms, determines the weight of each search terms.
10. device as claimed in claim 9, wherein, described probability determining unit, comprises further:
Key-value pair output unit, be suitable for for search data to set in each search data pair, determine each continuous print fragment that can obtain from the search word that each search data is right; Take fragment as key, and situation about whether occurring in the search result content that the search word at this fragment place is corresponding with each search terms that this fragment comprises is for value, exports key-value pair;
Statistic unit, is suitable for, in the key-value pair set exported, by the value in each key-value pair that stat key is identical, obtaining the probability that each search terms in this key occurs in search result content.
CN201510917486.XA 2015-12-10 2015-12-10 A kind of method and apparatus of the weight of determining search terms Active CN105528430B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510917486.XA CN105528430B (en) 2015-12-10 2015-12-10 A kind of method and apparatus of the weight of determining search terms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510917486.XA CN105528430B (en) 2015-12-10 2015-12-10 A kind of method and apparatus of the weight of determining search terms

Publications (2)

Publication Number Publication Date
CN105528430A true CN105528430A (en) 2016-04-27
CN105528430B CN105528430B (en) 2019-05-31

Family

ID=55770653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510917486.XA Active CN105528430B (en) 2015-12-10 2015-12-10 A kind of method and apparatus of the weight of determining search terms

Country Status (1)

Country Link
CN (1) CN105528430B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933714A (en) * 2019-03-18 2019-06-25 北京搜狗科技发展有限公司 A kind of calculation method, searching method and the relevant apparatus of entry weight

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5278980A (en) * 1991-08-16 1994-01-11 Xerox Corporation Iterative technique for phrase query formation and an information retrieval system employing same
CN102193932A (en) * 2010-03-09 2011-09-21 北京金山软件有限公司 Method and system for determining search term
CN102289436A (en) * 2010-06-18 2011-12-21 阿里巴巴集团控股有限公司 Method and device for determining weighted value of search term and method and device for generating search results
CN103150362A (en) * 2013-02-28 2013-06-12 北京奇虎科技有限公司 Video search method and system
CN104361115A (en) * 2014-12-01 2015-02-18 北京奇虎科技有限公司 Entry weight definition method and device based on co-clicking
CN104376115A (en) * 2014-12-01 2015-02-25 北京奇虎科技有限公司 Fuzzy word determining method and device based on global search
CN104615723A (en) * 2015-02-06 2015-05-13 百度在线网络技术(北京)有限公司 Determining method and device of search term weight value
CN104636403A (en) * 2013-11-15 2015-05-20 腾讯科技(深圳)有限公司 Query request processing method and device
CN105095381A (en) * 2015-06-30 2015-11-25 北京奇虎科技有限公司 Method and device for new word identification

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5278980A (en) * 1991-08-16 1994-01-11 Xerox Corporation Iterative technique for phrase query formation and an information retrieval system employing same
CN102193932A (en) * 2010-03-09 2011-09-21 北京金山软件有限公司 Method and system for determining search term
CN102289436A (en) * 2010-06-18 2011-12-21 阿里巴巴集团控股有限公司 Method and device for determining weighted value of search term and method and device for generating search results
CN103150362A (en) * 2013-02-28 2013-06-12 北京奇虎科技有限公司 Video search method and system
CN104636403A (en) * 2013-11-15 2015-05-20 腾讯科技(深圳)有限公司 Query request processing method and device
CN104361115A (en) * 2014-12-01 2015-02-18 北京奇虎科技有限公司 Entry weight definition method and device based on co-clicking
CN104376115A (en) * 2014-12-01 2015-02-25 北京奇虎科技有限公司 Fuzzy word determining method and device based on global search
CN104615723A (en) * 2015-02-06 2015-05-13 百度在线网络技术(北京)有限公司 Determining method and device of search term weight value
CN105095381A (en) * 2015-06-30 2015-11-25 北京奇虎科技有限公司 Method and device for new word identification

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933714A (en) * 2019-03-18 2019-06-25 北京搜狗科技发展有限公司 A kind of calculation method, searching method and the relevant apparatus of entry weight
CN109933714B (en) * 2019-03-18 2021-04-20 北京搜狗科技发展有限公司 Entry weight calculation method, entry weight search method and related device

Also Published As

Publication number Publication date
CN105528430B (en) 2019-05-31

Similar Documents

Publication Publication Date Title
CN104951099A (en) Method and device for showing candidate items based on input method
CN104361115A (en) Entry weight definition method and device based on co-clicking
CN102915314A (en) Automatic error correction pair generation method and system
US20090094486A1 (en) Method For Test Case Generation
CN105653701A (en) Model generating method and device as well as word weighting method and device
CN105389352A (en) Log processing method and apparatus
CN104376115A (en) Fuzzy word determining method and device based on global search
CN105022801A (en) Hot video mining method and hot video mining device
CN110347782A (en) Article duplicate checking method, apparatus and electronic equipment
CN104050286A (en) Method and device for providing search result integration
CN105095381A (en) Method and device for new word identification
CN104317931A (en) Webpage title determining method and device
CN103942264A (en) Method and device for pushing webpages containing news information
CN104778159A (en) Word segmenting method and device based on word weights
CN104615723A (en) Determining method and device of search term weight value
CN104715067A (en) Method, device and system for making key words on web page and browser client
CN111435406A (en) Method and device for correcting database statement spelling errors
CN105488209B (en) A kind of analysis method and device of word weight
CN103744970A (en) Method and device for determining subject term of picture
CN105528430A (en) Method and device for determining weights of search terms
US20130007023A1 (en) System and Method for Consolidating Search Engine Results
CN105183905A (en) Method and device for excavating query terms of official website
CN113792232B (en) Page feature calculation method, page feature calculation device, electronic equipment, page feature calculation medium and page feature calculation program product
CN104462519A (en) Search query method and device
CN103778181A (en) Method and device for importing icons into favorites of browsers

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220726

Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015

Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Patentee before: Qizhi software (Beijing) Co.,Ltd.

TR01 Transfer of patent right