CN100557612C - A kind of search result ordering method and device based on search engine - Google Patents

A kind of search result ordering method and device based on search engine Download PDF

Info

Publication number
CN100557612C
CN100557612C CNB2007101872765A CN200710187276A CN100557612C CN 100557612 C CN100557612 C CN 100557612C CN B2007101872765 A CNB2007101872765 A CN B2007101872765A CN 200710187276 A CN200710187276 A CN 200710187276A CN 100557612 C CN100557612 C CN 100557612C
Authority
CN
China
Prior art keywords
keyword
participle
weight
internet resources
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2007101872765A
Other languages
Chinese (zh)
Other versions
CN101158971A (en
Inventor
刘汉洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhigu Ruituo Technology Services Co Ltd
Original Assignee
Shenzhen Xunlei Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xunlei Network Technology Co Ltd filed Critical Shenzhen Xunlei Network Technology Co Ltd
Priority to CNB2007101872765A priority Critical patent/CN100557612C/en
Publication of CN101158971A publication Critical patent/CN101158971A/en
Application granted granted Critical
Publication of CN100557612C publication Critical patent/CN100557612C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of search result ordering method and device, relate to searching engine field, the demand that ranking results is close to the users more based on search engine.Method comprises: the search word to user's input carries out word segmentation processing; Participle with the word segmentation processing gained is searched in keyword index respectively, to determine the keyword weight of described search word in respectively treating the sorting network resource; Determine the total weight of described search word in respectively treating the sorting network resource; And described Internet resources respectively to be sorted are sorted, and present to the user according to total weight.Device comprises: participle unit, keyword weight determining unit, total weight determining unit, sequencing unit and display unit.

Description

A kind of search result ordering method and device based on search engine
Technical field
The present invention relates to searching engine field, particularly relate to a kind of search result ordering method and device based on search engine.
Background technology
Along with the continuous development of search engine technique and the progress of the information processing technology, people to the demand of search engine also more and more widely, the kind of search engine also presents diversified development.At present, the search engine of main flow is divided into: full-text search engine, catalogue search engine and META Search Engine.In the recent period, vertical search engine has also progressed into people's the visual field.
At searching engine field, can major criterion that estimate a search engine quality be exactly allow the user find required information as early as possible, promptly relevant with the user search theme various information.
In recent years, each large search engine is all optimized in the relevance of search results ordering.So-called relevance of search results just is meant the degree of correlation of the user search speech and the page.Usually, correlativity is the important evidence that search engine sorts.The main method of calculating page correlativity has the HillTop of PageRank, the Bharat of google, the super link analysis of Baidu etc.Their ultimate principle is exactly that the situation of quoting according to webpage sorts.
But owing to there is the problem of participle in Chinese search engine, dictionary is with the basis of search word as the search engine of query word.The quality of dictionary is determining the effect of Search Results ordering to a certain extent.Dictionary is too small, can cause the appearance of too much irrelevant information; Dictionary is excessive, causes problems such as the theme of part speech Search Results is very few to occur sometimes.Therefore how to determine to increase new extended lexicon collection by dictionary, the result who makes search engine more accurately, hommization just becomes a problem that receives much concern thereupon more.
Summary of the invention
The embodiment of the invention provides a kind of search result ordering method and device based on search engine, the demand that ranking results is close to the users more.
A kind of search result ordering method of the embodiment of the invention based on search engine, attribute customization keyword dictionary according to speech and speech, according to described keyword dictionary, according to maximum match principle the subject information of each Internet resources is carried out word segmentation processing, attribute according to word segmentation processing gained participle filters this participle, extracts the keyword of the subject information of each Internet resources; Keyword to the subject information of described each Internet resources carries out word segmentation processing, sets up the keyword index of each participle of keyword to Internet resources; According to basic dictionary for word segmentation the subject information of Internet resources is carried out word segmentation processing, set up the resource index of each participle of Internet resources to Internet resources, this sort method comprises the following steps: the search word of user's input is carried out word segmentation processing; Carry out the participle of word segmentation processing gained searches in resource index respectively with the search word to user's input, to determine the set of the Internet resources under each participle respectively, get each described intersection of sets collection, as Internet resources to be sorted, and in keyword index, search respectively with the participle of this word segmentation processing gained, to determine the keyword weight of described search word in respectively treating the sorting network resource; Determine the total weight of described search word in respectively treating the sorting network resource; And described Internet resources respectively to be sorted are sorted, and present to the user according to total weight.
A kind of Search Results collator based on search engine of the embodiment of the invention comprises: customization units is used for attribute with speech and speech as basic structure, customization keyword dictionary; The attribute that comprises each effective speech and each effective speech correspondence in the keyword dictionary of customization, and the attribute of each invalid speech and each invalid speech correspondence; Extraction unit is used for according to the keyword dictionary, by maximum match principle the subject information of each Internet resources is carried out word segmentation processing; Attribute according to word segmentation processing gained participle filters this participle, with the keyword of the subject information that extracts each Internet resources; Keyword index is set up the unit, is used for respectively each keyword of the subject information of each Internet resources is carried out word segmentation processing, and sets up the keyword index of each participle of keyword to Internet resources, calls in order to keyword weight determining unit; Resource index is set up the unit, is used for according to basic dictionary for word segmentation the subject information of Internet resources being carried out word segmentation processing, and sets up the resource index of each participle of Internet resources to Internet resources; The participle unit is used for the search word of user's input is carried out word segmentation processing; Determining unit is searched in resource index respectively search word is carried out word segmentation processing gained participle, to determine the set of the Internet resources under each participle respectively; Get each described intersection of sets collection, as Internet resources to be sorted; Keyword weight determining unit is used for carrying out word segmentation processing gained participle with the search word to user's input and searches in keyword index respectively, to determine the keyword weight of described search word in respectively treating the sorting network resource; Total weight determining unit is used for definite described search word and is respectively treating total weight of sorting network resource; Sequencing unit is used for according to total weight described Internet resources respectively to be sorted being sorted; Display unit is used for presenting ranking results to the user.
In sum, the search word to user's input carries out word segmentation processing in the embodiment of the invention; Participle with the word segmentation processing gained is searched in keyword index respectively, with the definite keyword weight of described search word in respectively treating the sorting network resource, and determines the total weight of described search word in respectively treating the sorting network resource.Owing to considered the situations such as coupling of search word and keyword in total weight, so described Internet resources respectively to be sorted are sorted and present to user, the demand that can be close to the users more according to total weight.
Description of drawings
Fig. 1 is the method step process flow diagram of the embodiment of the invention;
Fig. 2 is the apparatus structure synoptic diagram of the embodiment of the invention;
Fig. 3 is the installation optimization structural representation of the embodiment of the invention;
Fig. 4 is the index synoptic diagram of the embodiment of the invention;
Fig. 5 is for determining to treat the synoptic diagram of sorting network resource in the embodiment of the invention;
Fig. 6 is the synoptic diagram of inquiry participle weight in the embodiment of the invention.
Embodiment
For the demand that ranking results is close to the users more, the embodiment of the invention provides a kind of search result ordering method and device based on search engine, below respectively the brief overview.
A kind of search result ordering method based on search engine that the embodiment of the invention provides sets in advance having carried out some, user's inputted search speech, and determined to treat after the sorting network resource, referring to shown in Figure 1, carry out following key step:
S1, the search word of user input is carried out word segmentation processing (this step also can be treated to carry out before the sorting network resource determining).
S2, in keyword index, search respectively, to determine that described search word is in the keyword weight of respectively treating in the sorting network resource (include but not limited to web page resources and downloaded resources, below repeat no more) with the participle of word segmentation processing gained.
S3, determine the total weight of described search word in respectively treating the sorting network resource.
S4, described Internet resources respectively to be sorted are sorted, and present to the user according to total weight.
Before user's inputted search speech is searched for, in advance step is set, specifically comprise:
The step of customization keyword dictionary: as basic structure, comprise the attribute of each effective speech and each effective speech correspondence with the attribute of speech and speech in the keyword dictionary of customization, and the attribute of each invalid speech and each invalid speech correspondence.The set of the set of described invalid speech and effective speech is mutex relation each other, and the character that effective speech comprises of character covering that invalid speech comprises.The attribute of institute's predicate is with the character type numeral, and a kind of attribute of institute's predicate represented respectively in each character.
Extract the step of keyword:, the subject information of each Internet resources is carried out word segmentation processing by maximum match principle according to the keyword dictionary; Attribute according to word segmentation processing gained participle filters this participle, with the keyword of the subject information that extracts each Internet resources.Wherein,, perhaps from the content of webpage, extract the subject information of this webpage with the title of webpage subject information as this webpage, perhaps with the information of describing downloaded resources as subject information etc.
Set up the step of keyword index: each keyword to the subject information of each Internet resources adopts basic dictionary for word segmentation to carry out word segmentation processing respectively, and sets up the keyword index of each participle of keyword to Internet resources.
Set up the step of resource index: according to basic dictionary for word segmentation the subject information of Internet resources is carried out word segmentation processing, and set up the resource index of each participle of Internet resources to Internet resources.
The step of configure weights: each the participle speech length according to keyword accounts for the long ratio of this keyword speech, for each participle disposes the participle weight respectively; Perhaps (include but not limited to: viewed number of times and/or the situation and/or be downloaded number of times and/or file layout of being cited according to the information of Internet resources, below repeat no more), be this Internet resources configuring static weight, and account for the long ratio of this keyword speech according to each participle speech length of keyword, for each participle disposes the participle weight respectively.The weight of configuration can be recorded in above-mentioned resource index and the keyword index.After the configure weights in S2, can in keyword index, search respectively search word being carried out word segmentation processing gained participle, to determine the participle weight of each participle in the keyword of the subject information of respectively treating the sorting network resource, and, treat keyword weight in the sorting network resource at this as search word with the participle weight addition of each participle in the same subject information for the treatment of the sorting network resource.In S3, desirable search word is in the current total weight for the treatment of in the sorting network resource of keyword weight conduct; Also desirable according to the current information configuration for the treatment of the sorting network resource static weight and search word in the current keyword weight for the treatment of in the sorting network resource, and with this static weight and the synthetic current total weight for the treatment of the sorting network resource of keyword set of weights; Perhaps with other associated weight and the synthetic current total weight for the treatment of the sorting network resource of keyword set of weights.
After user's inputted search speech is searched for, determine to treat that the sorting network resource specifically searches respectively search word is carried out word segmentation processing gained participle in resource index, to determine the set of the Internet resources under each participle respectively; Get each described intersection of sets collection, as Internet resources to be sorted.
The embodiment of the invention also provides a kind of Search Results collator based on search engine, and referring to shown in Figure 2, it comprises: participle unit, keyword weight determining unit, total weight determining unit, sequencing unit and display unit.
The participle unit is used for the search word of user's input is carried out word segmentation processing.
Keyword weight determining unit is used for searching in keyword index respectively with word segmentation processing gained participle, to determine the keyword weight of described search word in respectively treating the sorting network resource.
Total weight determining unit is used for definite described search word and is respectively treating total weight of sorting network resource.
Sequencing unit is used for according to total weight described Internet resources respectively to be sorted being sorted.
Display unit is used for presenting ranking results to the user.
Further in order to provide said units required information, referring to shown in Figure 3, described device also comprises: customization units, extraction unit, keyword index are set up the unit, resource index is set up unit, determining unit and dispensing unit.
Customization units is used for attribute with speech and speech as basic structure, customization keyword dictionary; The attribute that comprises each effective speech and each effective speech correspondence in the keyword dictionary of customization, and the attribute of each invalid speech and each invalid speech correspondence.
Extraction unit is used for according to the keyword dictionary, by maximum match principle the subject information of each Internet resources is carried out word segmentation processing; Attribute according to word segmentation processing gained participle filters this participle, with the keyword of the subject information that extracts each Internet resources.
Keyword index is set up the unit, be used for respectively each keyword of the subject information of each Internet resources being carried out word segmentation processing according to basic dictionary for word segmentation, and set up the keyword index of each participle of keyword to Internet resources, call in order to keyword weight determining unit.
Resource index is set up the unit, is used for according to basic dictionary for word segmentation the subject information of Internet resources being carried out word segmentation processing, and sets up the resource index of each participle of Internet resources to Internet resources.
Determining unit is searched in resource index respectively search word is carried out word segmentation processing gained participle, to determine the set of the Internet resources under each participle respectively; Get each described intersection of sets collection, as Internet resources to be sorted.
Dispensing unit is used for accounting for the long ratio of this keyword speech according to each participle speech length of keyword, for each participle disposes the participle weight respectively; Perhaps, be this Internet resources configuring static weight, and account for the long ratio of this keyword speech, for each participle disposes the participle weight respectively according to each participle speech length of keyword according to the information of Internet resources.After the dispensing unit configure weights, keyword weight determining unit can be searched in keyword index respectively search word being carried out word segmentation processing gained participle, to determine the participle weight of each participle in the keyword of the subject information of respectively treating the sorting network resource, and, treat keyword weight in the sorting network resource at this as search word with the participle weight addition of each participle in the same subject information for the treatment of the sorting network resource.Total desirable search word of weight determining unit is in the current total weight for the treatment of in the sorting network resource of keyword weight conduct; Also desirable according to the current information configuration for the treatment of the sorting network resource static weight and search word in the current keyword weight for the treatment of in the sorting network resource, and with this static weight and the synthetic current total weight for the treatment of the sorting network resource of keyword set of weights; Perhaps with other associated weight and the synthetic current total weight for the treatment of the sorting network resource of keyword set of weights.
So far, the method for the embodiment of the invention and the general introduction of device are finished.Below by 1 embodiment the present invention is described in further detail.
Embodiment 1, present embodiment comprise the step that step is set, determines to treat the sorting network resource, step, the ordered steps of calculating weight, and rendering step.Step wherein is set to be comprised: the customization substep of keyword dictionary, the extraction substep of keyword, set up keyword index substep, set up the substep of resource index, and weight configuration substep.
101, the customization of keyword dictionary.
Keyword promptly can identify the vocabulary of the subject information of Internet resources (web page resources or downloaded resources).For example, in search engine, the user is through regular meeting's Input Software title+" download ", movie name+phrases such as " high-resolutions ", and dbase here and movie name just can be defined as the keyword of these phrases.
In order effectively to extract the keyword of the subject information of Internet resources, at first need to set up a keyword dictionary.According to user's daily search habit statistical, in video display search engine, music searching engine and universal search engine, the user usually can import vocabulary such as video display name, song title, singer's name as search word.Therefore, can set up the keyword dictionary according to information such as at present popular film, TV play, song, singer, performers.The basic structure of this dictionary is: (speech, attribute).Wherein, attribute description the validity and the classification of speech, as whether effective, whether be movie name, title of the song, software name etc.
Present embodiment (but being not limited to this mode) is in the following ways described attribute: attribute information is described in the character type numeral step-by-step with a byte, and totally 8, each represents a kind of attribute of this speech, and 1 for having this attribute, and 0 for not having this attribute.As " hero " not only can be movie name but also can be the TV play name, and its attribute just can be expressed as 11100000, and every attribute information ginseng is shown in Table 1:
7 6 5 4 3 2 1 0
Validity Video display TV play Title of the song The singer The director The performer The software name
Table 1
Wherein the attribute definition of most significant digit (i.e. the 7th shown in the table 1) is as follows: this has write down effective attribute of speech in the keyword dictionary, and invalid set of words and effective set of words be mutex relation each other.Speech A in the invalid set of words can comprise certain the speech B in effective set of words on literal, be effective speech as this speech of certain movie name " east ", and " east ", " east gate " etc. are invalid speech.The preferential of invalid speech determines that principle is: comprise certain effective speech on literal, but do not belong to effective set of words, and be not the vocabulary that certain movie name, title of the song etc. can be used as keyword.
102, the extraction of keyword.
At each Internet resources in the search engine database, need extract corresponding keyword for its subject information.
At first adopt the keyword dictionary, by maximum match principle the subject information of Internet resources is carried out participle, the result filters according to its attribute with the participle gained.Removing attribute is invalid vocabulary, and reserved property is effective vocabulary, and with the vocabulary that the keeps keyword as the subject information of these Internet resources.
For example, have in the keyword dictionary with next group speech:
East 1,100 0000
East 0,000 0000
High Road to China 1,010 0000
Northeast 0,000 0000
……
To extraction result be as next web pages title:
The titbit in film east------east
High Road to China high-resolution version-----High Road to China
The path in northeast-----
For vertical search engine, as to the video display search engine, the last of keyword determined and can also further be filtered according to other attributes of the keyword that extracts.As the keyword to web page title " dragon and tiger door Zhen Zi is red to be acted the leading role " extraction is " dragon and tiger door " and " Zhen Zidan ", but " Zhen Zidan " is not a video display vocabulary, but a name, just should filter " Zhen Zidan " this speech this moment.This filter type can be determined according to the concrete search category of search engine.
103, set up keyword index.
Adopt basic dictionary for word segmentation (but being not limited to), each keyword to the subject information of each Internet resources carries out word segmentation processing respectively, and sets up the keyword index of each participle of keyword to Internet resources.
For example just like the subject information of next group Internet resources:
Doc1: ineffable secret complete or collected works' Chinese subtitle;
Doc2: ineffable secret complete or collected works;
Doc3: iron triangle DVD Chinese subtitle;
Doc4: iron triangle complete or collected works;
Doc5: iron triangle (acting the leading role unconventional and unrestrained China);
Doc6: secret complete or collected works;
Their keyword is respectively:
Doc1: ineffable secret;
Doc2: ineffable secret;
Doc3: iron triangle;
Doc4: iron triangle;
Doc5: iron triangle;
Doc6: secret.
Each keyword is carried out word segmentation processing, obtains following participle: can not, say,, secret, iron triangle.
Keyword index to set up situation as follows:
" can not " related Doc1 and Doc2; " say " related Doc1 and Doc2; " " related Doc1 and Doc2; " secret " related Doc1, Doc2 and Doc6; " iron triangle " related Doc3, Doc4 and Doc5.
104, set up resource index (and set up between the keyword index in no particular order).
According to basic dictionary for word segmentation (but being not limited to) subject information of Internet resources is carried out word segmentation processing, and set up the resource index of each participle of Internet resources to Internet resources.
For example just like the subject information of next group Internet resources:
Doc1: ineffable secret complete or collected works' Chinese subtitle;
Doc2: ineffable secret complete or collected works;
Doc3: iron triangle DVD Chinese subtitle;
Doc4: iron triangle complete or collected works;
Doc5: iron triangle (acting the leading role unconventional and unrestrained China);
Doc6: secret complete or collected works;
After the word segmentation processing resource index to set up situation as follows:
" can not " related Doc1, Doc2; " say " related Doc1, Doc2; " " related Doc1, Doc2; " secret " related Doc1, Doc2, Doc6; " complete or collected works " related Doc1, Doc2, Doc4, Doc6; " Chinese " related Doc1, Doc3; " captions " related Doc1, Doc3; " iron triangle " related Doc3, Doc4, Doc5; " DVD " related Doc3; " protagonist " related Doc5; " unconventional and unrestrained China " related Doc5.
105, weight configuration.
Weight configuration comprises: dispose this two parts to the static weight configuration of Internet resources and to the weight of each participle in the keyword.
Wherein, the static weight of web page resources is quoted information such as situation and is determined by number of visits, web page source, the webpage of webpage; The static weight of downloaded resources is determined by information such as the download time of resource, file size, file layouts.For example:, can determine that the static weight of this downloaded resources is W1 according to the download time of docid1, the information such as size of docid1 for certain downloaded resources docid1.
Wherein, weight configuration to each participle in the keyword comprises the following steps: at first according to basic dictionary for word segmentation (but being not limited to) keyword to be carried out participle, be divided into four speech as keyword " ineffable secret ", promptly word segmentation result is: can not, say,, secret.Next supposes that the weight of each keyword is weight=1, then word1 " can not " pairing weight is W11, word2 " says " that pairing weight is W21, word3 " " pairing weight is W31, the pairing weight of word4 " secret " is W41, and W11=W41=1/3, W21=W31=1/4, promptly each participle weight is determined in the ratio that participle speech length accounts for keyword speech length.
The weight of each participle can join in above-mentioned resource index and the keyword index in the static weight of configuration and the keyword.Referring to shown in Figure 4, the static weight information of all-network resource all is recorded in together in specific implementation, and is index with the docid of Internet resources correspondence.Word1, Word2...Wordn have write down the participle weight of this speech in the keyword of the subject information that is equipped with Internet resources respectively, and are index with the docid of the subject information correspondence of keyword belonging network resource.
106, determine to treat the sorting network resource.
Referring to shown in Figure 5, when the user imports certain speech word and searches for as search word, at first adopt basic dictionary for word segmentation to carry out word segmentation processing to search word word, obtain segmentation sequence word1, word2 ..., wordn.In resource index shown in Figure 4, find out participle wordk, k=1,2 then, ..., the common factor of the pairing docid sequence of n is as docid2, docid4, docid5 etc., and with the common factor of the Internet resources of the common factor correspondence of docid sequence as treating the sorting network resource.
107, calculate weight.
Calculating the total weight respectively treat the sorting network resource, below is example with docid2.
Referring to shown in Figure 6, in keyword index (referring to shown in Figure 4), search word1 respectively, word2 ..., the participle weight of wordn in the pairing subject information for the treatment of the sorting network resource of docid2, take out participle weights W 12, W22 ..., Wn2 adds up, obtain the keyword weight of search word in the pairing subject information for the treatment of the sorting network resource of docid2, i.e. Wk (docid)=∑ Wmn.If do not contain docid2 among the pairing docid of certain wordk, then its corresponding weights is Wk2=0, and promptly this speech is not the keyword participle of subject information of the Internet resources of docid2 correspondence.
And in resource index shown in Figure 4, get the static weight Ws (docid) of the Internet resources of docid2 correspondence.
Calculate total weights W (docid) of the Internet resources of docid2 correspondence at last.Can determine Ws (docid) and Wk (docid) the shared ratio of difference in W (docid) as the case may be, as: Ws (docid) accounts for q1, and Wk (docid) accounts for q2, then W (docid)=q1*Ws (docid)+q2*Wk (docid).
108, ordering.
After calculating total weight of respectively treating the sorting network resource, the described sorting network resource of respectively treating is sorted according to total weight order from high to low.
After adopting such scheme, can obtain more satisfactory Search Results to the Search Results ordering.For example, when user search " secret trailer ", if in the Search Results web page title 1-" secret trailer " is arranged, web page title 2-" ineffable secret trailer ", then the weight of " secret trailer " will be greater than the weight of " ineffable secret trailer ".This is that the keyword of " ineffable secret trailer " is " an ineffable secret " because the keyword of " secret trailer " is " secret ", and " trailer " is invalid keyword.After to the keyword participle, " ineffable secret " will be divided into " can not, say,, secret " four speech.In keyword index, " secret " weight in the keyword of web page title 1 is weight, and the weight in the keyword of web page title 2 is weight/3.
109, present ranking results to the user.
The highest Internet resources of the total weight of reality are come the foremost, thus the demand that ranking results is close to the users more.
As can be seen, q1 and q2 are adjustable from embodiment 1.Under special circumstances, owing to the reason of extracting keyword, sometimes work as the user and import a word, and when this word is a movie name, for example " east ", this search word may have many results and be keyword " east ", at this moment can cause too simplification of Search Results, the result shows that whole page or leaf all is the films in relevant " east ", may certain gap be arranged with the actual result who wants of user like this.Can reduce the q2 and the q1 that raises, with at these special circumstances.
In sum, the search word to user's input carries out word segmentation processing in the embodiment of the invention; Participle with the word segmentation processing gained is searched in keyword index respectively, with the definite keyword weight of described search word in respectively treating the sorting network resource, and determines the total weight of described search word in respectively treating the sorting network resource.Owing to considered the situations such as coupling of search word and keyword in total weight, so described Internet resources respectively to be sorted are sorted and present to user, the demand that can be close to the users more according to total weight.
Further, provide the step that step is set, determines to treat the sorting network resource, step, the ordered steps of calculating weight in the embodiment of the invention, and the specific implementation of rendering step.Step wherein is set to be comprised: the customization substep of keyword dictionary, the extraction substep of keyword, set up keyword index substep, set up the substep of resource index, and weight configuration substep.Better supported the present invention.
Further, q1 and q2 scalable in the embodiment of the invention 1 so can adjust as the case may be, satisfy user's various demands.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims (15)

1, a kind of search result ordering method based on search engine, it is characterized in that, attribute customization keyword dictionary according to speech and speech, according to described keyword dictionary, according to maximum match principle the subject information of each Internet resources is carried out word segmentation processing, attribute according to word segmentation processing gained participle filters this participle, extracts the keyword of the subject information of each Internet resources; Keyword to the subject information of described each Internet resources carries out word segmentation processing, sets up the keyword index of each participle of keyword to Internet resources; According to basic dictionary for word segmentation the subject information of Internet resources is carried out word segmentation processing, set up the resource index of each participle of Internet resources to Internet resources, this sort method comprises the following steps:
Search word to user's input carries out word segmentation processing;
Carry out the participle of word segmentation processing gained searches in resource index respectively with the search word to user's input, to determine the set of the Internet resources under each participle respectively, get each described intersection of sets collection, as Internet resources to be sorted, and in keyword index, search respectively with the participle of this word segmentation processing gained, to determine the keyword weight of described search word in respectively treating the sorting network resource;
Determine the total weight of described search word in respectively treating the sorting network resource; And
According to total weight described Internet resources respectively to be sorted are sorted, and present to the user.
2, the method for claim 1 is characterized in that, described keyword dictionary comprises:
The attribute of each effective speech and each effective speech correspondence, and the attribute of each invalid speech and each invalid speech correspondence.
3, method as claimed in claim 2 is characterized in that, the set of the set of described invalid speech and effective speech is mutex relation each other.
4, method as claimed in claim 3 is characterized in that, the character that effective speech comprises of character covering that described invalid speech comprises.
5, method as claimed in claim 2 is characterized in that, the attribute of described effective speech and invalid speech is with the character type numeral, and a kind of attribute of described effective speech and invalid speech represented respectively in each character.
6, the method for claim 1 is characterized in that, described each participle of setting up keyword after the keyword index of Internet resources, the search word of user input carried out word segmentation processing before described method further comprise:
Account for the long ratio of this keyword speech according to each participle speech length of keyword, for each participle disposes the participle weight respectively, each participle of keyword that described participle weight is joined foundation is in the keyword index of Internet resources; Perhaps
Information according to Internet resources, be this Internet resources configuring static weight, and account for the long ratio of this keyword speech according to each participle speech length of keyword, for each participle disposes the participle weight respectively, each participle of keyword that described participle weight is joined foundation is in the keyword index of Internet resources.
7, method as claimed in claim 6 is characterized in that, the information of described Internet resources comprises: viewed number of times and/or the situation and/or be downloaded number of times and/or file layout and/or file size of being cited.
8, method as claimed in claim 6 is characterized in that, the described definite keyword weight of described search word in respectively treating the sorting network resource comprises:
To carry out word segmentation processing gained participle with search word and in keyword index, search respectively, to determine the participle weight of each participle in the keyword of the subject information of respectively treating the sorting network resource user's input;
With the participle weight addition of each participle in the same subject information for the treatment of the sorting network resource, treat keyword weight in the sorting network resource at this as described search word.
9, method as claimed in claim 8 is characterized in that, described total weight comprises at least: search word is in the described keyword weight for the treatment of in the sorting network resource.
10, method as claimed in claim 8 is characterized in that, determines the total weight of described search word in respectively treating the sorting network resource, comprises the following steps:
Get the static weight for the treatment of the information configuration of sorting network resource according to current;
Get described search word in the current keyword weight for the treatment of in the sorting network resource;
With current static weight and the synthetic current total weight for the treatment of the sorting network resource of keyword set of weights for the treatment of the sorting network resource.
11, method as claimed in claim 10 is characterized in that, currently treats that total weight of sorting network resource is W (docid)=q1*Ws (docid)+q2*Wk (docid),
Wherein, docid represents Internet resources current to be sorted;
Q1 represents that static weight accounts for the ratio of total weight;
Ws (docid) represents static weight;
Q2 represents that the keyword weight accounts for the ratio of total weight;
Wk (docid) expression keyword weight.
12, the method for claim 1 is characterized in that, according to total weight order from high to low described Internet resources respectively to be sorted is sorted, and the ranking results forward is presented to the user.
13, a kind of Search Results collator based on search engine is characterized in that, comprising:
Customization units is used for attribute with speech and speech as basic structure, customization keyword dictionary; The attribute that comprises each effective speech and each effective speech correspondence in the keyword dictionary of customization, and the attribute of each invalid speech and each invalid speech correspondence;
Extraction unit is used for according to the keyword dictionary, by maximum match principle the subject information of each Internet resources is carried out word segmentation processing; Attribute according to word segmentation processing gained participle filters this participle, with the keyword of the subject information that extracts each Internet resources;
Keyword index is set up the unit, is used for respectively each keyword of the subject information of each Internet resources is carried out word segmentation processing, and sets up the keyword index of each participle of keyword to Internet resources, calls in order to keyword weight determining unit;
Resource index is set up the unit, is used for according to basic dictionary for word segmentation the subject information of Internet resources being carried out word segmentation processing, and sets up the resource index of each participle of Internet resources to Internet resources;
The participle unit is used for the search word of user's input is carried out word segmentation processing;
Determining unit is searched in resource index respectively search word is carried out word segmentation processing gained participle, to determine the set of the Internet resources under each participle respectively; Get each described intersection of sets collection, as Internet resources to be sorted;
Keyword weight determining unit is used for carrying out word segmentation processing gained participle with the search word to user's input and searches in keyword index respectively, to determine the keyword weight of described search word in respectively treating the sorting network resource;
Total weight determining unit is used for definite described search word and is respectively treating total weight of sorting network resource;
Sequencing unit is used for according to total weight described Internet resources respectively to be sorted being sorted;
Display unit is used for presenting ranking results to the user.
14, device as claimed in claim 13 is characterized in that, also comprises:
Dispensing unit is used for accounting for the long ratio of this keyword speech according to each participle speech length of keyword, for each participle disposes the participle weight respectively; Perhaps
According to the information of Internet resources, be this Internet resources configuring static weight, and account for the long ratio of this keyword speech, for each participle disposes the participle weight respectively according to each participle speech length of keyword;
Specifically comprise: keyword weight determining unit, being further used for carrying out word segmentation processing gained participle with the search word to user's input searches in keyword index respectively, to determine the participle weight of each participle in the keyword of the subject information of respectively treating the sorting network resource, with the participle weight addition of each participle in the same subject information for the treatment of the sorting network resource, treat keyword weight in the sorting network resource at this as described search word.
15, device as claimed in claim 13 is characterized in that, sequencing unit sorts to described Internet resources respectively to be sorted according to total weight order from high to low, and then display unit is presented to the user with the ranking results forward.
CNB2007101872765A 2007-11-15 2007-11-15 A kind of search result ordering method and device based on search engine Active CN100557612C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2007101872765A CN100557612C (en) 2007-11-15 2007-11-15 A kind of search result ordering method and device based on search engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2007101872765A CN100557612C (en) 2007-11-15 2007-11-15 A kind of search result ordering method and device based on search engine

Publications (2)

Publication Number Publication Date
CN101158971A CN101158971A (en) 2008-04-09
CN100557612C true CN100557612C (en) 2009-11-04

Family

ID=39307073

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2007101872765A Active CN100557612C (en) 2007-11-15 2007-11-15 A kind of search result ordering method and device based on search engine

Country Status (1)

Country Link
CN (1) CN100557612C (en)

Families Citing this family (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957828B (en) * 2009-07-20 2013-03-06 阿里巴巴集团控股有限公司 Method and device for sequencing search results
CN102103615B (en) * 2009-12-21 2014-03-26 北大方正集团有限公司 Three-segment sequential collecting method and system for retrieval results
CN102289436B (en) * 2010-06-18 2013-12-25 阿里巴巴集团控股有限公司 Method and device for determining weighted value of search term and method and device for generating search results
CN102004772A (en) * 2010-11-15 2011-04-06 百度在线网络技术(北京)有限公司 Method and equipment for sequencing search results according to terms
CN102546456B (en) * 2010-12-22 2015-04-08 北大方正集团有限公司 Information feedback method and device
CN102163228B (en) * 2011-04-13 2014-10-08 北京百度网讯科技有限公司 Method, apparatus and device for determining sorting result of resource candidates
US9298826B2 (en) 2012-01-05 2016-03-29 International Business Machines Corporation Goal-oriented user matching among social networking environments
CN103425687A (en) * 2012-05-21 2013-12-04 阿里巴巴集团控股有限公司 Retrieval method and system based on queries
CN103425691B (en) * 2012-05-22 2016-12-14 阿里巴巴集团控股有限公司 A kind of searching method and system
CN103593343B (en) * 2012-08-13 2019-05-03 北京京东尚科信息技术有限公司 Information retrieval method and device in a kind of e-commerce platform
CN103678365B (en) 2012-09-13 2017-07-18 阿里巴巴集团控股有限公司 The dynamic acquisition method of data, apparatus and system
CN103838754B (en) * 2012-11-23 2017-12-22 腾讯科技(深圳)有限公司 Information retrieval device and method
CN103034718B (en) * 2012-12-12 2016-07-06 北京博雅立方科技有限公司 A kind of target data sort method and device
CN103092943B (en) * 2013-01-10 2016-03-23 北京亿赞普网络技术有限公司 A kind of method of advertisement scheduling and advertisement scheduling server
CN104077306B (en) * 2013-03-28 2018-05-11 阿里巴巴集团控股有限公司 The result ordering method and system of a kind of search engine
CN103226597B (en) * 2013-04-19 2015-03-25 北京集奥聚合科技有限公司 Keyword advertisement matching method based on natural semantics
CN104219575B (en) * 2013-05-29 2020-05-12 上海连尚网络科技有限公司 Method and system for recommending related videos
CN103353894A (en) * 2013-07-19 2013-10-16 武汉睿数信息技术有限公司 Data searching method and system based on semantic analysis
CN104881497A (en) * 2015-06-17 2015-09-02 郑州悉知信息技术有限公司 Searching method and client
CN104991915A (en) * 2015-06-23 2015-10-21 郑州悉知信息技术有限公司 Information search method and apparatus
CN104933149B (en) * 2015-06-23 2018-08-14 郑州悉知信息科技股份有限公司 A kind of information search method and device
CN104881504B (en) * 2015-06-23 2018-08-14 郑州悉知信息科技股份有限公司 A kind of information search method and device
CN106649338B (en) * 2015-10-30 2020-08-21 中国移动通信集团公司 Information filtering strategy generation method and device
CN105868242A (en) * 2015-12-14 2016-08-17 乐视网信息技术(北京)股份有限公司 Sorting method and system for labels in network recommendation
CN106021430B (en) * 2016-05-16 2018-01-19 武汉斗鱼网络科技有限公司 Full-text search matching process and system based on the self-defined dictionaries of Lucence
CN105975636A (en) * 2016-06-24 2016-09-28 点击律(上海)网络科技有限公司 Method and device for optimizing online consultation services
CN106484889A (en) * 2016-10-18 2017-03-08 合信息技术(北京)有限公司 The flooding method and apparatus of Internet resources
CN107766400A (en) * 2017-05-05 2018-03-06 平安科技(深圳)有限公司 Text searching method and system
CN107145571B (en) * 2017-05-05 2020-02-14 广东艾檬电子科技有限公司 Searching method and device
CN107357891A (en) * 2017-07-12 2017-11-17 中云开源数据技术(上海)有限公司 A kind of homepage Link Recommendation method
CN110580276B (en) * 2018-06-08 2022-06-28 百度在线网络技术(北京)有限公司 Method and apparatus for processing information
CN109033386B (en) * 2018-07-27 2020-04-10 北京字节跳动网络技术有限公司 Search ranking method and device, computer equipment and storage medium
CN110688572A (en) * 2019-09-24 2020-01-14 四川新网银行股份有限公司 Method for identifying search intention in cold starting state
CN110765356A (en) * 2019-10-23 2020-02-07 绍兴柯桥浙工大创新研究院发展有限公司 Industrial design man-machine data query system for retrieving and sorting according to user habits
CN113127761A (en) * 2019-12-31 2021-07-16 中国科学技术信息研究所 Intelligent sorting method for scientific and technological element retrieval, electronic equipment and storage medium
CN112004126A (en) * 2020-08-24 2020-11-27 海信视像科技股份有限公司 Search result display method and display device
CN111737501A (en) * 2020-06-22 2020-10-02 北京百度网讯科技有限公司 Content recommendation method and device, electronic equipment and storage medium
CN111797205B (en) * 2020-06-30 2024-03-12 百度在线网络技术(北京)有限公司 Vocabulary retrieval method and device, electronic equipment and storage medium
CN112346876B (en) * 2020-12-04 2023-04-18 山东鲁软数字科技有限公司 Channel distribution method and system with autonomous learning characteristic
CN113298493A (en) * 2021-05-21 2021-08-24 陕西合友网络科技有限公司 Navigation system and method for administrative examination and approval intelligent navigation
CN113326363B (en) * 2021-05-27 2023-07-25 北京百度网讯科技有限公司 Searching method and device, prediction model training method and device and electronic equipment
CN113515940B (en) * 2021-07-14 2022-12-13 上海芯翌智能科技有限公司 Method and equipment for text search
CN115114505B (en) * 2022-08-28 2022-11-25 安徽冠成教育科技有限公司 Online education content distribution system

Also Published As

Publication number Publication date
CN101158971A (en) 2008-04-09

Similar Documents

Publication Publication Date Title
CN100557612C (en) A kind of search result ordering method and device based on search engine
US11354356B1 (en) Video segments for a video related to a task
CN102360383B (en) Method for extracting text-oriented field term and term relationship
CN102708100B (en) Method and device for digging relation keyword of relevant entity word and application thereof
CN102200975B (en) Vertical search engine system using semantic analysis
CN101788988B (en) Information extraction method
CN102630049B (en) Method for determining interest degree of user about playing video and equipment thereof
CN102163228B (en) Method, apparatus and device for determining sorting result of resource candidates
CN101802776A (en) Method and apparatus for relating datasets by using semantic vectors and keyword analyses
CN103678576A (en) Full-text retrieval system based on dynamic semantic analysis
CN101446959A (en) Internet-based news recommendation method and system thereof
CN101609459A (en) A kind of extraction system of affective characteristic words
CN102708174A (en) Method and device for displaying rich media information in browser
CN103186556A (en) Method for obtaining and searching structural semantic knowledge and corresponding device
CN103294681A (en) Method and device for generating search result
CN103092943A (en) Method of advertisement dispatch and advertisement dispatch server
CN103123624A (en) Method of confirming head word, device of confirming head word, searching method and device
CN102662936A (en) Chinese-English unknown words translating method blending Web excavation, multi-feature and supervised learning
CN103020067A (en) Method and device for determining webpage type
CN104503988A (en) Searching method and device
CN105912563A (en) Method of giving machines artificial intelligence learning based on knowledge of psychology
CN102999521A (en) Method and device for identifying search requirement
KR101606758B1 (en) Issue data extracting method and system using relevant keyword
CN104077327A (en) Core word importance recognition method and equipment and search result sorting method and equipment
CN111259136A (en) Method for automatically generating theme evaluation abstract based on user preference

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: BEIJING Z-GOOD RUITUO TECHNOLOGY SERVICE CO., LTD.

Free format text: FORMER OWNER: XUNLEI NETWORK TECHNOLOGY CO., LTD., SHENZHEN

Effective date: 20131030

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 518057 SHENZHEN, GUANGDONG PROVINCE TO: 100085 HAIDIAN, BEIJING

TR01 Transfer of patent right

Effective date of registration: 20131030

Address after: 100085 Beijing city Haidian District No. 33 Xiaoying Road 1 1F05 room

Patentee after: Beijing Zhigu Ruituo Technology Service Co., Ltd.

Address before: 518057 Guangdong, Shenzhen, Nanshan District science and technology in the road, Shenzhen, No. 11, software park, building 7, level 8, two

Patentee before: Xunlei Network Technology Co., Ltd., Shenzhen