CN103631929A - Intelligent prompt method, module and system for search - Google Patents

Intelligent prompt method, module and system for search Download PDF

Info

Publication number
CN103631929A
CN103631929A CN201310653732.6A CN201310653732A CN103631929A CN 103631929 A CN103631929 A CN 103631929A CN 201310653732 A CN201310653732 A CN 201310653732A CN 103631929 A CN103631929 A CN 103631929A
Authority
CN
China
Prior art keywords
word
candidate word
suffix
hot
prefix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310653732.6A
Other languages
Chinese (zh)
Other versions
CN103631929B (en
Inventor
罗晶
尹岩
严敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JIANGSU WISEDU INFORMATION TECHNOLOGY Co Ltd
Original Assignee
JIANGSU WISEDU INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIANGSU WISEDU INFORMATION TECHNOLOGY Co Ltd filed Critical JIANGSU WISEDU INFORMATION TECHNOLOGY Co Ltd
Priority to CN201310653732.6A priority Critical patent/CN103631929B/en
Publication of CN103631929A publication Critical patent/CN103631929A/en
Application granted granted Critical
Publication of CN103631929B publication Critical patent/CN103631929B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2425Iterative querying; Query formulation based on the results of a preceding query

Abstract

The invention discloses an intelligent prompt method, an intelligent prompt module and an intelligent prompt system for search. According to the method disclosed by the invention, a server executes the following steps of distinguishing prefix words and suffix words by a tokenizer; carrying out synonymy expansion to form a prefix synonym list and a suffix synonym list; then traversing a hot word suffix tree to search hot words of prefix matches and/or suffix matches to obtain candidate words; and analyzing and calculating probability of each candidate word by historical search behaviors of a user. According to the method, a client executes the following steps of calculating load relevance of each candidate word; and calculating a click-on predicted value of each candidate word and then selecting the candidate words to display according to the click-on predicted values. In the invention, prompt words are obtained by matching between the prefix words and the suffix words, synonyms are combined, mass of search intentions of the user are integrated and the local relevance is combined, so that the prompt words are more approximate to the search intentions of the user.

Description

A kind of method, module and system for the intelligent prompt searched for
Technical field
The present invention relates to the keyword search in data search, data mining, relate in particular to the artificial intelligence in keyword input.
Background technology
Intelligent prompt is that a kind of help user clearly inputs intention, facilitates user fast to input, and improves the method that user experiences.Intelligent prompt is mainly used in search engine and development platform, can be according to user's input, by combobox or label etc. different represent form, to user, carry out automatic-prompting.
Main flow search engine is mainly the user search historical data that first statistical server end is preserved at present, according to the search rate of search word, set up popular word dictionary, when user inputs after keyword, according to the method for character string prefix matching, from popular word dictionary, search candidate's cue, then filter out cue according to search rate, be presented in successively search box below.This intelligent prompt, utilizes character string prefix matching to search candidate's cue, may omit some candidate cues relevant to searched key word.Utilize search rate screening candidate cue in popular word dictionary,, in conjunction with the search history data of active user this locality, may not cause the cue and the user search intent deviation that provide.The habitual Expression of language that has its source in that occurs the problems referred to above.In Chinese, the word of modification noun is always before modificand.Such as " casual pants ", wherein " leisure " is qualifier, and " trousers " are only main noun.User is after client input " casual pants ", and what by the mode of prefix matching, filter out is all the content relevant to " leisure ".But in fact user wants to search the content relevant to " trousers ".This causes cue and user search intent to occur obvious deviation.
Summary of the invention
Problem to be solved by this invention is the rational problem of cue in search engine.
For addressing the above problem, the scheme that the present invention adopts is as follows:
According to a kind of intelligent prompt method for searching for of the present invention, comprise client and server, client is connected by network with server, and the method comprises the following steps:
S21: client is obtained init string;
S22: client sends init string to server;
S29: server receives init string;
S3: server is searched for hot word according to init string and obtained candidate word information list;
S41: server is sent to terminal by candidate word information list;
S49: client candidate word information list;
S5: client is obtained candidate word list according to candidate word information list;
S91: client shows candidate word list;
It is characterized in that, described step S3 comprises:
S31: server splits init string according to participle device and obtains prefix word and suffix word;
S32: server is searched acquisition prefix synonym and suffix synonym according to prefix word and suffix word in thesaurus;
S33: server travel through hot word suffix tree search prefix matching and or the hot word of suffix match, obtain candidate word information list;
Wherein, described thesaurus is that server is for preserving the database of synonym incidence relation between keyword; Described hot word suffix tree is that server is searched for hot word according to the high frequency in hot dictionary and set up according to the data structure of broad sense suffix tree; Described hot dictionary is that server is for preserving the database of hot word information; Described hot word information comprises hot word, hot word sequence number and the hot word search frequency; Described prefix matching is that the prefix of hot word and described prefix word or prefix synonym match; Described suffix match is that the suffix of hot word mates with described suffix word or suffix synonym.
Further, according to the intelligent prompt method for searching for of the present invention, it is characterized in that, the method also comprises:
S34: server is according to the probability of each candidate word of analytical calculation of user's historical search behavior database;
Wherein, described user's historical search behavior database is for keeping track of history behavioural information.
Further, according to the intelligent prompt method for searching for of the present invention, it is characterized in that, described step S34 comprises:
S34a1: it is identical with init string and click the historical behavior information that hot word is identical with candidate word that server is searched original character string in user's historical search behavior database, obtains the click frequency of candidate word;
S34a2: server is according to the click frequency of candidate word being done to the probability that normalized obtains candidate word;
Wherein, described historical behavior information comprises original character string, clicks hot word and clicks the frequency.
Further, according to the intelligent prompt method for searching for of the present invention, it is characterized in that, described step S34 comprises:
S34b1: according to candidate word in user's historical search behavioral data library lookup historical behavior information;
S34b2: add up the click frequency under different prefix matching modes and different suffix match mode under this historical behavior information;
S34b3: the click frequency under different prefix matching modes and different suffix match mode is carried out to natural logarithm computing and obtain the logit value under different prefix matching modes and different suffix match mode;
S34b4: according to binary linear regression parametric equation computing formula
Figure 2013106537326100002DEST_PATH_IMAGE001
middle parameter
Figure 12624DEST_PATH_IMAGE002
value;
S34b5: according to formula
Figure 2013106537326100002DEST_PATH_IMAGE003
the probability of calculated candidate word, wherein
Figure 136700DEST_PATH_IMAGE004
;
S34b6; The probability of the candidate word of each candidate word of normalized;
Wherein, described historical behavior information comprises the click frequency of clicking hot word, nine kinds of candidate word match-types.
Further, according to the intelligent prompt method for searching for of the present invention, it is characterized in that, described step S5 comprises:
S51: client is according to the local degree of correlation of each candidate word in the list of local historical search database calculated candidate word information;
S52: client is calculated the click discreet value of each candidate word according to the local degree of correlation of candidate word, candidate word information;
S53: client is chosen candidate word list according to the click discreet value of candidate word from candidate word information list;
Wherein, described local historical search database is that client is for preserving local historical search information; Described local historical search information comprises local historical search character string, local historical search time, the local historical search frequency; Described step S51 comprises:
S511: with participle device, the candidate word in the local historical search character string in local historical search database and candidate word information list is split into lists of keywords and calculate the statistics frequency of each keyword;
S512: build keyword space vector according to the statistics frequency of the keyword in lists of keywords;
S513: the statistics frequency of the keyword keyword in lists of keywords splitting according to candidate word builds candidate word space vector;
S514: calculate the cosine value of keyword space vector and candidate word space vector, obtain the local degree of correlation of candidate word.
Further, according to the intelligent prompt method for searching for of the present invention, it is characterized in that, the statistics frequency of the calculating keyword described in described step S511 comprises by the step of the statistics frequency of time weighted calculation.
Further, according to the intelligent prompt method for searching for of the present invention, it is characterized in that, in described step S52:
CTR=A * R * C; The click discreet value that wherein CTR is candidate word; A is the probability of candidate word; R is the local degree of correlation of candidate word; C is the constant definite according to the type of candidate word.
Further, according to the intelligent prompt method for searching for of the present invention, it is characterized in that, in described step S52:
CTR=A * R * C * P; The click discreet value that wherein CTR is candidate word; A is the probability of candidate word; R is the local degree of correlation of candidate word; C is the constant definite according to the type of candidate word; P is the search frequency of candidate word; Wherein said candidate word information also comprises the search frequency of candidate word.
According to a kind of intelligent prompt device for searching for of the present invention, it is characterized in that, comprising:
Participle device, obtains prefix word and suffix word for splitting init string;
Synonym expanding unit, for searching and obtain prefix synonym and suffix synonym at thesaurus according to prefix word and suffix word;
Suffix tree traversal device, for travel through hot word suffix tree search prefix matching and or the hot word of suffix match, obtain candidate word information list; Described prefix matching is that the prefix of hot word and described prefix word or prefix synonym match; Described suffix match is that the suffix of hot word mates with described suffix word or suffix synonym;
Hot dictionary construction device, for administering and maintaining for preserving the database of hot word information;
Suffix tree construction device, for administering and maintaining hot word suffix tree; Described hot word suffix tree is that server is searched for hot word according to the high frequency in hot dictionary and set up according to the data structure of broad sense suffix tree;
Historical behavior analytical equipment, for according to the probability of each candidate word of analytical calculation of user's historical search behavior database;
User's historical search behavior database device, for keeping track of history behavioural information.
Further, according to a kind of intelligent prompt system for searching for of the present invention, comprise client and server, client is connected by network with server, it is characterized in that:
Described server comprises:
Word-dividing mode, obtains prefix word and suffix word for splitting init string;
Synonym expansion module, for searching and obtain prefix synonym and suffix synonym at thesaurus according to prefix word and suffix word;
Suffix tree spider module, for travel through hot word suffix tree search prefix matching and or the hot word of suffix match, obtain candidate word information list; Described prefix matching is that the prefix of hot word and described prefix word or prefix synonym match; Described suffix match is that the suffix of hot word mates with described suffix word or suffix synonym;
Hot dictionary builds module, for administering and maintaining for preserving the database of hot word information;
Suffix tree builds module, for administering and maintaining hot word suffix tree; Described hot word suffix tree is that server is searched for hot word according to the high frequency in hot dictionary and set up according to the data structure of broad sense suffix tree;
Historical behavior analysis module, for according to the probability of each candidate word of analytical calculation of user's historical search behavior database;
User's historical search behavioral data library module, for keeping track of history behavioural information;
Described client comprises:
Local relatedness computation module, for according to the local degree of correlation of local each candidate word of historical search database calculated candidate word information list;
Click discreet value computing module, for calculate the click discreet value of each candidate word according to the local degree of correlation of candidate word, candidate word information;
Candidate word is chosen module, for the click discreet value according to candidate word, from candidate word information list, chooses candidate word list;
Local historical search database memory module, for preserving local historical search information, described local historical search information comprises local historical search character string, local historical search time, the local historical search frequency;
Described local relatedness computation module comprises:
Keyword distribution statistics module, for splitting into lists of keywords with participle device by the candidate word in the local historical search character string of local historical search database and candidate word information list and calculating the statistics frequency of each keyword;
Keyword space vector builds module, for build keyword space vector according to the statistics frequency of the keyword of lists of keywords;
Candidate word space vector builds module, for the keyword splitting according to candidate word, in the statistics frequency of lists of keywords keyword, builds candidate word space vector;
Vector cosine computing module, for calculating the cosine value of keyword space vector and candidate word space vector, obtains the local degree of correlation of candidate word.
Technique effect of the present invention is as follows:
1,, in the present invention, cue obtains by prefix word and suffix word coupling, and combines synonym, therefore more easily approaches the expressed implication of language.
2, in the present invention, the coupling of prefix word and suffix word realizes by building hot word broad sense suffix tree, and in conjunction with hot word sequence number, makes search procedure quick, and the CPU time of consumption is few.
3,, in the present invention, final cue combines probability calculation, the Probabilistic Synthesis of calculating the intention of numerous user searchs, thereby make cue more approach user search intent.
4,, in the present invention, final cue combines the local degree of correlation, by user search historical analysis user search intent, thereby makes cue more approach user search intent.
Embodiment
Below summary of the invention of the present invention and claim are described in further detail.
One, application scenario of the present invention and applied environment
The present invention is applied to the intelligent prompt of search engine.During search, user needs the character string of search by the text edit box input of webpage, then according to device of the present invention, method or system, form with combobox under the text edit box of webpage shows a plurality of cues that user may search for, user selects after the cue in combobox, and search engine is searched for according to cue.Certainly, occur after the combobox of cue, user also can not select combobox to continue input text, and then search engine is searched for according to the text of input.Adopting the benefit of the combobox of intelligent prompt is to facilitate user to input, and reduces the artificial and consuming time of user version input.The obtaining main process and can be generalized into following steps of cue:
S21: client is obtained init string;
S22: client sends init string to server;
S29: server receives init string;
S3: server is searched for hot word according to init string and obtained candidate word information list;
S41: server is sent to terminal by candidate word information list;
S49: client candidate word information list;
S5: client is obtained candidate word list according to candidate word information list;
S91: client shows candidate word list;
In said process, client can mainly occur with form web page.Can certainly be made into special-purpose application program realizes.The client of form web page is generally arranged on user terminal.User is with the server of the mode access search engine of webpage.Certainly, in the present invention, client also can be arranged on server side.The situation that client is positioned at server side it is also understood that client modules and service end module are respectively client and server of the present invention for certain application program is divided into client modules and service end module.Now, between client modules and service end module, for " network " that connects both, can be understood as communication mode more widely, such as by local internal memory, or pipeline (Pipe), or socket (Socket) etc.
In said process, in step S21, " client is obtained init string " can be understood as aforementioned " user needs the character string of search by the text edit box input of webpage ".According to the aforementioned understanding to client, the step of " client is obtained init string " can also be passed through other forms.In general, the init string that client is obtained is by the character string of manually inputting, and is by client, to be obtained in user's input process, the character string that not user finally need to search for conventionally.
In said process, in step S91 " list of client shows candidate word " can be understood as aforesaid " under the text edit box of webpage, with the form of combobox, showing a plurality of cues that user may search for ", candidate word is also cue, and a plurality of cues have formed candidate word list.
Said process can be understood as prior art of the present invention, because a lot of search engine is also realized the process of intelligent prompt really according to above-mentioned steps.The present invention solves the concrete enforcement of problem to be solved by this invention by step S3 and step S5 and realizes.The follow-up description of this instructions is mainly for the concrete enforcement of step S3 and step S5 and the technology contents relevant with step S3, S5.And for other steps in said process, it will be understood by those skilled in the art that this instructions no longer describes in detail.
Two, the key concept in this instructions
The keyword of indication of the present invention is the word that can express certain semanteme obtaining after character string being split by participle device.Such as " casual pants " obtains two keywords, " leisure " and " trousers " after splitting.
The prefix word of indication of the present invention is first keyword in the keyword obtaining after character string being split by participle device.Such as " casual pants " obtains two keywords, " leisure " and " trousers " after splitting.Wherein " leisure " is prefix word.
The suffix word of indication of the present invention is last keyword in the keyword obtaining after character string being split by participle device.Such as " casual pants " obtains two keywords, " leisure " and " trousers " after splitting.Wherein " trousers " are suffix word.
It will be appreciated by those skilled in the art that this keyword is that prefix word is again suffix word if character string can only obtain a keyword after splitting by participle device.
The character string of the candidate word of indication of the present invention for being formed by one or more keywords.
The candidate word list of indication of the present invention can be understood as the array that a plurality of candidate word form.
The candidate word information of indication of the present invention comprises the attribute information of candidate word and candidate word or is only candidate word.The attribute information of candidate word can comprise the search frequency of candidate word, the local degree of correlation of the probability of candidate word and/or candidate word.
The candidate word information list of indication of the present invention can be understood as the array that a plurality of candidate word information forms.
The participle device of indication of the present invention, for for character string being split into module or the device of a plurality of keywords, mainly splits into a plurality of keywords by dictionary lookup by character string.It will be appreciated by those skilled in the art that participle device is prior art.In specific embodiment of the invention process, participle device can be bought acquisition by market, also can oneself construct.
The character string of the hot word of indication of the present invention for consisting of one or more keywords, for server is for preserving the character string of user search history.
The hot word information of indication of the present invention comprises hot word, hot word sequence number, the hot word search frequency.Wherein hot word sequence number is for setting up the index of fast finding, and the hot word search frequency is for adding up the searched number of times of hot word.
Three, embodiment 1
In the present embodiment, abovementioned steps S3 realizes by following steps:
S31: server splits init string according to participle device and obtains prefix word and suffix word;
S32: server is searched acquisition prefix synonym and suffix synonym according to prefix word and suffix word in thesaurus;
S33: server travel through hot word suffix tree search prefix matching and or the hot word of suffix match, obtain candidate word information list.
In the present embodiment, thesaurus is that server is for preserving the database of synonym incidence relation between keyword.Thesaurus is provided by business dictionary conventionally, also can oneself set up.
In the present embodiment, step S31 is realized by word-dividing mode or device.Word-dividing mode or device are also aforesaid participle device.It will be appreciated by those skilled in the art that prefix word and suffix word after step S31 processes may be identical.Under the prefix word situation identical with suffix word, prefix synonym and suffix synonym are also identical, so step S32 can do and simplify to process, and only need to search for the synonym of prefix word or the synonym of suffix word.
In the present embodiment, step S32 is realized by synonym expansion module or device.The synonym that it will be appreciated by those skilled in the art that a word may have a plurality of, thus step S32 obtain prefix synonym and suffix synonym are generally a list.
In the present embodiment, step S33 is realized by suffix tree spider module or device.Here, the prefix that prefix matching is hot word and described prefix word or prefix synonym match; Suffix match is that the suffix of hot word mates with described suffix word or suffix synonym." with or " in " prefix matching and or suffix match " represents is that the hot word of search may meet prefix matching or suffix match or prefix suffix and all mates.Suffix tree spider module or device are realized by traveling through hot word suffix tree.Hot word suffix tree is that server is searched for hot word according to the high frequency in hot dictionary and set up according to the data structure of broad sense suffix tree.The foundation of hot word suffix tree builds module by suffix tree or device is realized.Suffix tree builds module or device, for administering and maintaining hot word suffix tree.Total institute is known, and suffix tree (Suffix tree) is for being used for supporting the tree-like data structure of effective string matching and inquiry.Suffix tree can be expressed a character string, and broad sense suffix tree can be expressed a plurality of character strings.Structure and the traversal of broad sense suffix tree are prior art, and this instructions is not repeated.It should be noted that, the hot word in hot word suffix tree carrys out self-heating dictionary, but hot word in hot word suffix tree does not comprise hot words all in hot dictionary, is the hot word of hot dictionary medium-high frequency search.The hot word of high frequency search can obtain by the hot root to all in hot dictionary in the search frequency sequence according to hot word: first according to the search frequency of hot word, the hot word in hot dictionary is carried out by descending sort, then obtain the hot word of top n in the hot word after sequence.N is generally prior setting in actual applications, such as 10000 or 100000 etc.More efficient method can also be done once by the threshold filtering of the search frequency of hot word before sequence, and the hot word that only has the search frequency of hot word to be greater than a certain setting threshold just sorts.
In the present embodiment, aforesaid hot dictionary be server for preserving the database of hot word information, these data are also for preserving user search history.Preserve user search history and build module or device realization by hot dictionary.Hot dictionary builds module or device is used for administering and maintaining for preserving the database of hot word information.Hot word information comprises hot word, hot word sequence number, the hot word search frequency.The process of preserving user search history is as follows: user submits to after searched character string request search by user end to server, server receives after searched character string when carrying out search, also carry out searched character string is added to the step in hot dictionary as hot word: if preserved this searched character string in hot dictionary, the corresponding hot word search frequency is added to 1, otherwise will preserve searched character string to hot dictionary, and the search frequency of this hot word is made as to 1.
It should be noted that, the candidate word information list that step S33 obtains is the array that a plurality of candidate word information forms.In the present embodiment, candidate word information is only hot word, and the candidate word list obtaining in step S5 is candidate word information list.Under other embodiments and follow-up embodiment candidate word information can comprise more content: such as the hot word sequence number of candidate word, the attribute information of candidate word.
Four, embodiment 2
The present embodiment is based upon on the basis of embodiment 1, is specially, and has increased a step, i.e. step S34 after the step S33 of embodiment 1: server is according to the probability of each candidate word of analytical calculation of user's historical search behavior database.
The step S34 of the present embodiment is realized by historical behavior analytical equipment or device, and problem to be solved is the statistical study of a certain specific candidate word user's historical search to obtain the probability that user view under condition that user inputs init string is inputted this candidate word.The input of the present embodiment is the candidate word information list that step S33 obtains, and output be also candidate word information list, but candidate word information in the candidate word information list of exporting has increased the probability of candidate word.
The calculating of the probability of candidate word is calculated and is obtained by user's historical search behavioural analysis.User's historical search behavioral data is kept in user's historical search behavior database, and this process is realized by device or the module of user's historical search behavior database.User's historical search behavior database has been preserved historical behavior information.The method of performing step S34 has a variety of.Instructions of the present invention provides two kinds of embodiments wherein: embodiment 1 and embodiment 2.Wherein embodiment 1 is a kind of simple embodiment.Embodiment 2 is for passing through the method for logistic regression algorithm to the match-type statistical study of candidate word.
Embodiment 1
If historical behavior information comprises original character string, clicks hot word and clicks the frequency.It is identical with init string and click the historical behavior information that hot word is identical with candidate word that server is searched original character string in user's historical search behavior database.The click frequency in historical behavior information can be used as the probability of candidate word.Owing to clicking the frequency, be to be greater than 0 integer, and probability is in general sense the value between 0 ~ 1, can also click the frequency to each candidate word for this reason and do after normalized the probability as candidate word, clicking frequency normalized can be with reference to following method: establish in candidate word information list and include K candidate word, the click frequency of each candidate word is respectively:
Figure 2013106537326100002DEST_PATH_IMAGE005
, the probability of i candidate word is:
Figure 846030DEST_PATH_IMAGE006
.Under present embodiment, said process can be summarized as:
S34a1: it is identical with init string and click the historical behavior information that hot word is identical with candidate word that server is searched original character string in user's historical search behavior database, obtains the click frequency of candidate word;
S34a2: server is according to the click frequency of candidate word being done to the probability that normalized obtains candidate word.
Under present embodiment, historical behavior information generates by the following method: after client executing step S91, user can select the candidate word list of showing in step S91.After the candidate word list that user selects to show in step S91, init string and selected candidate word are sent to server simultaneously, and request retrieval.Server is received after init string and selected candidate word, carry out retrieval and aforementioned when selected candidate word is added to hot dictionary step, also carry out the step that init string and selected candidate word is added to access customer historical search behavior database.Here, init string is the original character string in historical behavior information, and selected candidate word is clicks hot word.Init string and selected candidate word add the realization by the following method of access customer historical search behavior database: if preserved the corresponding relation record of original character string and the hot word of click in user's historical search behavior database, will click accordingly the frequency and add 1, otherwise will preserve original character string and click hot word to hot dictionary, and will click accordingly the frequency and be made as 1.
Embodiment 2
If historical behavior information comprises the click frequency of clicking hot word, nine kinds of candidate word match-types.Nine kinds of candidate word match-types comprise five kinds of fundamental types: non-matching type, prefix matching type, suffix match type, prefix synonym match-type, suffix synonym match-type; And four kinds of composite types: prefix suffix match type, prefix suffix synonym match-type, prefix matching suffix syntype and prefix synonym suffix match type.The match-type of above-mentioned nine kinds of candidate word is grouped into two independent variable: x 1and x 2.X 1represent prefix matching mode, possible values is that prefix is not mated, prefix synonym mates, prefix matching, uses respectively 1,4,5 numeric representations.X 2represent suffix match mode, possible values is that suffix does not mate, suffix synonym mates, suffix match, uses respectively 1,4,5 numeric representations.The probability that candidate word is chosen is:
, wherein
Figure 921302DEST_PATH_IMAGE004
,
Figure 489294DEST_PATH_IMAGE002
for undetermined parameter.Followingly be the computing method of undetermined parameter.
The probability that candidate word is not chosen is:
Figure 722009DEST_PATH_IMAGE008
The ratio of the probability that the probability that candidate word is chosen and candidate word are not chosen is:
Figure 2013106537326100002DEST_PATH_IMAGE009
After logit conversion, obtain:
Under present embodiment, according to the click frequency of the various candidate word match-types in historical behavior information, can obtain the value of logit and the value of x1 and x2.
If the value that a click frequency of clicking nine kinds of candidate word match-types in the historical behavior information that hot word is corresponding is preserved is:
{73,98,119,67,89,342,137,123,99}。
Can obtain the data of following form:
x1 x2 Click the frequency Logit value
1(prefix is not mated) 1(suffix does not mate) 73 4.29
1(prefix is not mated) 4(suffix synonym coupling) 89+137+123=349 5.86
1(prefix is not mated) 5(suffix match) 119+342+99=560 6.33
4(prefix synonym coupling) 1(suffix does not mate) 67+137+99=303 5.71
4(prefix synonym coupling) 4(suffix synonym coupling) 137 4.92
4(prefix synonym coupling) 5(suffix match) 99 4.60
5(prefix matching) 1(suffix does not mate) 98+342+123=563 6.33
5(prefix matching) 4(suffix synonym coupling) 123 4.81
5(prefix matching) 5(suffix match) 342 5.83
According to the data of above table, adopt binary linear regression parametric equation can obtain this and click hot word
Figure 331293DEST_PATH_IMAGE002
parameter value.And then the probability calculation of choosing according to aforesaid candidate word obtains the probability of current candidate word.The probability that further obtains candidate word can also be done normalized.Under present embodiment, said process may be summarized to be following steps:
S34b1: according to candidate word in user's historical search behavioral data library lookup historical behavior information;
S34b2: add up the click frequency under different prefix matching modes and different suffix match mode under this historical behavior information;
S34b3: the click frequency under different prefix matching modes and different suffix match mode is carried out to natural logarithm computing and obtain the logit value under different prefix matching modes and different suffix match mode;
S34b4: according to binary linear regression parametric equation computing formula
Figure 803863DEST_PATH_IMAGE001
middle parameter
Figure 104263DEST_PATH_IMAGE002
value;
S34b5: according to formula
Figure 29493DEST_PATH_IMAGE003
the probability of calculated candidate word, wherein
Figure 191484DEST_PATH_IMAGE004
;
S34b6; The probability of the candidate word of each candidate word of normalized.
Under present embodiment, user's historical search behavior database can be database independently; Also can be same for merging into aforesaid hot dictionary, use aforesaid hot dictionary keeping track of history behavioural information.Adopting under the mode of hot dictionary keeping track of history behavioural information, the click frequency of the various candidate word match-types in historical behavior information also preserved in hot dictionary, hot word in hot dictionary is the hot word of click in aforementioned historical behavior information, and the summation of the click frequency of nine kinds of candidate word match-types is the search frequency of the hot word in previous embodiment 1.Under present embodiment, as the candidate word information in candidate word information list step S34 input, step S33 output, two contents have at least been comprised: the match-type of hot word sequence number and candidate word.The leaf node of aforesaid hot word suffix tree has been preserved hot word sequence number, when execution step S33, travel through that candidate word that hot word suffix tree coupling obtains is subsidiary hot word sequence number that hot word suffix tree leaf node preserves and according to the mode of coupling, travel through the match-type of also having enclosed candidate word in the candidate word information that hot word suffix tree coupling obtains.
Under present embodiment, historical behavior information exchange is crossed the process implementation of preserving user search history.The difference of the process of the process of the preservation user search history in present embodiment and the preservation user search history in previous embodiment 1 is: under present embodiment, also need to search for according to the match-type of candidate word the differential count of the frequency.Under present embodiment, after the candidate word list that user selects to show in step S91, init string and selected candidate word are sent to server simultaneously, and request retrieval (this process is referring to the embodiment 1 of aforementioned the present embodiment).
By to above-mentioned two kinds of embodiments, it will be appreciated by those skilled in the art that different step S34 realizes needs different mathematical methods conventionally, and that similar existing mathematics method of estimation has is a variety of, so the method for performing step S34 is a variety of in addition.Those skilled in the art understand, and the probability of the candidate word that step S34 obtains is estimated value only, and reality also can not reach completely accurately, therefore should exist by permissible error, also should allow the difference of parameter under above-mentioned two kinds of embodiments.The probability process that it will be appreciated by those skilled in the art that above-mentioned steps S34 calculated candidate word is just inputted for follow-up processing, so probability that can also be using the product of the search frequency of the probability of candidate word obtained above and candidate word as candidate word in practical application.
Five, embodiment 3
The present embodiment is based upon on the basis of embodiment 1 or embodiment 2, is specially, and on the basis of embodiment 1 or embodiment 2, step S5 is wherein done to further improvement and optimization.In the present embodiment, step S5 comprises the following steps:
S51: client is according to the local degree of correlation of each candidate word in the list of local historical search database calculated candidate word information;
S52: client is calculated the click discreet value of each candidate word according to the local degree of correlation of candidate word, candidate word information;
S53: client is chosen candidate word list according to the click discreet value of candidate word from candidate word information list;
Wherein, described local historical search database be client for preserving local historical search information, described local historical search information comprises local historical search character string, local historical search time, the local historical search frequency.Step S51 is realized by local relatedness computation device or module; Step S52 realizes by clicking discreet value calculation element or module; Step S53 by
Candidate word selecting device or module realize.Wherein, step S51 comprises the following steps:
S511: with participle device, the candidate word in the local historical search character string in local historical search database and candidate word information list is split into lists of keywords and calculate the statistics frequency of each keyword;
S512: build keyword space vector according to the statistics frequency of the keyword in lists of keywords;
S513: the statistics frequency of the keyword keyword in lists of keywords splitting according to candidate word builds candidate word space vector;
S514: calculate the cosine value of keyword space vector and candidate word space vector, obtain the local degree of correlation of candidate word.
Wherein, step S511 is realized by keyword distribution statistics device or module; Step S512 is realized by keyword space vector construction device or module; S513 is realized by candidate word space vector construction device or module; Step S514 is realized by vectorial cosine calculation element or module.Step S511 is divided into again two step: step S511a: with participle device, the local historical search character string in local historical search database is split into lists of keywords and calculate the statistics frequency and the step S511b of each keyword: with participle device, the candidate word in candidate word information list is split into lists of keywords and calculate the statistics frequency of each keyword.Step S511a and step S511b obtain same lists of keywords after carrying out.For the process of the local degree of correlation of above-mentioned steps 51 calculated candidate words is described, now illustrate.
The content being provided with in local historical search database is that length is the array lhi of n, and is defined as follows:
struct LocalHistInfo
{
String sSearch;
DateTime tRecent;
int nCount;
} lhi[n];
Each member of array lhi is local historical search information.Local historical search information represents with structure LocalHistInfo.Wherein, sSearch is local historical search character string; TRecent is the local historical search time, record be the time of the last search; NCount is the local historical search frequency.Step S511a can pass through following process implementation:
for (int i=0;i<n;i++)
{
struct LocalHistInfo item = lhi[i];
StringArray arKeys;
WordSplit (item.sSearch, arKeys); // use participle device, local historical search string segmentation is become to keyword
Item.nCount=TimeWeightCount (item.tRecent, item.nCount); // by time-weighted frequency step
for (int j=0;j<arKeys.GetCount();j++)
{ // by the keyword after cutting apart, in conjunction with local historical search time and the local historical search frequency, joins in vKey
vKey.Add(arKeys[j], item.nCount);
}
}
Said process forms aforesaid step S511a.Wherein, vKey is for representing the example of the VecterKey class of lists of keywords.Add is the method for class VecterKey.It is defined as follows:
class VecterKey {
Array< KeyItem *> m_arData;
int VeckterKey::Add(string sKey, int nCount)
{
KeyItem * pItem=NULL;
Bool bFind=FindKey (sKey, pItem); // search keyword whether to exist
If (! BFind) // if there is no newly-built keyword
{
pItem = new KeyItem;
pItem->sKey = sKey;
PItem->nCount=nCount; // add up the search frequency of this keyword
m_arData.Add(pItem);
Else // otherwise to keyword
PItem->nCount +=nCount; // add up the search frequency of this keyword
return bFind;
} // end of Add
}; // end of VecterKey
Wherein KeyItem represents the structure of keyword, can be expressed as:
struct KeyItem
{
string sKey;
int nCount;
};
In the above results, sKey is keyword, and nCount is the statistics frequency that keyword is corresponding.
In like manner, step S511b also uses above-mentioned similar step S511a, after will candidate word candidate word splitting, join in aforesaid vKey, but the search frequency (referring to embodiment 1) that the local historical search frequency of candidate word can the hot word that hot dictionary is preserved for 1 fixing or server.It should be noted that have one by the step TimeWeightCount of the time-weighted calculating frequency when local historical search character string adds lists of keywords vKey in above-mentioned local historical search database.It will be appreciated by those skilled in the art that this also can omit by the step of the time-weighted calculating frequency.By the step of the time-weighted calculating frequency, it is the preferential embodiment of the present invention.By the time-weighted calculating frequency, be and the local historical search frequency adjusted according to the time interval of local historical search time and current time.Simple method can be: when the time interval surpass 1 month weighting coefficient be 1; If the time interval, weighting coefficient was 2 at two weeks and between 1 month; The time interval between a week and two weeks weighting coefficient be made as 3; If the time interval is less than a week, weighting coefficient is made as 5.
After above-mentioned steps S511a and S511b are complete, obtain lists of keywords vKey.The statistics frequency of extracting keywords all in vKey can obtain the keyword space vector Ks_Vector={ v in step S512 1, v 2, v 3..., v m.Wherein m is the number of keyword in vKey, is expressed as the keyword space vector of m dimension; Each dimension values v of vector ithe corresponding statistics frequency of each keyword.
After splitting with participle device for certain candidate word in candidate word information list, can obtain a plurality of keywords represents with HintKeys.If the keyword in aforesaid vKey exists in HintKeys, establish vector value for the statistics frequency of this keyword, otherwise be made as this vector value, be 0 also can obtain the candidate word space vector Hs_Vector={ w of m dimension 1, w 2, w 3..., w m.In candidate word space vector Hs_Vector, if a certain dimension w icorresponding vector value is 0, w iin the lists of keywords HintKeys that corresponding keyword does not split in candidate word, otherwise can represent that this keyword is present in the lists of keywords HintKeys of candidate word fractionation.The process of aforementioned acquisition candidate word space vector Hs_Vector is aforesaid step S513.
The keyword space vector Ks_Vector of the m dimension obtaining according to abovementioned steps S512 and the candidate word space vector Hs_Vector of the m dimension that step S513 obtains use vectorial cosine formula can obtain cosine value λ:
Figure 2013106537326100002DEST_PATH_IMAGE011
The process that adopts above-mentioned formula to calculate cosine value λ is abovementioned steps S514.Cosine value λ can be used as the local degree of correlation of candidate word.Can also be after the cosine value of each above-mentioned candidate word be normalized in reality is implemented as the local degree of correlation of candidate word: the cosine value of establishing each candidate word is { λ 1, λ 2, and λ 3 ..., λ K}, wherein K indicates K candidate word, the local degree of correlation that candidate word i is corresponding is:
Figure 938467DEST_PATH_IMAGE012
.
The click discreet value process of calculating candidate word in step S52 is the subsequent step of step S51.The input dependence of step S52 is in the calculated value of the local degree of correlation of step S51.The click discreet value process of step S52 candidate word, this instructions has provided two kinds of embodiments:
Embodiment 1:CTR=A * R * C; The click discreet value that wherein CTR is candidate word; A is the probability of candidate word; R is the local degree of correlation of candidate word; C is the constant definite according to the type of candidate word.
Embodiment 2:CTR=A * R * C * P; The click discreet value that wherein CTR is candidate word; A is the probability of candidate word; R is the local degree of correlation of candidate word; C is the constant definite according to the type of candidate word; P is the search frequency of candidate word.
In above-mentioned two kinds of embodiments, the probability of candidate word is the probability of the candidate word in embodiment 2, and above-mentioned two kinds of embodiments are all based upon on the basis of embodiment 2 as can be seen here.The match-type that the type of the candidate word in " C is the constant definite according to the type of candidate word " is aforementioned candidate word.The match-type of candidate word is the type obtaining according in step S33 process in previous embodiment 1, generally has nine types.The match-type of nine kinds of candidate word refers to previous embodiment 2, is not repeated.C in above-mentioned two kinds of embodiments is the definite constant of match-type of nine kinds of candidate word, and its concrete numerical value those skilled in the art can be worth accordingly according to the concrete application settings of the present invention." the search frequency that P is candidate word " in above-mentioned embodiment 2 is from the search frequency of preserving in aforementioned hot dictionary.According to above-mentioned two kinds of embodiments, those skilled in the art can also draw other embodiment.Such as,
Embodiment 3:CTR=A * R; The click discreet value that wherein CTR is candidate word; A is the probability of candidate word; R is the local degree of correlation of candidate word.
Embodiment 4:CTR=A * R * P; The click discreet value that wherein CTR is candidate word; A is the probability of candidate word; R is the local degree of correlation of candidate word; P is the search frequency of candidate word.
Embodiment 5:CTR=R * P; The click discreet value that wherein CTR is candidate word; R is the local degree of correlation of candidate word; P is the search frequency of candidate word.
It should be noted that previous embodiment 2 last illustrated can be using the product of the search frequency of the probability of candidate word and candidate word as candidate word probability.Under this embodiment, A is A * P, so embodiment 1 is equal to embodiment 2, and embodiment 3 is equal to embodiment 4.
In embodiment 5, do not need the probability of candidate word as input, under this embodiment, need to not using embodiment 2 as basis, only need to using embodiment 1 as basis.The present embodiment be take performing step S5 as object, and previous embodiment 1 and embodiment 2 be take performing step S3 as object.Therefore, if the input and output of the embodiment of step S5 do not relate to step S3 or irrelevant with step S3, the present embodiment can not take embodiment 1 or embodiment 2 for basic, and technical scheme that can independent complete realizes object of the present invention.
The simple embodiment of step S53 is: candidate word is obtained to candidate word sequencing queue by the descending sort of clicking discreet value, then from candidate word sequencing queue, select front 10 or 20 candidate word as final candidate word list.The candidate word obtaining in step S5 is from the candidate word information list in abovementioned steps S41.Those skilled in the art understand, in previous embodiment 1 or 2, with reference to step S53, before step S41, can also comprise step S39: candidate word information is obtained to the queue of candidate word information sorting by the descending sort of the search frequency of the probability of candidate word or candidate word, then from the queue of candidate word information sorting, select front 20 or 30 candidate word information as final candidate word information list execution step S41.Whether the existence of step S39 does not affect the technical scheme of aforesaid embodiment 1 or embodiment 2 or the present embodiment, does not affect the scope of the present invention yet.

Claims (10)

1. the intelligent prompt method for searching for, comprises client and server, and client is connected by network with server, and the method comprises the following steps:
S21: client is obtained init string;
S22: client sends init string to server;
S29: server receives init string;
S3: server is searched for hot word according to init string and obtained candidate word information list;
S41: server is sent to terminal by candidate word information list;
S49: client candidate word information list;
S5: client is obtained candidate word list according to candidate word information list;
S91: client shows candidate word list;
It is characterized in that, described step S3 comprises:
S31: server splits init string according to participle device and obtains prefix word and suffix word;
S32: server is searched acquisition prefix synonym and suffix synonym according to prefix word and suffix word in thesaurus;
S33: server travel through hot word suffix tree search prefix matching and or the hot word of suffix match, obtain candidate word information list;
Wherein, described thesaurus is that server is for preserving the database of synonym incidence relation between keyword; Described hot word suffix tree is that server is searched for hot word according to the high frequency in hot dictionary and set up according to the data structure of broad sense suffix tree; Described hot dictionary is that server is for preserving the database of hot word information; Described hot word information comprises hot word, hot word sequence number and the hot word search frequency; Described prefix matching is that the prefix of hot word and described prefix word or prefix synonym match; Described suffix match is that the suffix of hot word mates with described suffix word or suffix synonym.
2. the intelligent prompt method for searching for as claimed in claim 1, is characterized in that, the method also comprises:
S34: server is according to the probability of each candidate word of analytical calculation of user's historical search behavior database;
Wherein, described user's historical search behavior database is for keeping track of history behavioural information.
3. the intelligent prompt method for searching for as claimed in claim 2, is characterized in that, described step S34 comprises:
S34a1: it is identical with init string and click the historical behavior information that hot word is identical with candidate word that server is searched original character string in user's historical search behavior database, obtains the click frequency of candidate word;
S34a2: server is according to the click frequency of candidate word being done to the probability that normalized obtains candidate word;
Wherein, described historical behavior information comprises original character string, clicks hot word and clicks the frequency.
4. the intelligent prompt method for searching for as claimed in claim 2, is characterized in that, described step S34 comprises:
S34b1: according to candidate word in user's historical search behavioral data library lookup historical behavior information;
S34b2: add up the click frequency under different prefix matching modes and different suffix match mode under this historical behavior information;
S34b3: the click frequency under different prefix matching modes and different suffix match mode is carried out to natural logarithm computing and obtain the logit value under different prefix matching modes and different suffix match mode;
S34b4: according to binary linear regression parametric equation computing formula middle parameter
Figure 2013106537326100001DEST_PATH_IMAGE004
value;
S34b5: according to formula
Figure 2013106537326100001DEST_PATH_IMAGE006
the probability of calculated candidate word, wherein ;
S34b6; The probability of the candidate word of each candidate word of normalized;
Wherein, described historical behavior information comprises the click frequency of clicking hot word, nine kinds of candidate word match-types.
5. the intelligent prompt method for searching for as claimed in claim 1 or 2 or 3 or 4, is characterized in that, described step S5 comprises:
S51: client is according to the local degree of correlation of each candidate word in the list of local historical search database calculated candidate word information;
S52: client is calculated the click discreet value of each candidate word according to the local degree of correlation of candidate word, candidate word information;
S53: client is chosen candidate word list according to the click discreet value of candidate word from candidate word information list;
Wherein, described local historical search database is that client is for preserving local historical search information; Described local historical search information comprises local historical search character string, local historical search time, the local historical search frequency; Described step S51 comprises:
S511: with participle device, the candidate word in the local historical search character string in local historical search database and candidate word information list is split into lists of keywords and calculate the statistics frequency of each keyword;
S512: build keyword space vector according to the statistics frequency of the keyword in lists of keywords;
S513: the statistics frequency of the keyword keyword in lists of keywords splitting according to candidate word builds candidate word space vector;
S514: calculate the cosine value of keyword space vector and candidate word space vector, obtain the local degree of correlation of candidate word.
6. the intelligent prompt method for searching for as claimed in claim 5, is characterized in that, the statistics frequency of the calculating keyword described in described step S511 comprises by the step of the frequency of time weighted calculation.
7. the intelligent prompt method for searching for as claimed in claim 5, is characterized in that, in described step S52:
CTR=A * R * C; The click discreet value that wherein CTR is candidate word; A is the probability of candidate word; R is the local degree of correlation of candidate word; C is the constant definite according to the type of candidate word.
8. the intelligent prompt method for searching for as claimed in claim 5, is characterized in that, in described step S52:
CTR=A * R * C * P; The click discreet value that wherein CTR is candidate word; A is the probability of candidate word; R is the local degree of correlation of candidate word; C is the constant definite according to the type of candidate word; P is the search frequency of candidate word.
9. the intelligent prompt device for searching for, is characterized in that, comprising:
Participle device, obtains prefix word and suffix word for splitting init string;
Synonym expanding unit, for searching and obtain prefix synonym and suffix synonym at thesaurus according to prefix word and suffix word;
Suffix tree traversal device, for travel through hot word suffix tree search prefix matching and or the hot word of suffix match, obtain candidate word information list; Described prefix matching is that the prefix of hot word and described prefix word or prefix synonym match; Described suffix match is that the suffix of hot word mates with described suffix word or suffix synonym;
Hot dictionary construction device, for administering and maintaining for preserving the database of hot word information;
Suffix tree construction device, for administering and maintaining hot word suffix tree; Described hot word suffix tree is that server is searched for hot word according to the high frequency in hot dictionary and set up according to the data structure of broad sense suffix tree;
Historical behavior analytical equipment, for according to the probability of each candidate word of analytical calculation of user's historical search behavior database;
User's historical search behavior database device, for keeping track of history behavioural information.
10. the intelligent prompt system for searching for, comprises client and server, and client is connected by network with server, it is characterized in that:
Described server comprises:
Word-dividing mode, obtains prefix word and suffix word for splitting init string;
Synonym expansion module, for searching and obtain prefix synonym and suffix synonym at thesaurus according to prefix word and suffix word;
Suffix tree spider module, for travel through hot word suffix tree search prefix matching and or the hot word of suffix match, obtain candidate word information list; Described prefix matching is that the prefix of hot word and described prefix word or prefix synonym match; Described suffix match is that the suffix of hot word mates with described suffix word or suffix synonym;
Hot dictionary builds module, for administering and maintaining for preserving the database of hot word information;
Suffix tree builds module, for administering and maintaining hot word suffix tree; Described hot word suffix tree is that server is searched for hot word according to the high frequency in hot dictionary and set up according to the data structure of broad sense suffix tree;
Historical behavior analysis module, for according to the probability of each candidate word of analytical calculation of user's historical search behavior database;
User's historical search behavioral data library module, for keeping track of history behavioural information;
Described client comprises:
Local relatedness computation module, for according to the local degree of correlation of local each candidate word of historical search database calculated candidate word information list;
Click discreet value computing module, for calculate the click discreet value of each candidate word according to the local degree of correlation of candidate word, candidate word information;
Candidate word is chosen module, for the click discreet value according to candidate word, from candidate word information list, chooses candidate word list;
Local historical search database memory module, for preserving local historical search information, described local historical search information comprises local historical search character string, local historical search time, the local historical search frequency;
Described local relatedness computation module comprises:
Keyword distribution statistics module, for splitting into lists of keywords with participle device by the candidate word in the local historical search character string of local historical search database and candidate word information list and calculating the statistics frequency of each keyword;
Keyword space vector builds module, for build keyword space vector according to the statistics frequency of the keyword of lists of keywords;
Candidate word space vector builds module, for the keyword splitting according to candidate word, in the statistics frequency of lists of keywords keyword, builds candidate word space vector;
Vector cosine computing module, for calculating the cosine value of keyword space vector and candidate word space vector, obtains the local degree of correlation of candidate word.
CN201310653732.6A 2013-12-09 2013-12-09 A kind of method of intelligent prompt, module and system for search Active CN103631929B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310653732.6A CN103631929B (en) 2013-12-09 2013-12-09 A kind of method of intelligent prompt, module and system for search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310653732.6A CN103631929B (en) 2013-12-09 2013-12-09 A kind of method of intelligent prompt, module and system for search

Publications (2)

Publication Number Publication Date
CN103631929A true CN103631929A (en) 2014-03-12
CN103631929B CN103631929B (en) 2016-08-31

Family

ID=50212970

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310653732.6A Active CN103631929B (en) 2013-12-09 2013-12-09 A kind of method of intelligent prompt, module and system for search

Country Status (1)

Country Link
CN (1) CN103631929B (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914569A (en) * 2014-04-24 2014-07-09 百度在线网络技术(北京)有限公司 Input prompt method and device and dictionary tree model establishing method and device
CN104750873A (en) * 2015-04-22 2015-07-01 百度在线网络技术(北京)有限公司 Popular search term push method and device
CN105224554A (en) * 2014-06-11 2016-01-06 阿里巴巴集团控股有限公司 Search word is recommended to carry out method, system, server and the intelligent terminal searched for
CN105488121A (en) * 2015-11-24 2016-04-13 魏强 Accurate retrieval system
CN106126500A (en) * 2016-06-22 2016-11-16 广东亿迅科技有限公司 A kind of statistical method associating hot word
CN107665217A (en) * 2016-07-29 2018-02-06 苏宁云商集团股份有限公司 A kind of vocabulary processing method and system for searching service
CN108227954A (en) * 2017-12-29 2018-06-29 北京奇虎科技有限公司 A kind of method, apparatus and electronic equipment that search input associational word is provided
CN108241740A (en) * 2017-12-29 2018-07-03 北京奇虎科技有限公司 The generation method and device of a kind of search input associational word of timeliness
CN108319376A (en) * 2017-12-29 2018-07-24 北京奇虎科技有限公司 A kind of input association recommendation method and device that optimization business word is promoted
WO2018133624A1 (en) * 2017-01-17 2018-07-26 腾讯科技(深圳)有限公司 Object recommendation method and apparatus, server, and storage medium
CN108536763A (en) * 2018-03-21 2018-09-14 阿里巴巴集团控股有限公司 A kind of drop-down reminding method and device
CN108846016A (en) * 2018-05-05 2018-11-20 复旦大学 A kind of searching algorithm towards Chinese word segmentation
CN109739367A (en) * 2018-12-28 2019-05-10 北京金山安全软件有限公司 Candidate word list generation method and device
CN110286775A (en) * 2018-03-19 2019-09-27 北京搜狗科技发展有限公司 A kind of dictionary management method and device
CN111488426A (en) * 2020-04-17 2020-08-04 支付宝(杭州)信息技术有限公司 Query intention determining method and device and processing equipment
WO2020182123A1 (en) * 2019-03-12 2020-09-17 北京字节跳动网络技术有限公司 Method and device for pushing statement
CN111782947A (en) * 2020-06-29 2020-10-16 北京达佳互联信息技术有限公司 Search content display method and device, electronic equipment and storage medium
CN112925900A (en) * 2021-02-26 2021-06-08 北京百度网讯科技有限公司 Search information processing method, device, equipment and storage medium
CN113032819A (en) * 2019-12-09 2021-06-25 阿里巴巴集团控股有限公司 Method and system for determining search prompt words and information processing method
CN114817690A (en) * 2022-06-28 2022-07-29 江西医之健科技有限公司 Data searching method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930022A (en) * 2012-10-31 2013-02-13 中国运载火箭技术研究院 User-oriented information search engine system and method
CN103258023A (en) * 2013-05-07 2013-08-21 百度在线网络技术(北京)有限公司 Recommendation method and search engine for search candidate words
CN103425687A (en) * 2012-05-21 2013-12-04 阿里巴巴集团控股有限公司 Retrieval method and system based on queries

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425687A (en) * 2012-05-21 2013-12-04 阿里巴巴集团控股有限公司 Retrieval method and system based on queries
CN102930022A (en) * 2012-10-31 2013-02-13 中国运载火箭技术研究院 User-oriented information search engine system and method
CN103258023A (en) * 2013-05-07 2013-08-21 百度在线网络技术(北京)有限公司 Recommendation method and search engine for search candidate words

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914569A (en) * 2014-04-24 2014-07-09 百度在线网络技术(北京)有限公司 Input prompt method and device and dictionary tree model establishing method and device
CN103914569B (en) * 2014-04-24 2018-09-07 百度在线网络技术(北京)有限公司 Input creation method, the device of reminding method, device and dictionary tree-model
CN105224554A (en) * 2014-06-11 2016-01-06 阿里巴巴集团控股有限公司 Search word is recommended to carry out method, system, server and the intelligent terminal searched for
CN104750873A (en) * 2015-04-22 2015-07-01 百度在线网络技术(北京)有限公司 Popular search term push method and device
CN105488121A (en) * 2015-11-24 2016-04-13 魏强 Accurate retrieval system
CN106126500A (en) * 2016-06-22 2016-11-16 广东亿迅科技有限公司 A kind of statistical method associating hot word
CN106126500B (en) * 2016-06-22 2019-02-22 广东亿迅科技有限公司 A kind of statistical method being associated with hot word
CN107665217A (en) * 2016-07-29 2018-02-06 苏宁云商集团股份有限公司 A kind of vocabulary processing method and system for searching service
WO2018133624A1 (en) * 2017-01-17 2018-07-26 腾讯科技(深圳)有限公司 Object recommendation method and apparatus, server, and storage medium
CN108319376A (en) * 2017-12-29 2018-07-24 北京奇虎科技有限公司 A kind of input association recommendation method and device that optimization business word is promoted
CN108241740A (en) * 2017-12-29 2018-07-03 北京奇虎科技有限公司 The generation method and device of a kind of search input associational word of timeliness
CN108227954A (en) * 2017-12-29 2018-06-29 北京奇虎科技有限公司 A kind of method, apparatus and electronic equipment that search input associational word is provided
CN108319376B (en) * 2017-12-29 2021-11-26 北京奇虎科技有限公司 Input association recommendation method and device for optimizing commercial word promotion
CN110286775A (en) * 2018-03-19 2019-09-27 北京搜狗科技发展有限公司 A kind of dictionary management method and device
CN108536763A (en) * 2018-03-21 2018-09-14 阿里巴巴集团控股有限公司 A kind of drop-down reminding method and device
WO2019179208A1 (en) * 2018-03-21 2019-09-26 阿里巴巴集团控股有限公司 Drop-down suggestion list
CN108846016B (en) * 2018-05-05 2021-08-20 复旦大学 Chinese word segmentation oriented search algorithm
CN108846016A (en) * 2018-05-05 2018-11-20 复旦大学 A kind of searching algorithm towards Chinese word segmentation
CN109739367A (en) * 2018-12-28 2019-05-10 北京金山安全软件有限公司 Candidate word list generation method and device
US11030405B2 (en) 2019-03-12 2021-06-08 Beijing Bytedance Network Technology Co., Ltd. Method and device for generating statement
WO2020182123A1 (en) * 2019-03-12 2020-09-17 北京字节跳动网络技术有限公司 Method and device for pushing statement
CN113032819A (en) * 2019-12-09 2021-06-25 阿里巴巴集团控股有限公司 Method and system for determining search prompt words and information processing method
CN111488426A (en) * 2020-04-17 2020-08-04 支付宝(杭州)信息技术有限公司 Query intention determining method and device and processing equipment
CN111488426B (en) * 2020-04-17 2024-02-02 支付宝(杭州)信息技术有限公司 Query intention determining method, device and processing equipment
CN111782947A (en) * 2020-06-29 2020-10-16 北京达佳互联信息技术有限公司 Search content display method and device, electronic equipment and storage medium
CN112925900A (en) * 2021-02-26 2021-06-08 北京百度网讯科技有限公司 Search information processing method, device, equipment and storage medium
CN112925900B (en) * 2021-02-26 2023-10-03 北京百度网讯科技有限公司 Search information processing method, device, equipment and storage medium
CN114817690A (en) * 2022-06-28 2022-07-29 江西医之健科技有限公司 Data searching method and system

Also Published As

Publication number Publication date
CN103631929B (en) 2016-08-31

Similar Documents

Publication Publication Date Title
CN103631929A (en) Intelligent prompt method, module and system for search
Chung A Brief Survey of PageRank Algorithms.
US10528662B2 (en) Automated discovery using textual analysis
CN103838833A (en) Full-text retrieval system based on semantic analysis of relevant words
Wang et al. Retrieving complex tables with multi-granular graph representation learning
Du et al. An approach for selecting seed URLs of focused crawler based on user-interest ontology
CN106547864A (en) A kind of Personalized search based on query expansion
Singhal et al. Leveraging web intelligence for finding interesting research datasets
CN105447131B (en) Internet resources relatedness determines method and apparatus
CN104281565A (en) Semantic dictionary constructing method and device
Ahmadi et al. Unsupervised matching of data and text
CN107832319B (en) Heuristic query expansion method based on semantic association network
Wang et al. TSMH Graph Cube: A novel framework for large scale multi-dimensional network analysis
CN102708104B (en) Method and equipment for sorting document
CN108932247A (en) A kind of method and device optimizing text search
CN107133274A (en) A kind of distributed information retrieval set option method based on figure knowledge base
Xia et al. Graph-based web query classification
CN104794200A (en) Event publishing and subscribing method supporting fuzzy matching based on ontology
Slaninov et al. Web site community analysis based on suffix tree and clustering algorithm
Xu et al. Query recommendation based on improved query flow graph
Du et al. A novel page ranking algorithm based on triadic closure and hyperlink-induced topic search
Wang et al. Knowledge graph-based semantic ranking for efficient semantic query
Naik et al. Tweet analytics and tweet summarization using graph mining
Liu et al. A cascade information diffusion prediction model integrating topic features and cross-attention
Bama et al. Improved pagerank algorithm for web structure mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 211100, No. 100, general road, Jiangning Economic Development Zone, Jiangsu, Nanjing

Applicant after: JIANGSU WISEDU EDUCATION INFORMATION TECHNOLOGY CO., LTD.

Address before: 211100, No. 100, general road, Jiangning Economic Development Zone, Jiangsu, Nanjing

Applicant before: Jiangsu Wisedu Information Technology Co., Ltd.

COR Change of bibliographic data
C14 Grant of patent or utility model
GR01 Patent grant