CN103631929B - A kind of method of intelligent prompt, module and system for search - Google Patents
A kind of method of intelligent prompt, module and system for search Download PDFInfo
- Publication number
- CN103631929B CN103631929B CN201310653732.6A CN201310653732A CN103631929B CN 103631929 B CN103631929 B CN 103631929B CN 201310653732 A CN201310653732 A CN 201310653732A CN 103631929 B CN103631929 B CN 103631929B
- Authority
- CN
- China
- Prior art keywords
- word
- candidate word
- suffix
- search
- prefix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2425—Iterative querying; Query formulation based on the results of a preceding query
Abstract
The invention discloses a kind of method of intelligent prompt, module and system for search.The method according to the invention, server performs following steps: separate prefix word and suffix word by segmenter;Synonym is extended to prefix synonym and suffix Alphabetical List;Then traversal hot word suffix tree searches prefix matching and or the hot word acquisition candidate word of suffix match;Again by the probability of each candidate word of analytical calculation of user's historical search behavior.Client executing following steps: calculate the locally associated degree of candidate word;Calculate the click discreet value of candidate word, then according to clicking on discreet value from selecting candidate word to show.In the present invention, cue is obtained by prefix word and suffix word coupling, and combines synonym, and combines the intention of numerous user search, in conjunction with locally associated degree, so that cue is closer to user search intent.
Description
Technical field
The present invention relates to the keyword search in data search, data mining, particularly relate to keyword input in artificial
Intelligence.
Background technology
Intelligent prompt is that a kind of help user clearly inputs intention, facilitates user fast to input, improves the side of Consumer's Experience
Method.Intelligent prompt is mainly used in search engine and development platform, can be according to the input of user, by combobox or mark
What label etc. were different represents form, automatically points out to user.
User's search history data that the most first statistical server end of main flow search engine preserves at present, according to search word
Search rate, set up popular word dictionary, after user inputs keyword, according to the method for string prefix coupling, from hot topic
Word dictionary is searched candidate's cue, filters out cue further according to search rate, be presented in below search box successively.This intelligence
Can point out, utilize string prefix matched and searched candidate's cue, some candidates relevant to search keyword may be omitted
Cue.Utilize search rate screening candidate's cue in popular word dictionary, be not bound with the search history that active user is local
Data, may result in the cue and user search intent deviation provided.The habitual language that has its source in of the problems referred to above occurs
Speech expression way.In Chinese, the word of modification noun is always before modificand.Such as " casual pants ", only wherein " lies fallow "
It is qualifier, and " trousers " are only main noun.User, after client input " casual pants ", is screened by the mode of prefix matching
Go out is all the content relevant to " leisure ".But actually user is primarily intended to search the content relevant to " trousers ".This causes carrying
Show that obvious deviation occur in word and user search intent.
Summary of the invention
Problem to be solved by this invention is the rational problem of cue in search engine.
For solving the problems referred to above, the scheme that the present invention uses is as follows:
According to a kind of intelligent prompt method for search of the present invention, including client and server, client and clothes
Business device is connected by network, and the method comprises the following steps:
S21: client obtains init string;
S22: client sends init string to server;
S29: server receives init string;
S3: server obtains candidate word information list according to init string search hot word;
Candidate word information list is sent to terminal by S41: server;
S49: client receives candidate word information list;
S5: client obtains candidate word list according to candidate word information list;
S91: client shows candidate word list;
It is characterized in that, described step S3 includes:
S31: server splits init string according to segmenter and obtains prefix word and suffix word;
S32: server searches acquisition prefix synonym and suffix synonym according to prefix word and suffix word in thesaurus
Word;
S33: server traversal hot word suffix tree search prefix matching and or the hot word of suffix match, it is thus achieved that candidate word information
List;
Wherein, described thesaurus is that server is for preserving the database of synonym incidence relation between keyword;Described
Hot word suffix tree to be server search for hot word according to the high frequency in hot word bank sets up according to the data structure of generalized suffix tree;
Described hot word bank is that server is for preserving the database of hot word information;Described hot word information include hot word, hot word sequence number and
The hot word search frequency;Described prefix matching is that the prefix of hot word matches with described prefix word or prefix synonym;Described
Suffix match is that the suffix of hot word mates with described suffix word or suffix synonym.
Further, according to the intelligent prompt method for search of the present invention, it is characterised in that the method also includes:
S34: server is according to the probability of each candidate word of analytical calculation of user's historical search behavior database;
Wherein, described user's historical search behavior database is used for preserving historical behavior information.
Further, according to the intelligent prompt method for search of the present invention, it is characterised in that described step S34 includes:
It is identical with init string that S34a1: server searches original character string in user's historical search behavior database
And click on the historical behavior information that hot word is identical with candidate word, it is thus achieved that the click frequency of candidate word;
S34a2: server does the probability of normalized acquisition candidate word according to candidate word is clicked on the frequency;
Wherein, described historical behavior information includes original character string, clicks on hot word and click on the frequency.
Further, according to the intelligent prompt method for search of the present invention, it is characterised in that described step S34 includes:
S34b1: according to candidate word in user's historical search behavioral data library lookup historical behavior information;
S34b2: add up different prefix matching modes and the click under different suffix match modes under this historical behavior information
The frequency;
S34b3: the click frequency under different prefix matching modes and different suffix match mode is carried out natural logrithm fortune
Calculate the logit value obtained under different prefix matching mode and different suffix match mode;
S34b4: according to binary linear regression parametric equation computing formulaMiddle parameterValue;
S34b5: according to formulaCalculate the probability of candidate word, wherein;
S34b6;The probability of the candidate word of each candidate word of normalized;
Wherein, described historical behavior information includes clicking on hot word, the click frequency of nine kinds of candidate word match-types.
Further, according to the intelligent prompt method for search of the present invention, it is characterised in that described step S5 includes:
S51: client calculates this locality of each candidate word in candidate word information list according to local historical search data storehouse
The degree of correlation;
S52: client is estimated according to locally associated degree, the click of the candidate word information each candidate word of calculating of candidate word
Value;
S53: client chooses candidate word list according to the click discreet value of candidate word from candidate word information list;
Wherein, described local historical search data storehouse is that client is for preserving local historical search information;Described this locality
Historical search information includes local historical search character string, local historical search time, the local historical search frequency;Described step
S51 includes:
S511: by segmenter by the local historical search character string in local historical search data storehouse and candidate word information row
Candidate word in table splits into lists of keywords and calculates the statistics frequency of each keyword;
S512: build keyword space vector according to the statistics frequency of the keyword in lists of keywords;
S513: according to the statistics frequency structure candidate word sky of keyword keyword in lists of keywords that candidate word splits
Between vector;
S514: calculate keyword space vector and the cosine value of candidate word space vector, it is thus achieved that candidate word locally associated
Degree.
Further, according to the intelligent prompt method for search of the present invention, it is characterised in that institute in described step S511
The statistics frequency calculating keyword stated includes the step of the statistics frequency of temporally weighted calculation.
Further, according to the intelligent prompt method for search of the present invention, it is characterised in that in described step S52:
CTR = A×R×C;Wherein CTR is the click discreet value of candidate word;A is the probability of candidate word;R is candidate word
Locally associated degree;C is the constant that the type according to candidate word determines.
Further, according to the intelligent prompt method for search of the present invention, it is characterised in that in described step S52:
CTR = A×R×C×P;Wherein CTR is the click discreet value of candidate word;A is the probability of candidate word;R is candidate
The locally associated degree of word;C is the constant that the type according to candidate word determines;P is the search frequency of candidate word;Wherein said time
Word information is selected also to include the search frequency of candidate word.
A kind of intelligent prompt device for search according to the present invention, it is characterised in that including:
Participle device, is used for splitting init string and obtains prefix word and suffix word;
Synonym expanding unit, for according to prefix word and suffix word searches in thesaurus acquisition prefix synonym with after
Sew synonym;
Suffix tree traversal device, for travel through hot word suffix tree search prefix matching and or the hot word of suffix match, it is thus achieved that
Candidate word information list;Described prefix matching is that the prefix of hot word matches with described prefix word or prefix synonym;Described
The suffix that suffix match is hot word mate with described suffix word or suffix synonym;
Hot word bank construction device, is used for preserving the database of hot word information for management and maintenance;
Suffix tree construction device, is used for managing and safeguard hot word suffix tree;Described hot word suffix tree be server according to
High frequency search hot word in hot word bank is set up according to the data structure of generalized suffix tree;
Historical behavior analytical equipment, for each candidate word of analytical calculation according to user's historical search behavior database
Probability;
User's historical search behavior database device, is used for preserving historical behavior information.
Further, according to a kind of intelligent prompt system for search of the present invention, including client and server, client
End is connected by network with server, it is characterised in that:
Described server includes:
Word-dividing mode, is used for splitting init string and obtains prefix word and suffix word;
Synonym expansion module, for according to prefix word and suffix word searches in thesaurus acquisition prefix synonym with after
Sew synonym;
Suffix tree spider module, for travel through hot word suffix tree search prefix matching and or the hot word of suffix match, it is thus achieved that
Candidate word information list;Described prefix matching is that the prefix of hot word matches with described prefix word or prefix synonym;Described
The suffix that suffix match is hot word mate with described suffix word or suffix synonym;
Hot word bank builds module, is used for preserving the database of hot word information for management and maintenance;
Suffix tree builds module, is used for managing and safeguard hot word suffix tree;Described hot word suffix tree be server according to
High frequency search hot word in hot word bank is set up according to the data structure of generalized suffix tree;
Historical behavior analyzes module, for each candidate word of analytical calculation according to user's historical search behavior database
Probability;
User's historical search behavioral data library module, is used for preserving historical behavior information;
Described client includes:
Locally associated degree computing module, for calculating in candidate word information list each according to local historical search data storehouse
The locally associated degree of candidate word;
Click on discreet value computing module, calculate each candidate for the locally associated degree according to candidate word, candidate word information
The click discreet value of word;
Candidate word chooses module, chooses candidate word row for the click discreet value according to candidate word from candidate word information list
Table;
Local historical search data library storage module, is used for preserving local historical search information, and described local history is searched
Rope information includes local historical search character string, local historical search time, the local historical search frequency;
Described locally associated degree computing module includes:
Keyword distribution statistics module, is used for the local historical search word in local historical search data storehouse by segmenter
Candidate word in symbol string and candidate word information list splits into lists of keywords and calculates the statistics frequency of each keyword;
Keyword space vector builds module, builds key for the statistics frequency according to the keyword in lists of keywords
Word space vector;
Candidate word space vector builds module, for the keyword keyword in lists of keywords split according to candidate word
The statistics frequency build candidate word space vector;
Vector cosine computing module, for calculating the cosine value of keyword space vector and candidate word space vector, it is thus achieved that
The locally associated degree of candidate word.
The technique effect of the present invention is as follows:
1, in the present invention, cue is obtained by prefix word and suffix word coupling, and combines synonym, is therefore easier to
Close to the implication expressed by language.
2, in the present invention, the structure hot word generalized suffix tree that fits through of prefix word and suffix word realizes, and combines hot word
Sequence number so that search procedure is quick, and the CPU time of consumption is few.
3, in the present invention, final cue combines probability calculation, the meaning of the Probabilistic Synthesis of calculating numerous users search
Figure, so that cue is closer to user search intent.
4, in the present invention, final cue combines locally associated degree, analyzes user by user's search history and searches for meaning
Figure, so that cue is closer to user search intent.
Detailed description of the invention
Below the present invention be invention and claims and be described in further detail.
One, the application scenario of the present invention and applied environment
The present invention is applied to the intelligent prompt of search engine.During search, user is needed by the text edit box input of webpage
Character string to be searched for, then according to assembly of the invention, method or system, with combobox under the text edit box of webpage
Multiple cues that form display user may search for, after user selects the cue in combobox, search engine is according to prompting
Word scans for.Certainly, after there is the combobox of cue, user can not also select combobox to continue with text, then
Search engine scans for according to the text of input.The benefit using the combobox of intelligent prompt is to facilitate user to input, and reduces
It is artificial and time-consuming that user version inputs.The acquisition main process of cue can be generalized into following steps:
S21: client obtains init string;
S22: client sends init string to server;
S29: server receives init string;
S3: server obtains candidate word information list according to init string search hot word;
Candidate word information list is sent to terminal by S41: server;
S49: client receives candidate word information list;
S5: client obtains candidate word list according to candidate word information list;
S91: client shows candidate word list;
In said process, client can mainly occur with form web page.Special application journey can certainly be fabricated to
Sequence realizes.The client of form web page is typically mounted on user terminal.User accesses the clothes of search engine in the way of webpage
Business device.Certainly, in the present invention, client can also be arranged on server side.The situation that client is positioned at server side is all right
It is interpreted as that certain application program is divided into client modules and server module, client modules and server module and is respectively this
Bright client and server.Now, " network " that be used for connecting both between client modules and server module can be managed
Solution becomes communication mode the most widely, such as by local internal memory, or pipeline (Pipe), or socket (Socket) etc..
In said process, in step S21, " client acquisition init string " can be understood as that aforementioned " user passes through webpage
Text edit box input need search character string ".According to the aforementioned understanding to client, " client obtains original character
String " step can also pass through other forms.In general, the init string that client obtains is by the character being manually entered
String, and be that user's input process is obtained by client, the most also non-user finally needs the character string of search.
In said process, in step S91, " client displaying candidate word list " can be understood as the aforesaid " literary composition at webpage
Show, with the form of combobox, multiple cues that user may search under this edit box ", candidate word is also cue, multiple
Cue constitutes candidate word list.
Said process can be understood as the prior art of the present invention, because a lot of search engine is also really according to above-mentioned steps
Realize the process of intelligent prompt.The present invention solves problem to be solved by this invention being embodied as by step S3 and step S5
Realize.The follow-up description of this specification is embodied as and and step S3, S5 phase mainly for step S3 and step S5
The technology contents closed.And for other steps in said process, it will be understood by those skilled in the art that this specification is the most detailed
State.
Two, the basic conception in this specification
The keyword of indication of the present invention is can be expressed certain semantic word by segmenter by obtain after character string fractionation.
Such as " casual pants " split after obtain two keywords, " leisure " and " trousers ".
The prefix word of indication of the present invention is first key in the keyword that will be obtained after character string fractionation by segmenter
Word.Such as " casual pants " split after obtain two keywords, " leisure " and " trousers ".Wherein " lie fallow " is prefix word.
The suffix word of indication of the present invention is that last in the keyword that will be obtained after character string fractionation by segmenter is closed
Keyword.Such as " casual pants " split after obtain two keywords, " leisure " and " trousers ".Wherein " trousers " are suffix word.
It will be appreciated by those skilled in the art that if character string can only obtain a keyword after being split by segmenter, then should
Keyword be i.e. prefix word be again suffix word.
The candidate word of indication of the present invention is the character string being made up of one or more keywords.
The candidate word list of indication of the present invention can be understood as the array of multiple candidate word composition.
The candidate word information of indication of the present invention includes candidate word and the attribute information of candidate word or only candidate word.Wait
The attribute information selecting word can include the search frequency of candidate word, the probability of candidate word and/or the locally associated degree of candidate word.
The candidate word information list of indication of the present invention can be understood as the array of multiple candidate word information composition.
The segmenter of indication of the present invention is module or the device for character string splits into multiple keyword, mainly passes through
Character string is split into multiple keyword by dictionary lookup.It will be appreciated by those skilled in the art that segmenter is prior art.In the present invention
Specific implementation process in, segmenter can by market buy obtain, it is also possible to oneself structure.
The hot word of indication of the present invention is the character string being made up of one or more keywords, is used for preserving user for server
The character string of search history.
The hot word information of indication of the present invention includes that hot word, hot word sequence number, hot word search for the frequency.Wherein hot word sequence number is used for building
The vertical index quickly searched, the hot word search frequency is for adding up the number of times that hot word is searched.
Three, embodiment 1
In the present embodiment, abovementioned steps S3 is realized by following steps:
S31: server splits init string according to segmenter and obtains prefix word and suffix word;
S32: server searches acquisition prefix synonym and suffix synonym according to prefix word and suffix word in thesaurus
Word;
S33: server traversal hot word suffix tree search prefix matching and or the hot word of suffix match, it is thus achieved that candidate word information
List.
In the present embodiment, thesaurus is that server is for preserving the database of synonym incidence relation between keyword.With
Justice dictionary is generally provided by business dictionary, it is also possible to oneself is set up.
In the present embodiment, step S31 is realized by word-dividing mode or device.Word-dividing mode or device namely aforesaid participle
Device.It will be appreciated by those skilled in the art that the prefix word after step S31 processes and suffix word are probably identical.Prefix word and after
Sew word identical when, prefix synonym is the most identical with suffix synonym, and therefore step S32 can be done simplification and processes, and only needs
The synonym of prefix word to be searched for or the synonym of suffix word.
In the present embodiment, step S32 is realized by synonym expansion module or device.It will be appreciated by those skilled in the art that a word
Synonym may have multiple, therefore step S32 obtain prefix synonym and suffix synonym are usually a list.
In the present embodiment, step S33 is realized by suffix tree spider module or device.Here, before prefix matching is hot word
Sew and match with described prefix word or prefix synonym;Suffix match is suffix and described suffix word or the suffix synonym of hot word
Coupling.The hot word of search that what " with or " in " prefix matching and or suffix match " represented is may meet prefix matching or after
Sew coupling or prefix suffix all mates.Suffix tree spider module or device are realized by traversal hot word suffix tree.Hot word suffix
Set and set up for server searches for the data structure of hot word foundation generalized suffix tree according to the high frequency in hot word bank.Hot word suffix tree
Foundation build module or device by suffix tree and realize.Suffix tree builds module or device, after being used for managing and safeguarding hot word
Sew tree.Total well known, suffix tree (Suffix tree) is for for supporting the tree-like of effective string matching and inquiry
Data structure.Suffix tree can express a character string, and generalized suffix tree can express multiple character string.The structure of generalized suffix tree
Building and traversal is prior art, this specification is not repeated.It should be noted that the hot word in hot word suffix tree is from hot word
Storehouse, but the hot word in hot word suffix tree does not comprise all of hot word in hot word bank, the simply hot word of hot word bank medium-high frequency search.
The hot word of high frequency search can be by obtaining according to the search frequency sequence of hot word hot word all of in hot word bank: first basis
Hot word in hot word bank is sorted by the search frequency of hot word in descending order, then obtains top n hot word in the hot word after sequence.
N is the most usually previously set, and such as 10000 or 100000 etc..Highly efficient method can also be before sequence
The search frequency doing the threshold filtering once by the search frequency of hot word, only hot word is just entered more than the hot word of a certain setting threshold value
Row sequence.
In the present embodiment, aforesaid hot word bank be server for preserving the database of hot word information, these data are also used for
Preserve user's search history.Preserve user's search history and built module or device realization by hot word bank.Hot word bank build module or
Device is for management and safeguards the database for preserving hot word information.Hot word information includes that hot word, hot word sequence number, hot word are searched for
The frequency.The process preserving user's search history is as follows: user submits to searched character string to ask by user end to server
After search, server receives after searched character string while performing search, also performs to make searched character string
Add the step to hot word bank for hot word: if hot word bank has been preserved this searched character string, then will be corresponding
The hot word search frequency adds 1, otherwise by character string searched for preservation to hot word bank, and the search frequency of this hot word is set to 1.
It should be noted that the array that the candidate word information list that step S33 obtains is multiple candidate word information composition.This
In embodiment, candidate word information is only hot word, and the candidate word list obtained in step S5 is candidate word information list.?
Under other embodiments and follow-up embodiment candidate word information can include more content: the hot word sequence of such as candidate word
Number, the attribute information of candidate word.
Four, embodiment 2
The present embodiment is set up on the basis of embodiment 1, specifically, add a step after step S33 of embodiment 1
Suddenly, i.e. step S34: server is according to the probability of each candidate word of analytical calculation of user's historical search behavior database.
Step S34 of the present embodiment is realized by historical behavior analytical equipment or device, problem to solve is that certain
It is defeated that the statistical analysis of one specific candidate word user's historical search obtains user view under conditions of user inputs init string
Enter the probability of this candidate word.The input of the present embodiment is the candidate word information list that step S33 obtains, and output is also believed for candidate word
Breath list, but the candidate word information in the candidate word information list of output adds the probability of candidate word.
The calculating of the probability of candidate word is calculated by user's historical search behavioural analysis and obtains.User's historical search behavior number
According to being saved in user's historical search behavior database, this process is real by device or the module of user's historical search behavior database
Existing.User's historical search behavior database saves historical behavior information.The method realizing step S34 has a variety of.The present invention
Specification provides two kinds of embodiments therein: embodiment 1 and embodiment 2.Wherein embodiment 1 is a kind of simple
Embodiment.Embodiment 2 is by the logistic regression algorithm method to the match-type statistical analysis of candidate word.
Embodiment 1
If historical behavior information includes original character string, clicks on hot word and click on the frequency.Server is in user's historical search
Behavior database is searched the historical behavior letter that original character string is identical with init string and click hot word is identical with candidate word
Breath.The click frequency in historical behavior information can be as the probability of candidate word.It is greater than the integer of 0 owing to clicking on the frequency, and
Probability in general sense is the value between 0 ~ 1, can also click on each candidate word after the frequency does normalized for this and make
For the probability of candidate word, click on frequency normalized and be referred to following method: set and candidate word information list includes K
Candidate word, the click frequency of each candidate word is respectively as follows:, then the probability of i-th candidate word is:.Under present embodiment, said process can be summarized as:
It is identical with init string that S34a1: server searches original character string in user's historical search behavior database
And click on the historical behavior information that hot word is identical with candidate word, it is thus achieved that the click frequency of candidate word;
S34a2: server does the probability of normalized acquisition candidate word according to candidate word is clicked on the frequency.
Under present embodiment, historical behavior information generates by the following method: when after client executing step S91, user
The candidate word list shown in step S91 can be selected.After user selects the candidate word list shown in step S91, initial word
Symbol string and selected candidate word are simultaneously sent to server, and ask retrieval.Server receives init string and selected
Candidate word after, perform retrieval and aforementioned selected candidate word added while hot word bank step, also performing will be initial
Character string and selected candidate word add the step of access customer historical search behavior database.Here, init string is and goes through
Original character string in history behavioural information, selected candidate word is click hot word.Init string and selected candidate
Word adds the realization by the following method of access customer historical search behavior database: if in user's historical search behavior database
The saved corresponding relation record having original character string and clicking on hot word, then add 1 by clicking on the frequency accordingly, otherwise will preserve former
Beginning character string and click hot word are to hot word bank, and are set to 1 by clicking on the frequency accordingly.
Embodiment 2
If historical behavior information includes clicking on hot word, the click frequency of nine kinds of candidate word match-types.Nine kinds of candidate word
Join type and include five kinds of fundamental types: non-matching type, prefix matching type, suffix match type, prefix synonym match-type,
Suffix synonym match-type;And four kinds of composite types: prefix suffix match type, prefix suffix synonym match-type, prefix
Coupling suffix syntype and prefix synonym suffix match type.The match-type of above-mentioned nine kinds of candidate word is grouped into two solely
Vertical variable: x1And x2。x1Representing prefix matching mode, possible values is that prefix is not mated, prefix synonym mates, prefix matching, point
Not with 1,4,5 numeric representations.x2Representing suffix match mode, possible values is that suffix does not mates, suffix synonym mates, suffix
Join, respectively with 1,4,5 numeric representations.The probability that then candidate word is chosen is:
, wherein,For undetermined parameter.
Following it isThe computational methods of undetermined parameter.
The probability that candidate word is not chosen is:
The ratio of probability that candidate word is chosen and the probability that candidate word is not chosen is:
。
Obtain after logit conversion:
。
Under present embodiment, can obtain according to the click frequency of the various candidate word match-types in historical behavior information
The value of logit and the value of x1 and x2.
If one clicks on the value that in the historical behavior information that hot word is corresponding, the click frequency of nine kinds of candidate word match-types preserves
For:
{ 73,98,119,67,89,342,137,123,99}.
Then can obtain the data of following form:
x1 | x2 | Click on the frequency | Logit value |
1(prefix is not mated) | 1(suffix does not mates) | 73 | 4.29 |
1(prefix is not mated) | 4(suffix synonym mates) | 89+137+123=349 | 5.86 |
1(prefix is not mated) | 5(suffix match) | 119+342+99=560 | 6.33 |
4(prefix synonym mates) | 1(suffix does not mates) | 67+137+99=303 | 5.71 |
4(prefix synonym mates) | 4(suffix synonym mates) | 137 | 4.92 |
4(prefix synonym mates) | 5(suffix match) | 99 | 4.60 |
5(prefix matching) | 1(suffix does not mates) | 98+342+123=563 | 6.33 |
5(prefix matching) | 4(suffix synonym mates) | 123 | 4.81 |
5(prefix matching) | 5(suffix match) | 342 | 5.83 |
According to the data of above table, use binary linear regression parametric equation i.e. can obtain this click hot wordParameter value.Then the probability calculation chosen further according to aforesaid candidate word obtains the probability of current candidate word.
The probability obtaining candidate word further can also do normalized.Under present embodiment, said process may be summarized to be following
Step:
S34b1: according to candidate word in user's historical search behavioral data library lookup historical behavior information;
S34b2: add up different prefix matching modes and the click under different suffix match modes under this historical behavior information
The frequency;
S34b3: the click frequency under different prefix matching modes and different suffix match mode is carried out natural logrithm fortune
Calculate the logit value obtained under different prefix matching mode and different suffix match mode;
S34b4: according to binary linear regression parametric equation computing formulaMiddle parameterValue;
S34b5: according to formulaCalculate the probability of candidate word, wherein;
S34b6;The probability of the candidate word of each candidate word of normalized.
Under present embodiment, user's historical search behavior database can be independent database;Can also be with aforesaid
Hot word bank is same for merging into, and i.e. preserves historical behavior information by aforesaid hot word bank.History row is preserved using hot word bank
For under the mode of information, hot word bank also saving the click frequency of the various candidate word match-types in historical behavior information, heat
Hot word in dictionary is the click hot word in aforementioned historical behavior information, the click frequency of nine kinds of candidate word match-types total
The search frequency with the hot word being in previous embodiment 1.Under present embodiment, as step S34 input, step S33 defeated
Candidate word information in the candidate word information list gone out has included at least two contents: the coupling class of hot word sequence number and candidate word
Type.The leaf node of aforesaid hot word suffix tree saves hot word sequence number, when performing step S33, and traversal hot word suffix tree coupling
The candidate word obtained incidentally has gone up the hot word sequence number of hot word suffix tree leaf node preservation and according to the mode mated, and travels through hot word
The candidate word information that suffix tree coupling obtains also appends the match-type of candidate word.
Under present embodiment, historical behavior information is realized by the process preserving user's search history.In present embodiment
The process preserving user's search history be with the difference of process preserving user's search history in previous embodiment 1:
Under embodiment, in addition it is also necessary to scan for the differential counting of the frequency according to the match-type of candidate word.Under present embodiment, user
After selecting the candidate word list shown in step S91, init string and selected candidate word are simultaneously sent to server, and
(embodiment 1 that this process sees aforementioned the present embodiment) is retrieved in request.
By to above two embodiment, it will be appreciated by those skilled in the art that different steps S34 realizes typically requiring not
Same mathematical method, and similar existing mathematics method of estimation has a variety of, the method therefore realizing step S34 also has a lot
Kind.Those skilled in the art understand, the probability of the candidate word that step S34 obtains only estimate, actual being also impossible to has reached
Complete accurate, therefore should exist by allowable error, also should allow the difference of parameter under above two embodiment.People in the art
Member understands, above-mentioned steps S34 calculates the probabilistic process of candidate word and simply inputs for follow-up process, therefore in actual application also
Can be using the product of the probability of candidate word obtained above and the search frequency of candidate word as the probability of candidate word.
Five, embodiment 3
The present embodiment is set up on the basis of embodiment 1 or embodiment 2, specifically, at embodiment 1 or the base of embodiment 2
On plinth, step S5 therein further improved and optimized.In the present embodiment, step S5 comprises the following steps:
S51: client calculates this locality of each candidate word in candidate word information list according to local historical search data storehouse
The degree of correlation;
S52: client is estimated according to locally associated degree, the click of the candidate word information each candidate word of calculating of candidate word
Value;
S53: client chooses candidate word list according to the click discreet value of candidate word from candidate word information list;
Wherein, described local historical search data storehouse is client for preserving local historical search information, described
Local historical search information includes local historical search character string, local historical search time, the local historical search frequency.Step
S51 is calculated device by locally associated degree or module realizes;Step S52 is estimated value calculation apparatus by click or module realizes;Step
S53 by
Candidate word selecting device or module realize.Wherein, step S51 comprises the following steps:
S511: by segmenter by the local historical search character string in local historical search data storehouse and candidate word information row
Candidate word in table splits into lists of keywords and calculates the statistics frequency of each keyword;
S512: build keyword space vector according to the statistics frequency of the keyword in lists of keywords;
S513: according to the statistics frequency structure candidate word sky of keyword keyword in lists of keywords that candidate word splits
Between vector;
S514: calculate keyword space vector and the cosine value of candidate word space vector, it is thus achieved that candidate word locally associated
Degree.
Wherein, step S511 is realized by keyword distribution statistics device or module;Step S512 is by keyword space vector
Construction device or module realize;S513 is realized by candidate word space vector construction device or module;Step S514 is by vector cosine
Calculate device or module realizes.Step S511 is divided into again two steps: step S511a: by segmenter by this locality historical search number
Lists of keywords the statistics frequency calculating each keyword and step is split into according to the local historical search character string in storehouse
S511b: by segmenter, the candidate word in candidate word information list split into lists of keywords and calculate the system of each keyword
The meter frequency.Step S511a and step S511b obtain same lists of keywords after performing.Calculate for explanation above-mentioned steps 51 and wait
Select the process of the locally associated degree of word, existing illustration.
It is provided with array lhi that content is a length of n in local historical search data storehouse, and is defined as follows:
struct LocalHistInfo
{
String sSearch;
DateTime tRecent;
int nCount;
} lhi[n];
Each member of array lhi is local historical search information.Local historical search information structure
LocalHistInfo represents.Wherein, sSearch is local historical search character string;TRecent is the local historical search time,
Record is the last time searched for;NCount is the local historical search frequency.Step S511a can pass through procedure below
Realize:
for (int i=0;i<n;i++)
{
struct LocalHistInfo item = lhi[i];
StringArray arKeys;
WordSplit(item.sSearch, arKeys);// by segmenter, by this locality historical search character
String is divided into keyword
item.nCount = TimeWeightCount(item.tRecent,item.nCount);// temporally
The frequency step of weighting
for (int j=0;j<arKeys.GetCount();j++)
{ // by the keyword after segmentation combines local historical search time and the local historical search frequency, joins
In vKey
vKey.Add(arKeys[j], item.nCount);
}
}
Said process i.e. constitutes aforesaid step S511a.Wherein, vKey is for representing lists of keywords
The example of VecterKey class.Add is the method for class VecterKey.It is defined as follows:
class VecterKey {
Array< KeyItem *> m_arData;
…
int VeckterKey::Add(string sKey, int nCount)
{
KeyItem * pItem=NULL;
bool bFind = FindKey(sKey, pItem);// lookup keyword has existed
if (!BFind) // if there is no the most newly-built keyword
{
pItem = new KeyItem;
pItem->sKey = sKey;
pItem->nCount = nCount;// add up the search frequency of this keyword
m_arData.Add(pItem);
Else // otherwise to keyword
pItem->nCount += nCount;// add up the search frequency of this keyword
return bFind;
} // end of Add
}; // end of VecterKey
Wherein KeyItem represents the structure of keyword, can be expressed as:
struct KeyItem
{
string sKey;
int nCount;
};
In the above results, sKey is keyword, and nCount is the statistics frequency that keyword is corresponding.
In like manner, step S511b also uses above-mentioned similar step S511a, joins aforementioned after candidate word candidate word being split
VKey in, but the local historical search frequency of candidate word can be the hot word that 1 fixing or server hot word bank preserves
The search frequency (seeing embodiment 1).It should be noted that local historical search word in above-mentioned local historical search data storehouse
Step TimeWeightCount of the calculating frequency that symbol has temporally to weight when serially adding into lists of keywords vKey.This area
Artisans understand that, the step of the calculating frequency that this temporally weights can also be omitted.The step of the calculating frequency temporally weighted
Rapid is the preferential embodiment of the present invention.The calculating frequency temporally weighted according to being the local historical search time and current time
Between time interval the local historical search frequency is adjusted.Simple method can be: when time interval more than 1 month then
Weight coefficient is 1;If time interval is between two weeks and 1 month, weight coefficient is 2;Time interval is in a week and half
Between individual month, then weight coefficient is set to 3;If time interval is less than a week, weight coefficient is set to 5.
After above-mentioned steps S511a and S511b have performed, obtain lists of keywords vKey.Extract all of pass in vKey
The statistics frequency of keyword i.e. can get the keyword space vector Ks_Vector={ v in step S5121, v2, v3..., vm}.
The number of keyword during wherein m is vKey, is expressed as the keyword space vector of m dimension;Each dimension values v of vectoriThe most right
Answer the statistics frequency of each keyword.
Multiple keyword can be obtained after certain candidate word segmenter in candidate word information list is split use
HintKeys represents.If the keyword in aforesaid vKey exists in HintKeys, then set vector value as this keyword
The statistics frequency, being otherwise set to this vector value is the 0 candidate word space vector Hs_Vector={ w that can also obtain m dimension1, w2,
w3..., wm}.In candidate word space vector Hs_Vector, if certain dimension wiCorresponding vector value is 0, then wiCorresponding pass
Keyword the most, in lists of keywords HintKeys that candidate word splits, otherwise can not represent that this keyword is present in candidate word and tears open
In lists of keywords HintKeys divided.The process of aforementioned acquisition candidate word space vector Hs_Vector is aforesaid step
S513。
Keyword space vector Ks_Vector and step S513 of the m dimension obtained according to abovementioned steps S512 obtain
The candidate word space vector Hs_Vector of m dimension use vector cosine formula i.e. to can get cosine value λ:
The process using above-mentioned formula to calculate cosine value λ is abovementioned steps S514.Cosine value λ can be used as candidate word
Locally associated degree.As waiting after the cosine value of each above-mentioned candidate word can also being normalized in reality is implemented
Select the locally associated degree of word: set the cosine value of each candidate word as { λ1, λ 2, λ 3 ..., λ K}, wherein K indicates K candidate word,
The locally associated degree that then candidate word i is corresponding is:。
Calculate candidate word in step S52 clicks on the subsequent step that discreet value process is step S51.The input of step S52
Depend on the calculated value of the locally associated degree of step S51.The click discreet value process of step S52 candidate word, this specification is given
Two kinds of embodiments:
Embodiment 1:CTR=A × R × C;Wherein CTR is the click discreet value of candidate word;A is the probability of candidate word;R
Locally associated degree for candidate word;C is the constant that the type according to candidate word determines.
Embodiment 2:CTR=A × R × C × P;Wherein CTR is the click discreet value of candidate word;A is the general of candidate word
Rate;R is the locally associated degree of candidate word;C is the constant that the type according to candidate word determines;P is the search frequency of candidate word.
In above two embodiment, the probability of candidate word is the probability of the candidate word in embodiment 2, it can be seen that on
State on the basis of two kinds of embodiments are built upon embodiment 2.Candidate in " C is the constant that the type according to candidate word determines "
The type of word is the match-type of aforementioned candidates word.The match-type of candidate word is according to step S33 process in previous embodiment 1
The type of middle acquisition, typically has nine types.The match-type of nine kinds of candidate word refers to previous embodiment 2, is not repeated.Above-mentioned
C in two kinds of embodiments is the constant that the match-type of nine kinds of candidate word determines, its concrete numerical value people in the art
Member can be worth accordingly according to the application settings that the present invention is concrete.The search frequency of candidate word " P be " in above-mentioned embodiment 2
It it is the search frequency preserved in aforementioned hot word bank.According to above two embodiment, those skilled in the art can also obtain
Go out other embodiment.Such as,
Embodiment 3:CTR=A × R;Wherein CTR is the click discreet value of candidate word;A is the probability of candidate word;R is
The locally associated degree of candidate word.
Embodiment 4:CTR=A × R × P;Wherein CTR is the click discreet value of candidate word;A is the probability of candidate word;R
Locally associated degree for candidate word;P is the search frequency of candidate word.
Embodiment 5:CTR=R × P;Wherein CTR is the click discreet value of candidate word;R is the locally associated of candidate word
Degree;P is the search frequency of candidate word.
It should be noted that previous embodiment 2 finally illustrate can be by the search frequency of the probability of candidate word and candidate word
Secondary product is as the probability of candidate word.Under this embodiment, A is A × P, and therefore embodiment 1 is equal to embodiment party
Formula 2, embodiment 3 is equal to embodiment 4.
In embodiment 5, it is not necessary to the probability of candidate word is as input, under this embodiment, it is not necessary to make with embodiment 2
Based on, it is only necessary to based on embodiment 1.The present embodiment is for the purpose of realizing step S5, and previous embodiment 1 and enforcement
Example 2 is for the purpose of realizing step S3.Therefore, if the input and output of the detailed description of the invention of step S5 be not related to step S3 or with
Step S3 is unrelated, then the present embodiment can not independently constitute complete technical side based on embodiment 1 or embodiment 2
Case realizes the purpose of the present invention.
The simple embodiment of step S53 is: by the descending sort clicking on discreet value, candidate word is obtained candidate word sequence
Queue, then selects front 10 or 20 candidate word as final candidate word list from candidate word sequencing queue.In step S5
The candidate word obtained is from the candidate word information list in abovementioned steps S41.It will be appreciated by those skilled in the art that previous embodiment
In 1 or 2, with reference to step S53, step S39 before step S41, can also be included: by candidate word information by the probability of candidate word or
The descending sort of the search frequency of candidate word obtains candidate word information sorting queue, then selects from candidate word information sorting queue
Select front 20 or 30 candidate word information and perform step S41 as final candidate word information list.The presence or absence of step S39 is also
Do not affect aforesaid embodiment 1 or embodiment 2 or the technical scheme of the present embodiment, nor affect on the rights protection model of the present invention
Enclose.
Claims (7)
1., for an intelligent prompt method for search, including client and server, client and server are by network phase
Even, the method comprises the following steps:
S21: client obtains init string;
S22: client sends init string to server;
S29: server receives init string;
S3: server obtains candidate word information list according to init string search hot word;
Candidate word information list is sent to client by S41: server;
S49: client receives candidate word information list;
S5: client obtains candidate word list according to candidate word information list;
S91: client shows candidate word list;
It is characterized in that, described step S3 includes:
S31: server splits init string according to segmenter and obtains prefix word and suffix word;
S32: server searches acquisition prefix synonym and suffix synonym according to prefix word and suffix word in thesaurus;
S33: server traversal hot word suffix tree search prefix matching and or the hot word of suffix match, it is thus achieved that candidate word information arrange
Table;
S34: server is according to the probability of each candidate word of analytical calculation of user's historical search behavior database;
Wherein, described prefix word is first keyword in the keyword that will be obtained after character string fractionation by segmenter;Institute
The suffix word stated is last keyword in the keyword that will be obtained after character string fractionation by segmenter;Described thesaurus
For server for preserving the database of synonym incidence relation between keyword;Described hot word suffix tree is that server is according to warm
High frequency search hot word in dictionary is set up according to the data structure of generalized suffix tree;Described hot word bank is that server is for protecting
Deposit the database of hot word information;Described hot word information includes hot word, hot word sequence number and the hot word search frequency;Described prefix matching
Prefix for hot word matches with described prefix word or prefix synonym;Described suffix match be hot word suffix with described after
Sew word or suffix synonym coupling;Described user's historical search behavior database is used for preserving historical behavior information;Described step
Rapid S5 includes:
S51: client calculates the locally associated of each candidate word in candidate word information list according to local historical search data storehouse
Degree;
S52: client calculates the click discreet value of each candidate word according to the locally associated degree of candidate word, candidate word information;
S53: client chooses candidate word list according to the click discreet value of candidate word from candidate word information list;
Wherein, described local historical search data storehouse is that client is for preserving local historical search information;Described local history
Search information includes local historical search character string, local historical search time, the local historical search frequency;Described step S51
Including:
S511: by segmenter by the local historical search character string in local historical search data storehouse and candidate word information list
Candidate word split into lists of keywords and calculate the statistics frequency of each keyword;
S512: build keyword space vector according to the statistics frequency of the keyword in lists of keywords;
S513: according to candidate word split keyword keyword in lists of keywords the statistics frequency build candidate word space to
Amount;
S514: calculate the cosine value of keyword space vector and candidate word space vector, it is thus achieved that the locally associated degree of candidate word.
2. the intelligent prompt method for search as claimed in claim 1, it is characterised in that described step S34 includes:
S34a1: server searches in user's historical search behavior database that original character string is identical with init string and point
Hit the historical behavior information that hot word is identical with candidate word, it is thus achieved that the click frequency of candidate word;
S34a2: server does the probability of normalized acquisition candidate word according to candidate word is clicked on the frequency;
Wherein, described historical behavior information includes original character string, clicks on hot word and click on the frequency.
3. the intelligent prompt method for search as claimed in claim 1, it is characterised in that described step S34 includes:
S34b1: according to candidate word in user's historical search behavioral data library lookup historical behavior information;
S34b2: add up different prefix matching modes and the click frequency under different suffix match modes under this historical behavior information;
S34b3: the click frequency under different prefix matching modes and different suffix match mode is carried out natural logrithm computing and obtains
Obtain the logit value under different prefix matching modes and different suffix match modes;
S34b4: according to binary linear regression parametric equation computing formulaMiddle parameter
Value;
S34b5: according to formulaCalculate the probability of candidate word, wherein;
S34b6;The probability of the candidate word of each candidate word of normalized;
Wherein, described historical behavior information includes clicking on hot word, the click frequency of nine kinds of candidate word match-types;Described nine kinds
Candidate word match-type be respectively as follows: non-matching type, prefix matching type, suffix match type, prefix synonym match-type, after
Sew synonym match-type, prefix suffix match type, prefix suffix synonym match-type, prefix matching suffix syntype and
Prefix synonym suffix match type.
4. the intelligent prompt method for search as claimed in claim 1, it is characterised in that described in described step S511
The statistics frequency calculating keyword includes the step of the temporally frequency of weighted calculation.
5. the intelligent prompt method for search as claimed in claim 1, it is characterised in that in described step S52:
CTR = A×R×C;Wherein CTR is the click discreet value of candidate word;A is the probability of candidate word;R is this locality of candidate word
The degree of correlation;C is the constant that the type according to candidate word determines.
6. the intelligent prompt method for search as claimed in claim 1, it is characterised in that in described step S52:
CTR = A×R×C×P;Wherein CTR is the click discreet value of candidate word;A is the probability of candidate word;R is candidate word
Locally associated degree;C is the constant that the type according to candidate word determines;P is the search frequency of candidate word.
7., for an intelligent prompt system for search, including client and server, client and server are by network phase
Even, it is characterised in that:
Described server includes:
Word-dividing mode, is used for splitting init string and obtains prefix word and suffix word;Described prefix word is by word by segmenter
First keyword in the keyword that symbol string obtains after splitting;Described suffix word is to be obtained after character string fractionation by segmenter
To keyword in last keyword;
Synonym expansion module, is used for searching acquisition prefix synonym in thesaurus according to prefix word and suffix word and suffix is same
Justice word;
Suffix tree spider module, for travel through hot word suffix tree search prefix matching and or the hot word of suffix match, it is thus achieved that candidate
Word information list;Described prefix matching is that the prefix of hot word matches with described prefix word or prefix synonym;After described
Sew the suffix that coupling is hot word to mate with described suffix word or suffix synonym;
Hot word bank builds module, is used for preserving the database of hot word information for management and maintenance;
Suffix tree builds module, is used for managing and safeguard hot word suffix tree;Described hot word suffix tree is that server is according to hot word
High frequency search hot word in storehouse is set up according to the data structure of generalized suffix tree;
Historical behavior analyzes module, is used for the general of each candidate word of analytical calculation according to user's historical search behavior database
Rate;
User's historical search behavioral data library module, is used for preserving historical behavior information;
Described client includes:
Locally associated degree computing module, for calculating each candidate in candidate word information list according to local historical search data storehouse
The locally associated degree of word;
Click on discreet value computing module, calculate each candidate word for the locally associated degree according to candidate word, candidate word information
Click on discreet value;
Candidate word chooses module, for choosing candidate word list according to the click discreet value of candidate word from candidate word information list;
Local historical search data library storage module, is used for preserving local historical search information, described local historical search letter
Breath includes local historical search character string, local historical search time, the local historical search frequency;
Described locally associated degree computing module includes:
Keyword distribution statistics module, is used for the local historical search character string in local historical search data storehouse by segmenter
Split into lists of keywords with the candidate word in candidate word information list and calculate the statistics frequency of each keyword;
Keyword space vector builds module, builds keyword for the statistics frequency according to the keyword in lists of keywords empty
Between vector;
Candidate word space vector builds module, for the keyword system of keyword in lists of keywords split according to candidate word
The meter frequency builds candidate word space vector;
Vector cosine computing module, for calculating the cosine value of keyword space vector and candidate word space vector, it is thus achieved that candidate
The locally associated degree of word.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310653732.6A CN103631929B (en) | 2013-12-09 | 2013-12-09 | A kind of method of intelligent prompt, module and system for search |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310653732.6A CN103631929B (en) | 2013-12-09 | 2013-12-09 | A kind of method of intelligent prompt, module and system for search |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103631929A CN103631929A (en) | 2014-03-12 |
CN103631929B true CN103631929B (en) | 2016-08-31 |
Family
ID=50212970
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310653732.6A Active CN103631929B (en) | 2013-12-09 | 2013-12-09 | A kind of method of intelligent prompt, module and system for search |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103631929B (en) |
Families Citing this family (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103914569B (en) * | 2014-04-24 | 2018-09-07 | 百度在线网络技术(北京)有限公司 | Input creation method, the device of reminding method, device and dictionary tree-model |
CN105224554A (en) * | 2014-06-11 | 2016-01-06 | 阿里巴巴集团控股有限公司 | Search word is recommended to carry out method, system, server and the intelligent terminal searched for |
CN104750873A (en) * | 2015-04-22 | 2015-07-01 | 百度在线网络技术(北京)有限公司 | Popular search term push method and device |
CN105488121A (en) * | 2015-11-24 | 2016-04-13 | 魏强 | Accurate retrieval system |
CN106126500B (en) * | 2016-06-22 | 2019-02-22 | 广东亿迅科技有限公司 | A kind of statistical method being associated with hot word |
CN107665217A (en) * | 2016-07-29 | 2018-02-06 | 苏宁云商集团股份有限公司 | A kind of vocabulary processing method and system for searching service |
CN108319603A (en) * | 2017-01-17 | 2018-07-24 | 腾讯科技(深圳)有限公司 | Object recommendation method and apparatus |
CN108227954A (en) * | 2017-12-29 | 2018-06-29 | 北京奇虎科技有限公司 | A kind of method, apparatus and electronic equipment that search input associational word is provided |
CN108319376B (en) * | 2017-12-29 | 2021-11-26 | 北京奇虎科技有限公司 | Input association recommendation method and device for optimizing commercial word promotion |
CN108241740A (en) * | 2017-12-29 | 2018-07-03 | 北京奇虎科技有限公司 | The generation method and device of a kind of search input associational word of timeliness |
CN110286775A (en) * | 2018-03-19 | 2019-09-27 | 北京搜狗科技发展有限公司 | A kind of dictionary management method and device |
CN108536763B (en) * | 2018-03-21 | 2021-02-05 | 创新先进技术有限公司 | Pull-down prompting method and device |
CN108846016B (en) * | 2018-05-05 | 2021-08-20 | 复旦大学 | Chinese word segmentation oriented search algorithm |
CN109739367A (en) * | 2018-12-28 | 2019-05-10 | 北京金山安全软件有限公司 | Candidate word list generation method and device |
CN109933217B (en) * | 2019-03-12 | 2020-05-01 | 北京字节跳动网络技术有限公司 | Method and device for pushing sentences |
CN113032819A (en) * | 2019-12-09 | 2021-06-25 | 阿里巴巴集团控股有限公司 | Method and system for determining search prompt words and information processing method |
CN111488426B (en) * | 2020-04-17 | 2024-02-02 | 支付宝(杭州)信息技术有限公司 | Query intention determining method, device and processing equipment |
CN111782947B (en) * | 2020-06-29 | 2022-04-22 | 北京达佳互联信息技术有限公司 | Search content display method and device, electronic equipment and storage medium |
CN112925900B (en) * | 2021-02-26 | 2023-10-03 | 北京百度网讯科技有限公司 | Search information processing method, device, equipment and storage medium |
CN114817690A (en) * | 2022-06-28 | 2022-07-29 | 江西医之健科技有限公司 | Data searching method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102930022A (en) * | 2012-10-31 | 2013-02-13 | 中国运载火箭技术研究院 | User-oriented information search engine system and method |
CN103258023A (en) * | 2013-05-07 | 2013-08-21 | 百度在线网络技术(北京)有限公司 | Recommendation method and search engine for search candidate words |
CN103425687A (en) * | 2012-05-21 | 2013-12-04 | 阿里巴巴集团控股有限公司 | Retrieval method and system based on queries |
-
2013
- 2013-12-09 CN CN201310653732.6A patent/CN103631929B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103425687A (en) * | 2012-05-21 | 2013-12-04 | 阿里巴巴集团控股有限公司 | Retrieval method and system based on queries |
CN102930022A (en) * | 2012-10-31 | 2013-02-13 | 中国运载火箭技术研究院 | User-oriented information search engine system and method |
CN103258023A (en) * | 2013-05-07 | 2013-08-21 | 百度在线网络技术(北京)有限公司 | Recommendation method and search engine for search candidate words |
Also Published As
Publication number | Publication date |
---|---|
CN103631929A (en) | 2014-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103631929B (en) | A kind of method of intelligent prompt, module and system for search | |
CN105488024B (en) | The abstracting method and device of Web page subject sentence | |
Liu et al. | Context-based collaborative filtering for citation recommendation | |
CN104615767B (en) | Training method, search processing method and the device of searching order model | |
CN104484339B (en) | A kind of related entities recommend method and system | |
CN107239512B (en) | A kind of microblogging comment spam recognition methods of combination comment relational network figure | |
CN106547864B (en) | A kind of Personalized search based on query expansion | |
CN106484764A (en) | User's similarity calculating method based on crowd portrayal technology | |
CN104462327B (en) | Calculating, search processing method and the device of statement similarity | |
Amami et al. | A graph based approach to scientific paper recommendation | |
Kim et al. | A framework for tag-aware recommender systems | |
CN102456057B (en) | Search method based on online trade platform, device and server | |
CN109597995A (en) | A kind of document representation method based on BM25 weighted combination term vector | |
CN104281565A (en) | Semantic dictionary constructing method and device | |
CN107832319B (en) | Heuristic query expansion method based on semantic association network | |
An et al. | A heuristic approach on metadata recommendation for search engine optimization | |
CN103914490B (en) | Webpage operation method and system | |
CN101840438B (en) | Retrieval system oriented to meta keywords of source document | |
Elfida et al. | Enhancing to method for extracting Social network by the relation existence | |
CN106599304B (en) | Modular user retrieval intention modeling method for small and medium-sized websites | |
Zhang et al. | Co-ranking multiple entities in a heterogeneous network: Integrating temporal factor and users’ bookmarks | |
CN104794200B (en) | A kind of event distribution subscription method of the support fuzzy matching based on body | |
CN108932247A (en) | A kind of method and device optimizing text search | |
TWI621952B (en) | Comparison table automatic generation method, device and computer program product of the same | |
Albathan et al. | Enhanced n-gram extraction using relevance feature discovery |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 211100, No. 100, general road, Jiangning Economic Development Zone, Jiangsu, Nanjing Applicant after: JIANGSU WISEDU EDUCATION INFORMATION TECHNOLOGY CO., LTD. Address before: 211100, No. 100, general road, Jiangning Economic Development Zone, Jiangsu, Nanjing Applicant before: Jiangsu Wisedu Information Technology Co., Ltd. |
|
COR | Change of bibliographic data | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |