CN109189990A - A kind of generation method of search term, device and electronic equipment - Google Patents

A kind of generation method of search term, device and electronic equipment Download PDF

Info

Publication number
CN109189990A
CN109189990A CN201810826071.5A CN201810826071A CN109189990A CN 109189990 A CN109189990 A CN 109189990A CN 201810826071 A CN201810826071 A CN 201810826071A CN 109189990 A CN109189990 A CN 109189990A
Authority
CN
China
Prior art keywords
search
search term
participle
training
specified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810826071.5A
Other languages
Chinese (zh)
Other versions
CN109189990B (en
Inventor
叶澄灿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201810826071.5A priority Critical patent/CN109189990B/en
Publication of CN109189990A publication Critical patent/CN109189990A/en
Application granted granted Critical
Publication of CN109189990B publication Critical patent/CN109189990B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a kind of generation method of search term, device and electronic equipments, it is related to the search technique field in field of computer technology, wherein the above method includes: for specified search terms, using a variety of preset models, recommended candidate set of words is generated respectively;Recommended candidate set of words generated is merged, and duplicate removal processing is carried out to the set after merging, obtains recommending search term Candidate Set;From the recommended candidate word recommended in search term Candidate Set, chooses and recommend search term.Using schemes generation search term provided in an embodiment of the present invention, the recommendation search term for being able to solve prior art generation is not comprehensive enough, the single technical problem of category.

Description

A kind of generation method of search term, device and electronic equipment
Technical field
The present invention relates to the search technique fields in field of computer technology, more particularly to a kind of generation side of search term Method, device and electronic equipment.
Background technique
Increase with Online Video quality, the growth of quantity and user to video search engine usage degree, video Search has become user and obtains information, the important way of amusement and recreation.After user completes a search behavior, mentioned to user For good recommendation search term, the search interest of user can be excited, solves the search result obtained based on current search word not Good problem, and then promote the search experience of user.
At present it is known that a kind of search term generation technique, be the recommendation search term generation method based on collaborative filtering model, Scheme is as follows:
Using search log, the click relationship between user and search term is extracted, constructs data set.Any two are searched Rope word qiAnd qj, its collaborative filtering correlation is calculated using following formula:
Collaborative filtering correlation wijMould equal to N (i) ∩ N (j), divided by square of the modular multiplication product of the mould and N (j) of N (i) Root.Wherein, N (i) is to search for q in certain periodiUser set, N (j) is to search for q in the same periodjUser Set, N (i) ∩ N (j) is to simultaneously scan for q in the same periodiAnd qjUser set.For current search word, meter The collaborative filtering correlation for calculating it and respectively search term to be chosen takes the highest some search terms of collaborative filtering correlation, constitutes needle To the recommendation search term Candidate Set of the current search term.
The video search part of current each search engine is recommended on search word problem generating mainly using collaboration Filtering model technology.The technology is generated by collaborative filtering model recommends search term Candidate Set, then to recommendation search term Candidate Set The feature of each dimension of interior all candidate words scores, weighted sum, preferential to choose the high candidate word of total score, searches for as recommendation Word.
Inventor has found that at least there are the following problems for the prior art in the implementation of the present invention:
The recommendation search term obtained using collaborative filtering model, compared to the recommendation search term that user may need to know, no Enough comprehensive, obtained recommendation search term category is single, can not effectively meet the search need of user.
Summary of the invention
The generation method for being designed to provide a kind of search term, device and the electronic equipment of the embodiment of the present invention, to solve The recommendation search term certainly generated is not comprehensive enough, the single technical problem of category.Specific technical solution is as follows:
The embodiment of the present invention provides a kind of generation method of search term, comprising:
For specified search terms, using a variety of preset models, recommended candidate set of words is generated respectively, wherein described a variety of The data training that preset model passes through different dimensions in search log respectively obtains;
Recommended candidate set of words generated is merged, and duplicate removal processing is carried out to the set after merging, obtains recommending to search Rope word Candidate Set;
From the recommended candidate word in the recommendation search term Candidate Set, chooses and recommend search term.
Further, a variety of models, including at least two kinds in such as drag:
Click correlation models;
LDA (Latent Dirichlet Allocation implies the distribution of Di Li Cray) topic model;
Collaborative filtering model.
Further, described to preset a variety of models including clicking correlation models, it is raw to pass through the click correlation models Process at recommended candidate set of words includes:
For the first training result that specified search terms, inquiry are obtained using correlation models are clicked, obtain described specified The click dependency expression vector of search term, the click dependency expression vector are participle vector, for indicating described specified The weight of each participle of search term, wherein first training result is using the click correlation models to search day What the first sample data extracted in will were trained, the first sample data include extracting from described search log Multiple search terms, as training search term and described search log in training search term and search result between point Relationship is hit, first training result includes the participle vector of each trained search term;
The click dependency expression vector of the specified search terms is calculated separately, it is related to the respectively click of search term to be chosen Property expression vector inner product, obtain the specified search terms respectively with each wait choose the click correlation between search term;
Described respectively wait choose in search term, the high search term to be chosen of click correlation is preferentially chosen, constitutes institute State the recommended candidate set of words of specified search terms generated using the click correlation models.
Further, the click relationship between the training search term and search result in described search log is, described to search Training search term in Suo Zhi and the number of clicks between search result;
The click correlation models are trained using the first sample data using following steps, are obtained described First training result:
The trained search term of each of the first sample data is segmented respectively, and is generated for obtained participle Initial participle vector, the initial participle vector are used to indicate the initial weight of each participle of the training search term, and the instruction The initial weight for practicing each participle of search term is equal;
Following steps A and step B are repeated, until meeting default stopping criterion for iteration:
Step A: the number of current iteration expression vector, multiple trained search terms based on multiple trained search terms Amount and the number of clicks calculate separately the current iteration expression vector of multiple search results in the first sample data, In, when first time iteration the current iteration expression vector of the trained search term be the initial participle vector;
Step B: based on multiple described search results current iteration expression vector, multiple described search results quantity and The number of clicks calculates separately the new iteration expression vector of multiple trained search terms;
When meeting the default stopping criterion for iteration, the newest iteration of each trained search term is expressed respectively Vector, as the participle vector of the training search term, the participle vector of the trained search term constitutes first training result.
Further, the current iteration expression vector based on multiple trained search terms, multiple training are searched The quantity of rope word and the number of clicks calculate separately the current iteration expression of multiple search results in the first sample data Vector, comprising:
Using following formula, the current iteration for calculating described search result expresses vector:
Wherein, Dj (n)It is the current iteration expression vector of the n-th wheel iteration of j-th of search result, Qi (n-1)It is i-th of instruction The current iteration for practicing the (n-1)th wheel iteration of search term expresses vector, Ci,jBe i-th of trained search term and j-th search result it Between number of clicks, | Query | be the quantity of multiple trained search terms;
The current iteration expression vector based on multiple described search results, the quantity of multiple described search results and institute Number of clicks is stated, the new iteration expression vector of multiple trained search terms is calculated separately, comprising:
Using following formula, the new iteration for calculating the trained search term expresses vector:
Wherein, Qi (n)It is the new iteration expression vector of the n-th wheel iteration of i-th of trained search term, | Doc | it is multiple institutes State the quantity of search result.
Further, described to preset a variety of models include LDA topic model, is pushed away by click correlation models generation The process for recommending candidate word set includes:
Specified search terms are segmented, the participle of the specified search terms is obtained;
Obtain weight of each participle of the specified search terms in the specified search terms;
Each participle to the obtained specified search terms respectively, the second instruction that inquiry is obtained using LDA topic model Practice as a result, obtaining probability distribution of the participle of the specified search terms on multiple LDA themes, wherein second training As a result to be trained to the second sample data for extracting in search log using the LDA topic model, described the Two sample datas include the participle extracted from the title of the search result of described search log, are segmented as training, and described the Two training results include probability distribution of each training participle on multiple LDA themes;
For each LDA theme, using each participle of the specified search terms in the specified search terms Weight calculates the weighted sum of probability distribution of the participle of the specified search terms on the LDA theme, searches as described specify Weight of the rope word on the LDA theme;
Using weight of the specified search terms on multiple LDA themes, the LDA master of the specified search terms is constituted Vector is inscribed, the LDA as the specified search terms expresses vector;
In the LDA expression vector for calculating separately the specified search terms, with the LDA expression vector of respectively search term to be chosen Product, obtains the specified search terms respectively with each wait choose the LDA correlation between search term;
Described respectively wait choose in search term, the high search term to be chosen of the LDA correlation is preferentially chosen, described in composition The recommended candidate set of words of specified search terms generated using the LDA topic model.
Further, it in the recommended candidate word from the recommendation search term Candidate Set, chooses and recommends search term, packet It includes:
Obtain it is described recommendation search term Candidate Set in recommended candidate word, the correlative character with the specified search terms, As the first correlative character;
To first correlative character, using search term screening model is recommended, in the recommendation search term Candidate Set Recommended candidate word score respectively, obtain screening point, wherein the recommendations search term screening model, be use linear regression or Gradient promotes decision Tree algorithms, is trained to third sample data, and the third sample data includes search log In search term and the search term recommend search term between click relationship, and search log in search term and the search The second correlative character of word recommended between search term, second correlative character, with the first correlative character kind Class is identical;
The high recommended candidate word of the screening point is preferentially chosen as recommendation search term.
Further, first correlative character includes at least one of following correlation:
Click correlation;
LDA correlation;
Collaborative filtering correlation.
The embodiment of the present invention also provides a kind of generating means of search term, comprising:
Gather generation module, for being directed to specified search terms, using a variety of preset models, generates recommended candidate word set respectively It closes, wherein the data training that a variety of preset models pass through different dimensions in search log respectively obtains;
Gather merging module, for merging recommended candidate set of words generated, and the set after merging is gone It handles again, obtains recommending search term Candidate Set;
Word chooses module, for choosing and recommending search from the recommended candidate word in the recommendation search term Candidate Set Word.
Further, a variety of preset models, including at least two kinds in such as drag:
Click correlation models;
LDA topic model;
Collaborative filtering model.
Further, described to preset a variety of models including clicking correlation models;
The set generation module, comprising:
First inquiry submodule, for being directed to specified search terms, the first instruction that inquiry is obtained using correlation models are clicked Practice as a result, obtain the click dependency expression vector of the specified search terms, the clicks dependency expression vector be segment to Amount, the weight of each participle for indicating the specified search terms, wherein first training result is to use the click Correlation models are trained the first sample data extracted in search log, the first sample data include from The multiple search terms extracted in described search log, as the training search term in training search term and described search log Click relationship between search result, first training result include the participle vector of each trained search term;
First inner product computational submodule, for calculating separately the click dependency expression vector of the specified search terms, with The respectively inner product of the click dependency expression vector of search term to be chosen, obtains the specified search terms and searches respectively with each wait choose Click correlation between rope word;
First preferred submodule is high for, respectively wait choose in search term, preferentially choosing the click correlation described Search term to be chosen constitutes the recommended candidate set of words of the specified search terms generated using the click correlation models.
Further, the click relationship between the training search term and search result in described search log is, described to search Training search term in Suo Zhi and the number of clicks between search result;
The set generation module further includes following submodule, for using the first sample data to the click Correlation models are trained, and obtain first training result:
First participle submodule, for being segmented respectively to the trained search term of each of the first sample data, And initial participle vector is generated for obtained participle, the initial participle vector is for indicating each of training search term point The initial weight of word, and the initial weight of each participle of the training search term is equal;
Iteration submodule, for repeating following steps A and step B, until meeting default stopping criterion for iteration:
Step A: the number of current iteration expression vector, multiple trained search terms based on multiple trained search terms Amount and the number of clicks calculate separately the current iteration expression vector of multiple search results in the first sample data, In, when first time iteration the current iteration expression vector of the trained search term be the initial participle vector;
Step B: based on multiple described search results current iteration expression vector, multiple described search results quantity and The number of clicks calculates separately the new iteration expression vector of multiple trained search terms;
When meeting the default stopping criterion for iteration, the newest iteration of each trained search term is expressed respectively Vector, as the participle vector of the training search term, the participle vector of the trained search term constitutes first training result.
Further, the iteration submodule, comprising:
Search result iteration unit, for using following formula, the current iteration for calculating described search result expresses vector:
Wherein, Dj (n)It is the current iteration expression vector of the n-th wheel iteration of j-th of search result, Qi (n-1)It is i-th of instruction The current iteration for practicing the (n-1)th wheel iteration of search term expresses vector, Ci,jBe i-th of trained search term and j-th search result it Between number of clicks, | Query | be the quantity of multiple trained search terms;
Training search term iteration unit calculates the new iteration expression of the trained search term for using following formula Vector:
Wherein, Qi (n)It is the new iteration expression vector of the n-th wheel iteration of i-th of trained search term, | Doc | it is multiple institutes State the quantity of search result.
It is further, described that preset a variety of models include LDA topic model;
The set generation module, comprising:
Second participle submodule obtains the participle of the specified search terms for segmenting to specified search terms;
Weight Acquisition submodule, for obtaining power of each participle of the specified search terms in the specified search terms Weight;
Second inquiry submodule, for each participle respectively to the obtained specified search terms, inquiry uses LDA master The second training result that topic model obtains, obtains probability distribution of the participle of the specified search terms on multiple LDA themes, Wherein, second training result is to be carried out using the LDA topic model to the second sample data extracted in search log What training obtained, second sample data includes the participle extracted from the title of the search result of described search log, is made For training participle, second training result includes probability distribution of each training participle on multiple LDA themes;
Existed for being directed to each LDA theme using each participle of the specified search terms with value computational submodule Weight in the specified search terms calculates the weighting of probability distribution of the participle of the specified search terms on the LDA theme And value, as weight of the specified search terms on the LDA theme;
Vector generates submodule, for the weight using the specified search terms on multiple LDA themes, constitutes institute The LDA theme vector for stating specified search terms, the LDA as the specified search terms express vector;
Second inner product computational submodule, the LDA for calculating separately the specified search terms expresses vector, and respectively wait choose The inner product of the LDA expression vector of search term, obtains the specified search terms respectively with each wait choose the LDA phase between search term Guan Xing;
Second preferred submodule, for, respectively wait choose in search term, preferentially chosen described the LDA correlation it is high to Search term is chosen, the recommended candidate set of words of the specified search terms generated using the LDA topic model is constituted.
Further, the word chooses module, comprising:
Feature acquisition submodule is specified for obtaining the recommended candidate word in the recommendation search term Candidate Set with described The correlative character of search term, as the first correlative character;
Score submodule, is used for first correlative character, using search term screening model is recommended, to the recommendation Recommended candidate word in search term Candidate Set scores respectively, obtains screening point, wherein the recommendation search term screening model is Decision Tree algorithms are promoted using linear regression or gradient, third sample data are trained, the third sample number According to the click relationship recommended between search term including search term and the search term in search log, and in search log The second correlative character of search term and the search term recommended between search term, second correlative character, with described the One correlative character type is identical;
The preferred submodule of third, for preferentially choosing the high recommended candidate word of the screening point as recommendation search Word.
Further, the feature acquisition submodule, first correlative character specifically obtained include at least as follows One of correlation:
Click correlation;
LDA correlation;
Collaborative filtering correlation.
The embodiment of the present invention also provides a kind of electronic equipment, including processor, communication interface, memory and communication bus, Wherein, processor, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any of the above-described search term generation side The step of method.
At the another aspect that the present invention is implemented, the embodiment of the invention also provides a kind of computer readable storage medium, institutes It states and is stored with instruction in computer readable storage medium, when run on a computer, so that computer execution is any of the above-described The step of described search term generation method.
At the another aspect that the present invention is implemented, the embodiment of the invention also provides a kind of, and the computer program comprising instruction is produced Product, when run on a computer, so that computer executes any of the above-described search term generation method.
The generation method and device of search term provided in an embodiment of the present invention, by using different dimensional degree in search log According to a variety of models that training obtains, obtains recommending search term Candidate Set, expanded the generating mode for recommending search term, be able to solve The recommendation search term that the prior art generates is not comprehensive enough, the single technical problem of category.Certainly, it implements any of the products of the present invention Or method does not necessarily require achieving all the advantages described above at the same time.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described.
Fig. 1 is a kind of flow chart of search term generation method provided in an embodiment of the present invention;
Fig. 2 is another flow chart of search term generation method provided in an embodiment of the present invention;
Fig. 3 is provided in an embodiment of the present invention using the method flow for clicking correlation models generation recommended candidate set of words Figure;
Fig. 4 is the method flow diagram that correlation models are clicked in training provided in an embodiment of the present invention;
Fig. 5 is the method flow diagram that recommended candidate word is generated using LDA topic model;
Fig. 6 is the structural schematic diagram of search term generating means provided in an embodiment of the present invention;
Fig. 7 is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention is described.
The embodiment of the invention provides a kind of generation method of search term and devices, below first to relating in the embodiment of the present invention And concept be illustrated.
Recommending search term is user after inputting complete or part searches word, and search engine recommends the search term of user, It is intended to provide the search term for more meeting user's search need, or the search interest of excitation user.
Correlation models are clicked according to click data, generate the click correlation of search term.For with click data Search term, this model, which can be provided, clicks other high search terms of correlation with it.
LDA topic model, i.e., implicit Di Li Cray are distributed topic model, it can be by the theme of every document in document sets It is provided according to the form of probability distribution.The training set of mark is not needed in training LDA topic model, it is only necessary to document sets and refer to Determine the quantity of theme.
Collaborative filtering model is searched using the hobby of the group for identical content of having similar tastes and interests, search for come recommended user is interested Rope word.
Below by specific embodiment, search term generation method provided in an embodiment of the present invention is described in detail.
Referring to Fig. 1, Fig. 1 is a kind of flow chart of search term generation method provided in an embodiment of the present invention, including walks as follows It is rapid:
Step 101 is directed to specified search terms, using a variety of preset models, generates recommended candidate set of words respectively.
The data training that a variety of preset models pass through different dimensions in search log respectively obtains.Wherein, it searches in log The data of different dimensions may include: to search for click relationship between each search result and each search term in log, search The title content of each search result in log, the click relationship in search log between each user and each search term.It searches The data of different dimensions in Suo Zhi, have reacted search history from different aspect.Between each search result and each search term Click relationship, reacted the correlation degree between search term and search result, and between each user and each search term Click relationship has reacted the searching preferences of user.
In the embodiment of the present invention, specified search terms can be user's input, be also possible to import from other programs 's.The a variety of models for generating recommended candidate set of words may include generating and specifying search for the high search set of words of word correlation Model, also may include judge the interested field of searchers, and using in the field heat search the set of word as recommended candidate The model of set of words.
Recommended candidate set of words generated is merged, and carries out duplicate removal processing to the set after merging by step 102, is obtained To recommendation search term Candidate Set.
Pass through multiple recommendation search terms of the same specified search terms of the generation of different models, some possible phase Together, for identical recommendation search term, it can only retain one, remove duplicate recommendation search term.Recommend search term Candidate Set It is made of multiple recommended candidate words.
Step 103, from recommend search term Candidate Set in recommended candidate word in, choose recommend search term.
In the embodiment of the present invention, the search temperature for recommending each recommended candidate word in search term Candidate Set can be compared, It is preferred that choosing the high recommended candidate word of search temperature as recommendation search term.It can also be to all times in recommendation search term Candidate Set The feature of each dimension of word is selected to score, weighted sum is preferential to choose the high candidate word of total score, as recommendation search term.
The generation method of search term provided in an embodiment of the present invention, by using different dimensions data training in search log Obtained a variety of models obtain recommending search term Candidate Set, have expanded the generating mode for recommending search term, be able to solve existing skill The recommendation search term that art generates is not comprehensive enough, the single technical problem of category.
In above-mentioned method shown in FIG. 1, recommended candidate word and the recommendation search term selected from recommended candidate word, all It is different from specified search terms.
With reference to the accompanying drawing, above-mentioned search term generation method provided in an embodiment of the present invention is described in detail.
Fig. 2 show another flow chart of search term generation method provided in an embodiment of the present invention, can specifically include Following steps:
Step 201 obtains specified search terms.
Step 202 is directed to specified search terms, using clicking correlation models, LDA topic model and collaborative filtering model, Generate recommended candidate set of words.
In the embodiment of the present invention, the specified search terms that can only include to the first training result, using click correlation mould Type generates recommended candidate word, wherein the first training result is using clicking correlation models, to the extracted in search log What one sample data was trained.
LDA topic model can provide weight of the specified search terms on the LDA theme that multiple preparatory training obtain, according to Weight of the specified search terms on each LDA theme can provide and other search terms similar in the search term theme.
Using collaborative filtering model generate recommended candidate set of words scheme it is as follows: using search log, extract user and Click relationship between search term constructs data set.For any two search term qiAnd qj, it is calculated using following formula Collaborative filtering correlation:
Collaborative filtering correlation wijMould equal to N (i) ∩ N (j), divided by square of the modular multiplication product of the mould and N (j) of N (i) Root.Wherein, N (i) is to search for q in certain periodiUser set, N (j) is to search for q in the same periodjUser Set, N (i) ∩ N (j) is to simultaneously scan for q in the same periodiAnd qjUser set.The period can be one It, is also possible to a week.For current search word, the collaborative filtering correlation of it and respectively search term to be chosen is calculated, it is excellent The first selecting coordinated high search term to be chosen of filtering correlation constitutes recommended candidate set of words.
Recommended candidate set of words generated is merged, and carries out duplicate removal processing to the set after merging by step 203, is obtained To recommendation search term Candidate Set.
Recommended candidate word in step 204, acquisition recommendation search term Candidate Set, the correlative character with specified search terms, As the first correlative character.
First correlative character can be and click correlation, LDA correlation or collaborative filtering correlation.
In the embodiment of the present invention, recommended candidate word calculated click correlation in its generating process can be directly extracted Or LDA correlation or collaborative filtering correlation, the correlative character as the recommended candidate word that this step obtains.
Step 205, to the first correlative character, using recommending search term screening model, to recommending in search term Candidate Set Recommended candidate word score respectively, obtain screening point.
Recommend search term screening model, is that decision Tree algorithms are promoted using linear regression or gradient, to third sample data It is trained, third sample data includes searching between the search term in log and the recommendation search term of the search term The second correlative character of search term and the search term in click relationship, and search log recommended between search term, the Two correlative characters are identical as the first correlative character type in step 206.
In the embodiment of the present invention, the click recommended between search term for searching for the search term in log and the search term is closed System, can be number of clicks, is also possible to clicking rate.
The high recommended candidate word of step 206, preferential selection screening point, which is used as, recommends search term.
It is preferential to choose the high recommended candidate word of screening point, it may include sequence from high to low according to screening point, select pre- The recommended candidate word of the first quantity first set also may include selecting all recommendations that screening point is more than default screening point threshold value Candidate word.
The embodiment of the present invention screens mould using the recommendation search term screening model obtained by training, using recommendation search term Type, to recommend search term Candidate Set in recommended candidate word score respectively, compared to use the parameter manually set to recommended candidate The prior art of word scoring, can pick out more and can solve user's search need, and the recommendation that can more excite user to search for interest is searched Rope word.
Fig. 3 show the method provided in an embodiment of the present invention for generating recommended candidate set of words using correlation models are clicked Flow chart can specifically include following steps:
Step 301 obtains specified search terms.
Step 302, to specified search terms, inquire the first training result, obtain the click dependency expression of specified search terms Vector.
The content of first training result is the participle vector of multiple search terms.First training result is using click correlation Model is trained the first sample data extracted in search log.First sample data include from search log The multiple search terms extracted, as the point between the training search term and search result in training search term, and search log Hit relationship.First training result includes the click dependency expression vector of each trained search term obtained by training.
In the embodiment of the present invention, the click relationship between training search term and search result in log is searched for, can be Number of clicks.
Step 303 calculates specified search terms with each wait choose the click correlation between search term.
Search term to be chosen can be whole search terms in the first training result, be also possible to the first training result with Multiple search terms in the relevant field of specified search terms.
Specified search terms, wait choose the click correlation between search term, are the click correlation of specified search terms with each Express the inner product of the click dependency expression vector of vector and each search term to be chosen.
The inner product of two participle vectors, is the product summation of the weight of identical participle, divided by two in two participle vectors Segment the product of vector field homoemorphism.Click dependency expression vector be a kind of participle vector, so two click dependency expressions to The inner product of amount is identical as two participle inner products of vector, formula are as follows:
Wherein,WithFor two click dependency expression vectors, s isWithInner product, i be participle serial number, n be not The total quantity of same participle, AiForWeight on the participle of serial number i, BiForWeight on the participle of serial number i.
Step 304, preferential choose click the high search term to be chosen of correlation, constitute recommended candidate set of words.
Preferential choose clicks the high search term to be chosen of correlation, may include according to clicking correlation from high to low suitable Sequence selects the search term to be chosen of preset second quantity, also may include selecting to click correlation more than default click All search terms to be chosen of relevance threshold.
The embodiment of the present invention clicks the training result of correlation models by inquiry, generates the recommended candidate of specified search terms Set of words.A kind of generating mode of the embodiment of the present invention as recommended candidate set of words can match with other generating modes It closes, the recommendation search term for solving prior art generation together is not comprehensive enough, the single technical problem of category.
Fig. 4 show the method flow diagram that correlation models are clicked in training provided in an embodiment of the present invention, specifically can wrap Include following steps:
Step 401, from search log, extract multiple search terms as training search terms, extract multiple search results, mention Take the number of clicks between trained search term and search result.
Step 402 respectively segments each trained search term, and generate for obtained participle initially segment to Amount.
Using weight of the training search term on each participle of training search term, the initial of composing training search term is divided Term vector, the initial each weight segmented in vector are equal.
In the embodiment of the present invention, having quantity is the training search term of m participle, and initially segmenting vector can be set as always M element is shared, coordinate representation isUnit vector.
Step 403, the current iteration based on multiple trained search terms express vector, calculate separately working as multiple search results Preceding iteration expresses vector.
The quantity and number of clicks of current iteration expression vector, multiple trained search terms based on multiple trained search terms, Calculate separately the current iteration expression vector of multiple search results in first sample data.
Using following formula, the current iteration expression vector of search result is calculated:
Wherein, Dj (n)It is the current iteration expression vector of the n-th wheel iteration of j-th of search result, Qi (n-1)It is i-th of instruction The current iteration for practicing the (n-1)th wheel iteration of search term expresses vector, Ci,jBe i-th of trained search term and j-th search result it Between number of clicks, | Query | be the quantity of multiple trained search terms.
Step 404, the current iteration based on multiple search results express vector, calculate separately the new of multiple trained search terms Iteration express vector.
The quantity and number of clicks of current iteration expression vector, multiple search results based on multiple search results, respectively Calculate the new iteration expression vector of multiple trained search terms.
Using following formula, the new iteration for calculating training search term expresses vector:
Wherein, Qi (n)It is the new iteration expression vector of the n-th wheel iteration of i-th of trained search term, | Doc | it is multiple search The quantity of hitch fruit.
Step 405 judges whether to meet stopping criterion for iteration, if it is satisfied, 406 are entered step, if conditions are not met, into Step 403.
In the embodiment of the present invention, judge whether to meet stopping criterion for iteration, can be to judge whether the number of iteration reaches One preset first threshold, or, judge whether the current iteration expression vector of all trained search terms, subtracts The mould of the difference of the iteration expression vector of last round of iteration, is less than preset second threshold.
Step 406 obtains the first training result.
First training result includes the click dependency expression vector of each trained search term obtained by training.
The embodiment of the present invention obtains the click dependency expression vector of search term by the method training click data of iteration As training result, the inner product between the click dependency expression vector can adequately react two search terms expressed by it Click correlation.
Fig. 5 show the method flow diagram provided in an embodiment of the present invention that recommended candidate word is generated using LDA topic model, It can specifically include following steps:
Step 501 obtains specified search terms.
The weight of step 502, the participle for obtaining specified search terms and each participle in specified search terms.
In the embodiment of the present invention, can by the prior art provide segmenting method, obtain specified search terms participle and Weight of each participle in specified search terms.Specified search terms are inputted into segmenter, obtain segmenter output, it is specified to search The weight of the participle of rope word and each participle in specified search terms.
Step 503 inquires the second training result to each participle of specified search terms respectively, obtains each participle multiple Probability distribution on LDA theme.
The content of second training result is probability distribution of multiple participles on multiple LDA themes.Second training result is The second sample data extracted in search log is trained using LDA topic model.Second sample data includes The participle extracted from the title of the search result of search log, segments as training.Second training result includes each training Segment the probability distribution on multiple LDA themes.
Step 504 calculates weight of the specified search terms on multiple LDA themes.
It is calculated specified for each LDA theme using weight of each participle of specified search terms in specified search terms The weighted sum of probability distribution of the participle of search term on the LDA theme, as power of the specified search terms on the LDA theme Weight.Calculate the formula of weight of the specified search terms on a LDA theme are as follows:
Wherein, j is the serial number of LDA theme, and t is the participle of specified search terms, Pj(z | q) it is specified search terms in serial number Weight on the LDA theme of j, and P (z | t) it is weight of the participle t in specified search terms, Pj(t | q) it is participle t in serial number j LDA theme on weight.
Step 505, the LDA for generating specified search terms express vector.
Using weight of the specified search terms on multiple LDA themes, the LDA theme vector of specified search terms is constituted, as The LDA of specified search terms expresses vector.
Step 506 calculates specified search terms with each wait choose the LDA correlation between search term.
Search term to be chosen can be whole search terms in search log, is also possible to search and aims at day and specify search for Multiple search terms in the relevant field of word.
Specified search terms, wait choose the LDA correlation between search term, are that the LDA of specified search terms expresses vector with each With the inner product of the LDA expression vector of each search term to be chosen.It is a kind of participle vector, the calculating of the inner product that LDA, which expresses vector, Method is identical as the method provided in the step 303 that flow chart shown in Fig. 3 includes.
Step 507, the preferential selection high search term to be chosen of LDA correlation, constitute recommended candidate set of words.
It is preferential to choose the high search term to be chosen of LDA correlation, it may include according to LDA correlation from high to low suitable Sequence selects the search term to be chosen of preset third quantity, also may include selecting LDA correlation more than default LDA phase All search terms to be chosen of closing property threshold value.
The embodiment of the present invention passes through the training result of LDA topic model, generates the recommended candidate set of words of specified search terms. A kind of generating mode of the embodiment of the present invention as recommended candidate set of words, can match, together with other generating modes It is not comprehensive enough to solve the recommendation search term that the prior art generates, the single technical problem of category.
Based on the same inventive concept, the generation method of the search term provided according to that above embodiment of the present invention, correspondingly, this Inventive embodiments also provide a kind of generating means of search term, and structural schematic diagram is as shown in fig. 6, specifically include:
Gather generation module 601, for generating recommended candidate respectively using a variety of preset models for specified search terms Set of words, wherein the data training that a variety of preset models pass through different dimensions in search log respectively obtains;
Gather merging module 602, for merging recommended candidate set of words generated, and the set after merging is carried out Duplicate removal processing obtains recommending search term Candidate Set;
Word chooses module 603, for choosing and recommending to search from the recommended candidate word in the recommendation search term Candidate Set Rope word.
The generating means of search term provided in an embodiment of the present invention, by using different dimensions data training in search log Obtained a variety of models obtain recommending search term Candidate Set, have expanded the generating mode for recommending search term, be able to solve existing skill The recommendation search term that art generates is not comprehensive enough, the single technical problem of category.
Further, a variety of preset models, including at least two kinds in such as drag:
Click correlation models;
LDA topic model;
Collaborative filtering model.
Further, described to preset a variety of models including clicking correlation models;
The set generation module 601, comprising:
First inquiry submodule, for being directed to specified search terms, the first instruction that inquiry is obtained using correlation models are clicked Practice as a result, obtain the click dependency expression vector of the specified search terms, the clicks dependency expression vector be segment to Amount, the weight of each participle for indicating the specified search terms, wherein first training result is to use the click Correlation models are trained the first sample data extracted in search log, the first sample data include from The multiple search terms extracted in described search log, as the training search term in training search term and described search log Click relationship between search result, first training result include the participle vector of each trained search term;
First inner product computational submodule, for calculating separately the click dependency expression vector of the specified search terms, with The respectively inner product of the click dependency expression vector of search term to be chosen, obtains the specified search terms and searches respectively with each wait choose Click correlation between rope word;
First preferred submodule is high for, respectively wait choose in search term, preferentially choosing the click correlation described Search term to be chosen constitutes the recommended candidate set of words of the specified search terms generated using the click correlation models.
Further, the click relationship between the training search term and search result in described search log is, described to search Training search term in Suo Zhi and the number of clicks between search result;
The set generation module 601, further includes following submodule, for using the first sample data to the point It hits correlation models to be trained, obtains first training result:
First participle submodule, for being segmented respectively to the trained search term of each of the first sample data, And initial participle vector is generated for obtained participle, the initial participle vector is for indicating each of training search term point The initial weight of word, and the initial weight of each participle of the training search term is equal;
Iteration submodule, for repeating following steps A and step B, until meeting default stopping criterion for iteration:
Step A: the number of current iteration expression vector, multiple trained search terms based on multiple trained search terms Amount and the number of clicks calculate separately the current iteration expression vector of multiple search results in the first sample data, In, when first time iteration the current iteration expression vector of the trained search term be the initial participle vector;
Step B: based on multiple described search results current iteration expression vector, multiple described search results quantity and The number of clicks calculates separately the new iteration expression vector of multiple trained search terms;
When meeting the default stopping criterion for iteration, the newest iteration of each trained search term is expressed respectively Vector, as the participle vector of the training search term, the participle vector of the trained search term constitutes first training result.
Further, the iteration submodule, comprising:
Search result iteration unit, for using following formula, the current iteration for calculating described search result expresses vector:
Wherein, Dj (n)It is the current iteration expression vector of the n-th wheel iteration of j-th of search result, Qi (n-1)It is i-th of instruction The current iteration for practicing the (n-1)th wheel iteration of search term expresses vector, Ci,jBe i-th of trained search term and j-th search result it Between number of clicks, | Query | be the quantity of multiple trained search terms;
Training search term iteration unit calculates the new iteration expression of the trained search term for using following formula Vector:
Wherein, Qi (n)It is the new iteration expression vector of the n-th wheel iteration of i-th of trained search term, | Doc | it is multiple institutes State the quantity of search result.
It is further, described that preset a variety of models include LDA topic model;
The set generation module 601, comprising:
Second participle submodule obtains the participle of the specified search terms for segmenting to specified search terms;
Weight Acquisition submodule, for obtaining power of each participle of the specified search terms in the specified search terms Weight;
Second inquiry submodule, for each participle respectively to the obtained specified search terms, inquiry uses LDA master The second training result that topic model obtains, obtains probability distribution of the participle of the specified search terms on multiple LDA themes, Wherein, second training result is to be carried out using the LDA topic model to the second sample data extracted in search log What training obtained, second sample data includes the participle extracted from the title of the search result of described search log, is made For training participle, second training result includes probability distribution of each training participle on multiple LDA themes;
Existed for being directed to each LDA theme using each participle of the specified search terms with value computational submodule Weight in the specified search terms calculates the weighting of probability distribution of the participle of the specified search terms on the LDA theme And value, as weight of the specified search terms on the LDA theme;
Vector generates submodule, for the weight using the specified search terms on multiple LDA themes, constitutes institute The LDA theme vector for stating specified search terms, the LDA as the specified search terms express vector;
Second inner product computational submodule, the LDA for calculating separately the specified search terms expresses vector, and respectively wait choose The inner product of the LDA expression vector of search term, obtains the specified search terms respectively with each wait choose the LDA phase between search term Guan Xing;
Second preferred submodule, for, respectively wait choose in search term, preferentially chosen described the LDA correlation it is high to Search term is chosen, the recommended candidate set of words of the specified search terms generated using the LDA topic model is constituted.
Further, the word chooses module 603, comprising:
Feature acquisition submodule is specified for obtaining the recommended candidate word in the recommendation search term Candidate Set with described The correlative character of search term, as the first correlative character;
Score submodule, is used for first correlative character, using search term screening model is recommended, to the recommendation Recommended candidate word in search term Candidate Set scores respectively, obtains screening point, wherein the recommendation search term screening model is Decision Tree algorithms are promoted using linear regression or gradient, third sample data are trained, the third sample number According to the click relationship recommended between search term including search term and the search term in search log, and in search log The second correlative character of search term and the search term recommended between search term, second correlative character, with described the One correlative character type is identical;
The preferred submodule of third, for preferentially choosing the high recommended candidate word of the screening point as recommendation search Word.
Further, the feature acquisition submodule, first correlative character specifically obtained include at least as follows One of correlation:
Click correlation;
LDA correlation;
Collaborative filtering correlation.
Based on the same inventive concept, the search term generation method provided according to that above embodiment of the present invention, correspondingly, this hair Bright embodiment additionally provides a kind of electronic equipment, as shown in fig. 7, comprises processor 701, communication interface 702,703 and of memory Communication bus 704, wherein processor 701, communication interface 702, memory 703 complete mutual lead to by communication bus 704 Letter,
Memory 703, for storing computer program;
Processor 701 when for executing the program stored on memory 703, realizes any search in above-described embodiment The step of word generation method.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components.
The electronic equipment that search term provided in an embodiment of the present invention generates, by using different dimensions data in search log A variety of models that training obtains obtain recommending search term Candidate Set, have expanded the generating mode for recommending search term, are able to solve existing The recommendation search term for having technology to generate is not comprehensive enough, the single technical problem of category.
In another embodiment provided by the invention, a kind of computer readable storage medium is additionally provided, which can It reads to be stored with instruction in storage medium, when run on a computer, so that computer executes any in above-described embodiment search The step of rope word generation method.
In another embodiment provided by the invention, a kind of computer program product comprising instruction is additionally provided, when it When running on computers, so that computer executes any search term generation method in above-described embodiment.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or It partly generates according to process or function described in the embodiment of the present invention.The computer can be general purpose computer, dedicated meter Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc..
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device, For electronic equipment, computer readable storage medium and computer program product embodiments, since it is substantially similar to method reality Example is applied, so being described relatively simple, the relevent part can refer to the partial explaination of embodiments of method.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (17)

1. a kind of generation method of search term characterized by comprising
For specified search terms, using a variety of preset models, recommended candidate set of words is generated respectively, wherein described a variety of default The data training that model passes through different dimensions in search log respectively obtains;
Recommended candidate set of words generated is merged, and duplicate removal processing is carried out to the set after merging, obtains recommending search term Candidate Set;
From the recommended candidate word in the recommendation search term Candidate Set, chooses and recommend search term.
2. the method according to claim 1, wherein a variety of preset models, include at least as in drag Two kinds:
Click correlation models;
LDA topic model;
Collaborative filtering model.
3. the method according to claim 1, wherein it is described preset a variety of models include click correlation models, Include: by the process that the click correlation models generate recommended candidate set of words
For the first training result that specified search terms, inquiry are obtained using correlation models are clicked, described specify search for is obtained The click dependency expression vector of word, the click dependency expression vector are participle vector, are specified search for for indicating described The weight of each participle of word, wherein first training result is using the click correlation models in search log What the first sample data of extraction were trained, the first sample data include extracted from described search log it is more A search term is closed as the click between the training search term and search result in training search term and described search log System, first training result includes the participle vector of each trained search term;
The click dependency expression vector for calculating separately the specified search terms, the click correlation table with respectively search term to be chosen Up to the inner product of vector, the specified search terms are obtained respectively with each wait choose the click correlation between search term;
Described respectively wait choose in search term, the high search term to be chosen of click correlation is preferentially chosen, the finger is constituted Determine the recommended candidate set of words of search term generated using the click correlation models.
4. according to the method described in claim 3, it is characterized in that, training search term and search result in described search log Between click relationship be the training search term in described search log and the number of clicks between search result;
The click correlation models are trained using the first sample data using following steps, obtain described first Training result:
The trained search term of each of the first sample data is segmented respectively, and is generated initially for obtained participle Vector is segmented, the initial participle vector is used to indicate the initial weight of each participle of the training search term, and the training is searched The initial weight of each participle of rope word is equal;
Following steps A and step B are repeated, until meeting default stopping criterion for iteration:
Step A: based on multiple trained search terms current iteration expression vector, multiple trained search terms quantity and The number of clicks calculates separately the current iteration expression vector of multiple search results in the first sample data, wherein the The current iteration expression vector of the trained search term is the initial participle vector when an iteration;
Step B: current iteration expression vector, the quantity of multiple described search results based on multiple described search results and described Number of clicks calculates separately the new iteration expression vector of multiple trained search terms;
When meeting the default stopping criterion for iteration, respectively by the newest iteration of each trained search term express to Amount, as the participle vector of the training search term, the participle vector of the trained search term constitutes first training result.
5. according to the method described in claim 4, it is characterized in that, the current iteration based on multiple trained search terms The quantity and the number of clicks for expressing vector, multiple trained search terms, calculate separately more in the first sample data The current iteration of a search result expresses vector, comprising:
Using following formula, the current iteration for calculating described search result expresses vector:
Wherein, Dj (n)It is the current iteration expression vector of the n-th wheel iteration of j-th of search result, Qi (n-1)It is i-th of training search The current iteration of (n-1)th wheel iteration of word expresses vector, Ci,jIt is the point between i-th of trained search term and j-th of search result Number is hit, | Query | it is the quantity of multiple trained search terms;
The current iteration expression vector based on multiple described search results, the quantity of multiple described search results and the point Number is hit, the new iteration expression vector of multiple trained search terms is calculated separately, comprising:
Using following formula, the new iteration for calculating the trained search term expresses vector:
Wherein, Qi (n)It is the new iteration expression vector of the n-th wheel iteration of i-th of trained search term, | Doc | it is multiple described search The quantity of hitch fruit.
6. the method according to claim 1, wherein described, to preset a variety of models include LDA topic model, is passed through It is described click correlation models generate recommended candidate set of words process include:
Specified search terms are segmented, the participle of the specified search terms is obtained;
Obtain weight of each participle of the specified search terms in the specified search terms;
Each participle to the obtained specified search terms respectively, the second training knot that inquiry is obtained using LDA topic model Fruit obtains probability distribution of the participle of the specified search terms on multiple LDA themes, wherein second training result For what is be trained using the LDA topic model to the second sample data extracted in search log, second sample Notebook data includes the participle extracted from the title of the search result of described search log, is segmented as training, second instruction Practicing result includes probability distribution of each training participle on multiple LDA themes;
For each LDA theme, using weight of each participle of the specified search terms in the specified search terms, The weighted sum for calculating probability distribution of the participle of the specified search terms on the LDA theme, as the specified search terms Weight on the LDA theme;
Using weight of the specified search terms on multiple LDA themes, constitute the LDA themes of the specified search terms to Amount, the LDA as the specified search terms express vector;
The LDA expression vector for calculating separately the specified search terms, the inner product with the LDA expression vector of respectively search term to be chosen, The specified search terms are obtained respectively with each wait choose the LDA correlation between search term;
Described respectively wait choose in search term, the high search term to be chosen of the LDA correlation is preferentially chosen, is constituted described specified The recommended candidate set of words of search term generated using the LDA topic model.
7. the method according to claim 1, wherein the recommendation from the recommendation search term Candidate Set is waited It selects in word, chooses and recommend search term, comprising:
Obtain it is described recommendation search term Candidate Set in recommended candidate word, the correlative character with the specified search terms, as First correlative character;
To first correlative character, using search term screening model is recommended, to pushing away in the recommendation search term Candidate Set It recommends candidate word to score respectively, obtains screening point, wherein the recommendation search term screening model is using linear regression or gradient Decision Tree algorithms are promoted, third sample data is trained, the third sample data includes in search log Search term and the search term in the click relationship of search term and the search term recommended between search term, and search log Recommend the second correlative character between search term, second correlative character, with the first correlative character type phase Together;
The high recommended candidate word of the screening point is preferentially chosen as recommendation search term.
8. the method according to the description of claim 7 is characterized in that first correlative character, includes at least following related One of property:
Click correlation;
LDA correlation;
Collaborative filtering correlation.
9. a kind of generating means of search term characterized by comprising
Gather generation module, for being directed to specified search terms, using a variety of preset models, generate recommended candidate set of words respectively, Wherein, the data training that a variety of preset models pass through different dimensions in search log respectively obtains;
Gather merging module, for merging recommended candidate set of words generated, and the set after merging is carried out at duplicate removal Reason obtains recommending search term Candidate Set;
Word chooses module, for choosing and recommending search term from the recommended candidate word in the recommendation search term Candidate Set.
10. device according to claim 9, which is characterized in that a variety of preset models are included at least as in drag Two kinds:
Click correlation models;
LDA topic model;
Collaborative filtering model.
11. device according to claim 9, which is characterized in that described to preset a variety of models including clicking correlation models;
The set generation module, comprising:
First inquiry submodule, for being directed to specified search terms, inquiry is tied using the first training that correlation models obtain is clicked Fruit obtains the click dependency expression vector of the specified search terms, and the click dependency expression vector is participle vector, uses In the weight for each participle for indicating the specified search terms, wherein first training result is related using the click Property model the first sample data extracted in search log are trained, the first sample data include from described Multiple search terms for extracting in search log, as in training search term and described search log training search term with search Click relationship between hitch fruit, first training result include the participle vector of each trained search term;
First inner product computational submodule, for calculating separately the click dependency expression vector of the specified search terms, with respectively to Choose search term click dependency expression vector inner product, obtain the specified search terms respectively with each search term to be chosen Between click correlation;
First preferred submodule, for, respectively wait choose in search term, preferentially choosing high to be selected of the click correlation described Search term is taken, the recommended candidate set of words of the specified search terms generated using the click correlation models is constituted.
12. device according to claim 11, which is characterized in that training search term and search in described search log are tied Click relationship between fruit is the training search term in described search log and the number of clicks between search result;
The set generation module further includes following submodule, for related to the click using the first sample data Property model is trained, and obtains first training result:
First participle submodule, for being segmented respectively to the trained search term of each of the first sample data, and needle Initial participle vector is generated to obtained participle, the initial participle vector is used to indicate each participle of the training search term Initial weight, and the initial weight of each participle of the training search term is equal;
Iteration submodule, for repeating following steps A and step B, until meeting default stopping criterion for iteration:
Step A: based on multiple trained search terms current iteration expression vector, multiple trained search terms quantity and The number of clicks calculates separately the current iteration expression vector of multiple search results in the first sample data, wherein the The current iteration expression vector of the trained search term is the initial participle vector when an iteration;
Step B: current iteration expression vector, the quantity of multiple described search results based on multiple described search results and described Number of clicks calculates separately the new iteration expression vector of multiple trained search terms;
When meeting the default stopping criterion for iteration, respectively by the newest iteration of each trained search term express to Amount, as the participle vector of the training search term, the participle vector of the trained search term constitutes first training result.
13. device according to claim 12, which is characterized in that the iteration submodule, comprising:
Search result iteration unit, for using following formula, the current iteration for calculating described search result expresses vector:
Wherein, Dj (n)It is the current iteration expression vector of the n-th wheel iteration of j-th of search result, Qi (n-1)It is i-th of training search The current iteration of (n-1)th wheel iteration of word expresses vector, Ci,jIt is the point between i-th of trained search term and j-th of search result Number is hit, | Query | it is the quantity of multiple trained search terms;
Training search term iteration unit, for using following formula, the new iteration for calculating the trained search term expresses vector:
Wherein, Qi (n)It is the new iteration expression vector of the n-th wheel iteration of i-th of trained search term, | Doc | it is multiple described search The quantity of hitch fruit.
14. device according to claim 9, which is characterized in that described to preset a variety of models include LDA topic model;
The set generation module, comprising:
Second participle submodule obtains the participle of the specified search terms for segmenting to specified search terms;
Weight Acquisition submodule, for obtaining weight of each participle of the specified search terms in the specified search terms;
Second inquiry submodule, for each participle respectively to the obtained specified search terms, inquiry uses LDA theme mould The second training result that type obtains obtains probability distribution of the participle of the specified search terms on multiple LDA themes, In, second training result is to be instructed using the LDA topic model to the second sample data extracted in search log It getting, second sample data includes the participle extracted from the title of the search result of described search log, as Training participle, second training result include probability distribution of each training participle on multiple LDA themes;
With value computational submodule, it is used to be directed to each LDA theme, using each participle of the specified search terms described Weight in specified search terms calculates the weighted sum of probability distribution of the participle of the specified search terms on the LDA theme, As weight of the specified search terms on the LDA theme;
Vector generates submodule, for the weight using the specified search terms on multiple LDA themes, constitutes the finger The LDA theme vector for determining search term, the LDA as the specified search terms express vector;
Second inner product computational submodule, for calculate separately the specified search terms LDA express vector, with respectively wait choose search Word LDA expression vector inner product, obtain the specified search terms respectively to it is each related wait choose the LDA between search term Property;
Second preferred submodule, for, respectively wait choose in search term, it is high wait choose preferentially to choose the LDA correlation described Search term constitutes the recommended candidate set of words of the specified search terms generated using the LDA topic model.
15. device according to claim 9, which is characterized in that described search selected ci poem modulus block, comprising:
Feature acquisition submodule is specified search for for obtaining the recommended candidate word in the recommendation search term Candidate Set with described The correlative character of word, as the first correlative character;
Score submodule, for being searched for using search term screening model is recommended to the recommendation to first correlative character Recommended candidate word in word Candidate Set scores respectively, obtains screening point, wherein the recommendation search term screening model is to use Linear regression or gradient promote decision Tree algorithms, are trained to third sample data, the third sample data packet Include the click relationship of the search term and the search term in search log recommended between search term, and the search in search log The second correlative character of word and the search term recommended between search term, second correlative character, with first phase Closing property feature type is identical;
The preferred submodule of third, for preferentially choosing the high recommended candidate word of the screening point as recommendation search term.
16. device according to claim 15, which is characterized in that the feature acquisition submodule, what is specifically obtained is described First correlative character includes at least one of following correlation:
Click correlation;
LDA correlation;
Collaborative filtering correlation.
17. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein processing Device, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any method and step of claim 1-8.
CN201810826071.5A 2018-07-25 2018-07-25 Search word generation method and device and electronic equipment Active CN109189990B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810826071.5A CN109189990B (en) 2018-07-25 2018-07-25 Search word generation method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810826071.5A CN109189990B (en) 2018-07-25 2018-07-25 Search word generation method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN109189990A true CN109189990A (en) 2019-01-11
CN109189990B CN109189990B (en) 2021-03-26

Family

ID=64937297

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810826071.5A Active CN109189990B (en) 2018-07-25 2018-07-25 Search word generation method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN109189990B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276009A (en) * 2019-06-20 2019-09-24 北京百度网讯科技有限公司 A kind of recommended method of associational word, device, electronic equipment and storage medium
CN110347911A (en) * 2019-05-28 2019-10-18 成都美美臣科技有限公司 The method of one e-commerce website commercial articles searching automatic push
CN110390052A (en) * 2019-07-25 2019-10-29 腾讯科技(深圳)有限公司 Search for recommended method, the training method of CTR prediction model, device and equipment
CN110795612A (en) * 2019-10-28 2020-02-14 北京字节跳动网络技术有限公司 Search word recommendation method and device, electronic equipment and computer-readable storage medium
CN112765966A (en) * 2021-04-06 2021-05-07 腾讯科技(深圳)有限公司 Method and device for removing duplicate of associated word, computer readable storage medium and electronic equipment
CN113282832A (en) * 2021-06-10 2021-08-20 北京爱奇艺科技有限公司 Search information recommendation method and device, electronic equipment and storage medium
CN113282831A (en) * 2021-06-10 2021-08-20 北京爱奇艺科技有限公司 Search information recommendation method and device, electronic equipment and storage medium
CN113312523A (en) * 2021-07-30 2021-08-27 北京达佳互联信息技术有限公司 Dictionary generation and search keyword recommendation method and device and server
CN113515940A (en) * 2021-07-14 2021-10-19 上海芯翌智能科技有限公司 Method and equipment for text search

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462289A (en) * 2014-11-27 2015-03-25 百度在线网络技术(北京)有限公司 Direct number keyword recommending method and device
CN105095210A (en) * 2014-04-22 2015-11-25 阿里巴巴集团控股有限公司 Method and apparatus for screening promotional keywords
CN105677769A (en) * 2015-12-29 2016-06-15 广州神马移动信息科技有限公司 Keyword recommending method and system based on latent Dirichlet allocation (LDA) model
US20160224663A1 (en) * 2014-11-07 2016-08-04 International Business Machines Corporation Context based passage retreival and scoring in a question answering system
CN105956149A (en) * 2016-05-12 2016-09-21 北京奇艺世纪科技有限公司 Default search word recommendation method and apparatus
CN106777217A (en) * 2016-12-23 2017-05-31 北京奇虎科技有限公司 A kind of search word recommends method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095210A (en) * 2014-04-22 2015-11-25 阿里巴巴集团控股有限公司 Method and apparatus for screening promotional keywords
US20160224663A1 (en) * 2014-11-07 2016-08-04 International Business Machines Corporation Context based passage retreival and scoring in a question answering system
CN104462289A (en) * 2014-11-27 2015-03-25 百度在线网络技术(北京)有限公司 Direct number keyword recommending method and device
CN105677769A (en) * 2015-12-29 2016-06-15 广州神马移动信息科技有限公司 Keyword recommending method and system based on latent Dirichlet allocation (LDA) model
CN105956149A (en) * 2016-05-12 2016-09-21 北京奇艺世纪科技有限公司 Default search word recommendation method and apparatus
CN106777217A (en) * 2016-12-23 2017-05-31 北京奇虎科技有限公司 A kind of search word recommends method and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HEERYON CHO 等: "Feature word selection by iterative top-K aggregation for classifying recommended shops", 《IEEE》 *
宣明: "企业级海量数据搜索引擎核心技术实现与优化", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
岑荣伟: "基于用户行为分析的搜索引擎评价研究", 《中国博士学位论文全文数据库》 *
王龙: "基于多维度用户偏好的推荐技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347911A (en) * 2019-05-28 2019-10-18 成都美美臣科技有限公司 The method of one e-commerce website commercial articles searching automatic push
CN110276009B (en) * 2019-06-20 2021-09-24 北京百度网讯科技有限公司 Association word recommendation method and device, electronic equipment and storage medium
CN110276009A (en) * 2019-06-20 2019-09-24 北京百度网讯科技有限公司 A kind of recommended method of associational word, device, electronic equipment and storage medium
CN110390052A (en) * 2019-07-25 2019-10-29 腾讯科技(深圳)有限公司 Search for recommended method, the training method of CTR prediction model, device and equipment
CN110390052B (en) * 2019-07-25 2022-10-28 腾讯科技(深圳)有限公司 Search recommendation method, training method, device and equipment of CTR (China train redundancy report) estimation model
CN110795612A (en) * 2019-10-28 2020-02-14 北京字节跳动网络技术有限公司 Search word recommendation method and device, electronic equipment and computer-readable storage medium
CN112765966B (en) * 2021-04-06 2021-07-23 腾讯科技(深圳)有限公司 Method and device for removing duplicate of associated word, computer readable storage medium and electronic equipment
CN112765966A (en) * 2021-04-06 2021-05-07 腾讯科技(深圳)有限公司 Method and device for removing duplicate of associated word, computer readable storage medium and electronic equipment
CN113282832A (en) * 2021-06-10 2021-08-20 北京爱奇艺科技有限公司 Search information recommendation method and device, electronic equipment and storage medium
CN113282831A (en) * 2021-06-10 2021-08-20 北京爱奇艺科技有限公司 Search information recommendation method and device, electronic equipment and storage medium
CN113515940A (en) * 2021-07-14 2021-10-19 上海芯翌智能科技有限公司 Method and equipment for text search
CN113515940B (en) * 2021-07-14 2022-12-13 上海芯翌智能科技有限公司 Method and equipment for text search
CN113312523A (en) * 2021-07-30 2021-08-27 北京达佳互联信息技术有限公司 Dictionary generation and search keyword recommendation method and device and server

Also Published As

Publication number Publication date
CN109189990B (en) 2021-03-26

Similar Documents

Publication Publication Date Title
CN109189990A (en) A kind of generation method of search term, device and electronic equipment
CN105335391B (en) The treating method and apparatus of searching request based on search engine
CN109062919B (en) Content recommendation method and device based on deep reinforcement learning
CN108280155B (en) Short video-based problem retrieval feedback method, device and equipment
CN108345702A (en) Entity recommends method and apparatus
JP5423030B2 (en) Determining words related to a word set
CN103377232B (en) Headline keyword recommendation method and system
CN105446973B (en) The foundation of user's recommended models and application method and device in social networks
CN107862022B (en) Culture resource recommendation system
US20190347281A1 (en) Apparatus and method for semantic search
CN104199833B (en) The clustering method and clustering apparatus of a kind of network search words
US8949227B2 (en) System and method for matching entities and synonym group organizer used therein
CN106339502A (en) Modeling recommendation method based on user behavior data fragmentation cluster
CN109241412A (en) A kind of recommended method, system and electronic equipment based on network representation study
CN108536784B (en) Comment information sentiment analysis method and device, computer storage medium and server
CN105512180B (en) A kind of search recommended method and device
WO2014056408A1 (en) Information recommending method, device and server
CN103399862B (en) Determine the method and apparatus of search index information corresponding to target query sequence
CN103744887B (en) It is a kind of for the method for people search, device and computer equipment
CN104636407B (en) Parameter value training and searching request treating method and apparatus
CN106649871B (en) Detection method, device and the calculating equipment of article multiplicity
Layton Learning data mining with python
CN110222260A (en) A kind of searching method, device and storage medium
CN107391509A (en) Label recommendation method and device
CN108153792A (en) A kind of data processing method and relevant apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant