CN103425767A - Method and system for determining prompt data - Google Patents

Method and system for determining prompt data Download PDF

Info

Publication number
CN103425767A
CN103425767A CN2013103421046A CN201310342104A CN103425767A CN 103425767 A CN103425767 A CN 103425767A CN 2013103421046 A CN2013103421046 A CN 2013103421046A CN 201310342104 A CN201310342104 A CN 201310342104A CN 103425767 A CN103425767 A CN 103425767A
Authority
CN
China
Prior art keywords
data
reminder
website
vertical website
employment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013103421046A
Other languages
Chinese (zh)
Other versions
CN103425767B (en
Inventor
柴思远
王灿辉
张阔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Beijing Sogou Information Service Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Beijing Sogou Information Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd, Beijing Sogou Information Service Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201310342104.6A priority Critical patent/CN103425767B/en
Publication of CN103425767A publication Critical patent/CN103425767A/en
Application granted granted Critical
Publication of CN103425767B publication Critical patent/CN103425767B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and system for determining prompt data. The method and system for determining the prompt data is used for solving the problem that prompt information of existing search tips in a small website is little. The method includes the steps of analyzing search logs recorded in a server, capturing the incident relation between each vertical website and the prompt data corresponding to the vertical website in the whole network, dividing all the vertical websites in the whole network according to established industry categories, clustering all the vertical websites according to the industry categories, correspondingly obtaining the prompt data correspondingly related to the industry categories according to the incident relation between each vertical website and the prompt data corresponding to the vertical website, when inputting is performed in a selected vertical website, and loading the prompt data correspondingly related to the industry category where the selected vertical website belongs to according to input characters. The method and system for determining the prompt data can achieve sharing of the prompt data in the industry categories, and therefore the accurate prompt data are obtained and information query efficiency is improved.

Description

A kind of definite method and system of reminder-data
Technical field
The present invention relates to the web search technology, particularly relate to a kind of definite method and system of reminder-data.
Background technology
While being searched in vertical website, search for the ease of the user, the Search Hints service is provided usually, can when the user input part search data, to search, be pointed out, input " love " as the user while being searched in shopping website, can provide prompting as " Ai Mashi " " electric motor car of love agate " etc.In for website, search provides when service prompting, can in the content of slave site, excavate the possible Search Hints word of short text, and the Search Hints word is excavated in search daily record that also can User, thereby form search, provides the prompting service.
At present, some little vertical web station owners will build the prompting service by excavating possible Search Hints word the content from website, there is no large-scale flow in website initial go-live period, excavate the data of some crucial short texts as the prompting service from the content of text of website.For example, apk goes through download website interesting net http://os-android.liqucn.com/, and the name of website being searched to the apk of record is referred to as the data of prompting service; The vertical website 4399 net http://www.4399.com/ of trivial games, the name of the trivial games that website is included is referred to as the data of prompting service, thereby the prompting service is provided.
For the website with certain flow, it is that language material builds the prompting service that common user searches for daily record.Within the shorter time cycle, most search need has been covered in the whole network user's search daily record, thereby has guaranteed the data integrity of Search Hints service.Current most of search engine all uses the prompting service that this scheme constructs is corresponding.
Excavate the original prompting service built of short essay in description text for slave site, the user of user's statement inquires about scene and description text corresponding to search example is difficult to coupling usually.The one apk name of including as website is " search dog cellphone inputting method ", but the user often is more prone to use the statement of " search dog input method " or " sogou input method " and so on, and now prompting service can not come into force; In addition, this kind of prompting service can only, by correlativity as sort by, can not be sorted according to temperature in the situation that the degree of correlation is identical; Moreover, could be as the language material of prompting service after the short text that generally needs artificial mark to excavate, maintenance cost is high, and the update cycle is long, has affected the search efficiency of information.
Searching for daily record for the user is the prompting service that language material builds, support due to the extensive flow of needs, general is used by large-scale websites such as Baidu, Taobaos, and relate to the technical barriers such as relevant matches, sequence, participle, phonetic notation, cost of development is higher, so often be difficult to use in microsite, affected equally the search efficiency of information in microsite.
Summary of the invention
The embodiment of the present invention provides a kind of definite method and system of reminder-data, to solve the existing less problem of Search Hints information in microsite.
In order to address the above problem, the embodiment of the invention discloses a kind of definite method of reminder-data, comprising:
The search daily record of recording in Analysis server, the incidence relation of the reminder-data that each vertical website of crawl is corresponding with described vertical website in the whole network;
According to set category of employment, each the vertical website in the whole network is divided, and by described category of employment, each vertical website is carried out to cluster, the incidence relation of the reminder-data corresponding with described vertical website according to each vertical website, the corresponding corresponding associated reminder-data of every profession and trade classification that obtains;
While being inputted in selected vertical website, according to the corresponding associated reminder-data of category of employment under the described selected vertical website of input character loading.
Preferably, the search daily record of recording in described Analysis server, capture the incidence relation of the reminder-data that each vertical website is corresponding with described vertical website in the whole network, comprising: the search daily record of recording in Analysis server captures historical query word corresponding to each vertical website in the whole network; Using described historical query word as reminder-data, set up the incidence relation of the reminder-data that each vertical website is corresponding with described vertical website.
Preferably, described using described historical query word as reminder-data, set up the incidence relation of the reminder-data that each vertical website is corresponding with described vertical website, comprise: in each vertical website, the click data of User is screened described historical query word, the described historical query word that screening is obtained, as reminder-data, is set up the incidence relation of the reminder-data that described each vertical website is corresponding with described vertical website.
Preferably, describedly according to set category of employment, each the vertical website in the whole network is divided, and by described category of employment, each vertical website is carried out to cluster, comprising: determine the class condition of set category of employment, and according to described class condition, each the vertical website in the whole network is divided; Determine the category of employment that each vertical website is affiliated, by described category of employment, each vertical website is carried out to cluster.
Preferably, the described class condition of determining set category of employment, comprising: determine the known vertical website in each category of employment by searching the category of employment list, and obtain the feature language material of each described known vertical website; Feature language material to each known vertical website under same category of employment carries out model training, determines the corresponding class condition of described category of employment.
Preferably, the described feature language material to each known vertical website under same category of employment carries out model training, determine the corresponding class condition of described category of employment, comprise: the feature language material to each known vertical website under same category of employment carries out model training, obtains at least one proper vector that each known vertical website is corresponding; At least one proper vector that described known vertical website is corresponding, as training data, is determined the class condition of described category of employment.
Preferably, describedly according to input character, load the corresponding associated reminder-data of category of employment under described selected vertical website, comprising: according to input character, determine reminder-data corresponding to category of employment under described selected vertical website; Reminder-data corresponding to category of employment under described selected vertical website is weighted, and loads the reminder-data after weighting.
Preferably, after by category of employment, each vertical website being carried out to cluster, also comprise: calculate the similarity between each vertical website for each the vertical website under same category of employment, and determine the similar website of each vertical website according to described similarity; According to the corresponding associated reminder-data of category of employment under the described selected vertical website of input character loading, comprise: determine respectively described selected reminder-data corresponding to vertical website according to described input character, with the reminder-data corresponding to similar website of described selected vertical website; The reminder-data that described similar website is corresponding is weighted according to described similarity, and the reminder-data corresponding with described selected vertical website jointly sorted and load.
Preferably, during the described reminder-data according to the corresponding association of category of employment under the described selected vertical website of input character loading, while by input method, carrying out the character input, described input method is called the reminder-data of the affiliated category of employment of described selected vertical website that input character is corresponding; Perhaps, while being inputted in selected vertical website, the search engine of described selected vertical website calls the reminder-data of category of employment under the described selected vertical website that input character is corresponding by service interface; Perhaps, in browser during the vertical website of loading selected, described browser calls the reminder-data of category of employment under the described selected vertical website that input character is corresponding by scripted code.
Accordingly, the embodiment of the present invention also provides a kind of reminder-data fixed system really, comprising:
Analysis module, the search daily record of recording for Analysis server, the incidence relation of the reminder-data that each vertical website of crawl is corresponding with described vertical website in the whole network;
Divide and the cluster module, for according to set category of employment, each vertical website of the whole network being divided, and by described category of employment, each vertical website is carried out to cluster, the incidence relation of the reminder-data corresponding with described vertical website according to each vertical website, the corresponding corresponding associated reminder-data of every profession and trade classification that obtains;
Load-on module, when being inputted in selected vertical website, load the corresponding associated reminder-data of category of employment under described selected vertical website according to input character.
Preferably, described analysis module comprises: capture submodule, the search daily record of recording for Analysis server, capture historical query word corresponding to each vertical website in the whole network; Set up submodule, for using described historical query word as reminder-data, set up the incidence relation of the reminder-data that each vertical website is corresponding with described vertical website.
Preferably, described division cluster module comprise: the classification submodule, for determining the class condition of set category of employment, and according to described class condition, each the vertical website in the whole network is divided; The cluster submodule, for the category of employment under definite each vertical website, carry out cluster by described category of employment to each vertical website.
Preferably, described load-on module comprises: determine submodule, for the foundation input character, determine under described selected vertical website corresponding reminder-data in category of employment; Weighting also loads submodule, for reminder-data corresponding to category of employment under described selected vertical website is weighted, and loads the reminder-data after weighting.
Preferably, described system also comprises: similar website determination module, calculate the similarity between each vertical website for each the vertical website under same category of employment, and determine the similar website of each vertical website according to described similarity; Described load-on module, comprising: reminder-data is determined submodule, for determine respectively described selected reminder-data corresponding to vertical website according to described input character, with the reminder-data corresponding to similar website of described selected vertical website; Weighting also loads submodule, is weighted according to described similarity for the reminder-data that described similar website is corresponding, and is jointly sorted and load with the reminder-data under described selected vertical website.
Compared with prior art, the present invention includes following advantage:
The incidence relation of embodiment of the present invention reminder-data corresponding with vertical website according to each the vertical website in search daily record crawl the whole network, thereby obtain reminder-data, and then, according to the classification of vertical website and the cluster of category of employment, determine the corresponding associated reminder-data of every profession and trade classification.Then the corresponding associated reminder-data of category of employment under obtaining selected vertical website in when search, realize sharing of reminder-data in category of employment, thereby obtain reminder-data accurately, accelerated the search efficiency of information.
The embodiment of the present invention is by definite historical query word and the corresponding vertical websites thereof such as daily record of search engine, thereby the click data by the user is screened the historical query word, then using the historical query word as reminder-data, for the Search Hints in vertical website provides the data basis, guaranteed the accuracy of reminder-data.
The accompanying drawing explanation
Fig. 1 is definite method flow diagram of the reminder-data that provides of the embodiment of the present invention one;
Fig. 2 is definite method flow diagram of the reminder-data that provides of the embodiment of the present invention two;
Fig. 3 is the reminder-data that provides of the embodiment of the present invention four fixed system structural drawing really;
Fig. 4 is definite device preferred structure figure of the reminder-data that provides of the embodiment of the present invention four.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, below in conjunction with the drawings and specific embodiments, the present invention is further detailed explanation.
The embodiment of the present invention provides a kind of definite method of reminder-data, can be from obtaining the reminder-data of corresponding association in same category of employment in when search, realize sharing of the interior reminder-data of category of employment, thereby can be enriched, accurate, various reminder-data, for user's search provides prompting accurately and rapidly.
Embodiment mono-
With reference to Fig. 1, provided the definite method flow diagram of the reminder-data that the embodiment of the present invention one provides.
Step 101, the search daily record of recording in Analysis server, the incidence relation of the reminder-data that each vertical website of crawl is corresponding with described vertical website in the whole network.
In the embodiment of the present invention, that the user inquired about and relevant to each web sites function historical query word is as the basic data of Search Hints service.Search daily record by record in analytic statistics search engine (i.e. the associated server of search), obtain the vertical website that historical query word (being basic data) that the user inputted and described query word correspondence search, using this historical query word as reminder-data, thus set up each vertical website in the whole network respectively with the incidence relation of corresponding reminder-data.Therefore, can be on the basis that guarantees the reminder-data accuracy by the analysis to the search daily record, needn't be concerned about again the stationary problem between each self-corresponding reminder-data of each vertical website and website, thereby the robotization that can realize the reminder-data corresponding to each vertical website is upgraded, and is convenient to long-time maintenance.
Step 102, according to set category of employment, each the vertical website in the whole network is divided, and by described category of employment, each vertical website is carried out to cluster, the incidence relation of the reminder-data corresponding with described vertical website according to each vertical website, the corresponding corresponding associated reminder-data of every profession and trade classification that obtains.
In the embodiment of the present invention, for the method that prevents from because the problems such as vertical website flow is lower cause the less problem of reminder-data, adopt sharing the Search Hints data provides abundant, reminder-data accurately for vertical website.Therefore, also need to determine the category of employment of each vertical website, thereby obtain reminder-data based on the sector classification, for example little vertical websites at some, when searched in product meeting only, can from belonging to the large-scale website of shopping class together, Taobao, Amazon share reminder-data corresponding to search.
Can know the category of employment under the vertical website of part by set navigation website etc., set category of employment is one of objective classification standard of vertical website.In the present embodiment, according to set category of employment, each the vertical website in the whole network is divided, thereby determined the category of employment under each vertical website, and then by category of employment, each vertical website is carried out to cluster, determined under the every profession and trade classification the vertical website comprised.
There is difference because the differences such as uninterrupted and input cost cause the Search Hints service that the different vertical website provides, be mainly manifested on data cover degree and sequence effect.The embodiment of the present invention is that the relevant information by website self provides data foundation accurately for the Search Hints service, realization is shared information, between the vertical website of each under same category of employment, the Search Hints service can be used for reference mutually, shared reminder-data, for the user provides better search experience.Described share refers to common use Search Hints information and points out service, when selected vertical website carries out the Search Hints service, reminder-data can derive from this selected vertical website, also can derive from and other vertical websites under the selected same category of employment of vertical website.
Therefore, after the cluster of the category of employment that completes vertical website, for the vertical website under same category of employment, can be according to the incidence relation of this vertical website and corresponding reminder-data, determine under the sector classification corresponding associated all reminder-data as with the corresponding associated reminder-data of the sector classification, thereby determine the reminder-data of each different industries classification correspondence association.
Step 103, while being inputted, according to the corresponding associated cue data of category of employment under the described selected vertical website of input character loading in selected vertical website.
While in selected vertical website, carrying out search, can obtain the input character of user in this selected vertical website, and then according to the corresponding associated reminder-data of category of employment under the definite selected vertical website of this input character, in the situation that selected vertical website does not comprise reminder-data, under category of employment under selected this selected vertical website, the reminder-data of other vertical websites is determined corresponding associated reminder-data in the reminder-data of other vertical websites according to this input character; In the situation that selected vertical website comprises reminder-data, not only can obtain in this selected vertical website and comprise which reminder-data, can also be according to selecting the affiliated category of employment of vertical website, determine the reminder-data of other vertical websites under its category of employment, thereby get under selected vertical website reminder-data corresponding in category of employment according to input character, for user's search is pointed out.At shopping website input " ai ", can obtain more popular query word " Ai Mashi " " Ai Mashi suitcase " etc. as reminder-data as the user.
In sum, the incidence relation that the embodiment of the present invention can capture the reminder-data that vertical website is corresponding with vertical website according to the search daily record obtains the reminder-data that each website is corresponding, and then, according to the classification of vertical website and the cluster of category of employment, determine the corresponding associated reminder-data of every profession and trade classification.Then the corresponding associated reminder-data of category of employment under can obtaining selected vertical website in when search, realize sharing of reminder-data in category of employment in each vertical website, thereby can be enriched, accurate, various reminder-data, accelerated the search efficiency of information.
Embodiment bis-
With reference to Fig. 2, provided the definite method flow diagram of the reminder-data that the embodiment of the present invention two provides.
Step 201, the search daily record of recording in Analysis server, capture historical query word corresponding to each vertical website in the whole network.
Search engine can be regarded as the main entrance of internet, be most of users with most of websites between be connected directly, efficiently tie.The user passes to search engine by query demand with the form of query word usually, and search engine is calculated according to query word, presents top n website or the page that meets demand most to the user.
In the embodiment of the present invention, daily record pair by the analytic statistics search engine, the historical query word that the whole network user was inputted (being basic data) carries out preliminary screening, and determine by vertical website corresponding to each historical query word, thereby form the service data of Search Hints, on the basis that guarantees the reminder-data accuracy, needn't be concerned about again the stationary problem between each self-corresponding reminder-data of each vertical website and website, thereby the robotization that can realize the reminder-data corresponding to each vertical website is upgraded, and is convenient to long-time maintenance.
Step 202, using described historical query word as reminder-data, set up the incidence relation of the reminder-data that each vertical website is corresponding with described vertical website.
Preferably, step 202 is using described historical query word as reminder-data, set up the incidence relation of the reminder-data that each vertical website is corresponding with described vertical website, comprise following sub-step: in each vertical website, the click data of User is screened described historical query word, the historical query word that screening is obtained, as reminder-data, is set up the incidence relation of the reminder-data that each vertical website is corresponding with described vertical website.
Then utilize user feedback to obtain user's click data, the historical query word is accurately screened.Wherein, by click data, can determine under identical historical query word, different user is in the face of different choice that Search Results is made.The deviation existed on selection result for different users, can determine clicked maximum front M vertical website by statistics, now, using described clicked maximum front M vertical website as the vertical website that meets this historical query word, also determine the historical query word that the screening corresponding with this vertical website obtains simultaneously.The historical query word that screening is obtained, as reminder-data, is set up the incidence relation of the reminder-data that described each vertical website is corresponding with described vertical website.
The basis for estimation of the embodiment of the present invention using the user behavior of magnanimity in search engine as the high-quality reminder-data, set up reminder-data and the accurate many-to-many relationship between each vertical website in the whole network, thereby provide reminder-data when the search for vertical website.
Step 203, determine the class condition of set category of employment, and according to described class condition, each the vertical website in the whole network divided.
In the embodiment of the present invention, can know the category of employment of the vertical website of part by navigation website etc., as the vertical website of known category of employment.As http: in // 123.sogou.com/newtab/, just the website in the whole network is divided into to a plurality of set categorys of employment such as " novel ", " page trip ", " video display ", " video ", " music ", and, before the label of every profession and trade classification is illustrated in respectively to each vertical website corresponding to each classification, for the user, by category of employment, each vertical website is called.Server carries out page analysis to the page after capturing the page of navigation website, can obtain known vertical website under each category of employment that the every profession and trade classification includes with navigation website.
In addition, the descriptor of each the vertical website by this known category of employment, as the feature text of website, summary info etc. can be described the category of employment of this vertical website, thereby further determine the class condition of the sector classification, and then by the class condition obtained, other vertical websites of the whole network are classified, now the vertical website of known category of employment can reclassify.Certainly, because the classification of vertical website in the present invention is not unique, therefore in order accurately to determine the category of employment of each vertical website, to the vertical website of known category of employment, also can re-start classification, the embodiment of the present invention is not construed as limiting this.
Preferably, the described class condition of determining set category of employment, comprising: determine the known vertical website in each category of employment by searching the category of employment list, and obtain the feature language material of described known vertical website; Feature language material to each known vertical website under same category of employment carries out model training, determines the corresponding class condition of described category of employment.
Category of employment list that the embodiment of the present invention is preset is for determining the known vertical website under category of employment, and the category of employment of the vertical website in the sector list of categories is known.Then can obtain the feature language material of known vertical website, described feature language material is for describing the data of this vertical website feature, thereby carry out model training by the feature language material under same category of employment, by methods such as statistics, train potential Di Li Cray to distribute (Latent Dirichlet Allocation, LDA) model, and then determine class condition corresponding to described category of employment.
In the embodiment of the present invention, the feature language material at least comprises following three parts:
1) text of vertical website itself in each set category of employment;
Search engine itself has been stored the full text information of each vertical website, obtains conveniently, can be used for one of language material of doing vertical website cluster.As the title of vertical website is the feature text that the most concisely reacts this vertical site contents, be the description vertical website carried out from self function, can describe accurately the category of employment under vertical website.
2) the website summary texts that search engine calculates;
The summary info of website is that search engine uses complicated algorithm to carry out polymerization to the Website page text to obtain, its content refining, and can reflect and the Core Feature of website can be used as one of language material of website cluster.
3) information of using during user's access websites etc.
Click while entering into vertical website the information loaded by search engine, it is also an important text feature of website, text message web sites function is described for the angle from the user, in addition, extract these and click frequency corresponding to information, also can react the weight height of these text messages, be conducive to the clustering algorithm of vertical website, can be used for one of language material of doing vertical website cluster.
In the embodiment of the present invention, the feature language material comprises multiple, can divide different feature language materials to be trained, when the feature language material of each known vertical website under same category of employment is carried out to model training, determine the class condition that described category of employment is corresponding, comprise: the feature language material to each known vertical website under same category of employment carries out model training, obtains at least one proper vector that each known vertical website is corresponding; At least one proper vector that described each known vertical website is corresponding, as training data, is determined the class condition that category of employment is corresponding.
Because above-mentioned three kinds of feature language materials come, in comfortable different application scenarios, therefore to adopt the LDA model to be calculated respectively the language material in each set classification.Thereby obtain describing the vector of feature of the corresponding classification in vertical website of unknown classification.Wherein, the LDA model is topic model, also referred to as three layers of bayesian probability model, comprises: word, theme and document three-decker.Document is obeyed Dirichlet to theme and is distributed, and theme is obeyed multinomial distribution to word.
The disaggregated model respectively each category of employment training obtained, the class condition of definite every profession and trade classification, classified to other vertical websites that are not included in the whole network in navigation website.From vertical website, excavate or capture its category of employment information by technique scheme, thereby can determine several typical vertical websites under each set category of employment.Now can adopt Bayesian Classification Model, and the proper vector of employing category of employment and corresponding representative vertical website thereof is as training data, obtain class condition corresponding to each category of employment, thereby other vertical websites that are not included in navigation website are classified.
Further, can also utilize above-mentioned feature language material, improve the precision of class condition.The data such as the theme (title) of vertical website, summary (summary), click have all been reacted the text feature of vertical website from different linguistic context, above-mentionedly for different classes of feature language material, independently calculated proper vector.Therefore can be for the class condition of every profession and trade classification, and each vertical website is classified, then be weighted combination with above-mentioned classification results, further improve the classification results of each vertical website.
Preferably, also comprise the step of the list of preset category of employment: obtain the network class data, wherein, described network class data comprise following at least one: text message and the anchor text of Web side navigation, Yellow Page, vertical website; Determine the corresponding relation of category of employment and vertical website according to described network class data, and set up the list of category of employment.
In the embodiment of the present invention, category of employment is to share the basis of Search Hints information between vertical website, and according to the difference of granularity of division, same vertical website may belong to a plurality of categorys of employment, therefore, accurately comprehensively category of employment just can make between vertical website the Search Hints service fully shared.Below discuss in detail the step of determining the list of setting up category of employment:
1) Web side navigation and Yellow Page
Web side navigation and Yellow Page thereof can provide the directory service of internet site, are another entrances that is different from the internet of search engine, usually can include up to a hundred websites commonly used, and it is showed by category of employment is corresponding.Website corresponding to these categorys of employment and category of employment normally write by professional editor, can reflect the classifying quality of most of Internet user's psychology expection.Therefore can determine the corresponding relation of category of employment and vertical website by Web side navigation and Yellow Page, and then provide basic for the list of category of employment.
2) text message of vertical website
The text message of vertical website, theme as corresponding as vertical website, theme is the short text of the most concentrated vertical website function of reaction, usually can contain the category of employment information of vertical website.By popular word or the phrase in the corresponding theme of the popular vertical website of statistical study, can find out the category of employment in theme in conjunction with artificial mark, thereby determine the corresponding relation between category of employment and the vertical website in theme place.
3) anchor text
The anchor text is the text description that other vertical websites provide when quoting this certain vertical website, is than the theme short text of refining more in vertical website.By the phrase in statistical study anchor text or anchor text, can find out in conjunction with artificial mark the category of employment that the anchor text is corresponding, thereby determine the corresponding relation between category of employment and the vertical website in anchor text place.
Step 204, determine the category of employment that each vertical website is affiliated, by described category of employment, each vertical website carried out to cluster.
Can determine by cluster the vertical website comprised under the every profession and trade classification in the whole network, wherein category of employment is not to oppose each other, and therefore vertical website may corresponding a plurality of different categorys of employment.
Optionally, after by category of employment, each vertical website being carried out to cluster in step 204, also comprise: for each the vertical website under same category of employment, calculate the similarity between each vertical website, and determine according to described similarity the similar website that each vertical website is corresponding, thereby the every profession and trade classification is carried out to further industry segmentation.
After above-mentioned each vertical website characteristic of correspondence vector calculated, because proper vector can be for the similarity of classification and the vertical website that calculates unknown classification, therefore can also calculate the similarity of each vertical website under same category of employment, thereby determine the similarity relation of each vertical website in same category of employment, sharing the Search Hints information sorting for follow-up each vertical website obtained for segmentation provides basis.
Can calculate each vertical website and the similarity of selecting between vertical website in same category of employment while calculating similar website, the embodiment of the present invention is in order to reduce the wasting of resources, can also configure similar threshold value, thereby similarity in same category of employment is surpassed to the vertical website of similar threshold value as similar website, as similar threshold value is 50% or 70% etc. as the segmentation website in same category of employment, thereby obtain, respectively segment the common corresponding reminder-data of website; While being inputted in selected vertical website, according to input character, can load and respectively segment the corresponding associated reminder-data of website, realize the shared of corresponding reminder-data between selected vertical website and relevant segmentation website.
Step 205, the incidence relation of the reminder-data corresponding with described vertical website according to each vertical website, the corresponding corresponding associated reminder-data of every profession and trade classification that obtains.
Step 206, while being inputted, according to the corresponding associated reminder-data of category of employment under the described selected vertical website of input character loading in selected vertical website.
In a preferred embodiment of the invention, above-mentioned steps 206 comprises following sub-step: according to input character, determine reminder-data corresponding to category of employment under described selected vertical website; Reminder-data corresponding to category of employment under described selected vertical website is weighted, and loads the reminder-data after weighting.
At first determine the affiliated category of employment of selected vertical website, then determine reminder-data corresponding under the sector classification according to input character, obtain the reminder-data under selected vertical website, with the reminder-data of other the vertical websites except selected vertical website in affiliated category of employment simultaneously.
When reminder-data is shared, although from category of employment under selected vertical website, having got the reminder-data than horn of plenty, but, because the zone of Search Hints frame is limited, therefore also to further screen described reminder-data, to determine the reminder-data to user feedback.When screening, can reminder-data be weighted according to preset weight, wherein, can be to the corresponding weight of each vertical site configuration in category of employment, as in the situation that select vertical website, comprised reminder-data, the weight maximum of selected vertical website, other vertical websites can be according to configure weights such as size separately or flows.
For example, weight can be definite according to temperature, and described temperature can be determined the volumes of searches of this reminder-data by User.Wherein, for each vertical website, add up respectively the temperature of this reminder-data under each vertical website, be designated as u_site, and add up the temperature of this reminder-data under the whole network, be designated as u_all.The important evidence of reminder-data sequence in the Search Hints service using above-mentioned two parameters u _ site and u_all, by being weighted comprehensive temperature u_sort who can be used for sequence of summation formation to u_site and u_all.Thereby can determine the sequence of each reminder-data by comprehensive temperature, determine the ranking results of reminder-data, from ranking results is chosen, N item reminder-data is loaded as the corresponding reminder-data loaded of input character, and as front 10 or first 20 etc., wherein N is positive integer.
In addition, also determined selected similar website corresponding to vertical website in above-described embodiment, therefore when selected vertical website is carried out to Search Hints, in category of employment under selected vertical website, can only for selected vertical website and similar website thereof, determine reminder-data, in the situation that selected vertical website comprises reminder-data, only for the similar website with selected vertical website, determine reminder-data, make determining of reminder-data more targeted.
Therefore in another preferred embodiment of the present invention, above-mentioned steps 206 comprises following sub-step: determine respectively described selected reminder-data corresponding to vertical website according to described input character, with the reminder-data corresponding to similar website of described selected vertical website; The reminder-data that described similar website is corresponding is weighted according to described similarity, and the reminder-data corresponding with described selected vertical website jointly sorted and load.
The incidence relation of corresponding reminder-data by vertical website, can select the reminder-data that vertical website is corresponding, reminder-data corresponding to similar website with described selected vertical website, further can also obtain the similarity of similar website, the reminder-data that similar website is corresponding is weighted according to similarity, thereby the reminder-data corresponding with selected vertical website sorted jointly, determine the corresponding reminder-data loaded of input character.
Because category of employment has contained each similar vertical website of theme, and have more fine-grained similarity relation between the vertical website of category of employment inside.The embodiment of the present invention is utilized the LDA proper vector of website, calculates the similarity between each vertical website, shares the weight foundation of Search Hints service as website in category of employment.Therefore, adopt the similarity of similar website under same category of employment to share the Search Hints service as weight, at vertical website in self reminder-data deficiency, the classification that preferentially chooses a trade inside is spent similarly the highest reminder-data corresponding to similar website and is shared, and has improved the accuracy of reminder-data.
Step 207, load reminder-data.
Strictly do not limit operation steps at the present embodiment, as can be first performed step 203 and 204, then perform step 201 and 202, above-mentioned operation steps only, due to definite method of discussing for example reminder-data, should not be understood as limitation of the present invention.
In concrete enforcement, can, by channels such as input method or browsers, the Search Hints service be provided to each vertical website, and share reminder-data between the website of category of employment inside.As the user when vertical website is used search service, can be according to the category of employment under vertical website, for the user provides accurately comprehensively Search Hints service.
In a preferred embodiment of the invention, described method also comprises: identify the search entrance of each vertical website, and obtain user's input character by described search entrance.
In the embodiment of the present invention, the method for identification search entrance mainly comprises following sub-step:
Step S301, the form structure of the input frame comprised in the identification Website page.
The search entrances of the vertical websites of great majority are all that the mode with input frame exists in the page.
Step S302, by the contextual intention around input frame, the effect of judgement input frame.
Such as the keywords corresponding to link such as " logging in ", " homepage ", " content ", " delivering " are exactly the keyword that represents non-entrance intention, and " search ", " inquiry " etc. are exactly the clear and definite keyword that represents the entrance intention, mean that corresponding input frame is search box.
Step S303, the url(Uniform Resource Locator of its input of judgement after the input effective query in input frame, URL(uniform resource locator)) whether similar with the url of search site.
By whether comprising the input word in URL, the search(of whether take search) or information judge as methods such as prefixes.
Step S304, by the structure returned results after the input effective query, query word is further verified the validity of entrance in information such as the frequencies of occurrences.
Thereby can identify search entrance corresponding in each vertical website by above-mentioned steps.
In other embodiments of the invention, can adopt the corresponding associated reminder-data of the affiliated category of employment of the vertical website of following several mode loading selected:
(1), in browser during the vertical website of loading selected, described browser calls the reminder-data of category of employment under the described selected vertical website that input character is corresponding by scripted code.
Load the mode of corresponding scripts code by browser, whether the URL loaded according to browser is judged, be vertical website, obtains the search entrance of this vertical website and provides the Search Hints service for the user.Thereby while being inputted in selected vertical website, when query frame obtains user's a input character, can determine the corresponding associated reminder-data of category of employment under selected vertical website, thereby provide the Search Hints based on present input data for the user.
(2), while being inputted in selected vertical website, the search engine of described selected vertical website calls the reminder-data of category of employment under the described selected vertical website that input character is corresponding by service interface.
In the search entrance of each vertical website, directly embed Search Hints to be called and serve corresponding service interface, while being inputted, call the reminder-data of category of employment under the selected vertical website that input character is corresponding by service interface in selected vertical website.
(3), while carrying out the character input by input method, described input method is called the reminder-data of the affiliated category of employment of described selected vertical website that input character is corresponding.
While adopting input method that the Search Hints service is provided, can be invoked at by input method the inline code of the search box of each vertical website, thereby, when carrying out the character input by input method, input method can be called the reminder-data of the affiliated category of employment of described selected vertical website that input character is corresponding.
In other embodiments, the mobile phone application with search box also can the vertical website of loading selected under the corresponding associated reminder-data of category of employment, this method also can be applied on mobile device.
And, because input method, browser and search engine can be stored with the form of user account user's historical behavior data, therefore can also definite reminder-data further be screened by the behavioral data of storage, and then by the reminder-data after the screening of search entrance feedback.For example, nearly a period of time has been paid close attention to " perfume " series products, now, after input " ai ", when prompting " Ai Mashi ", " Ai Mashi perfume " and " Ai Mashi belt ", can preferentially show by " Ai Mashi perfume ".
The embodiment of the present invention has realized sharing of reminder-data that under same classification, each vertical website is corresponding, thereby has improved input efficiency, and has saved the cost of the structure reminder-data of little vertical website.
In sum, the embodiment of the present invention is by definite historical query word and the vertical websites corresponding to each historical query word such as daily record of search engine, click data by the user is screened the historical query word, and using the historical query word as reminder-data, for the reminder-data in each vertical website provides the data basis, guarantee the accuracy of reminder-data.
Secondly, obtain the feature language material of the known vertical website in the every profession and trade classification, expect to determine each category of employment characteristic of correspondence vector by feature, thereby determine the class condition of category of employment, and then each the vertical website in the whole network is classified, and determine similar website, for the follow-up reminder-data of sharing provides foundation, guarantee the accuracy of data.
Again, the mode that under selected vertical website, reminder-data corresponding to category of employment can sort by weighting is screened, thereby obtained the reminder-data of comparatively accurate and high-quality.And the reminder-data corresponding to category of employment can also be weighted according to similarity, thereby provide abundant reminder-data for the user.
Embodiment tri-
Below discuss for example definite method of reminder-data.
1, set up the incidence relation of the corresponding reminder-data of vertical website.
With shopping website treasure net, excellent many nets are example, the daily record of a large amount of users at luxury goods such as search engine inquiry " Ai Mashi belt ", " how youngster wrap perfume (or spice) " all can be arranged every day, and in Search Results, select above website to be browsed.Excavate the daily record of search engine, can find the corresponding relation of these historical query words and vertical website, by the quantity of statistical query word, can also distinguish which inquiry and be high-quality, with other user's requests of major part, overlap.By above-mentioned data, can set up respectively the incidence relation of above two corresponding reminder-data of vertical website, be above-mentioned two vertical websites Search Hints service accurately is provided.
2, vertical website classification and cluster.
In set navigation page, be easy to find a large amount of category of employment information, and the industry-by-industry classification can comprise several typical websites as sample.Representative vertical website that category of employment and category of employment were comprised using in the present embodiment as the benchmark of classification, the vertical website of each size of the whole network is classified.
" shopping " category of employment of take is example, on navigation page, be easy to find category of employment " shopping ", and the typical shopping website Amazon comprised, No. 1 shop etc., by category of employment and typical web site thereof, the proper vector of each website of using the LDA model to calculate, thereby carry out features training, obtain disaggregated model, and according to disaggregated model, each non-classified other vertical websites are classified, and then all websites that just can obtain category of employment " shopping " and comprise.
Wherein, the text features such as the title of website, summary, query word (query) are the Main Basiss of websites collection, before train classification models, this programme is used the LDA computing method, for text features such as title, the summary of each website, query, calculates respectively website characteristic of correspondence vector.For example: the query of shopping classification website passes into and usually can comprise " so-and-so price ", " so-and-so quotation ", keywords such as " so-and-so cash on delivery ", the query of recruitment classification website comprises " so-and-so is part-time " usually, " so-and-so recruitment ", keywords such as " so-and-so look for a job ", the LDA model is by the co-occurrence of these words of statistics, the parameters such as transmission, by " price ", " quotation ", the relevant feature of semanteme such as " cash on delivery " is carried out normalization calculating, obtain one dimensional numerical feature fx, same method can be by " recruitment ", " look for a job ", features such as " part-time " is normalized to fy, can obtain a digital proper vector featrue_query (f1 for any website like this, f2, fx, fy, fn) the theme distribution situation of this website is described.Same method can calculate title characteristic of correspondence vector f eatrue_title, summary characteristic of correspondence vector f eature_summary.
And then according to each vertical website the incidence relation of the reminder-data corresponding with described vertical website, can determine the corresponding associated reminder-data of every profession and trade classification.
3, provide reminder-data for vertical website.
1) share Search Hints by browser or search engine.
Each vertical website belongs to specific certain or certain several category of employment usually, can serve at category of employment intra-sharing Search Hints.The luxury goods of take shopping category of employment is example, and the treasure net belongs to the website of middle and small scale, and flow is little, and itself does not provide the Search Hints service; It cat luxury goods prefecture belongs to large-scale electric business website, reminder-data corresponding to the luxury goods classification itself provides.
Take by " Ai Mashi belt ", " perfume (or spice) how youngster bag " such search word is example, all belong to the luxury goods query demand, have a large amount of users every day retrieves in search engine, the present embodiment is designated as u_search by the inquiry temperature of search engine (being the temperature of the whole network), search engine is brought into to day flow in cat luxury goods prefecture and is designated as u_tmall, the flow that is brought into the treasure net is designated as u_zhenpin.Due to odjective causes such as website scales, the luxury goods demand that the treasure net is accepted can be little more a lot of than cat luxury goods prefecture, sky, for example in the treasure net, " Ai Mashi belt " such inquiry can very lowly even not have in treasure net temperature, this just causes for the Search Hints service data of treasure net making sparse, and the demand coverage is low.In the present embodiment, the reminder-data using the Search Hints data in cat luxury goods prefecture, sky as the luxury goods classification shares to the treasure net of classification of the same trade, promotes the coverage of treasure net Search Hints service and the search efficiency of information.
When the user accesses the treasure net in browser, as user search " Ai Mashi belt ", and character input one by one, it is the URL that waits to share reminder-data that browser can detect the URL that the treasure net is corresponding, obtains user's input data.Perhaps can in the search box by the vertical search engine in the corresponding page of treasure net, initiatively embed Search Hints service java script code, catch user's input data, then to the server request reminder-data.
For example: in the time of user's input " love ", input method can be gone query search prompting service with active user's input string and the url of current web page.The category of employment that judges current site by url is " luxury goods shopping " " shopping " etc., therefore can obtain the reminder-data suggestion1 of current url affiliated web site, and the reminder-data suggestion_i1 of lower other the vertical websites of the affiliated category of employment " luxury goods shopping " " shopping " of current url, by suggestion1 and suggestion_i1 are weighted to integration, generate in the drop-down list that prompting service data suggestion_r1 corresponding to a category of employment is corresponding at search box and show the user.As less as the scale of the corresponding website of current web page, do not there is reminder-data, the reminder-data suggestion1 of current url affiliated web site is empty set, directly according to the reminder-data suggestion_i1 of lower other the vertical websites of category of employment under current url " luxury goods shopping " " shopping ", in reminder-data that input character is the corresponding drop-down list corresponding at search box, is showed.
2) share Search Hints by input method.
When the user passes through the client-access treasure net of browser or other mobile browsers, as user search " Ai Mashi belt ", and Chinese character input one by one, the input method meeting current input string of recording user and input string on the screen, as the input data, are issued Search Hints server request reminder-data.
For example: when the user inputs " love " in the treasure net, input method can send to server with the url of active user's input string and current web page and be inquired about.The category of employment that judges current site by url is " luxury goods shopping " " shopping " etc., and obtain the reminder-data suggestion2 of current url affiliated web site, and the reminder-data suggestion_i2 of the lower similar website of the affiliated category of employment " luxury goods shopping " " shopping " of url, by suggestion2 and suggestion_i2 are weighted to integration, generate in a comprehensive prompting service data suggestion_r2 drop-down list that search box is corresponding in the treasure net and show the user.
In other embodiments, mobile phone application with search box possesses the function of browsing information, search or input, the application object that is equivalent to a mobile browser, mobile search engine or mobile input method, the mobile phone application that has search box also can be with the corresponding associated reminder-data of category of employment under the vertical website of form loading selected (mobile phone application itself) of browser, search engine or input method.
In addition, because input method, browser and search engine can record with the form of user account many users' input habit, user's behavioral data, therefore can provide the Search Hints service for the user more accurately.For example, the user has an everyday words " Emma ", and in this case, when user's input " ai ", the sequence of " Ai Mashi " is not even showed after may leaning on.For another example, input habit by digging user, find to be concerned about " perfume " series products in after a while, in this case when the user inputs " Ai Mashi ", server should preferentially be showed " Ai Mashi perfume " but not " Ai Mashi belt " convenient user's inquiry always.
To sum up, the incidence relation of embodiment of the present invention reminder-data corresponding with vertical website according to each the vertical website in search daily record crawl the whole network, thereby obtain reminder-data, and then, according to the classification of vertical website and the cluster of category of employment, determine the corresponding associated reminder-data of every profession and trade classification.Then the corresponding associated reminder-data of category of employment under obtaining selected vertical website in when search, realize sharing of reminder-data in category of employment, thereby obtain reminder-data accurately, accelerated the search efficiency of information.
The embodiment of the present invention is by definite historical query word and the corresponding vertical websites thereof such as daily record of search engine, thereby the click data by the user is screened the historical query word, then using the historical query word as reminder-data, for the Search Hints in vertical website provides the data basis, guaranteed the accuracy of reminder-data.
Embodiment tetra-
With reference to Fig. 3, provided reminder-data that the embodiment of the present invention four provides fixed system structural drawing really.
Accordingly, the embodiment of the present invention also provides a kind of reminder-data fixed system really, comprising: analysis module 31, division cluster module 32 and reminder-data determination module 33.
Analysis module 31, the search daily record of recording for Analysis server, the incidence relation of the reminder-data that each vertical website of crawl is corresponding with described vertical website in the whole network;
Divide and cluster module 32, for according to set category of employment, each vertical website of the whole network being divided, and by described category of employment, each vertical website is carried out to cluster, the incidence relation of the reminder-data corresponding with described vertical website according to each vertical website, the corresponding corresponding associated reminder-data of every profession and trade classification that obtains;
Load-on module 33, when being inputted in selected vertical website, load the corresponding associated reminder-data of category of employment under described selected vertical website according to input character.
In sum, the embodiment of the present invention can capture according to the search daily record incidence relation of the reminder-data that vertical website is corresponding with vertical website, obtain reminder-data, and then, according to the classification of vertical website and the cluster of category of employment, determine the corresponding associated reminder-data of every profession and trade classification.Then the corresponding associated reminder-data of category of employment under can obtaining selected vertical website in when search, realize sharing of reminder-data in category of employment in each vertical website, thereby can be enriched, accurate, various reminder-data, accelerated the search efficiency of information.
With reference to Fig. 4, provided the definite device preferred structure figure of the reminder-data that the embodiment of the present invention four provides.
Preferably, described analysis module comprises: capture submodule 311, the search daily record of recording for Analysis server, capture historical query word corresponding to each vertical website in the whole network; Set up submodule 312, for using described historical query word as reminder-data, set up the incidence relation of the reminder-data that each vertical website is corresponding with described vertical website.
Preferably, the described submodule 312 of setting up, for the click data at each vertical website User, described historical query word is screened, the described historical query word that screening is obtained, as reminder-data, is set up the incidence relation of the reminder-data that described each vertical website is corresponding with described vertical website.
Preferably, described division cluster module 32 comprise: classification submodule 321, for determining the class condition of set category of employment, and according to described class condition, each the vertical website in the whole network is divided; Cluster submodule 322, for the category of employment under definite each vertical website, carry out cluster by described category of employment to each vertical website.
Preferably, described classification submodule 321 comprises: feature language material acquiring unit, for determine the known vertical website of each category of employment by searching the category of employment list, and obtain the feature language material of described each known vertical website; The class condition determining unit, carry out model training for the feature language material to each known vertical website under same category of employment, determines the corresponding class condition of described category of employment.
Preferably, described class condition determining unit, comprising: proper vector is obtained subelement, for the feature language material to each known vertical website under same category of employment, carries out model training, obtains at least one proper vector that each known vertical website is corresponding; Class condition is determined subelement, at least one proper vector that described known vertical website is corresponding, as training data, determines the class condition of described category of employment.
Preferably, described load-on module 33 comprises: determine submodule 331, for the foundation input character, determine reminder-data corresponding to category of employment under described selected vertical website; Weighting also loads submodule 332, for reminder-data corresponding to category of employment under described selected vertical website is weighted, and reminder-data after the loading weighting.
Preferably, described system also comprises: similar website determination module, calculate the similarity between each vertical website for each the vertical website under same category of employment, and determine the similar website of each vertical website according to described similarity; Determine submodule 331, for determine respectively described selected reminder-data corresponding to vertical website according to described input character, with the reminder-data corresponding to similar website of described selected vertical website; Weighting also loads submodule 332, is weighted according to described similarity for the reminder-data that described similar website is corresponding, and is jointly sorted and load with the reminder-data under described selected vertical website.
Preferably, described load-on module 33, for carry out character when input by input method, described input method is called the reminder-data of category of employment under the described selected vertical website that input character is corresponding; Perhaps, while being inputted in selected vertical website, the search engine of described selected vertical website calls the reminder-data of category of employment under the described selected vertical website that input character is corresponding by service interface; Perhaps, in browser during the vertical website of loading selected, described browser calls the reminder-data of category of employment under the described selected vertical website that input character is corresponding by scripted code.
In sum, the incidence relation of embodiment of the present invention reminder-data corresponding with vertical website according to each the vertical website in search daily record crawl the whole network, thereby obtain reminder-data, and then, according to the classification of vertical website and the cluster of category of employment, determine the corresponding associated reminder-data of every profession and trade classification.Then the corresponding associated reminder-data of category of employment under obtaining selected vertical website in when search, realize sharing of reminder-data in category of employment, thereby obtain reminder-data accurately, accelerated the search efficiency of information.
The embodiment of the present invention is by definite historical query word and the corresponding vertical websites thereof such as daily record of search engine, thereby the click data by the user is screened the historical query word, then using the historical query word as reminder-data, for the Search Hints in vertical website provides the data basis, guaranteed the accuracy of reminder-data.
For system embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part gets final product referring to the part explanation of embodiment of the method.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is and the difference of other embodiment that between each embodiment, identical similar part is mutually referring to getting final product.
The present invention can describe in the general context of the computer executable instructions of being carried out by computing machine, for example program module.Usually, program module comprises the routine carrying out particular task or realize particular abstract data type, program, object, assembly, data structure etc.Also can in distributed computing environment, put into practice the present invention, in these distributed computing environment, be executed the task by the teleprocessing equipment be connected by communication network.In distributed computing environment, program module can be arranged in the local and remote computer-readable storage medium that comprises memory device.
Finally, also it should be noted that, in this article, relational terms such as the first and second grades only is used for an entity or operation are separated with another entity or operational zone, and not necessarily requires or imply between these entities or operation the relation of any this reality or sequentially of existing.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thereby make the process, method, commodity or the equipment that comprise a series of key elements not only comprise those key elements, but also comprise other key elements of clearly not listing, or also be included as the intrinsic key element of this process, method, commodity or equipment.In the situation that not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, commodity or the equipment that comprises described key element and also have other identical element.
Above definite method and system to a kind of reminder-data provided by the present invention, be described in detail, applied specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment is just for helping to understand method of the present invention and core concept thereof; , for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention simultaneously.

Claims (14)

1. definite method of a reminder-data, is characterized in that, comprising:
The search daily record of recording in Analysis server, the incidence relation of the reminder-data that each vertical website of crawl is corresponding with described vertical website in the whole network;
According to set category of employment, each the vertical website in the whole network is divided, and by described category of employment, each vertical website is carried out to cluster, the incidence relation of the reminder-data corresponding with described vertical website according to each vertical website, the corresponding corresponding associated reminder-data of every profession and trade classification that obtains;
While being inputted in selected vertical website, according to the corresponding associated reminder-data of category of employment under the described selected vertical website of input character loading.
2. method according to claim 1, is characterized in that, the search daily record of recording in described Analysis server, and the incidence relation of the reminder-data that each vertical website of crawl is corresponding with described vertical website in the whole network comprises:
The search daily record of recording in Analysis server, capture historical query word corresponding to each vertical website in the whole network;
Using described historical query word as reminder-data, set up the incidence relation of the reminder-data that each vertical website is corresponding with described vertical website.
3. method according to claim 2, is characterized in that, described using described historical query word as reminder-data, sets up the incidence relation of the reminder-data that each vertical website is corresponding with described vertical website, comprising:
In each vertical website, the click data of User is screened described historical query word, and the described historical query word that screening is obtained, as reminder-data, is set up the incidence relation of the reminder-data that described each vertical website is corresponding with described vertical website.
4. method according to claim 1, is characterized in that, describedly according to set category of employment, each the vertical website in the whole network divided, and by described category of employment, each vertical website is carried out to cluster, comprising:
Determine the class condition of set category of employment, and according to described class condition, each the vertical website in the whole network is divided;
Determine the category of employment that each vertical website is affiliated, by described category of employment, each vertical website is carried out to cluster.
5. method according to claim 4, is characterized in that, the described class condition of determining set category of employment comprises:
Determine the known vertical website in each category of employment by searching the category of employment list, and obtain the feature language material of each described known vertical website;
Feature language material to each known vertical website under same category of employment carries out model training, determines the corresponding class condition of described category of employment.
6. method according to claim 5, is characterized in that, the described feature language material to each known vertical website under same category of employment carries out model training, determines the corresponding class condition of described category of employment, comprising:
Feature language material to each known vertical website under same category of employment carries out model training, obtains at least one proper vector that each known vertical website is corresponding;
At least one proper vector that described known vertical website is corresponding, as training data, is determined the class condition of described category of employment.
7. method according to claim 1, is characterized in that, described according to the corresponding associated reminder-data of category of employment under the described selected vertical website of input character loading, comprising:
Determine reminder-data corresponding to category of employment under described selected vertical website according to input character;
Reminder-data corresponding to category of employment under described selected vertical website is weighted, and loads the reminder-data after weighting.
8. method according to claim 1, is characterized in that, after by category of employment, each vertical website being carried out to cluster, also comprises:
Calculate the similarity between each vertical website for each the vertical website under same category of employment, and determine the similar website of each vertical website according to described similarity;
, according to the corresponding associated reminder-data of category of employment under the described selected vertical website of input character loading, comprising:
Determine respectively described selected reminder-data corresponding to vertical website according to described input character, with the reminder-data corresponding to similar website of described selected vertical website;
The reminder-data that described similar website is corresponding is weighted according to described similarity, and the reminder-data corresponding with described selected vertical website jointly sorted and load.
9. according to claim 1 or 7 or 8 described methods, it is characterized in that, during the described reminder-data according to the corresponding association of category of employment under the described selected vertical website of input character loading,
While by input method, carrying out the character input, described input method is called the reminder-data of the affiliated category of employment of described selected vertical website that input character is corresponding;
Perhaps,
While being inputted in selected vertical website, the search engine of described selected vertical website calls the reminder-data of category of employment under the described selected vertical website that input character is corresponding by service interface;
Perhaps,
In browser during the vertical website of loading selected, described browser calls the reminder-data of category of employment under the described selected vertical website that input character is corresponding by scripted code.
10. reminder-data fixed system really, is characterized in that, comprising:
Analysis module, the search daily record of recording for Analysis server, the incidence relation of the reminder-data that each vertical website of crawl is corresponding with described vertical website in the whole network;
Divide and the cluster module, for according to set category of employment, each vertical website of the whole network being divided, and by described category of employment, each vertical website is carried out to cluster, the incidence relation of the reminder-data corresponding with described vertical website according to each vertical website, the corresponding corresponding associated reminder-data of every profession and trade classification that obtains;
Load-on module, when being inputted in selected vertical website, load the corresponding associated reminder-data of category of employment under described selected vertical website according to input character.
11. system according to claim 10, is characterized in that, described analysis module comprises:
Capture submodule, the search daily record of recording for Analysis server, capture historical query word corresponding to each vertical website in the whole network;
Set up submodule, for using described historical query word as reminder-data, set up the incidence relation of the reminder-data that each vertical website is corresponding with described vertical website.
12. system according to claim 10, is characterized in that, described division cluster module comprise:
The classification submodule, for determining the class condition of set category of employment, and divided each the vertical website in the whole network according to described class condition;
The cluster submodule, for the category of employment under definite each vertical website, carry out cluster by described category of employment to each vertical website.
13. system according to claim 10, is characterized in that, described load-on module comprises:
Determine submodule, for the foundation input character, determine under described selected vertical website corresponding reminder-data in category of employment;
Weighting also loads submodule, for reminder-data corresponding to category of employment under described selected vertical website is weighted, and loads the reminder-data after weighting.
14. system according to claim 10, is characterized in that, also comprises:
Similar website determination module, calculate the similarity between each vertical website for each the vertical website under same category of employment, and determine the similar website of each vertical website according to described similarity;
Described load-on module, comprising: reminder-data is determined submodule, for determine respectively described selected reminder-data corresponding to vertical website according to described input character, with the reminder-data corresponding to similar website of described selected vertical website; Weighting also loads submodule, is weighted according to described similarity for the reminder-data that described similar website is corresponding, and is jointly sorted and load with the reminder-data under described selected vertical website.
CN201310342104.6A 2013-08-07 2013-08-07 A kind of determination method and system pointing out data Active CN103425767B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310342104.6A CN103425767B (en) 2013-08-07 2013-08-07 A kind of determination method and system pointing out data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310342104.6A CN103425767B (en) 2013-08-07 2013-08-07 A kind of determination method and system pointing out data

Publications (2)

Publication Number Publication Date
CN103425767A true CN103425767A (en) 2013-12-04
CN103425767B CN103425767B (en) 2016-07-27

Family

ID=49650506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310342104.6A Active CN103425767B (en) 2013-08-07 2013-08-07 A kind of determination method and system pointing out data

Country Status (1)

Country Link
CN (1) CN103425767B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331434A (en) * 2014-10-22 2015-02-04 乐视网信息技术(北京)股份有限公司 Method for generating search prompt word service and device for generating search prompt word service
CN107665220A (en) * 2016-07-29 2018-02-06 苏宁云商集团股份有限公司 A kind of processing method and system for searching service
CN108846037A (en) * 2018-05-29 2018-11-20 天津字节跳动科技有限公司 The method and apparatus of prompting search word
CN110309253A (en) * 2018-03-01 2019-10-08 北京京东尚科信息技术有限公司 Selection method, apparatus and computer readable storage medium
CN113570404A (en) * 2021-06-30 2021-10-29 深圳市东信时代信息技术有限公司 Target user positioning method, device and related equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101241514A (en) * 2008-03-21 2008-08-13 北京搜狗科技发展有限公司 Method for creating error-correcting database, automatic error correcting method and system
US20080306830A1 (en) * 2007-06-07 2008-12-11 Cliquality, Llc System for rating quality of online visitors
CN101458713A (en) * 2008-12-29 2009-06-17 北京搜狗科技发展有限公司 Website classifying method and system
CN102651022A (en) * 2012-03-31 2012-08-29 奇智软件(北京)有限公司 Searching method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080306830A1 (en) * 2007-06-07 2008-12-11 Cliquality, Llc System for rating quality of online visitors
CN101241514A (en) * 2008-03-21 2008-08-13 北京搜狗科技发展有限公司 Method for creating error-correcting database, automatic error correcting method and system
CN101458713A (en) * 2008-12-29 2009-06-17 北京搜狗科技发展有限公司 Website classifying method and system
CN102651022A (en) * 2012-03-31 2012-08-29 奇智软件(北京)有限公司 Searching method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331434A (en) * 2014-10-22 2015-02-04 乐视网信息技术(北京)股份有限公司 Method for generating search prompt word service and device for generating search prompt word service
CN107665220A (en) * 2016-07-29 2018-02-06 苏宁云商集团股份有限公司 A kind of processing method and system for searching service
CN110309253A (en) * 2018-03-01 2019-10-08 北京京东尚科信息技术有限公司 Selection method, apparatus and computer readable storage medium
CN108846037A (en) * 2018-05-29 2018-11-20 天津字节跳动科技有限公司 The method and apparatus of prompting search word
CN108846037B (en) * 2018-05-29 2021-12-10 天津字节跳动科技有限公司 Method and device for prompting search terms
CN113570404A (en) * 2021-06-30 2021-10-29 深圳市东信时代信息技术有限公司 Target user positioning method, device and related equipment
CN113570404B (en) * 2021-06-30 2023-12-05 深圳市东信时代信息技术有限公司 Target user positioning method, device and related equipment

Also Published As

Publication number Publication date
CN103425767B (en) 2016-07-27

Similar Documents

Publication Publication Date Title
JP6814298B2 (en) Methods and equipment for warning
CN102822815B (en) For the method and system utilizing browser history to carry out action suggestion
US8533141B2 (en) Systems and methods for rule based inclusion of pixel retargeting in campaign management
CN103778548B (en) Merchandise news and key word matching method, merchandise news put-on method and device
CN108885624B (en) Information recommendation system and method
US8886583B2 (en) Recommendation information evaluation apparatus using support vector machine with relative dissatisfactory feature vectors and satisfactory feature vectors
CN102999586B (en) A kind of method and apparatus of recommendation of websites
CN102609474B (en) A kind of visit information supplying method and system
CN108228873A (en) Object recommendation, publication content delivery method, device, storage medium and equipment
US10116730B2 (en) Processing method, computer devices, computer system including such devices, and related computer program
CN110597962B (en) Search result display method and device, medium and electronic equipment
CN105677780A (en) Scalable user intent mining method and system thereof
CN102930054A (en) Data search method and data search system
CN103118111A (en) Information push method based on data from a plurality of data interaction centers
CN105718533A (en) Information pushing method and device
CN103425767B (en) A kind of determination method and system pointing out data
CN111967914A (en) User portrait based recommendation method and device, computer equipment and storage medium
CN104751354A (en) Advertisement cluster screening method
CN103713894A (en) Method and equipment for determining access demand information of user
KR20190031536A (en) Application Information Triggering
CN103745380A (en) Advertisement delivery method and apparatus
CN114329207A (en) Multi-service information sequencing system, method, storage medium and electronic equipment
CN101211368A (en) Method for classifying search term, device and search engine system
CN104573120A (en) Recommendation information obtaining method and device for terminal
JP5100855B2 (en) Latent class analyzer, latent class analyzing method and program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant