WO2015124024A1 - Method and device for promoting exposure rate of information, method and device for determining value of search word - Google Patents

Method and device for promoting exposure rate of information, method and device for determining value of search word Download PDF

Info

Publication number
WO2015124024A1
WO2015124024A1 PCT/CN2014/094298 CN2014094298W WO2015124024A1 WO 2015124024 A1 WO2015124024 A1 WO 2015124024A1 CN 2014094298 W CN2014094298 W CN 2014094298W WO 2015124024 A1 WO2015124024 A1 WO 2015124024A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
presentation
search
information
data
Prior art date
Application number
PCT/CN2014/094298
Other languages
French (fr)
Chinese (zh)
Inventor
王超
邓钦华
许晟
Original Assignee
北京奇虎科技有限公司
奇智软件(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201410063058.0A external-priority patent/CN104866493B/en
Priority claimed from CN201410098737.1A external-priority patent/CN104933047B/en
Application filed by 北京奇虎科技有限公司, 奇智软件(北京)有限公司 filed Critical 北京奇虎科技有限公司
Publication of WO2015124024A1 publication Critical patent/WO2015124024A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising

Definitions

  • the present invention relates to the field of computer technology, and more particularly to a method and apparatus for increasing the exposure of information, and a method and apparatus for determining the value of a search term.
  • the present invention proposes a solution for improving the exposure rate of information.
  • the present invention mainly solves the following problems in view of the shortcomings of the existing solutions:
  • the dynamic adjustment of the opportunity is presented, and the information exposure rate is increased under the premise of ensuring the minimum loss.
  • the present invention provides a method of increasing exposure of information, comprising: determining whether to perform an exposure promotion process for information related to a received query term; if so, checking historical query data associated with the query term Whether the query frequency is greater than or equal to the first threshold; if yes, determining the candidate information based on the historical presentation data related to the query word; based on the basic data of all candidate information and the historical presentation data of the information group to which all candidate information belongs, Estimate the quality of presentation of all candidate information And selecting candidate information having the predicted highest presentation quality parameter as recommendation information to the candidate presentation queue to perform overall presentation contention processing with the estimated highest presentation quality parameter as the presentation quality parameter of the recommendation information.
  • the present invention also provides an apparatus for improving the exposure rate of information, comprising: a first determining module, configured to determine whether to perform an exposure rate improvement process on undisplayed information related to the received query word; and an inspection module, a second determining module, configured to check whether the query frequency of the historical query data associated with the query word is greater than or equal to a first threshold; and the second determining module, configured to determine candidate information based on historical presentation data related to the query term; A prediction quality parameter for estimating the quality of all candidate information based on the basic data of all the candidate information and the historical presentation data of the information group to which all candidate information belongs, and a recommendation module for using the candidate information having the highest predicted quality parameter as the prediction The recommendation information is recommended to the candidate presentation queue to perform overall presentation contention processing with the estimated highest presentation quality parameter as the presentation quality parameter of the recommendation information.
  • a method for determining a value of a search term comprising: inputting feature data of a search term to be tested into a value regression model; and acquiring the search term to be tested based on a value regression model Value data. ;
  • the value regression model is obtained by clustering existing search words based on click relationship data and/or presentation relationship data to obtain a clustered search word set; classifying the search word set into A collection of search terms of different values; model training using different sets of search terms to obtain a value regression model.
  • an apparatus for determining a value of a search term comprising: an input module, configured to input feature data of a search term to be tested into a value regression model; and an acquisition module, configured to The value regression model obtains the value data of the search term to be tested; wherein the value regression model is obtained by the following module: a clustering module, configured to use the existing search term based on the click relationship data and/or the presentation relationship data Clustering is performed to obtain a clustered search word set; a classification module is used to classify the search word set into different value search word sets; and a model acquisition module is used to perform model training using different value search word sets. Get a value regression model.
  • a computer program comprising computer readable code, a method of causing an exposure of the enhanced information and determining a search term when the electronic device runs the computer readable code The method of value is implemented.
  • a computer readable medium storing a computer program as described above is provided.
  • the technical solution of the method and apparatus for improving the exposure of information according to the present invention has the following beneficial effects: dynamically adjusting the presentation opportunity according to historical presentation data, and increasing information exposure under the premise of ensuring minimum loss. Rate, while giving more opportunities to show in unit time The same information and improved user experience.
  • the value of the search term can be more accurately determined and the valuable data information (such as an advertisement) can be selected based on the search term value data to improve the user experience and improve the information click rate. , improve information exposure.
  • FIG. 1 is a flow chart of a method of increasing exposure of information according to an embodiment of the present invention
  • FIG. 2 is a structural diagram of an apparatus for increasing exposure of information according to an embodiment of the present invention.
  • FIG. 3 shows a flow chart of a method of obtaining a value regression model in accordance with one embodiment of the present invention
  • FIG. 4 shows a flow chart of a method of determining the value of a search term in accordance with one embodiment of the present invention
  • FIG. 5 is a block diagram showing an apparatus for determining a value of a search term according to an embodiment of the present invention
  • Figure 6 shows a block diagram of an electronic device for performing the method of the present invention
  • Figure 7 shows a schematic diagram of a memory unit for holding or carrying program code implementing a method in accordance with the present invention.
  • FIG. 1 is a flowchart of a method of increasing exposure of information according to an embodiment of the present invention.
  • step S110 it is determined whether or not the exposure rate promotion processing is performed for the unexpressed information related to the received query word.
  • the method of the present invention first needs to determine whether to perform an exposure promotion process for the unexpressed information related to the received query word. Can also target Each time the query request is executed, the exposure improvement process is performed, but such a realization is less efficient, that is, it may give a reluctant expectation that a large result shows too much opportunity, thereby reducing the efficiency of the entire system. Therefore, a determination can be made for each query request, and the ratio of the exposure improvement processing required in the system is controlled within a range, for example, the ratio control of the query request selected to participate in the exposure promotion processing and the total query request Not more than 5%. It should be understood that this ratio can be adjusted as needed.
  • the server has a historical query database storing historical query data, and the historical query data in the database is used to provide historical request information of each query word, so that whether the related query term is not displayed may be obtained.
  • the past information performs the adjustment parameters of the exposure improvement process.
  • acquiring an adjustment parameter based on the historical request data related to the query, acquiring an adjustment parameter; determining, based on the adjustment parameter and the random number generated by the system, whether to perform an exposure rate on the undisplayed information related to the received query.
  • an adjustment parameter is acquired based on historical query data related to the query.
  • the benchmark of the adjustment parameter can be set to 1.0, and based on the benchmark 1.0, according to the historical request data (which may be, but not limited to, the frequency of the query and the click rate, etc.), it may be, but is not limited to, using the following formula to adjust the pre- Estimate to get the adjustment parameters:
  • Adjustment parameter 1.0 + alpha * click rate + beta * log (gama / frequency) formula 1
  • the exposure rate enhancement processing is performed for the unexpressed information related to the received query.
  • the random number can be generated using a uniform distribution (for example, a target ratio of 5%, that is, a uniform distribution with a parameter of 20), so that its final result satisfies the target ratio (5%) described above.
  • a uniform distribution for example, a target ratio of 5%, that is, a uniform distribution with a parameter of 20
  • random numbers can also be generated in other ways.
  • Judgment Threshold Target Ratio * Adjustment Parameters.
  • the determination threshold is greater than or equal to 5%, it is determined that the exposure rate promotion process is performed for the undisplayed information related to the received query; when the determination threshold is less than 5%, it is determined that the related query is not related to the received query.
  • the information that has been presented does not perform the exposure enhancement process. It should be understood that other thresholds may be selected as desired without being limited to the specific threshold values described above.
  • the information that needs to be presented may be advertising information, and it may be determined whether The received advertisement information related to the query word is not subjected to the exposure improvement processing. That is to say, firstly, based on the historical request data related to the query, obtaining an adjustment parameter; then, based on the adjustment parameter and the random number generated by the system, determining whether the unrelated item related to the received query is not displayed
  • the advertising information performs an exposure improvement process.
  • the information may include at least one of the following: information indicating that the number of times is below a predetermined value, information of a predetermined area, information of a predetermined time period.
  • the information may be advertisement information with a number of presentations of less than 10 times, advertisement information of Beijing, and the like. It should be understood that the information of the present invention may also be other types of information.
  • the exposure promotion process is performed for the undisplayed information associated with the received query term. If it is determined at step 110 that the exposure promotion process is performed for the undisplayed information associated with the received query term, then at step 120, it is checked if the query frequency of the historical query data associated with the query term is greater than Equal to the first threshold.
  • the first threshold can be, for example, two per hour.
  • the frequency of querying historical query data associated with the query term is checked. If the frequency of the query is high, for example, greater than or equal to twice per hour, the method proceeds to step 130. If the frequency of the query is low, for example less than twice per hour, the method ends.
  • the first threshold is not limited to the above values, but any suitable value may be selected as the first threshold as needed.
  • candidate information is determined based on historical presentation data associated with the query term.
  • the historical presentation data related to the query word is searched, and the historical presentation data associated with the query word whose number of days of presentation is less than the second threshold is determined.
  • the second threshold can be 10 times. That is, it is said that the history presentation data associated with the query word is searched for and determined less than 10 times per day, and the information corresponding to the found history presentation data is determined as the candidate information.
  • historical presentation data associated with the query term that is displayed at other time levels less than a certain threshold may also be determined. For example, the weekly (seven days) presentation times are less than 70 times, or the 60 hour presentation times are less than 25 times, and so on.
  • the number of days of presentation of the information ID "A1234123”, the information ID "A1231312”, and the information ID “A1343141” is less than the second threshold (for example, 10 times), These three pieces of information are thus determined as candidate information.
  • the presentation quality parameters of all candidate information are estimated based on the basic data of all candidate information and the historical presentation data of the information group to which all candidate information belongs.
  • the presentation quality parameters of all candidate information are estimated based on the basic data of all candidate information and the historical presentation data of the information group to which all candidate information belongs.
  • the information ID "A1234123” and the information ID "A1231312” belong to one information group "G111223"
  • the information ID "A1343141” belongs to another information group "G222121”.
  • Querying various basic data of the above three candidate information, and querying historical presentation data of other same group information IDs in the above two information groups to which they belong, and the historical presentation data of each information ID includes presentation quality parameters of each information ID, The highest presentation quality parameter from each of the groups is used as an estimated presentation quality parameter for candidate information in the group.
  • the highest presentation quality parameter a (the presentation quality parameter of the information A) in the information group "G111223” is taken as the estimated presentation quality parameter a of the candidate information (information ID "A1234123", information ID "A1231312”).
  • the highest presentation quality parameter b (the presentation quality parameter of the information b) in the information group "G222121” is taken as the estimated presentation quality parameter b of the candidate information (information ID "A1343141").
  • the candidate information having the predicted highest presentation quality parameter is recommended as recommendation information to the candidate presentation queue to perform overall presentation with the estimated highest presentation quality parameter as the presentation quality parameter of the recommendation information.
  • Competitive processing
  • the candidate information having the predicted highest presentation quality parameter (parameter a) is recommended as recommendation information into the candidate presentation queue.
  • the information ID "A1234123” and the information ID "A1231312" of the candidate information having the predicted highest presentation quality parameter a can be recommended into the candidate presentation queue.
  • only one of the plurality of candidate information may be recommended to the candidate presentation queue at a time by polling.
  • the overall presentation of the competition process is the selection of candidate results based on the ranking.
  • search advertisement information As an example, first, the advertisement information that enters the overall competition processing is scored and sorted according to a predetermined rule, for example, according to the scores from large to small.
  • sort score ad creative quality * keyword bid price.
  • the advertising information is presented. However, not all ads will be shown. For example, search ads are generally 3 on the left and 8 on the right.
  • FIG. 2 is a structural diagram 200 of an apparatus for increasing the exposure rate of information according to an embodiment of the present invention.
  • the device 200 may include: a first determining module 210, an checking module 220, a second determining module 230, The estimation module 240 and the recommendation module 250.
  • the first determining module 210 may be configured to determine whether to perform an exposure promotion process for the undisplayed information related to the received query word.
  • the first determining module 210 may further include: an obtaining submodule 211 and a first determining submodule 212.
  • the obtaining sub-module 211 can be configured to obtain an adjustment parameter based on the historical query data related to the query term, and the first determining sub-module 212 can be configured to determine whether to target the based on the adjusted parameter and the random number generated by the system.
  • the received information related to the query word is not subjected to the exposure improvement process.
  • the checking module 220 can be configured to check whether the query frequency of the historical query data associated with the query word is greater than or equal to a first threshold.
  • the checking module 220 may be further configured to abandon the exposure rate promotion process if the query frequency of the historical query data associated with the query term is less than the first threshold.
  • the first threshold can be, for example, two per hour.
  • the second determining module 230 can be configured to determine candidate information based on historical presentation data related to the query term.
  • the second determining module 230 may further include: a searching submodule 231 and a second determining submodule 232.
  • the searching sub-module 231 can be configured to search for historical presentation data associated with the query word whose number of days of presentation is less than a second threshold; and the second determining sub-module 232 can be configured to correspond to the searched historical presentation data.
  • the information is determined as candidate information.
  • the estimation module 240 can be configured to estimate the presentation quality parameters of all candidate information based on the basic data of all candidate information and the historical presentation data of the information group to which all candidate information belongs.
  • the recommendation module 250 may be configured to recommend candidate information having the predicted highest presentation quality parameter as recommendation information to the candidate presentation queue to perform overall presentation contention processing with the estimated highest presentation quality parameter as the presentation quality parameter of the recommendation information.
  • the apparatus 200 may further include: a presentation quality parameter determination module (not shown), the module may be configured to: if the recommendation information is obtained in the overall presentation competition process, the recommendation information The presentation quality parameter obtained in the presentation process is determined as an initial presentation quality parameter of the recommendation information.
  • a presentation quality parameter determination module (not shown) the module may be configured to: if the recommendation information is obtained in the overall presentation competition process, the recommendation information The presentation quality parameter obtained in the presentation process is determined as an initial presentation quality parameter of the recommendation information.
  • a method for determining the value of a search term mainly includes the following steps:
  • Step 1 Count the number of ad impressions and ad clicks for all search terms in the ad impression log. the amount;
  • Step 3 if the search term click rate is less than a threshold and the number of advertisement presentations is greater than a threshold, the search term is low value; conversely, if the search term click rate is greater than a threshold and the number of advertisements is greater than a threshold, then The search terms are of high value.
  • the specific examples are as follows: for example, the threshold of the search term click rate is 5%, the threshold of the search term exhibiting threshold is 50; and the search term "prose of the sunset" is 100, and the number of clicks is 1, the word is low value.
  • the search term "laptop" ad shows 10,000 times and the number of clicks is 1000, the word is high value.
  • the implementation it is necessary to manually specify the search term click rate threshold and the search term display threshold, and the effect depends greatly on the worker's experience; and the implementation can only judge whether the value is high or low, and cannot give a value.
  • the specific value is not smooth enough in practical applications; moreover, the implementation mainly comes from statistics, so the promotion is poor, the coverage rate is relatively low, and the accuracy rate also has room for improvement, which cannot fully meet the needs of the search advertising system.
  • FIG. 3 is a flow chart of a method of obtaining a value regression model in accordance with one embodiment of the present invention.
  • the existing search words are clustered based on the click relationship data and/or the presentation relationship data to obtain a clustered search word set.
  • the number of common presentations of different search terms can be obtained and the presentation relationship data can be calculated based on the number of common presentations.
  • a certain search word is Q1, and the data displayed by the search engine based on the search word is D1, D2, D3, D4; and another search word input is Q2, and the search word is displayed by the search engine based on the search word.
  • the data is D2, D3, D5, D7, then their common presentation times are 2 (D2, D3); at this point, some correlation can be used to describe the relationship between Q1 and Q2, for example, this correlation can be assumed.
  • the correlation may also be defined as the number of presentations of the common presentation number / Q2 or the number of common presentations / (the number of presentations of Q1 + the number of presentations of Q2) and the like.
  • the presentation relationship data between the search terms can be obtained.
  • a certain search word is Q1
  • the data that is displayed by the search engine and clicked by the user based on the search word is D1, D2, D3, D4; and another search word that is input is Q2, based on the search word.
  • the data displayed by the search engine and clicked by the user is D2, D3, D4, D7, then their common clicks are 3 (D2, D3, D4); at this time, a correlation can be used to describe between Q1 and Q2.
  • click relationship data between the search terms can be obtained.
  • the correlation may be defined as the number of clicks of the common click/Q2 or the number of common clicks/(the number of clicks of Q1 + the number of clicks of Q2) and the like.
  • the number of common clicks, the number of joint presentations, the click relationship data, and the presentation relationship data respectively represent the number of common clicks, the number of joint presentations, the click relationship data, and the presentation relationship data between the two search words. That is to say, the above parameters refer to the correlation parameters between the two search terms.
  • the calculated relationship may be calculated based on at least one of the click relationship data, the presentation relationship data, the common presentation count, and the common click count.
  • the presentation data of Q1 is expressed as ⁇ D1, D2, D3, D4>
  • the presentation data of Q1 is represented as ⁇ D2, D3, D5, D7>
  • the Q1 and Q2 search are calculated using the clustering algorithm.
  • the clustering distance between words Through a similar method, the clustering distance of all the search words is calculated, thereby realizing the clustering of the search words.
  • the clustering distance between the search terms may be calculated based on at least one of the click relationship data, the presentation relationship data, the common click count, and the common presentation times, using a spectral clustering or kmeans clustering algorithm, thereby implementing the search term Clustering, and thus obtaining a clustered set of search terms.
  • the set of search words is classified into a set of search words of different values.
  • all collections can be classified into a predetermined number of collections of search terms.
  • the collection may be classified into three categories: a high value search word set, a medium value search word set, and a low value search word set, wherein the high value search term
  • the value data of the search words in the set is greater than the value data of the search words in the set of search words of the medium value; and the value data of the search words in the set of search words of the medium value is greater than the value of the search words in the set of search words of the low value according to.
  • All collections of search terms are classified into a set of search terms for a predetermined number according to certain rules.
  • log data has been utilized to predetermine its value data.
  • the value of the search term can be measured by the value brought by the search in a thousand times, which reflects the profitability of the search term in the unit search, that is, its value.
  • the value data of the search term can be obtained, and each search term is determined to be, for example, three levels of high, medium, and low according to the value data distribution.
  • the aggregated value data of the clustered search word set can be obtained.
  • the clustered search word set can be assigned as a collection of search words of different values.
  • search term can be divided into more grades or fewer grades
  • set of search words can also be divided into more grades or fewer grades.
  • model training is performed using a set of search words of different values to obtain a value regression model.
  • the model training is carried out using a set of search words of different values, and finally the value regression model is obtained.
  • each search word in each search term set can be used as a sample of value data corresponding to the set of search words, specifically, taking the above example, each of the high value search word sets
  • the search term is used as a one-sample, one-sample search term in the two-sample, medium-value search term as a one-sample and each search term in the low-value search term set is trained as a zero-sample using a logistic regression algorithm.
  • the value regression model is formed.
  • the search words in cluster 1 are, for example, “laptop”, “mac air”, “thinkpad”, etc., and the commercial value is marked as 1 (higher business) Value);
  • the search words in cluster 2 are "Andy Lau”, “Zhang Xueyou”, “Andy Lau's album”, etc., the commercial value is marked as 0 (low business value);
  • the search word in cluster 3 is "5 inch mobile phone has How big is it, "Is the android phone smooth?", and the commercial value is marked as 0.5 (medium business value). That is to say, the parameters of the value regression model are obtained through training, so that the value data of the search term is predicted by using the value regression model.
  • FIG. 4 is a flow chart of a method of determining the value of a search term in accordance with an embodiment of the present invention.
  • the feature data of the search term to be tested is input to the value regression model.
  • the value regression model established by the method shown in FIG. 3
  • the parameters of the value regression model have been obtained through the model training shown in Fig. 3.
  • the feature data of the search term to be tested is input into the model.
  • the feature data of the search term may include, for example, but is not limited to, the length of the search term, the category of the search term, the result of the search term segmentation, and the like.
  • the search words in cluster 1 are, for example, “laptop”, “mac air”, “thinkpad”, etc., and the commercial value is marked as 1 ( Higher business value);
  • the search terms in cluster 2 are "Andy Lau”, “Zhang Xueyou”, “Andy Lau's album”, etc., the commercial value is marked as 0 (low business value);
  • the search word in cluster 3 is "5 inch” How big is the mobile phone, "Whether the android phone is smooth", etc., the commercial value is marked as 0.5 (medium business value).
  • the feature data of the search term "Toshiba Notebook" to be tested is input into a value regression model.
  • step S420 based on the value regression model, the value data of the search term to be tested is obtained.
  • the feature data of the search term "Toshiba notebook” is input into the value regression model, and the value data that the trained model will give to the "Toshiba notebook” is, for example, 0.8 (a number greater than 0.5 and less than or equal to 1) ).
  • the value data of the search term "Li Lianjie” obtained is, for example, 0.1 (a number less than 0.5 greater than 0).
  • FIG. 5 is a block diagram showing the structure of an apparatus 500 for determining the value of a search term according to an embodiment of the present invention.
  • Apparatus 500 can include an input module 510 and an acquisition module 520.
  • the input module 510 can be used to input the search term to be tested into a value regression model.
  • the obtaining module 530 can be configured to obtain value data of the search term to be tested based on a value regression model.
  • the value regression model can be obtained by the following module:
  • a clustering module (not shown), which can be used to cluster existing search words based on click relationship data and/or presentation relationship data to obtain a clustered search word set;
  • a classification module (not shown) that can be used to classify a collection of search terms into a collection of search terms of different values
  • a model acquisition module (not shown) that can be used to model training with a collection of search terms of different values to obtain a value regression model.
  • the set of search words of different values may include a high value search word set, a medium value search word set, and a low value search word set, wherein the value data of the search word in the high value search word set The value data of the search term in the set of search words greater than the medium value; and the value data of the search term in the set of search words of the medium value is greater than the value data of the search term in the set of search words of the low value.
  • the value data of the search word in the high-value search word set is 1.
  • the value data of the search word in the set of the search value of the medium value is 0.5
  • the value data of the search word in the low-value search word set is 0.
  • the clustering module may further include a relational data acquisition sub-module, a calculation sub-module, and an acquisition sub-module.
  • the relationship data obtaining sub-module may be configured to obtain a common click count of different search terms and calculate a click relationship data and/or a common presentation number of different search words based on the common click times, and calculate the presentation relationship data based on the common presentation times.
  • the calculating submodule may be configured to calculate a clustering distance between the existing search words based on at least one of the click relationship data, the presentation relationship data, the common presentation times, and the common click times;
  • the obtaining sub-module may be configured to cluster existing search words based on the cluster distance to obtain a clustered search word set.
  • the common clicks, the common presentation times, the click relationship data, and the presentation relationship data respectively represent the number of common clicks, the common presentation times, the click relationship data, and the presentation relationship data between the two search words.
  • the model acquisition module may be further configured to:
  • Each search word in the high-value search word set is used as a one-sample and one-value search word set in each of the low-value search word sets as a one-sample, medium-value search word set.
  • a zero sample is trained using the logistic regression algorithm to form the value regression model.
  • modules in the client in the embodiment can be adaptively changed and placed in one or more clients different from the embodiment.
  • the modules in the embodiments can be combined into one module, and further they can be divided into a plurality of sub-modules or sub-units or sub-components.
  • any combination of the features disclosed in the specification, including the accompanying claims, the abstract and the drawings, and any methods so disclosed, or All processes or units of the client are combined.
  • Each feature disclosed in this specification may be replaced by alternative features that provide the same, equivalent or similar purpose.
  • the various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof.
  • a microprocessor or digital signal processor may be used in practice to implement some of the means for increasing the exposure of information and the means for determining the value of a search term in accordance with an embodiment of the present invention. Or some or all of the features of all components.
  • the invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein.
  • Such a program implementing the invention may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
  • FIG. 6 illustrates an electronic device that can implement the method of increasing the exposure of information of the present invention and a method of determining the value of a search term.
  • the electronic device conventionally includes a processor 610 and a computer program product or computer readable medium in the form of a memory 620.
  • the memory 620 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM.
  • Memory 620 has a memory space 630 for program code 631 for performing any of the method steps described above.
  • storage space 630 for program code may include various program code 631 for implementing various steps in the above methods, respectively.
  • the program code can be read from or written to one or more computer program products.
  • These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks.
  • Such a computer program product is typically a portable or fixed storage unit as described with reference to FIG.
  • the storage unit may have a storage section or a storage space or the like arranged similarly to the storage 620 in the electronic device of FIG.
  • the program code can be compressed, for example, in an appropriate form.
  • the storage unit comprises a program 631' for performing the steps of the method according to the invention, ie a code readable by a processor, such as 610, which, when executed by the electronic device, causes the electronic device to perform the above Each step in the described method.

Abstract

A method and device for promoting the exposure rate of information, and a method and device for determining a value of a search word. The method for promoting the exposure rate of information comprises: determining whether to execute exposure rate promotion processing on information related to a received query word (S110); if so, checking whether the query frequency of historical query data associated with the query word is greater than or equal to a first threshold value (S120); if so, based on historical presentation data related to the query word, determining candidate information (S130); based on basic data of all candidate information and historical presentation data of an information group to which all the pieces of candidate information belong, pre-estimating presentation quality parameters of all the pieces of candidate information (S140); and recommending candidate information with the highest pre-estimated presentation quality parameter as recommended information to a candidate presentation queue, so that the pre-estimated highest presentation quality parameter is used as a presentation quality parameter of the recommended information to conduct overall presentation competition processing (S150). The exposure rate of information is improved, more presentation opportunities are given to different information in unit time, and the user experience is improved.

Description

一种提升信息的曝光率的方法和装置、确定搜索词的价值的方法和装置Method and device for improving exposure of information, method and device for determining value of search word 技术领域Technical field
本发明涉及计算机技术领域,更具体地涉及一种提升信息的曝光率的方法和装置、确定搜索词的价值的方法和装置。The present invention relates to the field of computer technology, and more particularly to a method and apparatus for increasing the exposure of information, and a method and apparatus for determining the value of a search term.
背景技术Background technique
随着互联网业务的发展,在互联网上出现越来越多的各类业务,例如广告信息业务等等。对于互联网上的信息业务而言,信息的曝光或展现是信息主(例如,广告主)实现广告信息效果的基本保证,是搜索信息主定制创意和竞争价格的主要目的,也是信息主实现价值的基础。但是在实际的信息竞价系统设计中,需要同时考虑效率和公平的平衡,因为本质上公平是对潜力的认可,必将对未来的效率带来提升。在现实中的问题是,很多信息主定制的创意在展现表现上差异很大。一方面,一些全新设计的、预期收益效率更好的创意得不到有效展现,另一方面,一些曾经得到展现的、但是收益效率在随着时间下降的信息却在系统中不断被展现,这样对于效率和收益的最大化都带来负面影响。With the development of the Internet business, more and more types of services appear on the Internet, such as advertising information services. For the information service on the Internet, the exposure or presentation of information is the basic guarantee for the information owner (for example, the advertiser) to achieve the effect of advertising information, the main purpose of searching for information custom creative and competitive price, and the value of the information master. basis. However, in the actual design of information bidding system, it is necessary to consider the balance between efficiency and fairness, because in essence, fairness is the recognition of potential and will definitely improve the efficiency of the future. The problem in reality is that many of the ideas of the main customization of the information vary greatly in performance. On the one hand, some newly designed and expected revenue-efficient ideas are not effectively displayed. On the other hand, some information that has been revealed but whose revenue efficiency has declined over time is continuously displayed in the system. There is a negative impact on maximizing efficiency and profitability.
针对互联网的信息业务的上述问题,本发明提出了一种提升信息的曝光率的解决方案,针对现有方案的缺点,本发明主要解决以下几个问题:In view of the above problems of the information service of the Internet, the present invention proposes a solution for improving the exposure rate of information. The present invention mainly solves the following problems in view of the shortcomings of the existing solutions:
能够兼顾效率和公平,在单位时间内变换出更多的展现机会给不同的信息,避免一枝独秀;Being able to balance efficiency and fairness, transforming more opportunities for presentation to different information in a unit of time, avoiding outrageous;
通过提供更多选择空间,通过信息的多样性创造更多元的用户体验,避免重复性疲劳带来的转化率降低;并且By providing more choices, creating a more user experience through the diversity of information, avoiding the conversion rate reduction caused by repetitive fatigue;
根据历史展现数据,对展现机会做动态调整,在保证最小损失的前提下,增加信息曝光率。According to the historical data, the dynamic adjustment of the opportunity is presented, and the information exposure rate is increased under the premise of ensuring the minimum loss.
发明内容Summary of the invention
为了提升未被展现过的信息的曝光率,本发明的主要目的在于提供一种提升信息的曝光率的方法和装置、确定搜索词的价值的方法和装置、计算机程序以及计算机可读介质。In order to increase the exposure of unexpressed information, it is a primary object of the present invention to provide a method and apparatus for increasing the exposure of information, a method and apparatus for determining the value of a search term, a computer program, and a computer readable medium.
本发明提供了一种提升信息的曝光率的方法,包括:确定是否针对与接收到的查询词相关的信息执行曝光率提升处理;如果是,则检查与该查询词相关联的历史查询数据的查询频率是否大于等于第一阈值;如果是,则基于与所述查询词相关的历史展现数据,确定候选信息;基于所有候选信息的基本数据及所有候选信息所属的信息组的历史展现数据,预估所有候选信息的展现质量 参数;以及将具有预估的最高展现质量参数的候选信息作为推荐信息推荐到候选展现队列,以便以该预估的最高展现质量参数作为该推荐信息的展现质量参数进行整体展现竞争处理。The present invention provides a method of increasing exposure of information, comprising: determining whether to perform an exposure promotion process for information related to a received query term; if so, checking historical query data associated with the query term Whether the query frequency is greater than or equal to the first threshold; if yes, determining the candidate information based on the historical presentation data related to the query word; based on the basic data of all candidate information and the historical presentation data of the information group to which all candidate information belongs, Estimate the quality of presentation of all candidate information And selecting candidate information having the predicted highest presentation quality parameter as recommendation information to the candidate presentation queue to perform overall presentation contention processing with the estimated highest presentation quality parameter as the presentation quality parameter of the recommendation information.
本发明还提供了一种提升信息的曝光率的装置,包括:第一确定模块,用于确定是否针对与接收到的查询词相关的未被展现过的信息执行曝光率提升处理;检查模块,用于检查与该查询词相关联的历史查询数据的查询频率是否大于等于第一阈值;第二确定模块,用于基于与所述查询词相关的历史展现数据,确定候选信息;预估模块,用于基于所有候选信息的基本数据及所有候选信息所属的信息组的历史展现数据,预估所有候选信息的展现质量参数;推荐模块,用于将具有预估的最高展现质量参数的候选信息作为推荐信息推荐到候选展现队列以便以该预估的最高展现质量参数作为该推荐信息的展现质量参数进行整体展现竞争处理。The present invention also provides an apparatus for improving the exposure rate of information, comprising: a first determining module, configured to determine whether to perform an exposure rate improvement process on undisplayed information related to the received query word; and an inspection module, a second determining module, configured to check whether the query frequency of the historical query data associated with the query word is greater than or equal to a first threshold; and the second determining module, configured to determine candidate information based on historical presentation data related to the query term; A prediction quality parameter for estimating the quality of all candidate information based on the basic data of all the candidate information and the historical presentation data of the information group to which all candidate information belongs, and a recommendation module for using the candidate information having the highest predicted quality parameter as the prediction The recommendation information is recommended to the candidate presentation queue to perform overall presentation contention processing with the estimated highest presentation quality parameter as the presentation quality parameter of the recommendation information.
依据本发明的一个方面,提供了一种确定搜索词的价值的方法,其特征在于,包括:将待测搜索词的特征数据输入价值回归模型;基于价值回归模型,获取所述待测搜索词的价值数据。;According to an aspect of the present invention, a method for determining a value of a search term is provided, comprising: inputting feature data of a search term to be tested into a value regression model; and acquiring the search term to be tested based on a value regression model Value data. ;
其中,所述价值回归模型是通过如下方式获取的:将已有搜索词基于点击关系数据和/或展现关系数据而进行聚类,以获得聚类后的搜索词集合;将搜索词集合分类为不同价值的搜索词集合;利用不同价值的搜索词集合进行模型训练以获取价值回归模型。The value regression model is obtained by clustering existing search words based on click relationship data and/or presentation relationship data to obtain a clustered search word set; classifying the search word set into A collection of search terms of different values; model training using different sets of search terms to obtain a value regression model.
依据本发明的另一个方面,提供了一种确定搜索词的价值的装置,其特征在于,包括:输入模块,用于将待测搜索词的特征数据输入价值回归模型;获取模块,用于基于价值回归模型,获取所述待测搜索词的价值数据;其中,所述价值回归模型是通过如下模块获取的:聚类模块,用于将已有搜索词基于点击关系数据和/或展现关系数据而进行聚类,以获得聚类后的搜索词集合;分类模块,用于将搜索词集合分类为不同价值的搜索词集合;模型获取模块,用于利用不同价值的搜索词集合进行模型训练以获取价值回归模型。According to another aspect of the present invention, an apparatus for determining a value of a search term is provided, comprising: an input module, configured to input feature data of a search term to be tested into a value regression model; and an acquisition module, configured to The value regression model obtains the value data of the search term to be tested; wherein the value regression model is obtained by the following module: a clustering module, configured to use the existing search term based on the click relationship data and/or the presentation relationship data Clustering is performed to obtain a clustered search word set; a classification module is used to classify the search word set into different value search word sets; and a model acquisition module is used to perform model training using different value search word sets. Get a value regression model.
根据本发明的另一个方面,提供了一种计算机程序,其包括计算机可读代码,当电子设备运行所述计算机可读代码时,导致所述的提升信息的曝光率的方法和确定搜索词的价值的方法被执行。According to another aspect of the present invention, there is provided a computer program comprising computer readable code, a method of causing an exposure of the enhanced information and determining a search term when the electronic device runs the computer readable code The method of value is implemented.
根据本发明的再一个方面,提供了一种计算机可读介质,其中存储了如上所述的计算机程序。According to still another aspect of the present invention, a computer readable medium storing a computer program as described above is provided.
与现有技术相比,根据本发明提升信息的曝光率的方法和装置的技术方案存在以下有益效果:根据历史展现数据,对展现机会做动态调整,在保证最小损失的前提下,增加信息曝光率,同时在单位时间内给出更多的展现机会给不 同的信息并且提高了用户体验。Compared with the prior art, the technical solution of the method and apparatus for improving the exposure of information according to the present invention has the following beneficial effects: dynamically adjusting the presentation opportunity according to historical presentation data, and increasing information exposure under the premise of ensuring minimum loss. Rate, while giving more opportunities to show in unit time The same information and improved user experience.
根据本发明的确定搜索词的价值的方法和装置,可以更加准确地确定搜索词的价值并基于搜索词价值数据选择展现其中有价值的数据信息(例如广告)从而提高用户体验并提高信息点击率,提升信息曝光率。According to the method and apparatus for determining the value of a search term according to the present invention, the value of the search term can be more accurately determined and the valuable data information (such as an advertisement) can be selected based on the search term value data to improve the user experience and improve the information click rate. , improve information exposure.
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solutions of the present invention, and the above-described and other objects, features and advantages of the present invention can be more clearly understood. Specific embodiments of the invention are set forth below.
附图说明DRAWINGS
普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:The general technician will become clear. The drawings are only for the purpose of illustrating the preferred embodiments and are not to be construed as limiting. Throughout the drawings, the same reference numerals are used to refer to the same parts. In the drawing:
图1是根据本发明一实施例的提升信息的曝光率的方法的流程图;1 is a flow chart of a method of increasing exposure of information according to an embodiment of the present invention;
图2是根据本发明一实施例的提升信息的曝光率的装置的结构图;2 is a structural diagram of an apparatus for increasing exposure of information according to an embodiment of the present invention;
图3示出了根据本发明一个实施例的获取价值回归模型的方法的流程图;3 shows a flow chart of a method of obtaining a value regression model in accordance with one embodiment of the present invention;
图4示出了根据本发明一个实施例的确定搜索词的价值的方法的流程图;以及。4 shows a flow chart of a method of determining the value of a search term in accordance with one embodiment of the present invention;
图5示出了根据本发明一个实施例的确定搜索词的价值的装置的结构图;FIG. 5 is a block diagram showing an apparatus for determining a value of a search term according to an embodiment of the present invention; FIG.
图6示出了用于执行本发明的方法的电子设备的框图;以及Figure 6 shows a block diagram of an electronic device for performing the method of the present invention;
图7示出了用于保持或者携带实现根据本发明的方法的程序代码的存储单元示意图。Figure 7 shows a schematic diagram of a memory unit for holding or carrying program code implementing a method in accordance with the present invention.
具体实施方式detailed description
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the embodiments of the present invention have been shown in the drawings, the embodiments Rather, these embodiments are provided so that this disclosure will be more fully understood and the scope of the disclosure will be fully disclosed.
下面将参考附图,详细描述本发明改进的技术方案。The improved technical solution of the present invention will be described in detail below with reference to the accompanying drawings.
如图1所示,图1是根据本发明一实施例的提升信息的曝光率的方法的流程图。As shown in FIG. 1, FIG. 1 is a flowchart of a method of increasing exposure of information according to an embodiment of the present invention.
在步骤S110处,确定是否针对与接收到的查询词相关的未被展现过的信息执行曝光率提升处理。At step S110, it is determined whether or not the exposure rate promotion processing is performed for the unexpressed information related to the received query word.
具体而言,本发明的方法在接收到查询词之后,需要首先确定是否要针对与接收到的查询词相关的未被展现过的信息执行曝光率提升处理。也可以针对 每一次查询请求,执行曝光率提升处理,但是这样变现效率较低,即可能给予悔恨期望较大的结果太多展现机会,从而降低整个系统的效率。因此,可以针对每次查询请求进行一下判定,将系统中的需要进行曝光率提升处理的比率控制在一个范围内,例如,被选中参与曝光率提升处理的查询请求与总的查询请求的比率控制为不超过5%。应该理解,该比率可以按照需求进行调整。Specifically, after receiving the query word, the method of the present invention first needs to determine whether to perform an exposure promotion process for the unexpressed information related to the received query word. Can also target Each time the query request is executed, the exposure improvement process is performed, but such a realization is less efficient, that is, it may give a reluctant expectation that a large result shows too much opportunity, thereby reducing the efficiency of the entire system. Therefore, a determination can be made for each query request, and the ratio of the exposure improvement processing required in the system is controlled within a range, for example, the ratio control of the query request selected to participate in the exposure promotion processing and the total query request Not more than 5%. It should be understood that this ratio can be adjusted as needed.
具体而言,在服务器端具有一个存储历史查询数据的历史查询数据库,利用该数据库中的历史查询数据,可以提供各个查询词的历史请求信息,从而可以获取是否针对该查询词相关的未被展现过的信息执行曝光率提升处理的调整参数。Specifically, the server has a historical query database storing historical query data, and the historical query data in the database is used to provide historical request information of each query word, so that whether the related query term is not displayed may be obtained. The past information performs the adjustment parameters of the exposure improvement process.
具体而言,基于与所述查询相关的历史请求数据,获取调整参数;基于所述调整参数和系统产生的随机数,确定是否针对与接收到的查询相关的未被展现过的信息执行曝光率提升处理。Specifically, based on the historical request data related to the query, acquiring an adjustment parameter; determining, based on the adjustment parameter and the random number generated by the system, whether to perform an exposure rate on the undisplayed information related to the received query. Improve processing.
更具体地,首先,基于与所述查询相关的历史查询数据,获取调整参数。例如,可以将调整参数的基准设置为1.0,并在基准1.0的基础上,根据历史请求数据(可以是但不限于查询的频率和点击率等),可以但不限于使用如下的公式进行调整预估从而获取调整参数:More specifically, first, an adjustment parameter is acquired based on historical query data related to the query. For example, the benchmark of the adjustment parameter can be set to 1.0, and based on the benchmark 1.0, according to the historical request data (which may be, but not limited to, the frequency of the query and the click rate, etc.), it may be, but is not limited to, using the following formula to adjust the pre- Estimate to get the adjustment parameters:
调整参数=1.0+alpha*点击率+beta*log(gama/频率)公式1Adjustment parameter = 1.0 + alpha * click rate + beta * log (gama / frequency) formula 1
在公式中,例如alpha=0.2,beta=0.3,gama=1000;In the formula, for example, alpha=0.2, beta=0.3, gama=1000;
然后,基于所述调整参数和系统产生的随机数,确定是否针对与接收到的查询相关的未被展现过的信息执行曝光率提升处理。Then, based on the adjustment parameters and the random number generated by the system, it is determined whether the exposure rate enhancement processing is performed for the unexpressed information related to the received query.
具体而言,例如,随机数可以使用均匀分布(例如,目标比率为5%,即为参数为20的均匀分布)生成,从而使其最终结果满足上面所述的目标比率(5%)。当然,应该理解,随机数也可以使用其他方式来生成。Specifically, for example, the random number can be generated using a uniform distribution (for example, a target ratio of 5%, that is, a uniform distribution with a parameter of 20), so that its final result satisfies the target ratio (5%) described above. Of course, it should be understood that random numbers can also be generated in other ways.
然后,如上所述,基于随机数,使得最终结果满足目标比率;然后基于调整参数,能够获得确定是否针对与接收到的查询相关的未被展现过的信息执行曝光率提升处理的判断阈值,该判断阈值=目标比率*调整参数。从而基于该判断阈值确定是否针对与接收到的查询相关的未被展现过的信息执行曝光率提升处理。Then, as described above, based on the random number, the final result is made to satisfy the target ratio; and then based on the adjustment parameter, a determination threshold for determining whether to perform exposure rate promotion processing for the undisplayed information related to the received query can be obtained, Judgment Threshold = Target Ratio * Adjustment Parameters. Thereby, it is determined based on the judgment threshold whether or not the exposure rate improvement processing is performed for the information that has not been presented related to the received query.
例如,当该判断阈值大于等于5%,确定针对与接收到的查询相关的未被展现过的信息执行曝光率提升处理;当该判断阈值小于5%,确定针对与接收到的查询相关的未被展现过的信息不执行曝光率提升处理。应当理解,可以按照需要选择其他的阈值而不限于上述具体的阈值数值。For example, when the determination threshold is greater than or equal to 5%, it is determined that the exposure rate promotion process is performed for the undisplayed information related to the received query; when the determination threshold is less than 5%, it is determined that the related query is not related to the received query. The information that has been presented does not perform the exposure enhancement process. It should be understood that other thresholds may be selected as desired without being limited to the specific threshold values described above.
举例而言,例如需要展现的信息可以是广告信息,可以确定是否针对与接 收到的查询词相关的未被展现过的广告信息执行曝光率提升处理。也即是说,首先,基于与所述查询相关的历史请求数据,获取调整参数;然后,基于所述调整参数和系统产生的随机数,确定是否针对与接收到的查询相关的未被展现过的广告信息执行曝光率提升处理。For example, the information that needs to be presented may be advertising information, and it may be determined whether The received advertisement information related to the query word is not subjected to the exposure improvement processing. That is to say, firstly, based on the historical request data related to the query, obtaining an adjustment parameter; then, based on the adjustment parameter and the random number generated by the system, determining whether the unrelated item related to the received query is not displayed The advertising information performs an exposure improvement process.
根据本发明的实施例,所述信息可以包括以下至少之一:展现次数在预定值以下的信息、预定地域的信息、预定时段的信息。例如,信息可以是展现次数在10次以下的广告信息、北京市的广告信息等等。应该理解,本发明的信息也可以是其他类型的信息。According to an embodiment of the present invention, the information may include at least one of the following: information indicating that the number of times is below a predetermined value, information of a predetermined area, information of a predetermined time period. For example, the information may be advertisement information with a number of presentations of less than 10 times, advertisement information of Beijing, and the like. It should be understood that the information of the present invention may also be other types of information.
如果在步骤110处确定针对与接收到的该查询词相关的未被展现过的信息执行曝光率提升处理,则在步骤120处,检查与该查询词相关联的历史查询数据的查询频率是否大于等于第一阈值。第一阈值例如可以为:两次每小时。If it is determined at step 110 that the exposure promotion process is performed for the undisplayed information associated with the received query term, then at step 120, it is checked if the query frequency of the historical query data associated with the query term is greater than Equal to the first threshold. The first threshold can be, for example, two per hour.
具体而言,检查与该查询词相关联的历史查询数据的查询频率。如果查询频率较高,例如大于等于每小时两次,则该方法进入到步骤130。如果查询频率较低,例如小于每小时两次,则该方法结束。Specifically, the frequency of querying historical query data associated with the query term is checked. If the frequency of the query is high, for example, greater than or equal to twice per hour, the method proceeds to step 130. If the frequency of the query is low, for example less than twice per hour, the method ends.
应该理解,第一阈值不限于以上数值,而是可以按照需要选取任何适当的数值作为第一阈值。It should be understood that the first threshold is not limited to the above values, but any suitable value may be selected as the first threshold as needed.
接下来,在步骤130处,基于与所述查询词相关的历史展现数据,确定候选信息。Next, at step 130, candidate information is determined based on historical presentation data associated with the query term.
具体而言,在预先建立的历史展现数据的数据库中,查找与该查询词相关的历史展现数据,确定天级展现次数小于第二阈值的与该查询词相关联的历史展现数据。例如,第二阈值可以为10次。也即是说查找并确定每天展现次数小于10次的与该查询词相关联的历史展现数据,并且将与查找到的历史展现数据对应的信息确定为候选信息。应该理解,也可以确定其他时间级的展现次数小于某个阈值的与该查询词相关联的历史展现数据。例如,周级(七天)展现次数小于70次、或者60小时展现次数小于25次等等。Specifically, in the database of the pre-established historical presentation data, the historical presentation data related to the query word is searched, and the historical presentation data associated with the query word whose number of days of presentation is less than the second threshold is determined. For example, the second threshold can be 10 times. That is, it is said that the history presentation data associated with the query word is searched for and determined less than 10 times per day, and the information corresponding to the found history presentation data is determined as the candidate information. It should be understood that historical presentation data associated with the query term that is displayed at other time levels less than a certain threshold may also be determined. For example, the weekly (seven days) presentation times are less than 70 times, or the 60 hour presentation times are less than 25 times, and so on.
例如与该查询词“年货大礼包”相关的历史展现数据中,信息ID“A1234123”、信息ID“A1231312”和信息ID“A1343141”的天级展现次数小于第二阈值(例如,10次),从而将这三个信息确定为候选信息。For example, in the historical presentation data related to the query term "Annual Gift Package", the number of days of presentation of the information ID "A1234123", the information ID "A1231312", and the information ID "A1343141" is less than the second threshold (for example, 10 times), These three pieces of information are thus determined as candidate information.
确定了候选信息之后,在步骤140处,基于所有候选信息的基本数据及所有候选信息所属的信息组的历史展现数据,预估所有候选信息的展现质量参数。After the candidate information is determined, at step 140, the presentation quality parameters of all candidate information are estimated based on the basic data of all candidate information and the historical presentation data of the information group to which all candidate information belongs.
具体而言,基于所有候选信息的基本数据及所有候选信息所属的信息组的历史展现数据,预估所有候选信息的展现质量参数。Specifically, the presentation quality parameters of all candidate information are estimated based on the basic data of all candidate information and the historical presentation data of the information group to which all candidate information belongs.
例如,信息ID“A1234123”、信息ID“A1231312”属于一个信息组“G111223”, 信息ID“A1343141”属于另一个信息组“G222121”。查询上述三个候选信息的各种基本数据,并且查询它们所属的上述两个信息组中的其他同组信息ID的历史展现数据,各个信息ID的历史展现数据包括各个信息ID的展现质量参数,从各个信息组中最高的展现质量参数作为该组中的候选信息的预估展现质量参数。例如,将信息组“G111223”中例如最高展现质量参数a(信息A的展现质量参数)作为候选信息(信息ID“A1234123”、信息ID“A1231312”)的预估展现质量参数a。将信息组“G222121”中例如最高展现质量参数b(信息b的展现质量参数)作为候选信息(信息ID“A1343141”)的预估展现质量参数b。For example, the information ID "A1234123" and the information ID "A1231312" belong to one information group "G111223", The information ID "A1343141" belongs to another information group "G222121". Querying various basic data of the above three candidate information, and querying historical presentation data of other same group information IDs in the above two information groups to which they belong, and the historical presentation data of each information ID includes presentation quality parameters of each information ID, The highest presentation quality parameter from each of the groups is used as an estimated presentation quality parameter for candidate information in the group. For example, the highest presentation quality parameter a (the presentation quality parameter of the information A) in the information group "G111223" is taken as the estimated presentation quality parameter a of the candidate information (information ID "A1234123", information ID "A1231312"). For example, the highest presentation quality parameter b (the presentation quality parameter of the information b) in the information group "G222121" is taken as the estimated presentation quality parameter b of the candidate information (information ID "A1343141").
然后,在步骤150处,将将具有预估的最高展现质量参数的候选信息作为推荐信息推荐到候选展现队列,以便以该预估的最高展现质量参数作为该推荐信息的展现质量参数进行整体展现竞争处理。Then, at step 150, the candidate information having the predicted highest presentation quality parameter is recommended as recommendation information to the candidate presentation queue to perform overall presentation with the estimated highest presentation quality parameter as the presentation quality parameter of the recommendation information. Competitive processing.
承接上述例子,例如,如果参数a大于参数b,则将具有预估的最高展现质量参数(参数a)的候选信息作为推荐信息推荐到候选展现队列中。In the above example, for example, if the parameter a is larger than the parameter b, the candidate information having the predicted highest presentation quality parameter (parameter a) is recommended as recommendation information into the candidate presentation queue.
具体而言,可以将具有预估的最高展现质量参数a的候选信息的信息ID“A1234123”、信息ID“A1231312”都推荐到候选展现队列中。Specifically, the information ID "A1234123" and the information ID "A1231312" of the candidate information having the predicted highest presentation quality parameter a can be recommended into the candidate presentation queue.
可选地,也可以通过轮询的方式,一次仅将多个候选信息中的一个信息推荐到候选展现队列中。Optionally, only one of the plurality of candidate information may be recommended to the candidate presentation queue at a time by polling.
应该理解,也可以使用其他适当的方式来实现将多个候选信息中的一个信息推荐到候选展现队列中。以便以该预估的最高展现质量参数作为该推荐信息的展现质量参数进行整体展现竞争处理。It should be understood that other suitable means may be used to recommend one of the plurality of candidate information to be recommended in the candidate presentation queue. In order to perform the overall contention competition process with the estimated highest presentation quality parameter as the presentation quality parameter of the recommendation information.
整体展现竞争处理是在排序基础上的候选结果选取。以搜索广告信息为例,首先,对进入整体展现竞争处理的广告信息按照预定规则进行打分并排序,例如按照分值从大到小进行排列。例如,搜索广告信息排序的标准例如可以由两部分决定:广告创意质量,关键词的竞拍价格。即,排序分值=广告创意质量*关键词竞拍价格。接下来,计算推左和过滤结果。此步骤类似于分类,将优质广告推左,将劣质广告过滤。最后,依赖于业务需求,对广告信息进行展现。但是,并不一定是所有广告都会得到展现,例如,搜索广告一般是左侧3条,右侧8条结果。The overall presentation of the competition process is the selection of candidate results based on the ranking. Taking the search advertisement information as an example, first, the advertisement information that enters the overall competition processing is scored and sorted according to a predetermined rule, for example, according to the scores from large to small. For example, the criteria for ranking search advertising information can be determined, for example, in two parts: the quality of the creative creative, and the bid price of the keyword. That is, sort score = ad creative quality * keyword bid price. Next, calculate the push left and filter results. This step is similar to categorization, pushing premium ads to the left and filtering inferior ads. Finally, depending on the business needs, the advertising information is presented. However, not all ads will be shown. For example, search ads are generally 3 on the left and 8 on the right.
从而,通过本发明的提升信息的曝光率的方案,提升了信息进入整体展现竞争处理的机会,并最终提升了信息被最终展现的机会。Thus, by the solution of the present invention for improving the exposure of information, the opportunity for information to enter the overall competition process is enhanced, and the opportunity for the information to be finally presented is finally improved.
本发明还提供了一种提升信息的曝光率的装置。如图2所示,图2是根据本发明一实施例的提升信息的曝光率的装置的结构图200。The present invention also provides an apparatus for increasing the exposure of information. As shown in FIG. 2, FIG. 2 is a structural diagram 200 of an apparatus for increasing the exposure rate of information according to an embodiment of the present invention.
装置200可以包括:第一确定模块210、检查模块220、第二确定模块230、 预估模块240以及推荐模块250。The device 200 may include: a first determining module 210, an checking module 220, a second determining module 230, The estimation module 240 and the recommendation module 250.
其中,第一确定模块210可以用于确定是否针对与接收到的查询词相关的未被展现过的信息执行曝光率提升处理。The first determining module 210 may be configured to determine whether to perform an exposure promotion process for the undisplayed information related to the received query word.
根据本申请的实施例,第一确定模块210可以进一步包括:获取子模块211和第一确定子模块212。其中获取子模块211可以用于基于与所述查询词相关的历史查询数据,获取调整参数;并且第一确定子模块212可以用于基于所述调整参数和系统产生的随机数,确定是否针对与接收到的查询词相关的未被展现过的信息执行曝光率提升处理。According to an embodiment of the present application, the first determining module 210 may further include: an obtaining submodule 211 and a first determining submodule 212. The obtaining sub-module 211 can be configured to obtain an adjustment parameter based on the historical query data related to the query term, and the first determining sub-module 212 can be configured to determine whether to target the based on the adjusted parameter and the random number generated by the system. The received information related to the query word is not subjected to the exposure improvement process.
检查模块220可以用于检查与该查询词相关联的历史查询数据的查询频率是否大于等于第一阈值。The checking module 220 can be configured to check whether the query frequency of the historical query data associated with the query word is greater than or equal to a first threshold.
在一种实施例中,检查模块220进一步可以被配置成:如果与该查询词相关联的历史查询数据的查询频率小于第一阈值,则放弃曝光率提升处理。第一阈值例如可以为:两次每小时。In an embodiment, the checking module 220 may be further configured to abandon the exposure rate promotion process if the query frequency of the historical query data associated with the query term is less than the first threshold. The first threshold can be, for example, two per hour.
第二确定模块230可以用于基于与所述查询词相关的历史展现数据,确定候选信息。The second determining module 230 can be configured to determine candidate information based on historical presentation data related to the query term.
根据本申请的实施例,第二确定模块230可以进一步包括:查找子模块231和第二确定子模块232。其中,查找子模块231可以用于查找天级展现次数小于第二阈值的与该查询词相关联的历史展现数据;并且,第二确定子模块232可以用于将与查找到的历史展现数据对应的信息确定为候选信息。According to an embodiment of the present application, the second determining module 230 may further include: a searching submodule 231 and a second determining submodule 232. The searching sub-module 231 can be configured to search for historical presentation data associated with the query word whose number of days of presentation is less than a second threshold; and the second determining sub-module 232 can be configured to correspond to the searched historical presentation data. The information is determined as candidate information.
预估模块240可以用于基于所有候选信息的基本数据及所有候选信息所属的信息组的历史展现数据,预估所有候选信息的展现质量参数。The estimation module 240 can be configured to estimate the presentation quality parameters of all candidate information based on the basic data of all candidate information and the historical presentation data of the information group to which all candidate information belongs.
推荐模块250可以用于将具有预估的最高展现质量参数的候选信息作为推荐信息推荐到候选展现队列以便以该预估的最高展现质量参数作为该推荐信息的展现质量参数进行整体展现竞争处理。The recommendation module 250 may be configured to recommend candidate information having the predicted highest presentation quality parameter as recommendation information to the candidate presentation queue to perform overall presentation contention processing with the estimated highest presentation quality parameter as the presentation quality parameter of the recommendation information.
根据本申请的实施例,装置200还可以包括进一步包括:展现质量参数确定模块(未示出),该模块可以用于如果该推荐信息在整体展现竞争处理中获得了展现,则将该推荐信息在该展现过程中获得的展现质量参数确定为该推荐信息的初始展现质量参数。According to an embodiment of the present application, the apparatus 200 may further include: a presentation quality parameter determination module (not shown), the module may be configured to: if the recommendation information is obtained in the overall presentation competition process, the recommendation information The presentation quality parameter obtained in the presentation process is determined as an initial presentation quality parameter of the recommendation information.
由于图2所描述的本发明的装置所包括的各个模块的具体实施方式与本发明的方法中的步骤的具体实施方式是相对应的,由于已经对图1进行了详细的描述,所以为了不模糊本发明,在此不再对各个模块的具体细节进行描述。Since the specific implementation of each module included in the apparatus of the present invention described in FIG. 2 corresponds to the specific embodiment of the steps in the method of the present invention, since FIG. 1 has been described in detail, The invention is blurred and the specific details of the individual modules are not described here.
在确定搜索词的价值的方法一个确定搜索词的价值的方法实现方式中,主要包括以下几个步骤:In the method for determining the value of a search term, a method for determining the value of a search term mainly includes the following steps:
步骤1,在广告展现日志中统计所有搜索词的广告展现数量和广告点击数 量;Step 1. Count the number of ad impressions and ad clicks for all search terms in the ad impression log. the amount;
步骤2,计算搜索词的广告点击率=检索词广告点击数量/检索词广告展现数量;Step 2: Calculate the ad click rate of the search term = the number of search term clicks / the number of search ad impressions;
步骤3,如果检索词广告点击率小于一个阈值并且广告展现数量大于一个阈值,则这个检索词为低价值的;反之,如果检索词广告点击率大于一个阈值并且广告展现数量大于一个阈值,则这个检索词为高价值的。具体例子如下:比如搜索词点击率的阈值为5%,搜索词展现阈值的阈值为50;而搜索词“落日余晖的散文”广告展现次数为100,点击次数为1,则这个词为低价值的;而搜索词“笔记本电脑”广告展现次数为10000,点击次数为1000,则这个词为高价值的。Step 3: if the search term click rate is less than a threshold and the number of advertisement presentations is greater than a threshold, the search term is low value; conversely, if the search term click rate is greater than a threshold and the number of advertisements is greater than a threshold, then The search terms are of high value. The specific examples are as follows: for example, the threshold of the search term click rate is 5%, the threshold of the search term exhibiting threshold is 50; and the search term "prose of the sunset" is 100, and the number of clicks is 1, the word is low value. The search term "laptop" ad shows 10,000 times and the number of clicks is 1000, the word is high value.
在该实现方式中,需要人工指定搜索词点击率阈值和搜索词展现阈值,效果的好坏极大依赖工作者的经验;并且该实现方式只能判断价值高或者低,无法给出一个价值的具体数值,在实际应用中不够平滑;而且,该实现方式主要来自于统计,所以推广性较差,覆盖率比较低,并且准确率也有提升空间,不能完全满足搜索广告系统的需要。In this implementation, it is necessary to manually specify the search term click rate threshold and the search term display threshold, and the effect depends greatly on the worker's experience; and the implementation can only judge whether the value is high or low, and cannot give a value. The specific value is not smooth enough in practical applications; moreover, the implementation mainly comes from statistics, so the promotion is poor, the coverage rate is relatively low, and the accuracy rate also has room for improvement, which cannot fully meet the needs of the search advertising system.
下面将参考附图,详细描述本发明改进的确定搜索词的价值的方法的技术方案。The technical solution of the improved method for determining the value of a search term of the present invention will be described in detail below with reference to the accompanying drawings.
为了更好地理解本发明的技术方案,首先介绍本发明的价值回归模型的获取方法。如图3所示,图3是根据本发明一个实施例的获取价值回归模型的方法的流程图。In order to better understand the technical solution of the present invention, the acquisition method of the value regression model of the present invention is first introduced. As shown in FIG. 3, FIG. 3 is a flow chart of a method of obtaining a value regression model in accordance with one embodiment of the present invention.
在步骤S310处,将已有搜索词基于点击关系数据和/或展现关系数据而进行聚类,以获得聚类后的搜索词集合。At step S310, the existing search words are clustered based on the click relationship data and/or the presentation relationship data to obtain a clustered search word set.
具体来说,首先,需要获取不同搜索词的共同点击次数并基于所述共同点击次数计算点击关系数据和/或获取不同搜索词的共同展现次数并基于所述共同展现次数计算展现关系数据。Specifically, first, it is necessary to acquire common click times of different search words and calculate click relationship data based on the common click times and/or obtain common presentation times of different search words and calculate presentation relationship data based on the common presentation times.
例如,可以获取不同搜索词的共同展现次数并基于所述共同展现次数计算展现关系数据。For example, the number of common presentations of different search terms can be obtained and the presentation relationship data can be calculated based on the number of common presentations.
假设被输入的某个搜索词为Q1,而基于该搜索词被搜索引擎展现的数据为D1,D2,D3,D4;而被输入的另一搜索词为Q2,基于该搜索词被搜索引擎展现的数据为D2,D3,D5,D7,则它们的共同展现次数为2(D2,D3);此时可以使用某种相关性来描述Q1和Q2之间的展现关系,例如可以假设这个相关性被定义成共同展现次数/Q1的展现数,则此时Q1,Q2的展现关系可以表示为展现相关度2/4=0.5。It is assumed that a certain search word is Q1, and the data displayed by the search engine based on the search word is D1, D2, D3, D4; and another search word input is Q2, and the search word is displayed by the search engine based on the search word. The data is D2, D3, D5, D7, then their common presentation times are 2 (D2, D3); at this point, some correlation can be used to describe the relationship between Q1 and Q2, for example, this correlation can be assumed. The display relationship of Q1, Q2 can be expressed as the presentation correlation 2/4=0.5.
应该理解,也可以使用任何适当的其他的方式来表示两个搜索词之间的展 现关系,而不限于上面的方式。例如也可以将相关性定义为共同展现次数/Q2的展现数或者共同展现次数/(Q1的展现数+Q2的展现数)等等。It should be understood that any other suitable means may be used to represent the exhibition between two search terms. Now the relationship is not limited to the above. For example, the correlation may also be defined as the number of presentations of the common presentation number / Q2 or the number of common presentations / (the number of presentations of Q1 + the number of presentations of Q2) and the like.
类似地,可以获取到搜索词两两之间的展现关系数据。Similarly, the presentation relationship data between the search terms can be obtained.
此外,还可以获取不同搜索词的共同点击次数并基于所述共同点击次数计算点击关系数据。In addition, it is also possible to obtain the common clicks of different search terms and calculate the click relationship data based on the common clicks.
假设被输入的某个搜索词为Q1,而基于该搜索词被搜索引擎展现并被用户点击的数据为D1,D2,D3,D4;而被输入的另一搜索词为Q2,基于该搜索词被搜索引擎展现并被用户点击的数据为D2,D3,D4,D7,则它们的共同点击次数为3(D2,D3,D4);此时可以使用某种相关性来描述Q1和Q2之间的点击关系,例如可以假设这个相关性被定义成共同点击次数/Q1的点击数,则此时Q1,Q2的点击关系可以表示为点击相关度3/4=0.75。It is assumed that a certain search word is Q1, and the data that is displayed by the search engine and clicked by the user based on the search word is D1, D2, D3, D4; and another search word that is input is Q2, based on the search word. The data displayed by the search engine and clicked by the user is D2, D3, D4, D7, then their common clicks are 3 (D2, D3, D4); at this time, a correlation can be used to describe between Q1 and Q2. The click relationship, for example, can be assumed that the correlation is defined as the number of clicks of the common click / Q1, then the click relationship of Q1, Q2 can be expressed as the click relevance 3/4 = 0.75.
类似地,可以获取到搜索词两两之间的点击关系数据。Similarly, click relationship data between the search terms can be obtained.
应该理解,也可以使用任何适当的其他的方式来表示两个搜索词之间的点击关系,而不限于上面的方式。例如也可以将相关性定义为共同点击次数/Q2的点击数或者共同点击次数/(Q1的点击数+Q2的点击数)等等。It should be understood that any suitable other means may be used to represent the click relationship between two search terms, without being limited to the above. For example, the correlation may be defined as the number of clicks of the common click/Q2 or the number of common clicks/(the number of clicks of Q1 + the number of clicks of Q2) and the like.
应当理解,共同点击次数、共同展现次数、点击关系数据、展现关系数据分别表示两个搜索词之间的共同点击次数、共同展现次数、点击关系数据、展现关系数据。也即是说,上述参数是指两两搜索词之间的相关性参数。It should be understood that the number of common clicks, the number of joint presentations, the click relationship data, and the presentation relationship data respectively represent the number of common clicks, the number of joint presentations, the click relationship data, and the presentation relationship data between the two search words. That is to say, the above parameters refer to the correlation parameters between the two search terms.
在获取了点击关系数据、展现关系数据、共同点击次数、共同展现次数中的至少一个之后,可以基于所述点击关系数据、展现关系数据、共同展现次数和共同点击次数中的至少一个,计算已有搜索词之间的聚类距离。然后,基于所述聚类距离将已有搜索词进行聚类,以获得聚类后的搜索词集合。After at least one of the click relationship data, the presentation relationship data, the common click count, and the common presentation count is acquired, the calculated relationship may be calculated based on at least one of the click relationship data, the presentation relationship data, the common presentation count, and the common click count. There is a clustering distance between search terms. Then, the existing search words are clustered based on the cluster distance to obtain a clustered search word set.
承接上面的例子,例如Q1的展现数据被表示为<D1,D2,D3,D4>,Q1的展现数据被表示为<D2,D3,D5,D7>,然后使用聚类算法计算Q1和Q2搜索词之间的聚类距离。通过类似的方法,计算出所有的搜索词的聚类距离,从而实现搜索词的聚类。例如,可以使用谱聚类或者kmeans聚类算法并基于点击关系数据、展现关系数据、共同点击次数、共同展现次数中的至少一个而计算搜索词之间的聚类距离,从而实现对搜索词进行聚类,并且从而获得聚类后的搜索词集合。Taking the above example, for example, the presentation data of Q1 is expressed as <D1, D2, D3, D4>, the presentation data of Q1 is represented as <D2, D3, D5, D7>, and then the Q1 and Q2 search are calculated using the clustering algorithm. The clustering distance between words. Through a similar method, the clustering distance of all the search words is calculated, thereby realizing the clustering of the search words. For example, the clustering distance between the search terms may be calculated based on at least one of the click relationship data, the presentation relationship data, the common click count, and the common presentation times, using a spectral clustering or kmeans clustering algorithm, thereby implementing the search term Clustering, and thus obtaining a clustered set of search terms.
在步骤S320处,将搜索词集合分类为不同价值的搜索词集合。At step S320, the set of search words is classified into a set of search words of different values.
具体而言,可以将所有集合分类为预定数量的搜索词集合。可选地,例如在本发明的一个优选实施例中,可以将集合分类为三类:高价值的搜索词集合、中价值的搜索词集合以及低价值的搜索词集合,其中高价值的搜索词集合中搜索词的价值数据大于中价值的搜索词集合中搜索词的价值数据;以及中价值的搜索词集合中搜索词的价值数据大于低价值的搜索词集合中搜索词的价值数 据。按照一定规则将所有的搜索词集合分类为预订数量的搜索词集合。更具体而言,针对每个搜索词,已经利用日志统计数据预先确定其价值数据。例如可以近似地用千次搜索带来的价值来衡量该搜索词的价值数据,它反映了单位搜索内搜索词的盈利能力,也就是它的价值。这样,利用日志统计数据,可以获取搜索词的价值数据,并根据价值数据分布将每个搜索词确定为例如高、中、低三个档次。然后,再根据单个搜索词的价值数据,就能够得到聚类后的搜索词集合的集合价值数据。同理可将聚类后的搜索词集合分配为不同价值的搜索词集合。In particular, all collections can be classified into a predetermined number of collections of search terms. Alternatively, for example, in a preferred embodiment of the present invention, the collection may be classified into three categories: a high value search word set, a medium value search word set, and a low value search word set, wherein the high value search term The value data of the search words in the set is greater than the value data of the search words in the set of search words of the medium value; and the value data of the search words in the set of search words of the medium value is greater than the value of the search words in the set of search words of the low value according to. All collections of search terms are classified into a set of search terms for a predetermined number according to certain rules. More specifically, for each search term, log data has been utilized to predetermine its value data. For example, the value of the search term can be measured by the value brought by the search in a thousand times, which reflects the profitability of the search term in the unit search, that is, its value. In this way, using the log statistics, the value data of the search term can be obtained, and each search term is determined to be, for example, three levels of high, medium, and low according to the value data distribution. Then, based on the value data of the single search term, the aggregated value data of the clustered search word set can be obtained. Similarly, the clustered search word set can be assigned as a collection of search words of different values.
应该理解,对搜索词和/或搜索词集合划分不同价值的一定规则是灵活且可变的,其可以根据系统需求而做出调整。例如可以将搜索词划分成更多的档次或者更少的档次,同样也可以将搜索词集合划分成更多的档次或者更少的档次。这些划分方式都在本发明的保护范围之内。It should be understood that certain rules that divide different values of search terms and/or search term sets are flexible and variable, which can be adjusted according to system requirements. For example, the search term can be divided into more grades or fewer grades, and the set of search words can also be divided into more grades or fewer grades. These divisions are all within the scope of the present invention.
在步骤S330处,利用不同价值的搜索词集合进行模型训练以获取价值回归模型。At step S330, model training is performed using a set of search words of different values to obtain a value regression model.
将搜索词分类之后,利用不同价值的搜索词集合进行模型训练,最终获取价值回归模型。After classifying the search terms, the model training is carried out using a set of search words of different values, and finally the value regression model is obtained.
具体而言,可以将每个搜索词集合中的每个搜索词作为一份对应该搜索词集合的价值数据的样本,具体地,承接上面的示例,将高价值的搜索词集合中的每个搜索词作为一份2样本、中价值的搜索词集合中的每个搜索词作为1份1样本并且低价值的搜索词集合中的每个搜索词作为1份0样本利用逻辑回归算法进行训练以形成所述价值回归模型。例如,假设在价值回归模型中,存在3个聚类的标注数据:聚类1中的搜索词例如为“笔记本电脑”、“mac air”、“thinkpad”等,商业价值标注为1(高等商业价值);聚类2中的搜索词为“刘德华”、“张学友”、“刘德华的专辑”等,商业价值标注为0(低商业价值);聚类3中的搜索词为“5寸手机有多大”,“android手机是否流畅”等,商业价值标注为0.5(中商业价值)。也即是说,通过训练获取到该价值回归模型的参数,从而利用该价值回归模型对待测搜索词的价值数据进行预测。Specifically, each search word in each search term set can be used as a sample of value data corresponding to the set of search words, specifically, taking the above example, each of the high value search word sets The search term is used as a one-sample, one-sample search term in the two-sample, medium-value search term as a one-sample and each search term in the low-value search term set is trained as a zero-sample using a logistic regression algorithm. The value regression model is formed. For example, suppose that in the value regression model, there are three clusters of annotation data: the search words in cluster 1 are, for example, "laptop", "mac air", "thinkpad", etc., and the commercial value is marked as 1 (higher business) Value); the search words in cluster 2 are "Andy Lau", "Zhang Xueyou", "Andy Lau's album", etc., the commercial value is marked as 0 (low business value); the search word in cluster 3 is "5 inch mobile phone has How big is it, "Is the android phone smooth?", and the commercial value is marked as 0.5 (medium business value). That is to say, the parameters of the value regression model are obtained through training, so that the value data of the search term is predicted by using the value regression model.
应当理解,如何对不同价值的搜索词集合中的搜索词进行样本化的方式也可以是其他任何适当的方式而不限于上述的方式。It should be understood that the manner in which the search terms in the set of search terms of different values are sampled may also be in any other suitable manner and is not limited to the above.
至此,参照图3描述了价值回归模型的构建方法。So far, the construction method of the value regression model has been described with reference to FIG.
下面,利用形成的价值回归模型并参考图4来描述本发明的确定搜索词的价值的方法。如图4所示,图4是根据本发明一实施例的确定搜索词的价值的方法的流程图。Next, a method of determining the value of a search term of the present invention will be described using the formed value regression model and with reference to FIG. As shown in FIG. 4, FIG. 4 is a flow chart of a method of determining the value of a search term in accordance with an embodiment of the present invention.
在步骤S410处,将待测搜索词的特征数据输入价值回归模型。具体而言, 为了利用如图3所示的方法所建立的价值回归模型来预测待测的搜索词的价值数据,首先需要提取待测搜索词的特征数据并且将其输入价值回归模型。通过图3所示的模型训练已经获得了该价值回归模型的参数,现在将待测搜索词的特征数据输入该模型。搜索词的特征数据例如可以包括但不限于搜索词的长度、搜索词的类别、搜索词分词后的结果等。At step S410, the feature data of the search term to be tested is input to the value regression model. in particular, In order to predict the value data of the search term to be tested by using the value regression model established by the method shown in FIG. 3, it is first necessary to extract the feature data of the search term to be tested and input it into the value regression model. The parameters of the value regression model have been obtained through the model training shown in Fig. 3. Now, the feature data of the search term to be tested is input into the model. The feature data of the search term may include, for example, but is not limited to, the length of the search term, the category of the search term, the result of the search term segmentation, and the like.
举例而言,比如在价值回归模型中,存在3个聚类的标注数据:聚类1中的搜索词例如为“笔记本电脑”、“mac air”、“thinkpad”等,商业价值标注为1(高等商业价值);聚类2中的搜索词为“刘德华”、“张学友”、“刘德华的专辑”等,商业价值标注为0(低商业价值);聚类3中的搜索词为“5寸手机有多大”,“android手机是否流畅”等,商业价值标注为0.5(中商业价值)。例如,首先,将待测搜索词“东芝笔记本”的特征数据输入价值回归模型。For example, in the value regression model, for example, there are three clusters of annotation data: the search words in cluster 1 are, for example, "laptop", "mac air", "thinkpad", etc., and the commercial value is marked as 1 ( Higher business value); the search terms in cluster 2 are "Andy Lau", "Zhang Xueyou", "Andy Lau's album", etc., the commercial value is marked as 0 (low business value); the search word in cluster 3 is "5 inch" How big is the mobile phone, "Whether the android phone is smooth", etc., the commercial value is marked as 0.5 (medium business value). For example, first, the feature data of the search term "Toshiba Notebook" to be tested is input into a value regression model.
在步骤S420处,基于价值回归模型,获取所述待测搜索词的价值数据。At step S420, based on the value regression model, the value data of the search term to be tested is obtained.
承接上述例子,例如将待测搜索词“东芝笔记本”的特征数据输入价值回归模型,则训练的模型对“东芝笔记本”将给出的价值数据例如是0.8(是大于0.5小于等于1的一个数)。再例如,基于价值回归模型,获取到待测搜索词“李连杰”的价值数据例如是0.1(小于0.5大于0的一个数)。In the above example, for example, the feature data of the search term "Toshiba notebook" is input into the value regression model, and the value data that the trained model will give to the "Toshiba notebook" is, for example, 0.8 (a number greater than 0.5 and less than or equal to 1) ). For another example, based on the value regression model, the value data of the search term "Li Lianjie" obtained is, for example, 0.1 (a number less than 0.5 greater than 0).
本发明还提供了一种确定搜索词的价值的装置。如图5所示,图5是根据本发明一实施例的确定搜索词的价值的装置500的结构框图。The present invention also provides an apparatus for determining the value of a search term. As shown in FIG. 5, FIG. 5 is a block diagram showing the structure of an apparatus 500 for determining the value of a search term according to an embodiment of the present invention.
装置500可以包括输入模块510以及获取模块520。其中,输入模块510可以用于将待测搜索词输入价值回归模型。获取模块530可以用于基于价值回归模型,获取所述待测搜索词的价值数据。 Apparatus 500 can include an input module 510 and an acquisition module 520. The input module 510 can be used to input the search term to be tested into a value regression model. The obtaining module 530 can be configured to obtain value data of the search term to be tested based on a value regression model.
根据本发明的实施例,价值回归模型可以是通过如下模块获取的:According to an embodiment of the invention, the value regression model can be obtained by the following module:
聚类模块(未示出),其可以用于将已有搜索词基于点击关系数据和/或展现关系数据而进行聚类,以获得聚类后的搜索词集合;a clustering module (not shown), which can be used to cluster existing search words based on click relationship data and/or presentation relationship data to obtain a clustered search word set;
分类模块(未示出),其可以用于将搜索词集合分类为不同价值的搜索词集合;a classification module (not shown) that can be used to classify a collection of search terms into a collection of search terms of different values;
模型获取模块(未示出),其可以用于利用不同价值的搜索词集合进行模型训练以获取价值回归模型。A model acquisition module (not shown) that can be used to model training with a collection of search terms of different values to obtain a value regression model.
根据本发明的实施例,上述不同价值的搜索词集合可以包括高价值的搜索词集合、中价值的搜索词集合以及低价值的搜索词集合,其中高价值的搜索词集合中搜索词的价值数据大于中价值的搜索词集合中搜索词的价值数据;以及中价值的搜索词集合中搜索词的价值数据大于低价值的搜索词集合中搜索词的价值数据。 According to an embodiment of the present invention, the set of search words of different values may include a high value search word set, a medium value search word set, and a low value search word set, wherein the value data of the search word in the high value search word set The value data of the search term in the set of search words greater than the medium value; and the value data of the search term in the set of search words of the medium value is greater than the value data of the search term in the set of search words of the low value.
其中,高价值的搜索词集合中搜索词的价值数据为1、中价值的搜索词集合中搜索词的价值数据为0.5以及低价值的搜索词集合中搜索词的价值数据为0。The value data of the search word in the high-value search word set is 1. The value data of the search word in the set of the search value of the medium value is 0.5, and the value data of the search word in the low-value search word set is 0.
根据本发明的实施例,其中,聚类模块可以进一步包括关系数据获取子模块、计算子模块以及获取子模块。According to an embodiment of the present invention, the clustering module may further include a relational data acquisition sub-module, a calculation sub-module, and an acquisition sub-module.
其中,关系数据获取子模块,可以用于获取不同搜索词的共同点击次数并基于所述共同点击次数计算点击关系数据和/或不同搜索词的共同展现次数基于所述共同展现次数计算展现关系数据;The relationship data obtaining sub-module may be configured to obtain a common click count of different search terms and calculate a click relationship data and/or a common presentation number of different search words based on the common click times, and calculate the presentation relationship data based on the common presentation times. ;
计算子模块,可以用于基于所述点击关系数据、展现关系数据、共同展现次数和共同点击次数中的至少一个,计算已有搜索词之间的聚类距离;The calculating submodule may be configured to calculate a clustering distance between the existing search words based on at least one of the click relationship data, the presentation relationship data, the common presentation times, and the common click times;
获取子模块,可以用于基于所述聚类距离将已有搜索词进行聚类,以获得聚类后的搜索词集合。The obtaining sub-module may be configured to cluster existing search words based on the cluster distance to obtain a clustered search word set.
其中,共同点击次数、共同展现次数、点击关系数据、展现关系数据分别表示两个搜索词之间的共同点击次数、共同展现次数、点击关系数据、展现关系数据。The common clicks, the common presentation times, the click relationship data, and the presentation relationship data respectively represent the number of common clicks, the common presentation times, the click relationship data, and the presentation relationship data between the two search words.
根据本发明的实施例,模型获取模块可以进一步被配置成:According to an embodiment of the invention, the model acquisition module may be further configured to:
将高价值的搜索词集合中的每个搜索词作为一份2样本、中价值的搜索词集合中的每个搜索词作为一份1样本并且低价值的搜索词集合中的每个搜索词作为一份0样本利用所述逻辑回归算法进行训练以形成所述价值回归模型。Each search word in the high-value search word set is used as a one-sample and one-value search word set in each of the low-value search word sets as a one-sample, medium-value search word set. A zero sample is trained using the logistic regression algorithm to form the value regression model.
由于本实施例的装置所实现的功能基本相应于前述图3和图4所示的方法实施例,故本实施例的描述中未详尽之处,可以参见前述实施例中的相关说明,在此不做赘述。Since the functions implemented by the device in this embodiment substantially correspond to the foregoing method embodiments shown in FIG. 3 and FIG. 4, the description of the present embodiment is not exhaustive, and reference may be made to the related description in the foregoing embodiment. Do not repeat them.
在此提供的算法和显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本发明也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本发明的内容,并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithms and displays provided herein are not inherently related to any particular computer, virtual system, or other device. Various general purpose systems can also be used with the teaching based on the teachings herein. The structure required to construct such a system is apparent from the above description. Moreover, the invention is not directed to any particular programming language. It is to be understood that the invention may be embodied in a variety of programming language, and the description of the specific language has been described above in order to disclose the preferred embodiments of the invention.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that the embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures, and techniques are not shown in detail so as not to obscure the understanding of the description.
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确 记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。Similarly, the various features of the invention are sometimes grouped together into a single embodiment, in the above description of the exemplary embodiments of the invention, Figure, or a description of it. However, the method of the disclosure should not be construed as reflecting the intention that the claimed invention is claimed in the claims The features described are more features. Rather, as the following claims reflect, inventive aspects reside in less than all features of the single embodiments disclosed herein. Therefore, the claims following the specific embodiments are hereby explicitly incorporated into the embodiments, and each of the claims as a separate embodiment of the invention.
本领域那些技术人员可以理解,可以对实施例中的客户端中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个客户端中。可以把实施例中的模块组合成一个模块,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者客户端的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art will appreciate that the modules in the client in the embodiment can be adaptively changed and placed in one or more clients different from the embodiment. The modules in the embodiments can be combined into one module, and further they can be divided into a plurality of sub-modules or sub-units or sub-components. In addition to such features and/or at least some of the processes or units being mutually exclusive, any combination of the features disclosed in the specification, including the accompanying claims, the abstract and the drawings, and any methods so disclosed, or All processes or units of the client are combined. Each feature disclosed in this specification (including the accompanying claims, the abstract and the drawings) may be replaced by alternative features that provide the same, equivalent or similar purpose.
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。In addition, those skilled in the art will appreciate that, although some embodiments described herein include certain features that are included in other embodiments and not in other features, combinations of features of different embodiments are intended to be within the scope of the present invention. Different embodiments are formed and formed. For example, in the following claims, any one of the claimed embodiments can be used in any combination.
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的提升信息的曝光率的装置和确定搜索词的价值的装置中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。The various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or digital signal processor (DSP) may be used in practice to implement some of the means for increasing the exposure of information and the means for determining the value of a search term in accordance with an embodiment of the present invention. Or some or all of the features of all components. The invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein. Such a program implementing the invention may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
例如,图6示出了可以实现本发明的提升信息的曝光率的方法和确定搜索词的价值的方法的电子设备。该电子设备传统上包括处理器610和以存储器620形式的计算机程序产品或者计算机可读介质。存储器620可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器620具有用于执行上述方法中的任何方法步骤的程序代码631的存储空间630。例如,用于程序代码的存储空间630可以包括分别用于实现上面的方法中的各种步骤的各个程序代码631。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。 这样的计算机程序产品通常为如参考图7所述的便携式或者固定存储单元。该存储单元可以具有与图6的电子设备中的存储器620类似布置的存储段或者存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包括用于执行根据本发明的方法步骤的程序631’,即可以由例如诸如610之类的处理器读取的代码,这些代码当由电子设备运行时,导致该电子设备执行上面所描述的方法中的各个步骤。For example, FIG. 6 illustrates an electronic device that can implement the method of increasing the exposure of information of the present invention and a method of determining the value of a search term. The electronic device conventionally includes a processor 610 and a computer program product or computer readable medium in the form of a memory 620. The memory 620 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM. Memory 620 has a memory space 630 for program code 631 for performing any of the method steps described above. For example, storage space 630 for program code may include various program code 631 for implementing various steps in the above methods, respectively. The program code can be read from or written to one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such a computer program product is typically a portable or fixed storage unit as described with reference to FIG. The storage unit may have a storage section or a storage space or the like arranged similarly to the storage 620 in the electronic device of FIG. The program code can be compressed, for example, in an appropriate form. In general, the storage unit comprises a program 631' for performing the steps of the method according to the invention, ie a code readable by a processor, such as 610, which, when executed by the electronic device, causes the electronic device to perform the above Each step in the described method.
本文中所称的“一个实施例”、“实施例”或者“一个或者多个实施例”意味着,结合实施例描述的特定特征、结构或者特性包括在本发明的至少一个实施例中。此外,请注意,这里“在一个实施例中”的词语例子不一定全指同一个实施例。&quot;an embodiment,&quot; or &quot;an embodiment,&quot; or &quot;an embodiment,&quot; In addition, it is noted that the phrase "in one embodiment" is not necessarily referring to the same embodiment.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下被实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that the embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures, and techniques are not shown in detail so as not to obscure the understanding of the description.
应该注意的是,上述实施例对本发明进行的详细说明并不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”或“包括”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that the above-described embodiments are not intended to limit the invention, and that alternative embodiments may be devised without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as a limitation. The word "comprising" or "comprising" does not exclude the presence of the elements or the steps in the claims. The word "a" or "an" The invention can be implemented by means of hardware comprising several distinct elements and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means can be embodied by the same hardware item. The use of the words first, second, and third does not indicate any order. These words can be interpreted as names.
此外,还应当注意,本说明书中使用的语言主要是为了可读性和教导的目的而选择的,而不是为了解释或者限定本发明的主题而选择的。因此,在不偏离所附权利要求书的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。对于本发明的范围,对本发明所做的公开是说明性的,而非限制性的,本发明的范围由所附权利要求书限定。 In addition, it should be noted that the language used in the specification has been selected for the purpose of readability and teaching, and is not intended to be construed or limited. Therefore, many modifications and changes will be apparent to those skilled in the art without departing from the scope of the invention. The disclosure of the present invention is intended to be illustrative, and not restrictive, and the scope of the invention is defined by the appended claims.

Claims (25)

  1. 一种提升信息的曝光率的方法,其特征在于,包括:A method for improving the exposure of information, characterized by comprising:
    确定是否针对与接收到的查询词相关的信息执行曝光率提升处理;Determining whether to perform an exposure promotion process for information related to the received query word;
    如果是,则检查与该查询词相关联的历史查询数据的查询频率是否大于等于第一阈值;If yes, checking whether the query frequency of the historical query data associated with the query word is greater than or equal to the first threshold;
    如果是,则基于与所述查询词相关的历史展现数据,确定候选信息;If yes, determining candidate information based on historical presentation data associated with the query term;
    基于所有候选信息的基本数据及所有候选信息所属的信息组的历史展现数据,预估所有候选信息的展现质量参数;以及Estimating the presentation quality parameters of all candidate information based on the basic data of all candidate information and the historical presentation data of the information group to which all candidate information belongs;
    将具有预估的最高展现质量参数的候选信息作为推荐信息推荐到候选展现队列,以便以该预估的最高展现质量参数作为该推荐信息的展现质量参数进行整体展现竞争处理。The candidate information having the predicted highest presentation quality parameter is recommended as recommendation information to the candidate presentation queue to perform overall presentation contention processing with the estimated highest presentation quality parameter as the presentation quality parameter of the recommendation information.
  2. 根据权利要求1所述的方法,其特征在于,进一步包括:The method of claim 1 further comprising:
    如果与该查询词相关联的历史查询数据的查询频率小于第一阈值,则放弃曝光率提升处理。If the query frequency of the historical query data associated with the query term is less than the first threshold, the exposure rate promotion process is discarded.
  3. 根据权利要求1所述的方法,其特征在于,进一步包括:The method of claim 1 further comprising:
    如果该推荐信息在整体展现竞争处理中获得了展现,则将该推荐信息在该展现过程中获得的展现质量参数确定为该推荐信息的初始展现质量参数。If the recommendation information is obtained in the overall presentation competition process, the presentation quality parameter obtained by the recommendation information in the presentation process is determined as an initial presentation quality parameter of the recommendation information.
  4. 根据权利要求1所述的方法,其特征在于,确定是否针对与接收到的查询词相关的未被展现过的信息执行曝光率提升处理进一步包括:The method of claim 1, wherein determining whether to perform an exposure promotion process for the undisplayed information related to the received query term further comprises:
    基于与所述查询词相关的历史查询数据,获取调整参数;以及Obtaining adjustment parameters based on historical query data associated with the query term;
    基于所述调整参数和系统产生的随机数,确定是否针对与接收到的查询词相关的未被展现过的信息执行曝光率提升处理。Based on the adjustment parameters and the random number generated by the system, it is determined whether the exposure rate enhancement processing is performed for the unexpressed information related to the received query word.
  5. 根据权利要求1所述的方法,其特征在于,基于与所述查询词相关的历史展现数据,确定候选信息,进一步包括:The method according to claim 1, wherein determining candidate information based on historical presentation data related to the query term further comprises:
    查找天级展现次数小于第二阈值的与该查询词相关联的历史展现数据;并且,Finding historical presentation data associated with the query word whose number of days of presentation is less than a second threshold; and,
    将与查找到的历史展现数据对应的信息确定为候选信息。Information corresponding to the found history presentation data is determined as candidate information.
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述信息包括以下至少之一:展现次数在预定值以下的信息、预定地域的信息、预定时段的信息。The method according to any one of claims 1 to 5, wherein the information comprises at least one of: information indicating that the number of times is below a predetermined value, information of a predetermined area, information of a predetermined time period.
  7. 一种提升信息的曝光率的装置,其特征在于,包括:A device for improving the exposure of information, comprising:
    第一确定模块,用于确定是否针对与接收到的查询词相关的未被展现过的信息执行曝光率提升处理;a first determining module, configured to determine whether an exposure rate improvement process is performed on the unexpressed information related to the received query word;
    检查模块,用于检查与该查询词相关联的历史查询数据的查询频率是否大于等于第一阈值; An checking module, configured to check whether a query frequency of the historical query data associated with the query word is greater than or equal to a first threshold;
    第二确定模块,用于基于与所述查询词相关的历史展现数据,确定候选信息;a second determining module, configured to determine candidate information based on historical presentation data related to the query term;
    预估模块,用于基于所有候选信息的基本数据及所有候选信息所属的信息组的历史展现数据,预估所有候选信息的展现质量参数;An estimation module, configured to estimate a presentation quality parameter of all candidate information based on basic data of all candidate information and historical presentation data of a group of information to which all candidate information belongs;
    推荐模块,用于将具有预估的最高展现质量参数的候选信息作为推荐信息推荐到候选展现队列以便以该预估的最高展现质量参数作为该推荐信息的展现质量参数进行整体展现竞争处理。And a recommendation module, configured to recommend the candidate information having the predicted highest presentation quality parameter as recommendation information to the candidate presentation queue to perform overall presentation contention processing with the estimated highest presentation quality parameter as the presentation quality parameter of the recommendation information.
  8. 根据权利要求7所述的装置,其特征在于,检查模块进一步被配置成:The apparatus of claim 7 wherein the inspection module is further configured to:
    如果与该查询词相关联的历史查询数据的查询频率小于第一阈值,则放弃曝光率提升处理。If the query frequency of the historical query data associated with the query term is less than the first threshold, the exposure rate promotion process is discarded.
  9. 根据权利要求7所述的装置,其特征在于,进一步包括:The device according to claim 7, further comprising:
    展现质量参数确定模块,用于如果该推荐信息在整体展现竞争处理中获得了展现,则将该推荐信息在该展现过程中获得的展现质量参数确定为该推荐信息的初始展现质量参数。And a presentation quality parameter determining module, configured to determine, as the initial presentation quality parameter of the recommendation information, the presentation quality parameter obtained by the recommendation information in the presentation process if the recommendation information is obtained in the overall presentation competition process.
  10. 根据权利要求7所述的装置,其特征在于,第一确定模块进一步包括:The apparatus according to claim 7, wherein the first determining module further comprises:
    获取子模块,用于基于与所述查询词相关的历史查询数据,获取调整参数;以及Obtaining a submodule for obtaining an adjustment parameter based on historical query data related to the query term;
    第一确定子模块,用于基于所述调整参数和系统产生的随机数,确定是否针对与接收到的查询词相关的未被展现过的信息执行曝光率提升处理。And a first determining submodule, configured to determine, according to the adjustment parameter and the random number generated by the system, whether to perform an exposure rate improvement process on the unexpressed information related to the received query word.
  11. 根据权利要求7所述的装置,其特征在于,第二确定模块进一步包括:The apparatus according to claim 7, wherein the second determining module further comprises:
    查找子模块,用于查找天级展现次数小于第二阈值的与该查询词相关联的历史展现数据;并且,a search submodule, configured to find historical presentation data associated with the query word whose number of days of presentation is less than a second threshold; and,
    第二确定子模块,用于将与查找到的历史展现数据对应的信息确定为候选信息。And a second determining submodule configured to determine information corresponding to the found historical presentation data as candidate information.
  12. 一种确定搜索词的价值的方法,其特征在于,包括:A method for determining the value of a search term, comprising:
    将待测搜索词的特征数据输入价值回归模型;Inputting characteristic data of the search term to be tested into a value regression model;
    基于价值回归模型,获取所述待测搜索词的价值数据;Obtaining value data of the search term to be tested based on the value regression model;
    其中,所述价值回归模型是通过如下方式获取的:Wherein, the value regression model is obtained by:
    将已有搜索词基于点击关系数据和/或展现关系数据而进行聚类,以获得聚类后的搜索词集合;Clustering existing search words based on click relationship data and/or presentation relationship data to obtain a clustered search word set;
    将搜索词集合分类为不同价值的搜索词集合;Classify a collection of search terms into a collection of search terms of different values;
    利用不同价值的搜索词集合进行模型训练以获取价值回归模型。Model training is performed using a collection of search terms of different values to obtain a value regression model.
  13. 根据权利要求12所述的方法,其特征在于,所述不同价值的搜索词集合包括高价值的搜索词集合、中价值的搜索词集合以及低价值的搜索词集合, 其中高价值的搜索词集合中搜索词的价值数据大于中价值的搜索词集合中搜索词的价值数据;以及中价值的搜索词集合中搜索词的价值数据大于低价值的搜索词集合中搜索词的价值数据。The method according to claim 12, wherein said set of search words of different values comprises a collection of high-value search words, a collection of search words of medium value, and a collection of search words of low value. The value data of the search word in the high value search word set is greater than the value data of the search word in the middle value search word set; and the value data of the search word in the middle value search word set is greater than the low value search word set search word Value data.
  14. 根据权利要求13所述的方法,其特征在于,高价值的搜索词集合中搜索词的价值数据为1、中价值的搜索词集合中搜索词的价值数据为0.5以及低价值的搜索词集合中搜索词的价值数据为0。The method according to claim 13, wherein the value data of the search term in the high-value search word set is 1, the value data of the search term in the medium value search word set is 0.5, and the low-value search word set is included. The value data of the search term is 0.
  15. 根据权利要求12所述的方法,其特征在于,将已有搜索词基于所述已有搜索词之间的点击关系数据和展现关系数据而进行聚类,以获得聚类后的搜索词集合,进一步包括:The method according to claim 12, wherein the existing search words are clustered based on the click relationship data and the presentation relationship data between the existing search words to obtain a clustered search word set. Further includes:
    获取不同搜索词的共同点击次数并基于所述共同点击次数计算点击关系数据和/或获取不同搜索词的共同展现次数并基于所述共同展现次数计算展现关系数据;Obtaining common clicks of different search terms and calculating click relationship data based on the common clicks and/or acquiring common presentation times of different search words and calculating presentation relationship data based on the common presentation times;
    基于所述点击关系数据、展现关系数据、共同展现次数和共同点击次数中的至少一个,计算已有搜索词之间的聚类距离;Calculating a clustering distance between the existing search words based on at least one of the click relationship data, the presentation relationship data, the common presentation times, and the common click times;
    基于所述聚类距离将已有搜索词进行聚类,以获得聚类后的搜索词集合。The existing search words are clustered based on the cluster distance to obtain a clustered search word set.
  16. 根据权利要求15所述的方法,其特征在于,共同点击次数、共同展现次数、点击关系数据、展现关系数据分别表示两个搜索词之间的共同点击次数、共同展现次数、点击关系数据、展现关系数据。The method according to claim 15, wherein the number of common clicks, the number of joint presentations, the click relationship data, and the presentation relationship data respectively represent a common click number, a common presentation number, a click relationship data, and a presentation between two search words. Relationship data.
  17. 根据权利要求13所述的方法,其特征在于,利用不同价值的搜索词集合进行模型训练以获取价值回归模型,进一步包括:将每个搜索词集合中的每个搜索词作为一份对应该搜索词集合的价值数据的样本,具体地,The method according to claim 13, wherein the model training is performed by using a set of search words of different values to obtain a value regression model, further comprising: searching each search word in each search word set as a corresponding search a sample of the value data of the word collection, specifically,
    将高价值的搜索词集合中的每个搜索词作为一份2样本、中价值的搜索词集合中的每个搜索词作为一份1样本并且低价值的搜索词集合中的每个搜索词作为一份0样本利用所述逻辑回归算法进行训练以形成所述价值回归模型。Each search word in the high-value search word set is used as a one-sample and one-value search word set in each of the low-value search word sets as a one-sample, medium-value search word set. A zero sample is trained using the logistic regression algorithm to form the value regression model.
  18. 一种确定搜索词的价值的装置,其特征在于,包括:A device for determining the value of a search term, comprising:
    输入模块,用于将待测搜索词的特征数据输入价值回归模型;An input module, configured to input feature data of the search term to be tested into a value regression model;
    获取模块,用于基于价值回归模型,获取所述待测搜索词的价值数据;An obtaining module, configured to obtain value data of the search term to be tested based on a value regression model;
    其中,所述价值回归模型是通过如下模块获取的:Wherein, the value regression model is obtained by the following module:
    聚类模块,用于将已有搜索词基于点击关系数据和/或展现关系数据而进行聚类,以获得聚类后的搜索词集合;a clustering module, configured to cluster existing search words based on click relationship data and/or presentation relationship data to obtain a clustered search word set;
    分类模块,用于将搜索词集合分类为不同价值的搜索词集合;a classification module for classifying a collection of search words into a collection of search words of different values;
    模型获取模块,用于利用不同价值的搜索词集合进行模型训练以获取价值回归模型。A model acquisition module is configured to perform model training using a set of search words of different values to obtain a value regression model.
  19. 根据权利要求18所述的装置,其特征在于,所述不同价值的搜索词集 合包括高价值的搜索词集合、中价值的搜索词集合以及低价值的搜索词集合,其中高价值的搜索词集合中搜索词的价值数据大于中价值的搜索词集合中搜索词的价值数据;以及中价值的搜索词集合中搜索词的价值数据大于低价值的搜索词集合中搜索词的价值数据。The device according to claim 18, wherein said different value search term sets The high-value search word set, the medium value search word set, and the low-value search word set, wherein the value data of the search word in the high-value search word set is greater than the value data of the search word in the medium value search word set; And the value data of the search term in the set of search words of the medium value is greater than the value data of the search term in the set of search words of the low value.
  20. 根据权利要求19所述的装置,其特征在于,高价值的搜索词集合中搜索词的价值数据为1、中价值的搜索词集合中搜索词的价值数据为0.5以及低价值的搜索词集合中搜索词的价值数据为0。The apparatus according to claim 19, wherein the value data of the search words in the high-value search word set is 1, the value data of the search words in the set of medium value search words is 0.5, and the low-value search word set is included. The value data of the search term is 0.
  21. 根据权利要求18所述的装置,其特征在于,聚类模块进一步包括:The apparatus according to claim 18, wherein the clustering module further comprises:
    关系数据获取子模块,用于获取不同搜索词的共同点击次数并基于所述共同点击次数计算点击关系数据和/或获取不同搜索词的共同展现次数基于所述共同展现次数计算展现关系数据;a relationship data obtaining sub-module, configured to acquire common click times of different search words, calculate click relationship data based on the common click times, and/or obtain a common presentation number of different search words, and calculate presentation relationship data based on the common presentation times;
    计算子模块,用于基于所述点击关系数据、展现关系数据、共同展现次数和共同点击次数中的至少一个,计算已有搜索词之间的聚类距离;以及a calculation submodule, configured to calculate a cluster distance between the existing search words based on at least one of the click relationship data, the presentation relationship data, the common presentation times, and the common click times;
    获取子模块,用于基于所述聚类距离将已有搜索词进行聚类,以获得聚类后的搜索词集合。The obtaining submodule is configured to cluster existing search words based on the cluster distance to obtain a clustered search word set.
  22. 根据权利要求21所述的装置,其特征在于,共同点击次数、共同展现次数、点击关系数据、展现关系数据分别表示两个搜索词之间的共同点击次数、共同展现次数、点击关系数据、展现关系数据。The device according to claim 21, wherein the number of common clicks, the number of joint presentations, the click relationship data, and the presentation relationship data respectively represent a common click number, a common presentation number, a click relationship data, and a presentation between two search words. Relationship data.
  23. 根据权利要求19所述的装置,其特征在于,模型获取模块进一步被配置成:The apparatus of claim 19 wherein the model acquisition module is further configured to:
    将每个搜索词集合中的每个搜索词作为一份对应该搜索词集合的价值数据的样本,具体地,Using each search term in each search term set as a sample of value data corresponding to the set of search terms, specifically,
    将高价值的搜索词集合中的每个搜索词作为一份2样本、中价值的搜索词集合中的每个搜索词作为一份1样本并且低价值的搜索词集合中的每个搜索词作为一份0样本利用所述逻辑回归算法进行训练以形成所述价值回归模型。Each search word in the high-value search word set is used as a one-sample and one-value search word set in each of the low-value search word sets as a one-sample, medium-value search word set. A zero sample is trained using the logistic regression algorithm to form the value regression model.
  24. 一种计算机程序,包括计算机可读代码,当电子设备运行所述计算机可读代码运行时,导致权利要求1-6和12-17中的任一项权利要求所述的提升信息的曝光率的方法和确定搜索词的价值的方法被执行。A computer program comprising computer readable code for causing an exposure of an elevated information as claimed in any one of claims 1-6 and 12-17 when the electronic device is operative to run the computer readable code The method and method of determining the value of the search term are performed.
  25. 一种计算机可读介质,其中存储了如权利要求24所述的计算机程序。 A computer readable medium storing the computer program of claim 24.
PCT/CN2014/094298 2014-02-24 2014-12-19 Method and device for promoting exposure rate of information, method and device for determining value of search word WO2015124024A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201410063058.0A CN104866493B (en) 2014-02-24 2014-02-24 A kind of method and apparatus for the exposure rate promoting information
CN201410063058.0 2014-02-24
CN201410098737.1 2014-03-17
CN201410098737.1A CN104933047B (en) 2014-03-17 2014-03-17 Method and device for determining value of search term

Publications (1)

Publication Number Publication Date
WO2015124024A1 true WO2015124024A1 (en) 2015-08-27

Family

ID=53877618

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/094298 WO2015124024A1 (en) 2014-02-24 2014-12-19 Method and device for promoting exposure rate of information, method and device for determining value of search word

Country Status (1)

Country Link
WO (1) WO2015124024A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447724A (en) * 2015-12-15 2016-03-30 腾讯科技(深圳)有限公司 Content item recommendation method and apparatus
US20170293934A1 (en) * 2015-05-11 2017-10-12 Tencent Technology (Shenzhen) Company Limited Method for determining validity of delivering of promotion information, monitoring server and terminal
CN110210882A (en) * 2018-03-21 2019-09-06 腾讯科技(深圳)有限公司 Promote position matching process and device, promotion message methods of exhibiting and device
CN111428125A (en) * 2019-01-10 2020-07-17 北京三快在线科技有限公司 Sorting method and device, electronic equipment and readable storage medium
CN112749333A (en) * 2020-07-24 2021-05-04 腾讯科技(深圳)有限公司 Resource searching method and device, computer equipment and storage medium
CN112765452A (en) * 2020-12-31 2021-05-07 北京百度网讯科技有限公司 Search recommendation method and device and electronic equipment
CN113111085A (en) * 2021-04-08 2021-07-13 达而观信息科技(上海)有限公司 Automatic hierarchical exploration method and device based on streaming data

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1877581A (en) * 2006-07-12 2006-12-13 百度在线网络技术(北京)有限公司 Advertisement display system and method used for Internet search engine
CN101331475A (en) * 2005-12-14 2008-12-24 微软公司 Automatic detection of online commercial intention
CN101980211A (en) * 2010-11-12 2011-02-23 百度在线网络技术(北京)有限公司 Machine learning model and establishing method thereof
CN101980210A (en) * 2010-11-12 2011-02-23 百度在线网络技术(北京)有限公司 Marked word classifying and grading method and system
US20110231241A1 (en) * 2010-03-18 2011-09-22 Yahoo! Inc. Real-time personalization of sponsored search based on predicted click propensity
CN102387411A (en) * 2010-09-06 2012-03-21 康佳集团股份有限公司 Set-top box and method for set-top box to play advertisement
CN103164454A (en) * 2011-12-15 2013-06-19 百度在线网络技术(北京)有限公司 Keyword grouping method and keyword grouping system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101331475A (en) * 2005-12-14 2008-12-24 微软公司 Automatic detection of online commercial intention
CN1877581A (en) * 2006-07-12 2006-12-13 百度在线网络技术(北京)有限公司 Advertisement display system and method used for Internet search engine
US20110231241A1 (en) * 2010-03-18 2011-09-22 Yahoo! Inc. Real-time personalization of sponsored search based on predicted click propensity
CN102387411A (en) * 2010-09-06 2012-03-21 康佳集团股份有限公司 Set-top box and method for set-top box to play advertisement
CN101980211A (en) * 2010-11-12 2011-02-23 百度在线网络技术(北京)有限公司 Machine learning model and establishing method thereof
CN101980210A (en) * 2010-11-12 2011-02-23 百度在线网络技术(北京)有限公司 Marked word classifying and grading method and system
CN103164454A (en) * 2011-12-15 2013-06-19 百度在线网络技术(北京)有限公司 Keyword grouping method and keyword grouping system

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170293934A1 (en) * 2015-05-11 2017-10-12 Tencent Technology (Shenzhen) Company Limited Method for determining validity of delivering of promotion information, monitoring server and terminal
US10719847B2 (en) * 2015-05-11 2020-07-21 Tencent Technology (Shenzhen) Company Limited Method for determining validity of delivering of promotion information, monitoring server and terminal
CN105447724A (en) * 2015-12-15 2016-03-30 腾讯科技(深圳)有限公司 Content item recommendation method and apparatus
CN110210882A (en) * 2018-03-21 2019-09-06 腾讯科技(深圳)有限公司 Promote position matching process and device, promotion message methods of exhibiting and device
CN111428125A (en) * 2019-01-10 2020-07-17 北京三快在线科技有限公司 Sorting method and device, electronic equipment and readable storage medium
CN111428125B (en) * 2019-01-10 2023-05-30 北京三快在线科技有限公司 Ordering method, ordering device, electronic equipment and readable storage medium
CN112749333A (en) * 2020-07-24 2021-05-04 腾讯科技(深圳)有限公司 Resource searching method and device, computer equipment and storage medium
CN112749333B (en) * 2020-07-24 2024-01-16 腾讯科技(深圳)有限公司 Resource searching method, device, computer equipment and storage medium
CN112765452A (en) * 2020-12-31 2021-05-07 北京百度网讯科技有限公司 Search recommendation method and device and electronic equipment
CN112765452B (en) * 2020-12-31 2024-02-27 北京百度网讯科技有限公司 Search recommendation method and device and electronic equipment
CN113111085A (en) * 2021-04-08 2021-07-13 达而观信息科技(上海)有限公司 Automatic hierarchical exploration method and device based on streaming data
CN113111085B (en) * 2021-04-08 2024-01-30 达观数据有限公司 Automatic hierarchical exploration method and device based on stream data

Similar Documents

Publication Publication Date Title
WO2015124024A1 (en) Method and device for promoting exposure rate of information, method and device for determining value of search word
Ibrahim et al. Decoding the sentiment dynamics of online retailing customers: Time series analysis of social media
TWI512653B (en) Information providing method and apparatus, method and apparatus for determining the degree of comprehensive relevance
US8341101B1 (en) Determining relationships between data items and individuals, and dynamically calculating a metric score based on groups of characteristics
US20160285672A1 (en) Method and system for processing network media information
TW201224972A (en) Sorting method and apparatus of query results
CN110457577B (en) Data processing method, device, equipment and computer storage medium
WO2018053966A1 (en) Click rate estimation
WO2018040069A1 (en) Information recommendation system and method
CN107526810B (en) Method and device for establishing click rate estimation model and display method and device
WO2018149337A1 (en) Information distribution method, device, and server
CN108921398B (en) Shop quality evaluation method and device
CN106445954B (en) Business object display method and device
CN108369674B (en) System and method for subdividing customers with mixed attribute types using a target clustering approach
CN105956882A (en) Method and device for getting procurement demand
CN109740036B (en) Hotel ordering method and device for OTA platform
CN106951527B (en) Song recommendation method and device
WO2022081267A1 (en) Product evaluation system and method of use
CN111626767B (en) Resource data issuing method, device and equipment
CN111259272A (en) Search result ordering method and device
TWI803823B (en) Resource information pushing method, device, server and storage medium
CN105389714B (en) Method for identifying user characteristics from behavior data
WO2021104513A1 (en) Object display method and apparatus, electronic device and storage medium
CN110020209B (en) Method and system for determining correlation between content and search word and method and system for displaying correlation
CN112287208B (en) User portrait generation method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14883023

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14883023

Country of ref document: EP

Kind code of ref document: A1