WO2015124024A1

WO2015124024A1 - Method and device for promoting exposure rate of information, method and device for determining value of search word

Info

Publication number: WO2015124024A1
Application number: PCT/CN2014/094298
Authority: WO
Inventors: 王超; 邓钦华; 许晟
Original assignee: 北京奇虎科技有限公司; 奇智软件（北京）有限公司
Priority date: 2014-02-24
Filing date: 2014-12-19
Publication date: 2015-08-27

Abstract

A method and device for promoting the exposure rate of information, and a method and device for determining a value of a search word. The method for promoting the exposure rate of information comprises: determining whether to execute exposure rate promotion processing on information related to a received query word (S110); if so, checking whether the query frequency of historical query data associated with the query word is greater than or equal to a first threshold value (S120); if so, based on historical presentation data related to the query word, determining candidate information (S130); based on basic data of all candidate information and historical presentation data of an information group to which all the pieces of candidate information belong, pre-estimating presentation quality parameters of all the pieces of candidate information (S140); and recommending candidate information with the highest pre-estimated presentation quality parameter as recommended information to a candidate presentation queue, so that the pre-estimated highest presentation quality parameter is used as a presentation quality parameter of the recommended information to conduct overall presentation competition processing (S150). The exposure rate of information is improved, more presentation opportunities are given to different information in unit time, and the user experience is improved.

Description

Method and device for improving exposure of information, method and device for determining value of search word

Technical field

The present invention relates to the field of computer technology, and more particularly to a method and apparatus for increasing the exposure of information, and a method and apparatus for determining the value of a search term.

Background technique

With the development of the Internet business, more and more types of services appear on the Internet, such as advertising information services. For the information service on the Internet, the exposure or presentation of information is the basic guarantee for the information owner (for example, the advertiser) to achieve the effect of advertising information, the main purpose of searching for information custom creative and competitive price, and the value of the information master. basis. However, in the actual design of information bidding system, it is necessary to consider the balance between efficiency and fairness, because in essence, fairness is the recognition of potential and will definitely improve the efficiency of the future. The problem in reality is that many of the ideas of the main customization of the information vary greatly in performance. On the one hand, some newly designed and expected revenue-efficient ideas are not effectively displayed. On the other hand, some information that has been revealed but whose revenue efficiency has declined over time is continuously displayed in the system. There is a negative impact on maximizing efficiency and profitability.

In view of the above problems of the information service of the Internet, the present invention proposes a solution for improving the exposure rate of information. The present invention mainly solves the following problems in view of the shortcomings of the existing solutions:

Being able to balance efficiency and fairness, transforming more opportunities for presentation to different information in a unit of time, avoiding outrageous;

By providing more choices, creating a more user experience through the diversity of information, avoiding the conversion rate reduction caused by repetitive fatigue;

According to the historical data, the dynamic adjustment of the opportunity is presented, and the information exposure rate is increased under the premise of ensuring the minimum loss.

Summary of the invention

In order to increase the exposure of unexpressed information, it is a primary object of the present invention to provide a method and apparatus for increasing the exposure of information, a method and apparatus for determining the value of a search term, a computer program, and a computer readable medium.

The present invention provides a method of increasing exposure of information, comprising: determining whether to perform an exposure promotion process for information related to a received query term; if so, checking historical query data associated with the query term Whether the query frequency is greater than or equal to the first threshold; if yes, determining the candidate information based on the historical presentation data related to the query word; based on the basic data of all candidate information and the historical presentation data of the information group to which all candidate information belongs, Estimate the quality of presentation of all candidate information And selecting candidate information having the predicted highest presentation quality parameter as recommendation information to the candidate presentation queue to perform overall presentation contention processing with the estimated highest presentation quality parameter as the presentation quality parameter of the recommendation information.

The present invention also provides an apparatus for improving the exposure rate of information, comprising: a first determining module, configured to determine whether to perform an exposure rate improvement process on undisplayed information related to the received query word; and an inspection module, a second determining module, configured to check whether the query frequency of the historical query data associated with the query word is greater than or equal to a first threshold; and the second determining module, configured to determine candidate information based on historical presentation data related to the query term; A prediction quality parameter for estimating the quality of all candidate information based on the basic data of all the candidate information and the historical presentation data of the information group to which all candidate information belongs, and a recommendation module for using the candidate information having the highest predicted quality parameter as the prediction The recommendation information is recommended to the candidate presentation queue to perform overall presentation contention processing with the estimated highest presentation quality parameter as the presentation quality parameter of the recommendation information.

According to an aspect of the present invention, a method for determining a value of a search term is provided, comprising: inputting feature data of a search term to be tested into a value regression model; and acquiring the search term to be tested based on a value regression model Value data. ;

The value regression model is obtained by clustering existing search words based on click relationship data and/or presentation relationship data to obtain a clustered search word set; classifying the search word set into A collection of search terms of different values; model training using different sets of search terms to obtain a value regression model.

According to another aspect of the present invention, an apparatus for determining a value of a search term is provided, comprising: an input module, configured to input feature data of a search term to be tested into a value regression model; and an acquisition module, configured to The value regression model obtains the value data of the search term to be tested; wherein the value regression model is obtained by the following module: a clustering module, configured to use the existing search term based on the click relationship data and/or the presentation relationship data Clustering is performed to obtain a clustered search word set; a classification module is used to classify the search word set into different value search word sets; and a model acquisition module is used to perform model training using different value search word sets. Get a value regression model.

According to another aspect of the present invention, there is provided a computer program comprising computer readable code, a method of causing an exposure of the enhanced information and determining a search term when the electronic device runs the computer readable code The method of value is implemented.

According to still another aspect of the present invention, a computer readable medium storing a computer program as described above is provided.

Compared with the prior art, the technical solution of the method and apparatus for improving the exposure of information according to the present invention has the following beneficial effects: dynamically adjusting the presentation opportunity according to historical presentation data, and increasing information exposure under the premise of ensuring minimum loss. Rate, while giving more opportunities to show in unit time The same information and improved user experience.

According to the method and apparatus for determining the value of a search term according to the present invention, the value of the search term can be more accurately determined and the valuable data information (such as an advertisement) can be selected based on the search term value data to improve the user experience and improve the information click rate. , improve information exposure.

The above description is only an overview of the technical solutions of the present invention, and the above-described and other objects, features and advantages of the present invention can be more clearly understood. Specific embodiments of the invention are set forth below.

DRAWINGS

The general technician will become clear. The drawings are only for the purpose of illustrating the preferred embodiments and are not to be construed as limiting. Throughout the drawings, the same reference numerals are used to refer to the same parts. In the drawing:

1 is a flow chart of a method of increasing exposure of information according to an embodiment of the present invention;

2 is a structural diagram of an apparatus for increasing exposure of information according to an embodiment of the present invention;

3 shows a flow chart of a method of obtaining a value regression model in accordance with one embodiment of the present invention;

4 shows a flow chart of a method of determining the value of a search term in accordance with one embodiment of the present invention;

FIG. 5 is a block diagram showing an apparatus for determining a value of a search term according to an embodiment of the present invention; FIG.

Figure 6 shows a block diagram of an electronic device for performing the method of the present invention;

Figure 7 shows a schematic diagram of a memory unit for holding or carrying program code implementing a method in accordance with the present invention.

detailed description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the embodiments of the present invention have been shown in the drawings, the embodiments Rather, these embodiments are provided so that this disclosure will be more fully understood and the scope of the disclosure will be fully disclosed.

The improved technical solution of the present invention will be described in detail below with reference to the accompanying drawings.

As shown in FIG. 1, FIG. 1 is a flowchart of a method of increasing exposure of information according to an embodiment of the present invention.

At step S110, it is determined whether or not the exposure rate promotion processing is performed for the unexpressed information related to the received query word.

Specifically, after receiving the query word, the method of the present invention first needs to determine whether to perform an exposure promotion process for the unexpressed information related to the received query word. Can also target Each time the query request is executed, the exposure improvement process is performed, but such a realization is less efficient, that is, it may give a reluctant expectation that a large result shows too much opportunity, thereby reducing the efficiency of the entire system. Therefore, a determination can be made for each query request, and the ratio of the exposure improvement processing required in the system is controlled within a range, for example, the ratio control of the query request selected to participate in the exposure promotion processing and the total query request Not more than 5%. It should be understood that this ratio can be adjusted as needed.

Specifically, the server has a historical query database storing historical query data, and the historical query data in the database is used to provide historical request information of each query word, so that whether the related query term is not displayed may be obtained. The past information performs the adjustment parameters of the exposure improvement process.

Specifically, based on the historical request data related to the query, acquiring an adjustment parameter; determining, based on the adjustment parameter and the random number generated by the system, whether to perform an exposure rate on the undisplayed information related to the received query. Improve processing.

More specifically, first, an adjustment parameter is acquired based on historical query data related to the query. For example, the benchmark of the adjustment parameter can be set to 1.0, and based on the benchmark 1.0, according to the historical request data (which may be, but not limited to, the frequency of the query and the click rate, etc.), it may be, but is not limited to, using the following formula to adjust the pre- Estimate to get the adjustment parameters:

Adjustment parameter = 1.0 + alpha * click rate + beta * log (gama / frequency) formula 1

In the formula, for example, alpha=0.2, beta=0.3, gama=1000;

Then, based on the adjustment parameters and the random number generated by the system, it is determined whether the exposure rate enhancement processing is performed for the unexpressed information related to the received query.

Specifically, for example, the random number can be generated using a uniform distribution (for example, a target ratio of 5%, that is, a uniform distribution with a parameter of 20), so that its final result satisfies the target ratio (5%) described above. Of course, it should be understood that random numbers can also be generated in other ways.

Then, as described above, based on the random number, the final result is made to satisfy the target ratio; and then based on the adjustment parameter, a determination threshold for determining whether to perform exposure rate promotion processing for the undisplayed information related to the received query can be obtained, Judgment Threshold = Target Ratio * Adjustment Parameters. Thereby, it is determined based on the judgment threshold whether or not the exposure rate improvement processing is performed for the information that has not been presented related to the received query.

For example, when the determination threshold is greater than or equal to 5%, it is determined that the exposure rate promotion process is performed for the undisplayed information related to the received query; when the determination threshold is less than 5%, it is determined that the related query is not related to the received query. The information that has been presented does not perform the exposure enhancement process. It should be understood that other thresholds may be selected as desired without being limited to the specific threshold values described above.

For example, the information that needs to be presented may be advertising information, and it may be determined whether The received advertisement information related to the query word is not subjected to the exposure improvement processing. That is to say, firstly, based on the historical request data related to the query, obtaining an adjustment parameter; then, based on the adjustment parameter and the random number generated by the system, determining whether the unrelated item related to the received query is not displayed The advertising information performs an exposure improvement process.

According to an embodiment of the present invention, the information may include at least one of the following: information indicating that the number of times is below a predetermined value, information of a predetermined area, information of a predetermined time period. For example, the information may be advertisement information with a number of presentations of less than 10 times, advertisement information of Beijing, and the like. It should be understood that the information of the present invention may also be other types of information.

If it is determined at step 110 that the exposure promotion process is performed for the undisplayed information associated with the received query term, then at step 120, it is checked if the query frequency of the historical query data associated with the query term is greater than Equal to the first threshold. The first threshold can be, for example, two per hour.

Specifically, the frequency of querying historical query data associated with the query term is checked. If the frequency of the query is high, for example, greater than or equal to twice per hour, the method proceeds to step 130. If the frequency of the query is low, for example less than twice per hour, the method ends.

It should be understood that the first threshold is not limited to the above values, but any suitable value may be selected as the first threshold as needed.

Next, at step 130, candidate information is determined based on historical presentation data associated with the query term.

Specifically, in the database of the pre-established historical presentation data, the historical presentation data related to the query word is searched, and the historical presentation data associated with the query word whose number of days of presentation is less than the second threshold is determined. For example, the second threshold can be 10 times. That is, it is said that the history presentation data associated with the query word is searched for and determined less than 10 times per day, and the information corresponding to the found history presentation data is determined as the candidate information. It should be understood that historical presentation data associated with the query term that is displayed at other time levels less than a certain threshold may also be determined. For example, the weekly (seven days) presentation times are less than 70 times, or the 60 hour presentation times are less than 25 times, and so on.

For example, in the historical presentation data related to the query term "Annual Gift Package", the number of days of presentation of the information ID "A1234123", the information ID "A1231312", and the information ID "A1343141" is less than the second threshold (for example, 10 times), These three pieces of information are thus determined as candidate information.

After the candidate information is determined, at step 140, the presentation quality parameters of all candidate information are estimated based on the basic data of all candidate information and the historical presentation data of the information group to which all candidate information belongs.

Specifically, the presentation quality parameters of all candidate information are estimated based on the basic data of all candidate information and the historical presentation data of the information group to which all candidate information belongs.

For example, the information ID "A1234123" and the information ID "A1231312" belong to one information group "G111223", The information ID "A1343141" belongs to another information group "G222121". Querying various basic data of the above three candidate information, and querying historical presentation data of other same group information IDs in the above two information groups to which they belong, and the historical presentation data of each information ID includes presentation quality parameters of each information ID, The highest presentation quality parameter from each of the groups is used as an estimated presentation quality parameter for candidate information in the group. For example, the highest presentation quality parameter a (the presentation quality parameter of the information A) in the information group "G111223" is taken as the estimated presentation quality parameter a of the candidate information (information ID "A1234123", information ID "A1231312"). For example, the highest presentation quality parameter b (the presentation quality parameter of the information b) in the information group "G222121" is taken as the estimated presentation quality parameter b of the candidate information (information ID "A1343141").

Then, at step 150, the candidate information having the predicted highest presentation quality parameter is recommended as recommendation information to the candidate presentation queue to perform overall presentation with the estimated highest presentation quality parameter as the presentation quality parameter of the recommendation information. Competitive processing.

In the above example, for example, if the parameter a is larger than the parameter b, the candidate information having the predicted highest presentation quality parameter (parameter a) is recommended as recommendation information into the candidate presentation queue.

Specifically, the information ID "A1234123" and the information ID "A1231312" of the candidate information having the predicted highest presentation quality parameter a can be recommended into the candidate presentation queue.

Optionally, only one of the plurality of candidate information may be recommended to the candidate presentation queue at a time by polling.

It should be understood that other suitable means may be used to recommend one of the plurality of candidate information to be recommended in the candidate presentation queue. In order to perform the overall contention competition process with the estimated highest presentation quality parameter as the presentation quality parameter of the recommendation information.

The overall presentation of the competition process is the selection of candidate results based on the ranking. Taking the search advertisement information as an example, first, the advertisement information that enters the overall competition processing is scored and sorted according to a predetermined rule, for example, according to the scores from large to small. For example, the criteria for ranking search advertising information can be determined, for example, in two parts: the quality of the creative creative, and the bid price of the keyword. That is, sort score = ad creative quality * keyword bid price. Next, calculate the push left and filter results. This step is similar to categorization, pushing premium ads to the left and filtering inferior ads. Finally, depending on the business needs, the advertising information is presented. However, not all ads will be shown. For example, search ads are generally 3 on the left and 8 on the right.

Thus, by the solution of the present invention for improving the exposure of information, the opportunity for information to enter the overall competition process is enhanced, and the opportunity for the information to be finally presented is finally improved.

The present invention also provides an apparatus for increasing the exposure of information. As shown in FIG. 2, FIG. 2 is a structural diagram 200 of an apparatus for increasing the exposure rate of information according to an embodiment of the present invention.

The device 200 may include: a first determining module 210, an checking module 220, a second determining module 230, The estimation module 240 and the recommendation module 250.

The first determining module 210 may be configured to determine whether to perform an exposure promotion process for the undisplayed information related to the received query word.

According to an embodiment of the present application, the first determining module 210 may further include: an obtaining submodule 211 and a first determining submodule 212. The obtaining sub-module 211 can be configured to obtain an adjustment parameter based on the historical query data related to the query term, and the first determining sub-module 212 can be configured to determine whether to target the based on the adjusted parameter and the random number generated by the system. The received information related to the query word is not subjected to the exposure improvement process.

The checking module 220 can be configured to check whether the query frequency of the historical query data associated with the query word is greater than or equal to a first threshold.

In an embodiment, the checking module 220 may be further configured to abandon the exposure rate promotion process if the query frequency of the historical query data associated with the query term is less than the first threshold. The first threshold can be, for example, two per hour.

The second determining module 230 can be configured to determine candidate information based on historical presentation data related to the query term.

According to an embodiment of the present application, the second determining module 230 may further include: a searching submodule 231 and a second determining submodule 232. The searching sub-module 231 can be configured to search for historical presentation data associated with the query word whose number of days of presentation is less than a second threshold; and the second determining sub-module 232 can be configured to correspond to the searched historical presentation data. The information is determined as candidate information.

The estimation module 240 can be configured to estimate the presentation quality parameters of all candidate information based on the basic data of all candidate information and the historical presentation data of the information group to which all candidate information belongs.

The recommendation module 250 may be configured to recommend candidate information having the predicted highest presentation quality parameter as recommendation information to the candidate presentation queue to perform overall presentation contention processing with the estimated highest presentation quality parameter as the presentation quality parameter of the recommendation information.

According to an embodiment of the present application, the apparatus 200 may further include: a presentation quality parameter determination module (not shown), the module may be configured to: if the recommendation information is obtained in the overall presentation competition process, the recommendation information The presentation quality parameter obtained in the presentation process is determined as an initial presentation quality parameter of the recommendation information.

Since the specific implementation of each module included in the apparatus of the present invention described in FIG. 2 corresponds to the specific embodiment of the steps in the method of the present invention, since FIG. 1 has been described in detail, The invention is blurred and the specific details of the individual modules are not described here.

In the method for determining the value of a search term, a method for determining the value of a search term mainly includes the following steps:

Step 1. Count the number of ad impressions and ad clicks for all search terms in the ad impression log. the amount;

Step 2: Calculate the ad click rate of the search term = the number of search term clicks / the number of search ad impressions;

Step 3: if the search term click rate is less than a threshold and the number of advertisement presentations is greater than a threshold, the search term is low value; conversely, if the search term click rate is greater than a threshold and the number of advertisements is greater than a threshold, then The search terms are of high value. The specific examples are as follows: for example, the threshold of the search term click rate is 5%, the threshold of the search term exhibiting threshold is 50; and the search term "prose of the sunset" is 100, and the number of clicks is 1, the word is low value. The search term "laptop" ad shows 10,000 times and the number of clicks is 1000, the word is high value.

In this implementation, it is necessary to manually specify the search term click rate threshold and the search term display threshold, and the effect depends greatly on the worker's experience; and the implementation can only judge whether the value is high or low, and cannot give a value. The specific value is not smooth enough in practical applications; moreover, the implementation mainly comes from statistics, so the promotion is poor, the coverage rate is relatively low, and the accuracy rate also has room for improvement, which cannot fully meet the needs of the search advertising system.

The technical solution of the improved method for determining the value of a search term of the present invention will be described in detail below with reference to the accompanying drawings.

In order to better understand the technical solution of the present invention, the acquisition method of the value regression model of the present invention is first introduced. As shown in FIG. 3, FIG. 3 is a flow chart of a method of obtaining a value regression model in accordance with one embodiment of the present invention.

At step S310, the existing search words are clustered based on the click relationship data and/or the presentation relationship data to obtain a clustered search word set.

Specifically, first, it is necessary to acquire common click times of different search words and calculate click relationship data based on the common click times and/or obtain common presentation times of different search words and calculate presentation relationship data based on the common presentation times.

For example, the number of common presentations of different search terms can be obtained and the presentation relationship data can be calculated based on the number of common presentations.

It is assumed that a certain search word is Q1, and the data displayed by the search engine based on the search word is D1, D2, D3, D4; and another search word input is Q2, and the search word is displayed by the search engine based on the search word. The data is D2, D3, D5, D7, then their common presentation times are 2 (D2, D3); at this point, some correlation can be used to describe the relationship between Q1 and Q2, for example, this correlation can be assumed. The display relationship of Q1, Q2 can be expressed as the presentation correlation 2/4=0.5.

It should be understood that any other suitable means may be used to represent the exhibition between two search terms. Now the relationship is not limited to the above. For example, the correlation may also be defined as the number of presentations of the common presentation number / Q2 or the number of common presentations / (the number of presentations of Q1 + the number of presentations of Q2) and the like.

Similarly, the presentation relationship data between the search terms can be obtained.

In addition, it is also possible to obtain the common clicks of different search terms and calculate the click relationship data based on the common clicks.

It is assumed that a certain search word is Q1, and the data that is displayed by the search engine and clicked by the user based on the search word is D1, D2, D3, D4; and another search word that is input is Q2, based on the search word. The data displayed by the search engine and clicked by the user is D2, D3, D4, D7, then their common clicks are 3 (D2, D3, D4); at this time, a correlation can be used to describe between Q1 and Q2. The click relationship, for example, can be assumed that the correlation is defined as the number of clicks of the common click / Q1, then the click relationship of Q1, Q2 can be expressed as the click relevance 3/4 = 0.75.

Similarly, click relationship data between the search terms can be obtained.

It should be understood that any suitable other means may be used to represent the click relationship between two search terms, without being limited to the above. For example, the correlation may be defined as the number of clicks of the common click/Q2 or the number of common clicks/(the number of clicks of Q1 + the number of clicks of Q2) and the like.

It should be understood that the number of common clicks, the number of joint presentations, the click relationship data, and the presentation relationship data respectively represent the number of common clicks, the number of joint presentations, the click relationship data, and the presentation relationship data between the two search words. That is to say, the above parameters refer to the correlation parameters between the two search terms.

After at least one of the click relationship data, the presentation relationship data, the common click count, and the common presentation count is acquired, the calculated relationship may be calculated based on at least one of the click relationship data, the presentation relationship data, the common presentation count, and the common click count. There is a clustering distance between search terms. Then, the existing search words are clustered based on the cluster distance to obtain a clustered search word set.

Taking the above example, for example, the presentation data of Q1 is expressed as <D1, D2, D3, D4>, the presentation data of Q1 is represented as <D2, D3, D5, D7>, and then the Q1 and Q2 search are calculated using the clustering algorithm. The clustering distance between words. Through a similar method, the clustering distance of all the search words is calculated, thereby realizing the clustering of the search words. For example, the clustering distance between the search terms may be calculated based on at least one of the click relationship data, the presentation relationship data, the common click count, and the common presentation times, using a spectral clustering or kmeans clustering algorithm, thereby implementing the search term Clustering, and thus obtaining a clustered set of search terms.

At step S320, the set of search words is classified into a set of search words of different values.

In particular, all collections can be classified into a predetermined number of collections of search terms. Alternatively, for example, in a preferred embodiment of the present invention, the collection may be classified into three categories: a high value search word set, a medium value search word set, and a low value search word set, wherein the high value search term The value data of the search words in the set is greater than the value data of the search words in the set of search words of the medium value; and the value data of the search words in the set of search words of the medium value is greater than the value of the search words in the set of search words of the low value according to. All collections of search terms are classified into a set of search terms for a predetermined number according to certain rules. More specifically, for each search term, log data has been utilized to predetermine its value data. For example, the value of the search term can be measured by the value brought by the search in a thousand times, which reflects the profitability of the search term in the unit search, that is, its value. In this way, using the log statistics, the value data of the search term can be obtained, and each search term is determined to be, for example, three levels of high, medium, and low according to the value data distribution. Then, based on the value data of the single search term, the aggregated value data of the clustered search word set can be obtained. Similarly, the clustered search word set can be assigned as a collection of search words of different values.

It should be understood that certain rules that divide different values of search terms and/or search term sets are flexible and variable, which can be adjusted according to system requirements. For example, the search term can be divided into more grades or fewer grades, and the set of search words can also be divided into more grades or fewer grades. These divisions are all within the scope of the present invention.

At step S330, model training is performed using a set of search words of different values to obtain a value regression model.

After classifying the search terms, the model training is carried out using a set of search words of different values, and finally the value regression model is obtained.

Specifically, each search word in each search term set can be used as a sample of value data corresponding to the set of search words, specifically, taking the above example, each of the high value search word sets The search term is used as a one-sample, one-sample search term in the two-sample, medium-value search term as a one-sample and each search term in the low-value search term set is trained as a zero-sample using a logistic regression algorithm. The value regression model is formed. For example, suppose that in the value regression model, there are three clusters of annotation data: the search words in cluster 1 are, for example, "laptop", "mac air", "thinkpad", etc., and the commercial value is marked as 1 (higher business) Value); the search words in cluster 2 are "Andy Lau", "Zhang Xueyou", "Andy Lau's album", etc., the commercial value is marked as 0 (low business value); the search word in cluster 3 is "5 inch mobile phone has How big is it, "Is the android phone smooth?", and the commercial value is marked as 0.5 (medium business value). That is to say, the parameters of the value regression model are obtained through training, so that the value data of the search term is predicted by using the value regression model.

It should be understood that the manner in which the search terms in the set of search terms of different values are sampled may also be in any other suitable manner and is not limited to the above.

So far, the construction method of the value regression model has been described with reference to FIG.

Next, a method of determining the value of a search term of the present invention will be described using the formed value regression model and with reference to FIG. As shown in FIG. 4, FIG. 4 is a flow chart of a method of determining the value of a search term in accordance with an embodiment of the present invention.

At step S410, the feature data of the search term to be tested is input to the value regression model. in particular, In order to predict the value data of the search term to be tested by using the value regression model established by the method shown in FIG. 3, it is first necessary to extract the feature data of the search term to be tested and input it into the value regression model. The parameters of the value regression model have been obtained through the model training shown in Fig. 3. Now, the feature data of the search term to be tested is input into the model. The feature data of the search term may include, for example, but is not limited to, the length of the search term, the category of the search term, the result of the search term segmentation, and the like.

For example, in the value regression model, for example, there are three clusters of annotation data: the search words in cluster 1 are, for example, "laptop", "mac air", "thinkpad", etc., and the commercial value is marked as 1 ( Higher business value); the search terms in cluster 2 are "Andy Lau", "Zhang Xueyou", "Andy Lau's album", etc., the commercial value is marked as 0 (low business value); the search word in cluster 3 is "5 inch" How big is the mobile phone, "Whether the android phone is smooth", etc., the commercial value is marked as 0.5 (medium business value). For example, first, the feature data of the search term "Toshiba Notebook" to be tested is input into a value regression model.

At step S420, based on the value regression model, the value data of the search term to be tested is obtained.

In the above example, for example, the feature data of the search term "Toshiba notebook" is input into the value regression model, and the value data that the trained model will give to the "Toshiba notebook" is, for example, 0.8 (a number greater than 0.5 and less than or equal to 1) ). For another example, based on the value regression model, the value data of the search term "Li Lianjie" obtained is, for example, 0.1 (a number less than 0.5 greater than 0).

The present invention also provides an apparatus for determining the value of a search term. As shown in FIG. 5, FIG. 5 is a block diagram showing the structure of an apparatus 500 for determining the value of a search term according to an embodiment of the present invention.

Apparatus 500 can include an input module 510 and an acquisition module 520. The input module 510 can be used to input the search term to be tested into a value regression model. The obtaining module 530 can be configured to obtain value data of the search term to be tested based on a value regression model.

According to an embodiment of the invention, the value regression model can be obtained by the following module:

a clustering module (not shown), which can be used to cluster existing search words based on click relationship data and/or presentation relationship data to obtain a clustered search word set;

a classification module (not shown) that can be used to classify a collection of search terms into a collection of search terms of different values;

A model acquisition module (not shown) that can be used to model training with a collection of search terms of different values to obtain a value regression model.

According to an embodiment of the present invention, the set of search words of different values may include a high value search word set, a medium value search word set, and a low value search word set, wherein the value data of the search word in the high value search word set The value data of the search term in the set of search words greater than the medium value; and the value data of the search term in the set of search words of the medium value is greater than the value data of the search term in the set of search words of the low value.

The value data of the search word in the high-value search word set is 1. The value data of the search word in the set of the search value of the medium value is 0.5, and the value data of the search word in the low-value search word set is 0.

According to an embodiment of the present invention, the clustering module may further include a relational data acquisition sub-module, a calculation sub-module, and an acquisition sub-module.

The relationship data obtaining sub-module may be configured to obtain a common click count of different search terms and calculate a click relationship data and/or a common presentation number of different search words based on the common click times, and calculate the presentation relationship data based on the common presentation times. ;

The calculating submodule may be configured to calculate a clustering distance between the existing search words based on at least one of the click relationship data, the presentation relationship data, the common presentation times, and the common click times;

The obtaining sub-module may be configured to cluster existing search words based on the cluster distance to obtain a clustered search word set.

The common clicks, the common presentation times, the click relationship data, and the presentation relationship data respectively represent the number of common clicks, the common presentation times, the click relationship data, and the presentation relationship data between the two search words.

According to an embodiment of the invention, the model acquisition module may be further configured to:

Each search word in the high-value search word set is used as a one-sample and one-value search word set in each of the low-value search word sets as a one-sample, medium-value search word set. A zero sample is trained using the logistic regression algorithm to form the value regression model.

Since the functions implemented by the device in this embodiment substantially correspond to the foregoing method embodiments shown in FIG. 3 and FIG. 4, the description of the present embodiment is not exhaustive, and reference may be made to the related description in the foregoing embodiment. Do not repeat them.

The algorithms and displays provided herein are not inherently related to any particular computer, virtual system, or other device. Various general purpose systems can also be used with the teaching based on the teachings herein. The structure required to construct such a system is apparent from the above description. Moreover, the invention is not directed to any particular programming language. It is to be understood that the invention may be embodied in a variety of programming language, and the description of the specific language has been described above in order to disclose the preferred embodiments of the invention.

In the description provided herein, numerous specific details are set forth. However, it is understood that the embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures, and techniques are not shown in detail so as not to obscure the understanding of the description.

Similarly, the various features of the invention are sometimes grouped together into a single embodiment, in the above description of the exemplary embodiments of the invention, Figure, or a description of it. However, the method of the disclosure should not be construed as reflecting the intention that the claimed invention is claimed in the claims The features described are more features. Rather, as the following claims reflect, inventive aspects reside in less than all features of the single embodiments disclosed herein. Therefore, the claims following the specific embodiments are hereby explicitly incorporated into the embodiments, and each of the claims as a separate embodiment of the invention.

Those skilled in the art will appreciate that the modules in the client in the embodiment can be adaptively changed and placed in one or more clients different from the embodiment. The modules in the embodiments can be combined into one module, and further they can be divided into a plurality of sub-modules or sub-units or sub-components. In addition to such features and/or at least some of the processes or units being mutually exclusive, any combination of the features disclosed in the specification, including the accompanying claims, the abstract and the drawings, and any methods so disclosed, or All processes or units of the client are combined. Each feature disclosed in this specification (including the accompanying claims, the abstract and the drawings) may be replaced by alternative features that provide the same, equivalent or similar purpose.

In addition, those skilled in the art will appreciate that, although some embodiments described herein include certain features that are included in other embodiments and not in other features, combinations of features of different embodiments are intended to be within the scope of the present invention. Different embodiments are formed and formed. For example, in the following claims, any one of the claimed embodiments can be used in any combination.

The various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or digital signal processor (DSP) may be used in practice to implement some of the means for increasing the exposure of information and the means for determining the value of a search term in accordance with an embodiment of the present invention. Or some or all of the features of all components. The invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein. Such a program implementing the invention may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.

For example, FIG. 6 illustrates an electronic device that can implement the method of increasing the exposure of information of the present invention and a method of determining the value of a search term. The electronic device conventionally includes a processor 610 and a computer program product or computer readable medium in the form of a memory 620. The memory 620 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM. Memory 620 has a memory space 630 for program code 631 for performing any of the method steps described above. For example, storage space 630 for program code may include various program code 631 for implementing various steps in the above methods, respectively. The program code can be read from or written to one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such a computer program product is typically a portable or fixed storage unit as described with reference to FIG. The storage unit may have a storage section or a storage space or the like arranged similarly to the storage 620 in the electronic device of FIG. The program code can be compressed, for example, in an appropriate form. In general, the storage unit comprises a program 631' for performing the steps of the method according to the invention, ie a code readable by a processor, such as 610, which, when executed by the electronic device, causes the electronic device to perform the above Each step in the described method.

"an embodiment," or "an embodiment," or "an embodiment," In addition, it is noted that the phrase "in one embodiment" is not necessarily referring to the same embodiment.

It should be noted that the above-described embodiments are not intended to limit the invention, and that alternative embodiments may be devised without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as a limitation. The word "comprising" or "comprising" does not exclude the presence of the elements or the steps in the claims. The word "a" or "an" The invention can be implemented by means of hardware comprising several distinct elements and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means can be embodied by the same hardware item. The use of the words first, second, and third does not indicate any order. These words can be interpreted as names.

In addition, it should be noted that the language used in the specification has been selected for the purpose of readability and teaching, and is not intended to be construed or limited. Therefore, many modifications and changes will be apparent to those skilled in the art without departing from the scope of the invention. The disclosure of the present invention is intended to be illustrative, and not restrictive, and the scope of the invention is defined by the appended claims.

Claims

A method for improving the exposure of information, characterized by comprising:

Determining whether to perform an exposure promotion process for information related to the received query word;

If yes, checking whether the query frequency of the historical query data associated with the query word is greater than or equal to the first threshold;

If yes, determining candidate information based on historical presentation data associated with the query term;

Estimating the presentation quality parameters of all candidate information based on the basic data of all candidate information and the historical presentation data of the information group to which all candidate information belongs;

The candidate information having the predicted highest presentation quality parameter is recommended as recommendation information to the candidate presentation queue to perform overall presentation contention processing with the estimated highest presentation quality parameter as the presentation quality parameter of the recommendation information.
The method of claim 1 further comprising:

If the query frequency of the historical query data associated with the query term is less than the first threshold, the exposure rate promotion process is discarded.
The method of claim 1 further comprising:

If the recommendation information is obtained in the overall presentation competition process, the presentation quality parameter obtained by the recommendation information in the presentation process is determined as an initial presentation quality parameter of the recommendation information.
The method of claim 1, wherein determining whether to perform an exposure promotion process for the undisplayed information related to the received query term further comprises:

Obtaining adjustment parameters based on historical query data associated with the query term;

Based on the adjustment parameters and the random number generated by the system, it is determined whether the exposure rate enhancement processing is performed for the unexpressed information related to the received query word.
The method according to claim 1, wherein determining candidate information based on historical presentation data related to the query term further comprises:

Finding historical presentation data associated with the query word whose number of days of presentation is less than a second threshold; and,

Information corresponding to the found history presentation data is determined as candidate information.
The method according to any one of claims 1 to 5, wherein the information comprises at least one of: information indicating that the number of times is below a predetermined value, information of a predetermined area, information of a predetermined time period.
A device for improving the exposure of information, comprising:

a first determining module, configured to determine whether an exposure rate improvement process is performed on the unexpressed information related to the received query word;

An checking module, configured to check whether a query frequency of the historical query data associated with the query word is greater than or equal to a first threshold;

a second determining module, configured to determine candidate information based on historical presentation data related to the query term;

An estimation module, configured to estimate a presentation quality parameter of all candidate information based on basic data of all candidate information and historical presentation data of a group of information to which all candidate information belongs;

And a recommendation module, configured to recommend the candidate information having the predicted highest presentation quality parameter as recommendation information to the candidate presentation queue to perform overall presentation contention processing with the estimated highest presentation quality parameter as the presentation quality parameter of the recommendation information.
The apparatus of claim 7 wherein the inspection module is further configured to:

If the query frequency of the historical query data associated with the query term is less than the first threshold, the exposure rate promotion process is discarded.
The device according to claim 7, further comprising:

And a presentation quality parameter determining module, configured to determine, as the initial presentation quality parameter of the recommendation information, the presentation quality parameter obtained by the recommendation information in the presentation process if the recommendation information is obtained in the overall presentation competition process.
The apparatus according to claim 7, wherein the first determining module further comprises:

Obtaining a submodule for obtaining an adjustment parameter based on historical query data related to the query term;

And a first determining submodule, configured to determine, according to the adjustment parameter and the random number generated by the system, whether to perform an exposure rate improvement process on the unexpressed information related to the received query word.
The apparatus according to claim 7, wherein the second determining module further comprises:

a search submodule, configured to find historical presentation data associated with the query word whose number of days of presentation is less than a second threshold; and,

And a second determining submodule configured to determine information corresponding to the found historical presentation data as candidate information.
A method for determining the value of a search term, comprising:

Inputting characteristic data of the search term to be tested into a value regression model;

Obtaining value data of the search term to be tested based on the value regression model;

Wherein, the value regression model is obtained by:

Clustering existing search words based on click relationship data and/or presentation relationship data to obtain a clustered search word set;

Classify a collection of search terms into a collection of search terms of different values;

Model training is performed using a collection of search terms of different values to obtain a value regression model.
The method according to claim 12, wherein said set of search words of different values comprises a collection of high-value search words, a collection of search words of medium value, and a collection of search words of low value. The value data of the search word in the high value search word set is greater than the value data of the search word in the middle value search word set; and the value data of the search word in the middle value search word set is greater than the low value search word set search word Value data.
The method according to claim 13, wherein the value data of the search term in the high-value search word set is 1, the value data of the search term in the medium value search word set is 0.5, and the low-value search word set is included. The value data of the search term is 0.
The method according to claim 12, wherein the existing search words are clustered based on the click relationship data and the presentation relationship data between the existing search words to obtain a clustered search word set. Further includes:

Obtaining common clicks of different search terms and calculating click relationship data based on the common clicks and/or acquiring common presentation times of different search words and calculating presentation relationship data based on the common presentation times;

Calculating a clustering distance between the existing search words based on at least one of the click relationship data, the presentation relationship data, the common presentation times, and the common click times;

The existing search words are clustered based on the cluster distance to obtain a clustered search word set.
The method according to claim 15, wherein the number of common clicks, the number of joint presentations, the click relationship data, and the presentation relationship data respectively represent a common click number, a common presentation number, a click relationship data, and a presentation between two search words. Relationship data.
The method according to claim 13, wherein the model training is performed by using a set of search words of different values to obtain a value regression model, further comprising: searching each search word in each search word set as a corresponding search a sample of the value data of the word collection, specifically,

Each search word in the high-value search word set is used as a one-sample and one-value search word set in each of the low-value search word sets as a one-sample, medium-value search word set. A zero sample is trained using the logistic regression algorithm to form the value regression model.
A device for determining the value of a search term, comprising:

An input module, configured to input feature data of the search term to be tested into a value regression model;

An obtaining module, configured to obtain value data of the search term to be tested based on a value regression model;

Wherein, the value regression model is obtained by the following module:

a clustering module, configured to cluster existing search words based on click relationship data and/or presentation relationship data to obtain a clustered search word set;

a classification module for classifying a collection of search words into a collection of search words of different values;

A model acquisition module is configured to perform model training using a set of search words of different values to obtain a value regression model.
The device according to claim 18, wherein said different value search term sets The high-value search word set, the medium value search word set, and the low-value search word set, wherein the value data of the search word in the high-value search word set is greater than the value data of the search word in the medium value search word set; And the value data of the search term in the set of search words of the medium value is greater than the value data of the search term in the set of search words of the low value.
The apparatus according to claim 19, wherein the value data of the search words in the high-value search word set is 1, the value data of the search words in the set of medium value search words is 0.5, and the low-value search word set is included. The value data of the search term is 0.
The apparatus according to claim 18, wherein the clustering module further comprises:

a relationship data obtaining sub-module, configured to acquire common click times of different search words, calculate click relationship data based on the common click times, and/or obtain a common presentation number of different search words, and calculate presentation relationship data based on the common presentation times;

a calculation submodule, configured to calculate a cluster distance between the existing search words based on at least one of the click relationship data, the presentation relationship data, the common presentation times, and the common click times;

The obtaining submodule is configured to cluster existing search words based on the cluster distance to obtain a clustered search word set.
The device according to claim 21, wherein the number of common clicks, the number of joint presentations, the click relationship data, and the presentation relationship data respectively represent a common click number, a common presentation number, a click relationship data, and a presentation between two search words. Relationship data.
The apparatus of claim 19 wherein the model acquisition module is further configured to:

Using each search term in each search term set as a sample of value data corresponding to the set of search terms, specifically,

Each search word in the high-value search word set is used as a one-sample and one-value search word set in each of the low-value search word sets as a one-sample, medium-value search word set. A zero sample is trained using the logistic regression algorithm to form the value regression model.
A computer program comprising computer readable code for causing an exposure of an elevated information as claimed in any one of claims 1-6 and 12-17 when the electronic device is operative to run the computer readable code The method and method of determining the value of the search term are performed.
A computer readable medium storing the computer program of claim 24.