CN109933714B - Entry weight calculation method, entry weight search method and related device - Google Patents

Entry weight calculation method, entry weight search method and related device Download PDF

Info

Publication number
CN109933714B
CN109933714B CN201910203912.1A CN201910203912A CN109933714B CN 109933714 B CN109933714 B CN 109933714B CN 201910203912 A CN201910203912 A CN 201910203912A CN 109933714 B CN109933714 B CN 109933714B
Authority
CN
China
Prior art keywords
search
search word
word
term
similar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910203912.1A
Other languages
Chinese (zh)
Other versions
CN109933714A (en
Inventor
石翔
陈炜鹏
许静芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201910203912.1A priority Critical patent/CN109933714B/en
Publication of CN109933714A publication Critical patent/CN109933714A/en
Application granted granted Critical
Publication of CN109933714B publication Critical patent/CN109933714B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a method for calculating term weight, which is characterized in that after a similar search term set is constructed, aiming at each search term in the similar search term set, the click rate of a search result corresponding to each search term is calculated. And taking a search word in the similar search word set as a first search word, and calculating the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs according to the click rate of search results corresponding to each search word in the similar search word set to which the first search word belongs. And then, aiming at each participle entry included by the first search word, calculating the entry weight of each participle entry included by the first search word according to the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs. The obtained entry weight can be used for more accurately extracting accurate core words from the search words input by the user, or reasonable search result item sequencing is returned aiming at the search words input by the user, so that the user experience is improved.

Description

Entry weight calculation method, entry weight search method and related device
Technical Field
The present application relates to the field of internet technologies, and in particular, to a method for calculating term weights, a method for searching terms, and a related device.
Background
With the continuous development of the internet, the information in the network also shows the blowout type growth, and users usually use a search engine to search to obtain the information concerned by the users from a large amount of information. In the searching process, a user needs to submit a search word aiming at a search target, the weight of each participle entry in the search word can be used for measuring the importance degree of each entry in the search word, and a search engine extracts a core word from the search word according to the entry weight so as to return a search result item related to the core word to the user and sort the search result items corresponding to the returned entries according to the entry weight. Therefore, how to accurately identify the weight of each participle entry in the search term submitted by the user directly determines the search result item returned by the search engine and the final presented ordering effect of the search result item.
The method for calculating the term weight includes that aiming at a search result item, search words corresponding to the search result item are collected by utilizing a click log, the search words are combined into a search word set, each search word in the search word set is subjected to word segmentation to respectively obtain corresponding word segmentation terms, aiming at each word segmentation term of one search word, the weight is determined according to the frequency of the occurrence of the word segmentation term, and the word segmentation terms with higher frequency of occurrence can be endowed with higher weight.
However, the entry weight calculated by the existing method lacks correlation information, and it is difficult to accurately measure the importance of the entry, thereby causing extraction of an erroneous core word from a search word input by a user, or unreasonable ordering of search result items returned by the search word input by the user, and affecting user experience.
Disclosure of Invention
In order to solve the technical problems, the application provides a method for calculating term weights, a method for searching and a related device, which can improve the accuracy of term weight calculation, further improve the rationality of search result item ordering and improve user experience.
The embodiment of the application discloses the following technical scheme:
in a first aspect, an embodiment of the present application provides a method for calculating entry weights, where the method includes:
aiming at search terms in user click log data, constructing a similar search term set, wherein each search term in the similar search term set is a search term of the same search result item clicked in the click log data;
aiming at each search word included in the similar search word set, calculating to obtain a search result click rate corresponding to each search word; the click rate of the search result is the click rate of the search word aiming at the search result item corresponding to the similar search word set to which the search word belongs;
taking a search word in the similar search word set as a first search word, and calculating the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs according to the click rate of search results corresponding to each search word in the similar search word set to which the first search word belongs;
performing word segmentation on the first search word to obtain at least one word segmentation entry included in the first search word;
and aiming at each participle entry included by the first search word, calculating the entry weight of each participle entry included by the first search word according to the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs.
Optionally, the calculating to obtain the click rate of the search result corresponding to each search term includes:
acquiring the number of times of clicking of a search result item corresponding to a similar search word set to which the search word belongs when the search word is used for searching, and recording the number of times as a first number of times;
acquiring the number of times of searching the search word, and recording as a second number of times;
and taking the ratio of the first times to the second times as the click rate of the search result corresponding to the search word.
Optionally, after calculating the degree of correlation between the first search term and each search term in the similar search term set to which the first search term belongs according to the click rate of the search result corresponding to each search term included in the similar search term set to which the first search term belongs, the method further includes:
normalizing the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs;
the obtaining of the entry weight of each participle entry included in the first search word by calculating according to the degree of correlation between the first search word and each search word in the similar search word set to which the first search word belongs includes:
and calculating the entry weight of each participle entry included in the first search word according to the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs after normalization processing.
Optionally, the normalizing the degree of correlation between the first search term and each search term in the similar search term set to which the first search term belongs includes:
Figure BDA0001998355450000031
wherein, weight (query, qanchor) represents the degree of correlation between the first search word and any search word in the similar search word set to which the first search word belongs; query represents the first search term, qanchor represents any search term in a similar search term set; click (quer y, doci) represents the click rate of the first search word for the search result item corresponding to the ith similar search word set to which the first search word belongs, and n is the number of the similar search word sets to which the first search word belongs; click (query) represents the sum of click rates of the first search term for the search result items corresponding to the similar search term set to which the first search term belongs; click (qanc hor, doci) represents the click rate of any search word for the search result item corresponding to the similar search word set to which the search word belongs; click (doci) represents the sum of the click rates of the search results corresponding to all the search terms in the ith similar search term set to which the first search term belongs.
Optionally, the obtaining, by calculation, entry weights of each participle entry included in the first search word according to the degree of correlation between the first search word and each search word in the similar search word set to which the first search word belongs after the normalization processing includes:
acquiring all second search terms comprising the word segmentation entries in a similar search term set to which the first search term belongs;
and acquiring the correlation degree between the first search word and each second search word after normalization processing, summing and calculating, and taking the calculation result as the entry weight of the participle entry included in the first search word.
Optionally, before the search term in the user click log data is targeted and a similar search term set is constructed, the method further includes:
acquiring a search click result set pointed by a search word in the user click log data;
and aiming at each search result item in the search click result set, respectively forming the search words clicked to the same search result item into a similar search word set.
Optionally, the obtaining of the search click result set pointed by the search word in the user click log data includes:
acquiring all search result items clicked by a user after executing a search behavior aiming at the search word;
and forming the search result items of which the clicked times are greater than a preset threshold value in all the search result items into a search click result set.
In a second aspect, an embodiment of the present application provides an apparatus for calculating entry weights, where the apparatus includes a construction unit, a first calculation unit, a second calculation unit, a word segmentation unit, and a third calculation unit:
the construction unit is used for constructing a similar search word set aiming at search words in the user click log data, wherein each search word in the similar search word set is a search word of the same search result item clicked in the click log data;
the first calculating unit is used for calculating and obtaining the click rate of the search result corresponding to each search word aiming at each search word included in the similar search word set; the click rate of the search result is the click rate of the search word aiming at the search result item corresponding to the similar search word set to which the search word belongs;
the second calculating unit is configured to calculate, by using a search word in the similar search word set as a first search word, a degree of correlation between the first search word and each search word in the similar search word set to which the first search word belongs, according to a click rate of a search result corresponding to each search word included in the similar search word set to which the first search word belongs;
the word segmentation unit is used for performing word segmentation on the first search word to obtain at least one word segmentation entry included in the first search word;
and the third calculating unit is configured to calculate, for each participle entry included in the first search word, an entry weight of each participle entry included in the first search word according to a degree of correlation between the first search word and each search word in a similar search word set to which the first search word belongs.
Optionally, the first computing unit is specifically configured to:
acquiring the number of times of clicking of a search result item corresponding to a similar search word set to which the search word belongs when the search word is used for searching, and recording the number of times as a first number of times;
acquiring the number of times of searching the search word, and recording as a second number of times;
and taking the ratio of the first times to the second times as the click rate of the search result corresponding to the search word.
Optionally, the apparatus further includes a processing unit:
the processing unit is used for carrying out normalization processing on the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs;
the third computing unit is specifically configured to:
and calculating the entry weight of each participle entry included in the first search word according to the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs after normalization processing.
Optionally, the processing unit specifically processes the data to obtain the correlation degree after normalization processing by using the following formula:
Figure BDA0001998355450000051
wherein, weight (query, qanchor) represents the degree of correlation between the first search word and any search word in the similar search word set to which the first search word belongs; query represents the first search term, qanchor represents any search term in a similar search term set; click (click, doc) represents the click rate of the first search term for the search result item corresponding to the ith similar search term set to which the first search term belongs, and n is the number of the similar search term sets to which the first search term belongs; click (query) represents the sum of click rates of the first search term for the search result items corresponding to the similar search term set to which the first search term belongs; click (qanc hor, doci) represents the click rate of any search word for the search result item corresponding to the similar search word set to which the search word belongs; click (doci) represents the sum of the click rates of the search results corresponding to all the search terms in the ith similar search term set to which the first search term belongs.
Optionally, the third computing unit is specifically configured to:
acquiring all second search terms comprising the word segmentation entries in a similar search term set to which the first search term belongs;
and acquiring the correlation degree between the first search word and each second search word after normalization processing, summing and calculating, and taking the calculation result as the entry weight of the participle entry included in the first search word.
Optionally, the apparatus further includes an obtaining unit and a determining unit:
the acquisition unit is used for acquiring a search click result set pointed by a search word in the user click log data;
the determining unit is used for respectively forming the search words clicked to the same search result item into a similar search word set aiming at each search result item in the search click result set.
Optionally, the obtaining unit is specifically configured to:
acquiring all search result items clicked by a user after executing a search behavior aiming at the search word;
and forming the search result items of which the clicked times are greater than a preset threshold value in all the search result items into a search click result set.
In a third aspect, an embodiment of the present application provides a search method, where the method includes:
receiving a search term to be queried input by a user;
acquiring a first search term matched with the search term to be queried;
respectively determining a first word segmentation entry matched with the word segmentation entries in the first search word aiming at each word segmentation entry in the search word to be queried;
determining the entry weight of the first participle entry as the entry weight of the participle entry; the term weight of the first participle term is determined according to the method of claim 1;
and returning a search result item corresponding to the participle entry according to the entry weight of each participle entry in the search word to be inquired.
Optionally, the returning a search result item corresponding to the participle entry according to the entry weight of each participle entry in the search term to be queried includes:
determining a second participle entry with the maximum weight according to the entry weight of each participle entry in the search word to be inquired;
and returning the search result item corresponding to the second sub-word entry.
Optionally, the returning a search result item corresponding to the participle entry according to the entry weight of each participle entry in the search term to be queried includes:
and sequencing the search result items corresponding to each participle entry according to the sequence of the entry weights from large to small.
In a fourth aspect, an embodiment of the present application provides a search apparatus, where the apparatus includes a receiving unit, an obtaining unit, a first determining unit, a second determining unit, and a returning unit:
the receiving unit is used for receiving search terms to be inquired input by a user;
the acquisition unit is used for acquiring a first search term matched with the search term to be inquired;
the first determining unit is configured to determine, for each word segmentation entry in the search term to be queried, a first word segmentation entry in the first search term that matches the word segmentation entry;
the second determining unit is configured to determine the entry weight of the first participle entry as the entry weight of the participle entry; the term weight of the first participle term is determined according to the method of claim 1;
and the returning unit is used for returning the search result items corresponding to the participle entries according to the entry weight of each participle entry in the search words to be inquired.
Optionally, the returning unit is configured to determine, according to the entry weight of each participle entry in the search term to be queried, a second participle entry with the largest weight;
and returning the search result item corresponding to the second sub-word entry.
Optionally, the returning unit is configured to sort, according to an order of the entry weights from large to small, the search result items corresponding to each participle entry respectively.
In a fifth aspect, embodiments of the present application provide an apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the one or more processors to include instructions for:
aiming at search terms in user click log data, constructing a similar search term set, wherein each search term in the similar search term set is a search term of the same search result item clicked in the click log data;
aiming at each search word included in the similar search word set, calculating to obtain a search result click rate corresponding to each search word; the click rate of the search result is the click rate of the search word aiming at the search result item corresponding to the similar search word set to which the search word belongs;
taking a search word in the similar search word set as a first search word, and calculating the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs according to the click rate of search results corresponding to each search word in the similar search word set to which the first search word belongs;
performing word segmentation on the first search word to obtain at least one word segmentation entry included in the first search word;
aiming at each participle entry included by the first search word, calculating the entry weight of each participle entry included by the first search word according to the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs;
or the like, or, alternatively,
receiving a search term to be queried input by a user;
acquiring a first search term matched with the search term to be queried;
respectively determining a first word segmentation entry matched with the word segmentation entries in the first search word aiming at each word segmentation entry in the search word to be queried;
determining the entry weight of the first participle entry as the entry weight of the participle entry; the term weight of the first participle term is determined according to the method of claim 1;
and returning a search result item corresponding to the participle entry according to the entry weight of each participle entry in the search word to be inquired.
In a sixth aspect, embodiments of the present application provide a machine-readable medium having stored thereon instructions, which, when executed by one or more processors, cause an apparatus to perform a method as described in one or more of the first or third aspects.
According to the technical scheme, after the similar search term set is constructed, firstly, aiming at each search term in the similar search term set, the click rate of the search result corresponding to each search term is calculated. Then, taking a search word in the similar search word set as a first search word, and calculating the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs according to the click rate of search results corresponding to each search word in the similar search word set to which the first search word belongs. And then, aiming at each participle entry included by the first search word, calculating the entry weight of each participle entry included by the first search word according to the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs.
It can be seen that, the term weight of each participle term in the first search term is determined according to the degree of correlation, and the degree of correlation is an embodiment of the click rate of the search result corresponding to the search term, that is, the click rate of clicking a certain search result term by the user for the search term is considered when calculating the term weight, and the click rate of the search result can reflect the semantic association degree between the search term and the search result term, so as to determine the key content to be expressed by the search term, therefore, the term weight obtained by calculation can more accurately reflect the importance degree of the participle term in terms of expressing the key content, can accurately distinguish the importance degree of different participle terms in the first search term, and further can extract accurate core terms from the search terms input by the user by using the obtained term weight, or return reasonable search result term ordering for the search terms input by the user, the user experience is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is an exemplary diagram of an application scenario of a method for calculating term weights according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a method for calculating term weights according to an embodiment of the present disclosure;
fig. 3 is an exemplary diagram of a similar search term set and a click rate of a search result corresponding to the search term provided in the embodiment of the present application;
fig. 4 is an exemplary diagram of an application scenario of a search method according to an embodiment of the present application;
fig. 5 is a schematic flowchart of a searching method according to an embodiment of the present application;
fig. 6 is a block diagram of an apparatus for calculating term weights according to an embodiment of the present disclosure;
fig. 7 is a structural diagram of a search apparatus according to an embodiment of the present application;
fig. 8 is a structural diagram of a terminal device according to an embodiment of the present application;
fig. 9 is a block diagram of a server according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the traditional entry weight calculation method, when the entry weight is calculated for each participle entry of a search word in a search word set, the entry weight is determined according to the frequency of the participle entry in the search word set, and the participle entry with higher frequency of occurrence can be endowed with higher entry weight.
For example, if a user who inputs "car", "website", "quote" and "car" in the search engine clicks the search result item a in the search result, the "car", "website", "quote" and "car" constitute a search term set, and if the term weight of the participle term is determined for "car", the "car" is divided into the participle terms "horse" and "car", where the "horse" appears 1 times in the search term set and the "car" appears 2 times in the search term set, and thus the term weight of "horse" is smaller than the term weight of "car" in the term weights of the participle terms determined according to the frequency of appearance.
In practical situations, for the search term "bmw car", the term weight of "bmw" should be greater than the term weight of "car", and "bmw" should be the core word of "bmw car". In the traditional method, only the occurrence frequency of word-segmentation entries in a search word set is considered, but the click rate of clicking a certain search result by a user aiming at the search words is not considered, so that the correlation degree between the search words of the same search result is not considered when the entry weight of the entry is determined, the calculated entry weight lacks correlation degree information, the importance degree of the entry is difficult to accurately measure, and further wrong core words are extracted from the search words input by the user or the search result items returned by the search words input by the user are unreasonable in sequence, thereby influencing the user experience.
Therefore, the term weight of each participle term in the first search term is calculated according to the degree of correlation, and the degree of correlation is an expression of the click rate of the search result corresponding to the search term, that is, the click rate of a certain search result clicked by a user aiming at the search term is considered when the term weight is calculated, so that the calculated term weight can accurately distinguish the importance degree of different participle terms in the first search term, and further, the obtained term weight can be used for extracting an accurate core term from the search term input by the user, or a reasonable search result item sequence is returned aiming at the search term input by the user, thereby improving the user experience.
In order to facilitate understanding of the technical solution of the present application, an application scenario of the embodiment of the present application is described below with reference to the accompanying drawings. Referring to fig. 1, the application scenario may include a server 101 and a terminal device 102, where the terminal device 102 may be, for example, an intelligent terminal, a computer, a Personal Digital Assistant (PDA), a tablet computer, or the like.
When a user inputs a search word through the terminal device 102 to perform a search, user click log data may be generated on the terminal device 102, the user click log data records a search word that has been input historically, a search result item that has been clicked by the user after the user performed a search action for the search word, and the number of clicks respectively corresponding to each search result item for the search word, and the like, and the server 101 may obtain the user click log data on the terminal device 102.
The server 101 constructs a similar search term set aiming at all search terms in the user click log data, wherein each search term in the similar search term set is a search term of the same search result item clicked in the click log data. The server 101 calculates a click rate of a search result corresponding to each search term in the similar search term set, respectively.
Because the click rate of the search result can reflect the semantic association degree between the search word and the search result item, the search intention to be expressed by the search word and the importance degree of each participle entry in the search word for expressing the search intention can be further determined. And the server 101 may calculate, according to the click rate of the search result, a degree of correlation between the first search term and each search term in the similar search term set to which the first search term belongs. The degree of correlation may reflect the degree of semantic association between the first search term and each search term, and a high degree of correlation indicates that the search intentions expressed by the two search terms are similar.
The term weight of the participle term is used to measure the importance of each participle term in the search term, and it is necessary to accurately reflect the importance of the participle term to the search term expressing the search intention. Since the importance degree of each participle entry in the search term for expressing the search intention can be determined by the click rate of the search result, and the click rate of the search result can be reflected by the correlation degree, in this embodiment of the present application, the server 101 may calculate the entry weight of each participle entry included in the first search term according to the correlation degree between the first search term and each search term in the similar search term set to which the first search term belongs.
In this way, once the user inputs the search term to be queried on the terminal device 102, the server 101 may determine the term weight of each participle term in the search term to be queried according to the term weight of the participle term obtained through calculation, so as to extract an accurate core term from the search term to be queried, or return a reasonable search result item ordering for the search term to be queried, thereby improving user experience.
A method for calculating term weights provided in an embodiment of the present application is described below with reference to the accompanying drawings, and with reference to fig. 2, the method includes:
s201, aiming at search terms in the user click log data, a similar search term set is constructed.
And each search word in the similar search word set is a search word clicked to the same search result item in the click log data.
It should be noted that the search terms included in the same similar search term set are different, and the same search term may exist among a plurality of similar search term sets.
For example, a user who records the input search terms "car on the bus", "car quote" and "car on the bus" in the user click log data clicks the search result item a after performing a search action, then the server may find that the search terms corresponding to the user click search result item a are "car on the bus", "car quote" and "car on the bus", and may add "car on the bus", "car quote" and "car on the bus" to the similar search term set corresponding to the search result item a, for example, may be expressed as { car on the bus, car on the bus }, and of course, other search terms clicked to the search result item a in the click log data may also be added to the similar search term set; correspondingly, the similar search term sets corresponding to the search result item B and the search result item C can be determined, for example, the similar search term set corresponding to the search result item B is determined as { car, website, offer }; and determining that the similar search word set corresponding to the search result item C is [ the website of the car of the speed of driving, the website of the speed of driving ].
In one possible implementation, the search click result set pointed to by each search word in the user click log data may be obtained before executing S201. And each search result item in the search click result set is a search result item clicked by a user after the user executes a search action aiming at the search word.
Each search result item may correspond to a unique Uniform Resource Locator (URL), and the corresponding search result item may be obtained according to the URL.
For example, if a search term in the user click log data is "car-on-speed", and the search result item that the user clicked after performing the search action for "car-on-speed" includes search result item a, search result item B, and search result item C, the search result set pointed to by the search term may be { search result item a, search result item B, search result item C }.
Then, aiming at each search result item in the search click result set, respectively forming the search words clicked to the same search result item into a similar search word set.
S202, aiming at each search word included in the similar search word set, calculating to obtain a click rate of a search result corresponding to each search word.
The search result click rate, also called query-title click rate, is the click rate of the search term for the search result item corresponding to the similar search term set to which the search term belongs.
Wherein the calculating to obtain the query-title click rate corresponding to each search term comprises: acquiring the number of times of clicking of a search result item corresponding to a similar search word set to which the search word belongs when the search word is used for searching, and recording the number of times as a first number of times; acquiring the searched times of the search word, and recording the times as a second time; and taking the ratio of the first times to the second times as the query-title click rate corresponding to the search term.
Referring to fig. 3, continuing with the above description by way of example, the similar search term set corresponding to the search result item a is { car, website, and car }, the similar search term set corresponding to the search result item B is { car, website, and car }, the similar search term set corresponding to the search result item C is { car, website, and car }, and the search result click rate calculation is performed on each similar search term set to obtain the search result click rate corresponding to each search term in each similar search term set. For example, for a similar search term set { car, car offer, and car }, the search terms "car, car offer, and car" are calculated by S203 to have respective corresponding search result click rates of 0.2, 0.1, and 0.1; correspondingly, the click rate of the search result of each search term in the similar search term set { Benchi automobile, Benchi website and Benchi quotation } is respectively 0.3, 0.2 and 0.2, and the click rate of the search result of each search term in the similar search term set { Benchi automobile, Benchi and Benchi website } is respectively 0.5, 0.3 and 0.3.
It is understood that the search result click rate refers to the click rate of a search term for a search result item corresponding to a set of similar search terms to which the search term belongs. The search term may be searched many times for a search term, but when the search term is used for searching, the number of times of clicking on a search result item corresponding to a similar search term set to which the search term belongs may be only a part of the search term. Therefore, in this embodiment, the manner of determining the click rate of the search result may be that the server first obtains the number of times that the search result item corresponding to the similar search term set to which the search term belongs was clicked when the search term is used for searching, and records the number of times that the search term is clicked as a first number of times, and obtains the number of times that the search term is searched, and records the number of times as a second number of times, where the first number of times and the second number of times are recorded in the user click log data. And then, the server takes the ratio of the first times to the second times as the click rate of the search result corresponding to the search word.
For example, the similar search term set corresponding to the search result item a is { car, offer, car }, and the search result click rate corresponding to "car is calculated. If the number of times that the user clicks the search result item a after performing the search action for the "car on the car" is m, and the total number of times that the "car on the car" is searched for is n, the click rate of the search result corresponding to the "car on the car" in the similar search term set { car on the car, car offer, car } is m/n, where m is the first number of times and n is the second number of times.
S203, taking a search word in the similar search word set as a first search word, and calculating the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs according to the click rate of the search result corresponding to each search word in the similar search word set to which the first search word belongs.
In this embodiment, since one search word may appear in a plurality of similar search word sets, when the degree of correlation between the first search word and each search word in the plurality of similar search word sets is calculated, each search word in the plurality of similar search word sets is actually a search word included after the union set of the plurality of similar search word sets.
Continuing with the example of the similar search term sets shown in fig. 3, the similar search term sets are { car, car offer, car }, { car, website, car offer }, { car, website, etc }, respectively, wherein the same search terms exist among the similar search term sets, and the similar search term sets actually include four search terms, namely "car, car offer", "car", and "website. Therefore, it is necessary to calculate the degrees of correlation between the first search term and the "car of speeding", "offer of speeding", "website of speeding", respectively.
It should be noted that, in a possible implementation manner, after the degree of correlation between the first search word and each search word in the similar search word set to which the first search word belongs is obtained through calculation, normalization processing may be performed on the degree of correlation between the first search word and each search word in the similar search word set to which the first search word belongs.
In this embodiment, the correlation degree between the first search term after the normalization processing and each search term in the similar search term set to which the first search term belongs may be calculated according to the following formula:
Figure BDA0001998355450000141
wherein, weight (query, qanchor) represents the degree of correlation between the first search word and any search word in the similar search word set to which the first search word belongs; query represents the first search term, qanchor represents any search term in a similar search term set; click (quer y, doci) represents the click rate of the first search word for the search result item corresponding to the ith similar search word set to which the first search word belongs, and n is the number of the similar search word sets to which the first search word belongs; click (query) represents the sum of click rates of the first search term for the search result items corresponding to the similar search term set to which the first search term belongs; click (qanc hor, doci) represents the click rate of any search word for the search result item corresponding to the similar search word set to which the search word belongs; click (doci) represents the sum of the click rates of the search results corresponding to all the search terms in the ith similar search term set to which the first search term belongs.
Taking the example of the multiple similar search term sets and the click rates of the search results corresponding to the search terms shown in fig. 3, if the degree of correlation between the first search term "car" and the "car website" in the similar search term set to which the first search term belongs is calculated, in this case, weight (query, qanchor) represents the degree of correlation between "car" and "car website", query represents "car", qanchor represents "car website", and weight (query, qanchor) represents (0.3/(0.2+0.3+0.5)) (0.2/(0.2+0.3+0.2)) + (0.5/(0.2+0.3+0.5)) (0.3/(0.5+0.3+0.3)) (0.22).
Accordingly, the degree of correlation between the first search term "speed car" and "speed car" is 0.45, the degree of correlation between the first search term "speed car" and "speed car" is 0.19, and the degree of correlation between the first search term "speed car" and "speed offer" is 0.14, as calculated by formula (1).
S204, performing word segmentation on the first search word to obtain at least one word segmentation entry included in the first search word.
S205, aiming at each participle entry included in the first search word, calculating to obtain the entry weight of each participle entry under the first search word.
The term entry corresponding to the first search term may include one or more terms.
For example, the first search word is "car on the fly", and the word segmentation for "car on the fly" results in the word segmentation entries "car on the fly" and "car", and the entry weights for "car on the fly" and "car" are calculated, respectively.
It can be understood that, when the entry weight of a certain participle entry is calculated, since all search terms in the set of similar search terms to which the first search term belongs do not include the participle entry, it is necessary to determine the second search term including the participle entry, and determine the entry weight of the participle entry by using the degree of correlation between the first search term and each second search term.
Specifically, one possible implementation manner of calculating the entry weight of each participle entry included in the first search word is as follows: acquiring all second search terms comprising the word segmentation entries in a similar search term set to which the first search term belongs; and acquiring the correlation degree between the first search word and each second search word after normalization processing, summing and calculating, and taking the calculation result as the entry weight of the word segmentation entries under the first search word.
And respectively taking each participle entry corresponding to the first search word as a target participle entry, and performing weight calculation aiming at the target participle entry to obtain the weight of each participle entry.
Wherein the weight is calculated as: determining the weight of the target participle entry according to the correlation degree between the first search word and each second search word, wherein the second search word is a search word in the plurality of search word set similar search word sets and comprises the target participle entry.
Taking the similar search word set shown in fig. 3 as an example, the degree of correlation between the first search word "car" and "car is 0.45," the degree of correlation between "car" and "website for car" is 0.22, "the degree of correlation between" car for car "and" car for car "is 0.19, and the degree of correlation between" car for car "and" price for car for. If the entry weight is calculated for the participle entry as "speed", and the four search terms of "speed car", "speed website", "speed" and "speed offer" all include the participle entry "speed", the terms "speed car", "speed website", "speed" and "speed offer" may be used as the second search term, and the entry weight of the participle entry "speed" may be 0.45+0.22+0.19+0.14 ═ 1. If the entry weight is calculated for the participle entry as "automobile", and only the search term of "car on the bus" includes the participle entry "automobile", the term "car on the bus" may be used as the second search term, and the entry weight of the participle entry "automobile" may be 0.45.
According to the technical scheme, after the similar search term set is constructed, firstly, aiming at each search term in the similar search term set, the click rate of the search result corresponding to each search term is calculated. Then, taking a search word in the similar search word set as a first search word, and calculating the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs according to the click rate of search results corresponding to each search word in the similar search word set to which the first search word belongs. And then, aiming at each participle entry included by the first search word, calculating the entry weight of each participle entry included by the first search word according to the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs.
It can be seen that, the term weight of each participle term in the first search term is determined according to the degree of correlation, and the degree of correlation is an embodiment of the click rate of the search result corresponding to the search term, that is, the click rate of clicking a certain search result term by the user for the search term is considered when calculating the term weight, and the click rate of the search result can reflect the semantic association degree between the search term and the search result term, so as to determine the key content to be expressed by the search term, therefore, the term weight obtained by calculation can more accurately reflect the importance degree of the participle term in terms of expressing the key content, can accurately distinguish the importance degree of different participle terms in the first search term, and further can extract accurate core terms from the search terms input by the user by using the obtained term weight, or return reasonable search result term ordering for the search terms input by the user, the user experience is improved.
Next, a manner of obtaining the search click result set will be described. Under the condition that a similar search term set is constructed according to a search click result set, after a user executes a search action aiming at a certain search term, the user clicks some search result items possibly due to the reasons of wrong click and the like, the search result items cannot truly represent the key contents of all expressions of the search term, and if the search click result set comprises the search result items, the term weight of the participle terms obtained through calculation is possibly not accurate enough. Generally, for search result items generated due to a wrong click and the like, the corresponding click times are generally small, so that in order to avoid the search click result set from including the search result item clicked by the user by mistake, the server can first acquire all search result items clicked by the user after the search action is executed for the first search word; and then, forming a search click result set by the search result items of which the clicked times are greater than a preset threshold value in all the search result items. Therefore, the possibility that the search click result set comprises the search result item clicked by the user by mistake is reduced, and the accuracy of weight calculation is improved.
It should be noted that, in the method provided in this embodiment, the server calculates, on line, the entry weight of each participle entry by using the user click log data, and stores the entry weight of each participle entry corresponding to the first search word, so that when the user inputs a search word to be queried and wants to obtain a search result item, the server can calculate, on line, the entry weight of each participle entry corresponding to the search word to be queried, thereby searching the search word to be queried according to the entry weight, and returning the search result item to the user.
Next, a search method provided in an embodiment of the present application will be described. Referring to fig. 4, fig. 4 shows an exemplary application scenario of a search method, where the application scenario includes a terminal device 401 and a server 402, and the terminal device 401 may be, for example, an intelligent terminal, a computer, a Personal Digital Assistant (PDA), a tablet computer, or the like.
The user may input a search term to be queried at the terminal device 401, and the server 402 may receive the search term to be queried input by the user and obtain a first search term matching with the search term to be queried. The server 402 performs word segmentation on the search word to be queried to obtain a segmentation entry, determines a segmentation entry matched with the segmentation entry in the first search word for each segmentation entry in the search word to be queried, and determines an entry weight of the first segmentation entry as an entry weight of the segmentation entry, thereby obtaining an entry weight of each segmentation entry in the search word to be queried. The server 402 returns the search result item corresponding to the participle entry to the terminal device 401 according to the entry weight of each participle entry in the search term to be queried, and displays the search result item on the terminal device 401.
Next, a search method provided in the present embodiment will be described with reference to the drawings. Referring to fig. 5, the method includes:
s501, receiving search terms to be inquired input by a user.
The user can input the search terms to be queried in the search engine of the terminal device, so that the search terms to be queried are searched through the search engine, and the search result items desired by the user are obtained.
S502, obtaining a first search word matched with the search word to be inquired.
The server records the search terms searched by the user and the weight of the participle entry corresponding to each search term, wherein the search term matched with the search term to be inquired can be used as the first search term.
S503, respectively determining a first participle entry matched with the participle entry in the first search word aiming at each participle entry in the search word to be inquired.
S504, determining the entry weight of the first participle entry as the weight of the participle entry.
Wherein the entry weight of the first participle entry is determined according to the method described in the embodiment corresponding to fig. 2.
And S505, returning a search result item corresponding to the participle entry according to the entry weight of each participle entry in the search term to be inquired.
For example, the search term to be queried is "car driving", the server stores the entry weights of the participle entries corresponding to the first search terms, and if the server obtains the first search term "car driving" matching with the search term "car driving", the server stores the entry weights corresponding to "car driving" and "car" in the first search term "car driving".
The participle entries of the search term "car to be queried" are "car" and "car", respectively, and it is assumed that "car" and "car" stored in the server for the first search term "car to be queried are 1 and 0.45, respectively, then, for the target participle entry" car ", the server determines that" car "in the first search term matches" car "in the search term to be queried, the entry weight of" car "in the first search term is the entry weight of" car "in the search term to be queried, that is, the entry weight of" car "in the search term to be queried is 1, and at this time, the" car "in the first search term is taken as the first participle entry. Correspondingly, the entry weight of the automobile in the search term to be queried is 0.45.
It can be understood that, a user inputs a search term to be queried to hope to obtain a search result item, and the obtained search result item should be capable of embodying key content to be expressed by the search term to be queried, so that the search result item can better meet the requirements of the user. In the search terms to be queried, the greater the term weight of the participle term is, the more the participle term can reflect the key content to be expressed by the search terms to be queried, so that when the server returns the search result item to the terminal device, the search result item corresponding to the search term with the greater term weight better meets the requirements of the user.
For this reason, in one implementation manner, the implementation manner of S505 may be that the server determines, according to the entry weight of each participle entry in the search term to be queried, the second participle entry with the largest entry weight. And the server returns the search result item corresponding to the second sub-word entry to the terminal equipment. Therefore, when the user executes the search action aiming at the search terms to be inquired, the search result items meeting the requirements of the user can be ensured to be searched, and the user experience is improved.
It can be understood that when a user performs a search action on a search term to be queried, a large number of search result items may be obtained, the association degree of the search result items with the search result item that the user wishes to obtain is different, some search result items are associated with the search result item that the user wishes to obtain to a large extent, and some search result items are obviously deviated from the search result item that the user wishes to obtain. How to rank the search result items to present them to the user will directly impact the user experience.
Because the entry weight of the participle entry can reflect the key content to be expressed by the search word to be inquired, the larger the entry weight of the participle entry is, the more the search result item corresponding to the participle entry conforms to the search result item which the user wants to obtain. For this reason, in one implementation manner, the implementation manner of S505 may be to sort the search result items respectively corresponding to each participle entry according to the order of the entry weights from large to small. Therefore, the search result items which are expected to be obtained by the user can be guaranteed to be preferentially displayed to the user, the user can obtain the required search result items as soon as possible, and the user experience is improved.
According to the technical scheme, when a user searches a search term to be queried, the term weight of each participle term in the search term to be queried is calculated according to the term weight determined in the corresponding embodiment of fig. 2, and the term weight determined in the corresponding embodiment of fig. 2 considers the click rate of clicking a certain search result item by the user aiming at the search term, so that the calculated term weight can more accurately reflect the importance degree of the participle term in the aspect of expressing key content. Therefore, the entry weights determined in the embodiment corresponding to fig. 5 can more accurately reflect the importance degrees of the participle entries in terms of expressing the key content, so that the importance degrees of different participle entries in the search term to be queried can be accurately distinguished, an accurate core term is extracted from the search term to be queried by using the obtained entry weights, or a reasonable search result item sequence is returned for the search term to be queried, thereby improving user experience.
Based on the embodiment corresponding to fig. 2, the present embodiment provides an apparatus for calculating term weights, referring to fig. 6, where the apparatus includes a second aspect, and the apparatus according to the present embodiment provides an apparatus for calculating term weights, where the apparatus includes a construction unit 601, a first calculation unit 602, a second calculation unit 603, a word segmentation unit 604, and a third calculation unit 605:
the constructing unit 601 is configured to construct a similar search term set for a search term in the user click log data, where each search term in the similar search term set is a search term in the click log data that is clicked to the same search result item;
the first calculating unit 602 is configured to calculate, for each search term included in the similar search term set, a click rate of a search result corresponding to each search term; the click rate of the search result is the click rate of the search word aiming at the search result item corresponding to the similar search word set to which the search word belongs;
the second calculating unit 603 is configured to calculate, by using a search word in the similar search word set as a first search word, a degree of correlation between the first search word and each search word in the similar search word set to which the first search word belongs, according to a click rate of a search result corresponding to each search word included in the similar search word set to which the first search word belongs;
the word segmentation unit 604 is configured to perform word segmentation on the first search word to obtain at least one word segmentation entry included in the first search word;
the third calculating unit 605 is configured to calculate, for each participle entry included in the first search word, a entry weight of each participle entry included in the first search word according to a degree of correlation between the first search word and each search word in the similar search word set to which the first search word belongs.
Optionally, the first computing unit is specifically configured to:
acquiring the number of times of clicking of a search result item corresponding to a similar search word set to which the search word belongs when the search word is used for searching, and recording the number of times as a first number of times;
acquiring the number of times of searching the search word, and recording as a second number of times;
and taking the ratio of the first times to the second times as the click rate of the search result corresponding to the search word.
Optionally, the apparatus further includes a processing unit:
the processing unit is used for carrying out normalization processing on the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs;
the third computing unit is specifically configured to:
and calculating the entry weight of each participle entry included in the first search word according to the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs after normalization processing.
Optionally, the processing unit specifically processes the data to obtain the correlation degree after normalization processing by using the following formula:
Figure BDA0001998355450000201
wherein, weight (query, qanchor) represents the degree of correlation between the first search word and any search word in the similar search word set to which the first search word belongs; query represents the first search term, qanchor represents any search term in a similar search term set; click (quer y, doci) represents the click rate of the first search word for the search result item corresponding to the ith similar search word set to which the first search word belongs, and n is the number of the similar search word sets to which the first search word belongs; click (query) represents the sum of click rates of the first search term for the search result items corresponding to the similar search term set to which the first search term belongs; click (qanc hor, doci) represents the click rate of any search word for the search result item corresponding to the similar search word set to which the search word belongs; click (doci) represents the sum of the click rates of the search results corresponding to all the search terms in the ith similar search term set to which the first search term belongs.
Optionally, the third computing unit is specifically configured to:
acquiring all second search terms comprising the word segmentation entries in a similar search term set to which the first search term belongs;
and acquiring the correlation degree between the first search word and each second search word after normalization processing, summing and calculating, and taking the calculation result as the entry weight of the participle entry included in the first search word.
Optionally, the method further includes an obtaining unit and a determining unit:
the acquisition unit is used for acquiring a search click result set pointed by a search word in the user click log data;
the determining unit is used for respectively forming the search words clicked to the same search result item into a similar search word set aiming at each search result item in the search click result set.
Optionally, the obtaining unit is specifically configured to:
acquiring all search result items clicked by a user after executing a search behavior aiming at the search word;
and forming the search result items of which the clicked times are greater than a preset threshold value in all the search result items into a search click result set.
Based on the corresponding embodiment of fig. 5, the embodiment of the present application provides a search apparatus, and the apparatus described with reference to fig. 7 includes a receiving unit 701, an obtaining unit 702, a first determining unit 703, a second determining unit 704, and a returning unit 705:
the receiving unit 701 is configured to receive a search term to be queried, which is input by a user;
the obtaining unit 702 is configured to obtain a first search term matched with the search term to be queried;
the first determining unit 703 is configured to determine, for each participle entry in the search term to be queried, a first participle entry in the first search term that matches the participle entry;
the second determining unit 704 is configured to determine the entry weight of the first participle entry as the entry weight of the participle entry; the term weight of the first participle term is determined according to the method of claim 1;
the returning unit 705 is configured to return a search result item corresponding to a participle entry according to the entry weight of each participle entry in the search term to be queried.
Optionally, the returning unit is configured to determine, according to the entry weight of each participle entry in the search term to be queried, a second participle entry with the largest weight;
and returning the search result item corresponding to the second sub-word entry.
Optionally, the returning unit is configured to sort, according to an order of the entry weights from large to small, the search result items corresponding to each participle entry respectively.
Fig. 8 is a block diagram illustrating an apparatus 800 according to an example embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 8, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.
The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
Fig. 9 is a schematic structural diagram of a server in an embodiment of the present invention. The server 900 may vary widely in configuration or performance and may include one or more Central Processing Units (CPUs) 922 (e.g., one or more processors) and memory 932, one or more storage media 930 (e.g., one or more mass storage devices) storing applications 942 or data 944. Memory 932 and storage media 930 can be, among other things, transient storage or persistent storage. The program stored on the storage medium 930 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, a central processor 922 may be provided in communication with the storage medium 930 to execute a series of instruction operations in the storage medium 930 on the server 900.
The server 900 may also include one or more power supplies 926, one or more wired or wireless network interfaces 950, one or more input-output interfaces 958, one or more keyboards 956, and/or one or more operating systems 941, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
In an exemplary embodiment, the server 900 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as storage medium 930 including instructions executable by CPU 922 of server 900 to perform the above-described method. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium may be at least one of the following media: various media that can store program codes, such as read-only memory (ROM), RAM, magnetic disk, or optical disk.
It should be noted that, in the present specification, all the embodiments are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (16)

1. A method for calculating entry weights, the method comprising:
aiming at search terms in user click log data, constructing a similar search term set, wherein each search term in the similar search term set is a search term of the same search result item clicked in the click log data;
aiming at each search word included in the similar search word set, calculating to obtain a search result click rate corresponding to each search word; the click rate of the search result is the click rate of the search word aiming at the search result item corresponding to the similar search word set to which the search word belongs;
taking a search word in the similar search word set as a first search word, and calculating the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs according to the click rate of search results corresponding to each search word in the similar search word set to which the first search word belongs;
performing word segmentation on the first search word to obtain at least one word segmentation entry included in the first search word;
and aiming at each participle entry included by the first search word, calculating the entry weight of each participle entry included by the first search word according to the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs.
2. The method of claim 1, wherein the calculating a search result click rate corresponding to each search term comprises:
acquiring the number of times of clicking of a search result item corresponding to a similar search word set to which the search word belongs when the search word is used for searching, and recording the number of times as a first number of times;
acquiring the number of times of searching the search word, and recording as a second number of times;
and taking the ratio of the first times to the second times as the click rate of the search result corresponding to the search word.
3. The method according to claim 1, wherein after calculating the degree of correlation between the first search term and each search term in the similar search term set to which the first search term belongs according to the click rate of the search result corresponding to each search term included in the similar search term set to which the first search term belongs, the method further comprises:
normalizing the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs;
the obtaining of the entry weight of each participle entry included in the first search word by calculating according to the degree of correlation between the first search word and each search word in the similar search word set to which the first search word belongs includes:
and calculating the entry weight of each participle entry included in the first search word according to the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs after normalization processing.
4. The method according to claim 3, wherein the normalizing the degree of correlation between the first search term and each search term in the similar search term set to which the first search term belongs comprises:
Figure FDA0002936864110000021
wherein, weight (query, qanchor) represents the degree of correlation between the first search word and any search word in the similar search word set to which the first search word belongs; query represents the first search term, qanchor represents any search term in a similar search term set; click (click, doc) represents the click rate of the first search term for the search result item corresponding to the ith similar search term set to which the first search term belongs, and n is the number of the similar search term sets to which the first search term belongs; click (query) represents the sum of click rates of the first search term for the search result items corresponding to the similar search term set to which the first search term belongs; click (qanchor, doci) represents the click rate of any search word for the search result item corresponding to the similar search word set to which the search word belongs; click (doci) represents the sum of the click rates of the search results corresponding to all the search terms in the ith similar search term set to which the first search term belongs.
5. The method according to claim 3, wherein the calculating the entry weight of each participle entry included in the first search word according to the degree of correlation between the first search word and each search word in the similar search word set to which the first search word belongs after the normalization process includes:
acquiring all second search terms comprising the word segmentation entries in a similar search term set to which the first search term belongs;
and acquiring the correlation degree between the first search word and each second search word after normalization processing, summing and calculating, and taking the calculation result as the entry weight of the participle entry included in the first search word.
6. The method of claim 1, wherein before constructing the set of similar search terms for the search terms in the user click log data, the method further comprises:
acquiring a search click result set pointed by a search word in the user click log data;
and aiming at each search result item in the search click result set, respectively forming the search words clicked to the same search result item into a similar search word set.
7. The method of claim 6, wherein obtaining the search click result set pointed to by the search term in the user click log data comprises:
acquiring all search result items clicked by a user after executing a search behavior aiming at the search word;
and forming the search result items of which the clicked times are greater than a preset threshold value in all the search result items into a search click result set.
8. The device for calculating the entry weight is characterized by comprising a construction unit, a first calculation unit, a second calculation unit, a word segmentation unit and a third calculation unit:
the construction unit is used for constructing a similar search word set aiming at search words in the user click log data, wherein each search word in the similar search word set is a search word of the same search result item clicked in the click log data;
the first calculating unit is used for calculating and obtaining the click rate of the search result corresponding to each search word aiming at each search word included in the similar search word set; the click rate of the search result is the click rate of the search word aiming at the search result item corresponding to the similar search word set to which the search word belongs;
the second calculating unit is configured to calculate, by using a search word in the similar search word set as a first search word, a degree of correlation between the first search word and each search word in the similar search word set to which the first search word belongs, according to a click rate of a search result corresponding to each search word included in the similar search word set to which the first search word belongs;
the word segmentation unit is used for performing word segmentation on the first search word to obtain at least one word segmentation entry included in the first search word;
and the third calculating unit is configured to calculate, for each participle entry included in the first search word, an entry weight of each participle entry included in the first search word according to a degree of correlation between the first search word and each search word in a similar search word set to which the first search word belongs.
9. The apparatus according to claim 8, wherein the first computing unit is specifically configured to:
acquiring the number of times of clicking of a search result item corresponding to a similar search word set to which the search word belongs when the search word is used for searching, and recording the number of times as a first number of times;
acquiring the number of times of searching the search word, and recording as a second number of times;
and taking the ratio of the first times to the second times as the click rate of the search result corresponding to the search word.
10. The apparatus of claim 8, further comprising a processing unit to:
the processing unit is used for carrying out normalization processing on the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs;
the third computing unit is specifically configured to:
and calculating the entry weight of each participle entry included in the first search word according to the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs after normalization processing.
11. The apparatus according to claim 10, wherein the processing unit obtains the normalized correlation degree by processing according to the following formula:
Figure FDA0002936864110000041
wherein, weight (query, qanchor) represents the degree of correlation between the first search word and any search word in the similar search word set to which the first search word belongs; query represents the first search term, qanchor represents any search term in a similar search term set; click (click, doc) represents the click rate of the first search term for the search result item corresponding to the ith similar search term set to which the first search term belongs, and n is the number of the similar search term sets to which the first search term belongs; click (query) represents the sum of click rates of the first search term for the search result items corresponding to the similar search term set to which the first search term belongs; click (qanchor, doci) represents the click rate of any search word for the search result item corresponding to the similar search word set to which the search word belongs; click (doci) represents the sum of the click rates of the search results corresponding to all the search terms in the ith similar search term set to which the first search term belongs.
12. The apparatus according to claim 10, wherein the third computing unit is specifically configured to:
acquiring all second search terms comprising the word segmentation entries in a similar search term set to which the first search term belongs;
and acquiring the correlation degree between the first search word and each second search word after normalization processing, summing and calculating, and taking the calculation result as the entry weight of the participle entry included in the first search word.
13. The apparatus according to claim 8, further comprising an obtaining unit and a determining unit:
the acquisition unit is used for acquiring a search click result set pointed by a search word in the user click log data;
the determining unit is used for respectively forming the search words clicked to the same search result item into a similar search word set aiming at each search result item in the search click result set.
14. The apparatus according to claim 13, wherein the obtaining unit is specifically configured to:
acquiring all search result items clicked by a user after executing a search behavior aiming at the search word;
and forming the search result items of which the clicked times are greater than a preset threshold value in all the search result items into a search click result set.
15. An apparatus for term weight computation comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by one or more processors, the one or more programs comprising instructions for:
aiming at search terms in user click log data, constructing a similar search term set, wherein each search term in the similar search term set is a search term of the same search result item clicked in the click log data;
aiming at each search word included in the similar search word set, calculating to obtain a search result click rate corresponding to each search word; the click rate of the search result is the click rate of the search word aiming at the search result item corresponding to the similar search word set to which the search word belongs;
taking a search word in the similar search word set as a first search word, and calculating the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs according to the click rate of search results corresponding to each search word in the similar search word set to which the first search word belongs;
performing word segmentation on the first search word to obtain at least one word segmentation entry included in the first search word;
and aiming at each participle entry included by the first search word, calculating the entry weight of each participle entry included by the first search word according to the correlation degree between the first search word and each search word in the similar search word set to which the first search word belongs.
16. A machine-readable medium having stored thereon instructions which, when executed by one or more processors, implement the method of any one of claims 1-7.
CN201910203912.1A 2019-03-18 2019-03-18 Entry weight calculation method, entry weight search method and related device Active CN109933714B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910203912.1A CN109933714B (en) 2019-03-18 2019-03-18 Entry weight calculation method, entry weight search method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910203912.1A CN109933714B (en) 2019-03-18 2019-03-18 Entry weight calculation method, entry weight search method and related device

Publications (2)

Publication Number Publication Date
CN109933714A CN109933714A (en) 2019-06-25
CN109933714B true CN109933714B (en) 2021-04-20

Family

ID=66987563

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910203912.1A Active CN109933714B (en) 2019-03-18 2019-03-18 Entry weight calculation method, entry weight search method and related device

Country Status (1)

Country Link
CN (1) CN109933714B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10867338B2 (en) 2019-01-22 2020-12-15 Capital One Services, Llc Offering automobile recommendations from generic features learned from natural language inputs
US10489474B1 (en) 2019-04-30 2019-11-26 Capital One Services, Llc Techniques to leverage machine learning for search engine optimization
US10565639B1 (en) 2019-05-02 2020-02-18 Capital One Services, Llc Techniques to facilitate online commerce by leveraging user activity
CN110598067B (en) * 2019-09-12 2022-10-21 腾讯音乐娱乐科技(深圳)有限公司 Word weight obtaining method and device and storage medium
US10796355B1 (en) * 2019-12-27 2020-10-06 Capital One Services, Llc Personalized car recommendations based on customer web traffic
CN111737571B (en) * 2020-06-11 2024-01-30 北京字节跳动网络技术有限公司 Searching method and device and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528430A (en) * 2015-12-10 2016-04-27 北京奇虎科技有限公司 Method and device for determining weights of search terms
CN107885783A (en) * 2017-10-17 2018-04-06 北京京东尚科信息技术有限公司 The method and apparatus for obtaining the high relevant classification of search term

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013120534A (en) * 2011-12-08 2013-06-17 Mitsubishi Electric Corp Related word classification device, computer program, and method for classifying related word
CN105786910B (en) * 2014-12-25 2019-06-07 北京奇虎科技有限公司 Entry weighing computation method and device
CN104615723B (en) * 2015-02-06 2018-08-07 百度在线网络技术(北京)有限公司 The determination method and apparatus of query word weighted value
CN104731361B (en) * 2015-03-04 2018-06-19 百度在线网络技术(北京)有限公司 A kind of method and apparatus of the selectable region of determining candidate entry
GB2537927A (en) * 2015-04-30 2016-11-02 Fujitsu Ltd Term Probabilistic Model For Co-occurrence Scores
CN105975459B (en) * 2016-05-24 2018-09-21 北京奇艺世纪科技有限公司 A kind of the weight mask method and device of lexical item
CN106339404B (en) * 2016-06-30 2019-10-22 北京奇艺世纪科技有限公司 A kind of search word recognition method and device
CN107885717B (en) * 2016-09-30 2020-12-29 腾讯科技(深圳)有限公司 Keyword extraction method and device
CN106919649B (en) * 2017-01-19 2020-06-26 北京奇艺世纪科技有限公司 Entry weight calculation method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528430A (en) * 2015-12-10 2016-04-27 北京奇虎科技有限公司 Method and device for determining weights of search terms
CN107885783A (en) * 2017-10-17 2018-04-06 北京京东尚科信息技术有限公司 The method and apparatus for obtaining the high relevant classification of search term

Also Published As

Publication number Publication date
CN109933714A (en) 2019-06-25

Similar Documents

Publication Publication Date Title
CN109933714B (en) Entry weight calculation method, entry weight search method and related device
CN109800325B (en) Video recommendation method and device and computer-readable storage medium
CN109918565B (en) Processing method and device for search data and electronic equipment
KR20170018297A (en) Method, device and system for determining crank phone number
CN108073303B (en) Input method and device and electronic equipment
CN108073606B (en) News recommendation method and device for news recommendation
CN108874827B (en) Searching method and related device
CN112784142A (en) Information recommendation method and device
CN106774969B (en) Input method and device
CN109977293B (en) Method and device for calculating search result relevance
CN109521888B (en) Input method, device and medium
CN111368161B (en) Search intention recognition method, intention recognition model training method and device
CN112307281A (en) Entity recommendation method and device
CN111241844A (en) Information recommendation method and device
CN110110046B (en) Method and device for recommending entities with same name
CN109799916B (en) Candidate item association method and device
CN108073664B (en) Information processing method, device, equipment and client equipment
CN110020206B (en) Search result ordering method and device
CN107301188B (en) Method for acquiring user interest and electronic equipment
CN107515853B (en) Cell word bank pushing method and device
CN111382367B (en) Search result ordering method and device
CN110020153B (en) Searching method and device
CN107870941B (en) Webpage sorting method, device and equipment
CN110019801B (en) Text relevance determining method and device
CN107688400B (en) Input error correction method and device for input error correction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant