CN108572971B - Method and device for mining keywords related to search terms - Google Patents

Method and device for mining keywords related to search terms Download PDF

Info

Publication number
CN108572971B
CN108572971B CN201710138638.5A CN201710138638A CN108572971B CN 108572971 B CN108572971 B CN 108572971B CN 201710138638 A CN201710138638 A CN 201710138638A CN 108572971 B CN108572971 B CN 108572971B
Authority
CN
China
Prior art keywords
search
query
result
history
basic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710138638.5A
Other languages
Chinese (zh)
Other versions
CN108572971A (en
Inventor
陈敏
秦首科
韩友
黄飞
袁腾飞
邱学忠
贾银芳
刘国庆
韩聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201710138638.5A priority Critical patent/CN108572971B/en
Publication of CN108572971A publication Critical patent/CN108572971A/en
Application granted granted Critical
Publication of CN108572971B publication Critical patent/CN108572971B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for mining keywords related to search terms, wherein the method comprises the following steps: obtaining a history display result with high correlation with a query word according to search log information in a search engine, wherein the history display result comprises a history search result aiming at the query and/or an auxiliary display result related to the query, which are displayed in a history display page; generating at least one aggregation result corresponding to the query according to the history display result; and extracting keywords relevant to the query from the at least one aggregation result. According to the scheme of the invention, the historical search behavior guidance of the user is introduced, and massive historical search results are used for reference, so that the problem of insufficient information quantity of the search words is solved to a greater extent, and the method is favorable for mining the real keywords capable of reflecting the search intention of the user.

Description

Method and device for mining keywords related to search terms
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for mining keywords related to search terms.
Background
In the prior art, a user mainly expresses the search intention of the user through a search word input by the user during searching, so that whether the intention of the search word is accurately understood by a search engine is very critical. The keyword extraction technology for the search terms is a basic module for understanding the search terms of the search engine.
The following two keyword extraction techniques are currently commonly employed: analyzing the weight of each basic entry contained in a search word through various natural language processing tools, and extracting a keyword from the search word; and secondly, aggregating all search terms, and extracting keywords in the search terms based on TF-IDF or various topic models (PLSA/LDA and the like). The two keyword extraction techniques have the following disadvantages: on one hand, the search words input by the user are usually random and even partially spoken, and some search words may have wrongly written characters, even pinyin and the like, and if the search words are started from the search words, the real keywords matching the search intention of the user cannot be well obtained; on the other hand, the relevant knowledge on the existing third-party webpage of the sea is not fully used for reference, and the real keywords matching the search intention of the user are difficult to be mined.
Disclosure of Invention
The invention aims to provide a method and a device for mining keywords related to search terms.
According to one aspect of the invention, a method for mining keywords related to a search term is provided, wherein the method comprises the following steps:
obtaining a history display result with high correlation with a query word according to search log information in a search engine, wherein the history display result comprises a history search result aiming at the query and/or an auxiliary display result related to the query, which are displayed in a history display page;
generating at least one aggregation result corresponding to the query according to the history display result;
and extracting keywords relevant to the query from the at least one aggregation result.
According to another aspect of the present invention, there is also provided an apparatus for mining a keyword related to a search term, wherein the apparatus comprises:
the search engine comprises a first obtaining device and a second obtaining device, wherein the first obtaining device is used for obtaining a history display result which has high correlation with a query word according to search log information in a search engine, and the history display result comprises a history search result aiming at the query and displayed in a history display page and/or an auxiliary display result related to the query;
the generating device is used for generating at least one aggregation result corresponding to the query according to the history display result;
and the first extraction device is used for extracting keywords relevant to the query from the at least one aggregation result.
Compared with the prior art, the invention has the following advantages: the method and the device have the advantages that at least one aggregation result corresponding to the search term can be aggregated based on the history display result with high correlation with the search term, the keywords related to the search term are extracted from the at least one aggregation result, the scheme for mining the keywords related to the search term introduces the historical search behavior guidance of the user, and the problem of insufficient information quantity of the search term is made up to a great extent by using massive historical search results, so that the method and the device are beneficial to mining the real keywords capable of reflecting the search intention of the user. In addition, when the user initiates actual search, the keywords related to the search words input by the user and obtained by off-line excavation can be found first, and then the search is initiated, so that higher-quality search service can be provided for the user; moreover, if keywords related to search terms are mined based on recent historical search behaviors of a large number of users, the real-time search requirements of the users can be met more likely based on search results obtained by the keywords related to the search terms obtained through mining in the actual search of the users. In addition, when the scheme for mining the keywords related to the search terms is applied to advertisement triggering in the actual search process, the advertisement triggering proportion can be greatly improved, and the showing efficiency of search flow is greatly improved.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
FIG. 1 is a flowchart illustrating a method for mining keywords associated with a search term according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for mining keywords related to a search term according to another embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an apparatus for mining keywords related to a search term according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus for mining keywords related to a search term according to another embodiment of the present invention.
The same or similar reference numbers in the drawings identify the same or similar elements.
Detailed Description
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel, concurrently, or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
The term "computer device" or "computer" in this context refers to an intelligent electronic device that can execute predetermined processes such as numerical calculation and/or logic calculation by running predetermined programs or instructions, and may include a processor and a memory, wherein the predetermined processes are executed by the processor by executing program instructions prestored in the memory, or the predetermined processes are executed by hardware such as ASIC, FPGA, DSP, or a combination thereof.
The computer devices include, for example, user devices and network devices. Wherein the user equipment includes but is not limited to a PC, a tablet computer, a smart phone, a PDA, etc.; the network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of computers or network servers, wherein Cloud Computing is one of distributed Computing, a super virtual computer consisting of a collection of loosely coupled computers. Wherein the computer device can be operated alone to implement the invention, or can be accessed to a network and implement the invention through interoperation with other computer devices in the network. The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
It should be noted that the user equipment, the network device, the network, etc. are only examples, and other existing or future computer devices may be applicable to the present invention, and are included in the scope of the present invention and are also included by reference.
The methodologies discussed hereinafter, some of which are illustrated by flow diagrams, may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine or computer readable medium such as a storage medium. The processor(s) may perform the necessary tasks.
Specific structural and functional details disclosed herein are merely representative and are provided for purposes of describing example embodiments of the present invention. The present invention may, however, be embodied in many alternate forms and should not be construed as limited to only the embodiments set forth herein.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element may be termed a second element, and, similarly, a second element may be termed a first element, without departing from the scope of example embodiments. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be noted that, in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may in fact be executed substantially concurrently or may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
The present invention is described in further detail below with reference to the attached drawing figures.
Fig. 1 is a flowchart illustrating a method for mining keywords related to a search term according to an embodiment of the present invention. The method of the present embodiment is mainly implemented by a network device.
The method according to the present embodiment comprises step S1, step S2 and step S3.
In step S1, the network device obtains a history display result having a high correlation with the search term query according to the search log information in the search engine.
The search log information includes any log information generated by a search engine for a historical search operation of a user, such as a one-time historical search operation for the user, and the log information includes a search term query input by the user, search time, and a historical presentation page for presenting a search result for the query to the user; it should be noted that, in addition to the historical search results for the query, the historical display page also includes other information, such as auxiliary display results related to the query, a search input box, search classification information, website identification information, and the like.
And the history display result comprises any display result item related to the query in the history display page. Preferably, the history presentation result comprises a history search result for the query and/or an auxiliary presentation result related to the query presented in the history presentation page; the auxiliary display result represents a query-related display result item except the historical search result in the historical display page, such as promotion information located on the right side of the search result display area, related search recommendation information located below the result display area, and the like.
The network device can obtain a history display result with high relevance to the search term query in various ways according to the search log information in the search engine.
For example, the network device takes the presentation result items accessed by the user in all the history presentation results corresponding to the query as the history presentation results with high relevance to the query.
For another example, the network device takes a predetermined number of recently accessed presentation result items in all history presentation results corresponding to the query as history presentation results having high correlation with the query.
As a preferable scheme, the network device obtains a history presentation result with high correlation with the query according to the search log information in the search engine and by combining with the predetermined index information.
The preset index information comprises any information related to a preset index, and the preset index is used for judging the correlation between the query and the historical display result. Preferably, the predetermined indicator comprises at least one of:
-historical exposure;
-a history presentation location;
-historical click volume;
-historical click time distribution.
Specifically, the network device obtains all presentation result items corresponding to the query according to search log information in a search engine, and determines a history presentation result having high correlation with the query from all presentation result items in combination with predetermined index information.
As an example, the predetermined index information includes a plurality of predetermined indexes used for determining the correlation, and a condition that each predetermined index is to satisfy (e.g., a history display amount exceeds a predetermined display amount, a history display position is a search result display area or a right side promotion bar, a history click amount exceeds a predetermined click amount, a history click time distribution indicates that most of history click times are within the last week, and the like), when values of the plurality of predetermined indexes corresponding to one display result item all satisfy the condition, it is determined that the display result item is a history display result having a high correlation with a query.
As another example, the predetermined index information includes a plurality of predetermined indexes used for determining the correlation, and a weight corresponding to each predetermined index, and then a weight of the presentation result is calculated according to values of the plurality of predetermined indexes corresponding to the presentation result item and the weight corresponding to each predetermined index, and when the weight of the presentation result exceeds a predetermined threshold, the presentation result item is determined to be a history presentation result having a high correlation with the query.
It should be noted that, the foregoing examples are only for better illustrating the technical solutions of the present invention, and are not limiting to the present invention, and those skilled in the art should understand that any implementation manner of obtaining the history presentation result with high correlation with the search term query according to the search log information in the search engine should be included in the scope of the present invention.
In step S2, the network device generates at least one aggregation result corresponding to the query according to the history presentation result.
Wherein the aggregation result can be represented in a plurality of data forms, and preferably, the aggregation result is represented in a document form.
Specifically, the implementation manner of the network device generating at least one aggregation result corresponding to the query according to the history presentation result includes, but is not limited to:
1) The history display result comprises a history search result aiming at the query displayed in a history display page, the network equipment constructs a plurality of < query, url > pairs according to the query and a plurality of url corresponding to the history search result, and then the network equipment aggregates to obtain an aggregation result corresponding to the query according to the plurality of < query, url >.
Wherein < query, url > represents a pair of the search term query and the link url. For example, the terms "caucasian" and url1 may be constructed to obtain < caucasian, url1>, that is, the value of query is "caucasian" and the value of url is "url1". The aggregation result obtained based on the present implementation manner includes presentation information corresponding to each url in the urls, such as a title, an abstract, and the like in a page corresponding to each url.
As an example, the history search results highly related to the search term query1 include 5 search results, and the 5 search results respectively correspond to the following 5 urls: url1, url2, url3, url4, and url5, the network device constructs the following 5 < query, url > pairs according to query1 and the above 5 urls: < query1, url1>, < query1, ur2>, < query1, ur3>, < query1, ur4>, < query1, ur5>; then, according to the 5 < query, url > pairs obtained by the above construction, the network device aggregates to obtain an aggregation result < query, document >, < query, document > corresponding to query1, which includes the titles corresponding to the 5 urls, respectively.
2) The history display result comprises an auxiliary display result related to the query and displayed in the history display page, and the network equipment aggregates all display contents in the auxiliary display result to obtain an aggregation result corresponding to the query.
For example, the history display result corresponding to the search term "caucasian" includes an auxiliary display result located in the right promotion bar, and the auxiliary display result includes the following display contents: "Minzijie", "Mira princess", "Chongxiong", … … "," Korea ending or dead important person "," Korea animation new op going to spit groove "," Korea opera continuing to be strong "," Korea going to the wrong side with more chance of seeing "and" Korea going to the spit groove with net friend ". The network device aggregates the presentation content into one aggregated result corresponding to "caucasian".
It should be noted that, the foregoing examples are only for better illustrating the technical solutions of the present invention, and are not meant to limit the present invention, and those skilled in the art should understand that any implementation manner for generating at least one aggregation result corresponding to query according to the historical presentation results is included in the scope of the present invention.
It should be noted that, preferably, the network device executes step S2 respectively at different retrieval granularities to generate aggregation results corresponding to the respective retrieval granularities.
In step S3, the network device extracts keywords related to the query from the at least one aggregation result.
Specifically, the network device may extract keywords related to the query from the at least one aggregated result in various ways.
As an implementation manner, for each word in the query, the network device extracts a word containing the word in the at least one aggregation result as a keyword related to the query.
As another implementation, the network device extracts a plurality of base terms from the at least one aggregated result; then, the network equipment obtains a plurality of preset characteristics of each basic entry; and then, the network equipment extracts partial basic terms of which the plurality of preset characteristics meet preset conditions from the plurality of basic terms as keywords related to the query.
As yet another implementation, the network device extracts a plurality of basic terms from the at least one aggregated result; then, for each basic entry in the basic entries, calculating the weight of the basic entry according to the characteristics of the basic entry; and then extracting keywords related to the query from the plurality of basic terms according to the weights respectively corresponding to the plurality of basic terms obtained through calculation. This implementation will be described in detail in the following embodiments.
The keywords related to the query may be denoted as < query, keywords >, where keywords represents a keyword or a set of keywords.
It should be noted that, the foregoing examples are only for better illustrating the technical solutions of the present invention, and are not limiting to the present invention, and those skilled in the art should understand that any implementation manner of extracting keywords related to query from the at least one polymerization result is included in the scope of the present invention.
It should be noted that, the network device mines or updates the keyword related to the query by performing the operations of step S1, step S2 and step S3 on line, and after performing step S3 each time, the network device stores the query and the keyword related to the query into a database (the database may be a local database, a cloud database, or the like), so that the network device can be used to search for the keyword related to the search term actually input by the user when the user actually searches.
As a preferable solution, the method of the present embodiment further includes step S4 and step S5 executed when the user actually searches.
In step S4, the network device searches for a keyword related to a search term input by a user according to the search term input by the user.
Specifically, the network device queries a keyword related to a received search term currently input by the user from a database.
For example, the network device performs step S1, step S2, and step S3 next time on line to mine the following keywords related to query 1: key1, key2 and key3, and updating the key words related to the query1 in the database into key1, key2 and key3; when a user inputs a search term query1, the network device searches in the database according to the received query1 input by the user to obtain a keyword related to the query1 as follows: key1, key2, key3.
It should be noted that, preferably, when the network device does not query the keyword related to the search term input by the user in the database, the keyword related to the query in the database with the highest matching degree with the search term input by the user may be used as the keyword related to the search term input by the user.
In step S5, the network device initiates a search based on the search term input by the user and the keyword related to the search term input by the user, and provides the search result to the user.
For example, in step S4, the network device searches, according to query1 received from the user, a database to obtain a keyword related to query1 as: key1, key2, key3; in step S5, the network device initiates a search based on query1, key2, key3 to obtain a search result, and provides the search result to the user.
According to the scheme of the embodiment, at least one aggregation result corresponding to the search term can be aggregated based on the history display result with high correlation with the search term, the keywords related to the search term are extracted from the at least one aggregation result, the scheme for mining the keywords related to the search term introduces the historical search behavior guidance of the user, and the problem of insufficient information quantity of the search term is made up to a great extent by using massive historical search results, so that the method is beneficial to mining the real keywords capable of reflecting the search intention of the user. In addition, when the user initiates actual search, the keywords related to the search words input by the user and obtained by off-line excavation can be found first, and then the search is initiated, so that higher-quality search service can be provided for the user; moreover, if keywords related to search terms are mined based on recent historical search behaviors of a large number of users, the real-time search requirements of the users can be met more likely based on search results obtained by the keywords related to the search terms obtained through mining in the actual search of the users. In addition, when the scheme for mining the keywords related to the search terms is applied to advertisement triggering in the actual search process, the advertisement triggering ratio of search can be greatly improved, and the showing efficiency of search flow is greatly improved.
It should be noted that, in the prior art, the auxiliary display results that need to be displayed for the user are usually determined based on the keywords input by the user in the actual search process of the user, and the reverse use of the auxiliary display results that are historically displayed for the search terms for user search has never been considered.
Fig. 2 is a flowchart illustrating a method for mining keywords related to a search term according to another embodiment of the present invention. The method according to the present embodiment comprises step S1, step S2 and step S3, wherein said step S3 further comprises step S31, step S32 and step S33. The step S1 and the step S2 have already been described in detail in the embodiment shown in fig. 1, and are not described herein again.
In step S31, the network device extracts a plurality of basic terms from the at least one aggregation result.
The network device may extract a plurality of basic terms from the at least one aggregated result in a plurality of ways.
For example, for each aggregation result in the at least one aggregation result, the network device performs word segmentation on the aggregation result to obtain a plurality of basic entries corresponding to the aggregation result.
For another example, for each of the at least one aggregation result, the network device extracts the 3 words with the highest frequency of occurrence in the aggregation result.
It should be noted that the foregoing examples are only for better illustrating the technical solutions of the present invention, and are not meant to limit the present invention, and those skilled in the art should understand that any implementation manner of extracting a plurality of basic terms from the at least one aggregation result is included in the scope of the present invention.
In step S32, for each basic entry in the basic entries, the network device calculates a weight of the basic entry according to the characteristics of the basic entry.
The weight of the basic entry is used for representing the importance of the basic entry.
Preferably, the characteristics of the base entry include at least one of:
1) The part of speech/importance level of the underlying entry. Different parts of speech may correspond to different importance levels, such as the highest importance level of verbs, the second order of nouns, and the lowest importance levels of prepositions, adverbs, and adverbs.
2) TF-IDF (Term Frequency-inverse file Frequency) characteristics of the basic entries in the aggregation result. The occurrence frequency of the basic entries in one aggregation result is TF, the occurrence frequency of all the aggregation results is IDF, the higher the TF is, the higher the importance of the basic entries is, and the lower the IDF is, the lower the importance of the basic entries is.
3) And displaying the user behavior characteristics corresponding to the result items on which the basic entries are positioned. The user behavior characteristics comprise any characteristics related to the behavior executed by the user for the display result item where the basic vocabulary entry is located, such as user access time, access times, in-page operation and the like.
4) Occurrences of the base entry in the query. Wherein the occurrence has the following three cases: the basic entry appears in the query, part of contents in the basic entry appears in the query, and the basic entry and part of contents thereof do not appear in the query. The importance of the basic entries is sequentially decreased according to the three situations.
It should be noted that the features of the basic entry are only examples, and are not limitations of the present invention, and those skilled in the art should understand that any feature of the basic entry that can be used to determine its importance (e.g., the type of the result item presented by the basic entry, the type of the result item presented includes, but is not limited to, a business result, a natural result, a promotion result, etc., and further, the business attribute of the basic entry, the business attribute includes a business type, whether purchased, a business grade, etc.) should be included in the scope of the features of the basic entry described in the present invention.
Specifically, for each basic entry in the multiple basic entries, the network device may calculate the weight of the basic entry according to the characteristics of the basic entry in multiple ways.
For example, for each basic entry in the multiple basic entries, the network device determines weights corresponding to the features of the basic entry, and then adds the weights corresponding to the features of the basic entry to obtain a weight for calculating the basic entry.
For another example, weighting coefficients may be set for different features, and for each of the plurality of basic terms, the network device may determine weights corresponding to respective features of the basic term, and then calculate the weights of the basic terms based on the weights corresponding to the respective features of the basic term and the weighting coefficients of the respective features.
It should be noted that, the above examples are only for better illustrating the technical solutions of the present invention, and not for limiting the present invention, and those skilled in the art should understand that any implementation manner for calculating the weight of each of the basic terms according to the characteristics of the basic term should be included in the scope of the present invention.
In step S33, the network device extracts keywords related to the query from the multiple basic terms according to the weights corresponding to the multiple basic terms obtained through calculation.
Specifically, the network device may extract keywords related to the query from the multiple basic terms in multiple ways according to weights corresponding to the multiple basic terms obtained through calculation, respectively.
For example, the network device sorts the multiple basic entries according to the corresponding weights from high to low, and captures the top N basic entries as N keywords related to the query.
For another example, the network device extracts, from the multiple basic terms, M basic terms whose corresponding weights are higher than a predetermined weight, as M keywords related to the query.
It should be noted that, the foregoing examples are only for better illustrating the technical solutions of the present invention, and are not limiting to the present invention, and those skilled in the art should understand that any implementation manner for extracting keywords related to query from the multiple basic terms according to the calculated weights corresponding to the multiple basic terms respectively is included in the scope of the present invention.
According to the scheme of the embodiment, a plurality of basic entries are extracted from at least one aggregation result obtained through aggregation, the weight of each basic entry is calculated according to the characteristics of the basic entry, and then keywords related to the search word are extracted from the basic entries based on the weight of each basic entry, so that the mined keywords related to the search word are more likely to reflect or influence the real-time search requirements of the user for the search word.
Fig. 3 is a schematic structural diagram of an apparatus for mining keywords related to a search term according to an embodiment of the present invention. The apparatus for mining a keyword related to a search term (hereinafter, simply referred to as "mining apparatus") includes first obtaining means 1, generating means 2, and first extracting means 3.
The first obtaining device 1 of the network device obtains a history display result with high correlation with the search term query according to the search log information in the search engine.
The search log information includes any log information generated by a search engine for a historical search operation of a user, such as a one-time historical search operation for the user, and the log information includes a search term query input by the user, search time, and a historical presentation page for presenting a search result for the query to the user; it should be noted that, in addition to the historical search results for the query, the historical display page also includes other information, such as auxiliary display results related to the query, a search input box, search classification information, website identification information, and the like.
And the history display result comprises any display result item related to the query in the history display page. Preferably, the history presentation result comprises a history search result for the query and/or an auxiliary presentation result related to the query presented in the history presentation page; the auxiliary display result represents a query-related display result item except the historical search result in the historical display page, such as promotion information located on the right side of the search result display area, related search recommendation information located below the result display area, and the like.
The first obtaining device 1 may obtain a history display result having a high correlation with the search term query in multiple ways according to the search log information in the search engine.
For example, the first obtaining device 1 takes the presentation result items accessed by the user in all the history presentation results corresponding to the query as the history presentation results having high correlation with the query.
For another example, the first obtaining device 1 takes a predetermined number of recently accessed presentation result items in all history presentation results corresponding to the query as history presentation results having a high correlation with the query.
Preferably, the first obtaining device 1 further includes a second obtaining device (not shown), and the second obtaining device obtains the history presentation result with high correlation with the query according to the search log information in the search engine and in combination with the predetermined index information.
The preset index information comprises any information related to a preset index, and the preset index is used for judging the correlation between the query and the historical display result. Preferably, the predetermined indicator comprises at least one of:
-historical exposure;
-a history presentation location;
-historical click volume;
-historical click time distribution.
Specifically, the second obtaining device obtains all presentation result items corresponding to the query according to search log information in the search engine, and determines a history presentation result having high correlation with the query from all presentation result items in combination with predetermined index information.
As an example, the predetermined index information includes a plurality of predetermined indexes used for determining the correlation, and a condition that each predetermined index is to satisfy (for example, the history display amount exceeds the predetermined display amount, the history display position is a search result display area or a right promotion bar, the history click amount exceeds the predetermined click amount, the history click time distribution indicates that most of the history click time is within the last week, and the like), when the values of the plurality of predetermined indexes corresponding to one display result item all satisfy the condition, the second obtaining device determines that the display result item is a history display result having a high correlation with the query.
As another example, the predetermined index information includes a plurality of predetermined indexes used for determining the correlation, and a weight corresponding to each predetermined index, a weight of the presentation result is calculated according to values of the plurality of predetermined indexes corresponding to the presentation result item and the weight corresponding to each predetermined index, and when the weight of the presentation result exceeds a predetermined threshold, the second obtaining device determines that the presentation result item is a history presentation result having a high correlation with the query.
It should be noted that, the foregoing examples are only for better illustrating the technical solutions of the present invention, and are not limiting to the present invention, and those skilled in the art should understand that any implementation manner of obtaining the history presentation result with high correlation with the search term query according to the search log information in the search engine should be included in the scope of the present invention.
And the generating device 2 of the network equipment generates at least one aggregation result corresponding to the query according to the history display result.
Wherein the aggregation result can be represented in a plurality of data forms, and preferably, the aggregation result is represented in a document form.
Specifically, the implementation manner of generating, by the generating device 2, at least one aggregation result corresponding to the query according to the history presentation result includes but is not limited to:
1) The generating device 2 further comprises a building device (not shown) and a first aggregation device (not shown). The history display result comprises a history search result aiming at the query displayed in a history display page, the construction device constructs a plurality of < query, url > pairs according to the query and a plurality of url corresponding to the history search result, and then the first aggregation device aggregates to obtain an aggregation result corresponding to the query according to the < query, url >.
Wherein < query, url > represents a pair of the search term query and the link url. For example, the terms "caucasian" and url1 may be constructed to obtain < caucasian, url1>, that is, the value of query is "caucasian" and the value of url is "url1". The aggregation result obtained based on the present implementation manner includes presentation information corresponding to each url in the plurality of urls, such as a title, an abstract, and the like in a page corresponding to each url.
As an example, the history search results highly related to the search term query1 include 5 search results, and the 5 search results respectively correspond to the following 5 urls: url1, url2, url3, url4, url5, the construction apparatus constructs the following 5 < query, url > pairs based on query1 and the above 5 urls: < query1, url1>, < query1, ur2>, < query1, ur3>, < query1, ur4>, < query1, ur5>; then, the first aggregation device aggregates the 5 < query, url > pairs obtained by the above construction to obtain an aggregation result < query, document >, < query, document > including the titles corresponding to the 5 urls respectively corresponding to query 1.
2) The generating device 2 further comprises a second aggregation device (not shown). The history display result comprises an auxiliary display result related to the query and displayed in the history display page, and the second aggregation device aggregates all display contents in the auxiliary display result to obtain an aggregation result corresponding to the query.
For example, the history display result corresponding to the search term "caucasian" includes an auxiliary display result located in the right promotion bar, and the auxiliary display result includes the following display contents: "Minzijie", "Mira princess", "Chongxiong", … … "," Konnan ending or death important person "," Konnan animation new op going to go to groove "," Konnan drama continuing to be strong "," Konnan seeing more often will get ill "," Konnan going to net friend go to groove ". The second aggregation means aggregates the above-described presentation contents into one aggregation result corresponding to "caucasian".
It should be noted that, the foregoing examples are only for better illustrating the technical solutions of the present invention, and are not meant to limit the present invention, and those skilled in the art should understand that any implementation manner for generating at least one aggregation result corresponding to query according to the historical presentation results is included in the scope of the present invention.
It should be noted that, preferably, the generating device 2 performs operations respectively at different retrieval granularities to generate aggregation results corresponding to the respective retrieval granularities.
The first extraction means 3 of the network device extracts keywords related to the query from the at least one aggregated result.
In particular, the first extracting device 3 may extract the keywords related to the query from the at least one aggregated result in various ways.
As an implementation manner, for each word in the query, the first extraction device 3 extracts a word containing the word in the at least one aggregation result as a keyword related to the query.
As another implementation, the first extraction device 3 extracts a plurality of basic terms from the at least one aggregation result; then, obtaining a plurality of preset characteristics of each basic entry; and then, extracting partial basic terms of which the plurality of preset characteristics meet preset conditions from the plurality of basic terms as keywords related to the query.
As still another implementation, the first extraction device 3 extracts a plurality of basic terms from the at least one aggregation result; then, for each basic entry in the basic entries, calculating the weight of the basic entry according to the characteristics of the basic entry; and then extracting keywords related to the query from the plurality of basic terms according to the weights respectively corresponding to the plurality of basic terms obtained through calculation. This implementation will be described in detail in the following embodiments.
The keywords related to the query may be denoted as < query, keywords >, where keywords represents a keyword or a set of keywords.
It should be noted that, the foregoing examples are only for better illustrating the technical solutions of the present invention, and are not limiting to the present invention, and those skilled in the art should understand that any implementation manner of extracting keywords related to query from the at least one polymerization result is included in the scope of the present invention.
It should be noted that, the first obtaining device 1, the generating device 2, and the first extracting device 3 perform operations online to mine or update the keywords related to the query, and after each operation performed by the first extracting device 3, the network device stores the query and the keywords related to the query into a database (which may be a local database, a cloud database, or the like), so that the network device can be used to find the keywords related to the search term actually input by the user when the user actually searches.
As a preferable scheme, the mining apparatus of this embodiment further includes a searching apparatus (not shown) and a searching apparatus (not shown) that are executed when the user actually searches.
The searching device searches the key words related to the search words input by the user according to the search words input by the user.
Specifically, the searching device searches the database for the keyword related to the search term according to the received search term currently input by the user.
For example, the first obtaining device 1, the generating device 2 and the first extracting device 3 mine the following keywords related to query1 after the last operation is performed online: key1, key2 and key3, and updating the keywords related to query1 in the database into key1, key2 and key3; when a user inputs a search term query1, the searching device searches in the database according to the received query1 input by the user to obtain a keyword related to the query1 as follows: key1, key2, key3.
It should be noted that, preferably, when the search apparatus does not search for the keyword related to the search term input by the user in the database, the keyword related to the query in the database with the highest matching degree with the search term input by the user may be used as the keyword related to the search term input by the user.
The search device initiates a search based on the search term input by the user and the keyword related to the search term input by the user, and provides the search result to the user.
For example, according to query1 received from the user, the searching device searches the database to obtain the keywords related to query1 as follows: key1, key2, key3; the search apparatus initiates a search based on query1, key2, key3 to obtain a search result, and provides the search result to the user.
According to the scheme of the embodiment, at least one aggregation result corresponding to the search term can be aggregated based on the history display result with high correlation with the search term, the keywords related to the search term are extracted from the at least one aggregation result, the scheme for mining the keywords related to the search term introduces the historical search behavior guidance of the user, and the problem of insufficient information quantity of the search term is made up to a great extent by using massive historical search results, so that the method is beneficial to mining the real keywords capable of reflecting the search intention of the user. In addition, when the user initiates actual search, the keywords related to the search words input by the user and obtained by off-line excavation can be found first, and then the search is initiated, so that higher-quality search service can be provided for the user; moreover, if keywords related to search terms are mined based on recent historical search behaviors of a large number of users, the real-time search requirements of the users can be met more likely based on search results obtained by the keywords related to the search terms obtained through mining in the actual search of the users. In addition, when the scheme for mining the keywords related to the search terms is applied to advertisement triggering in the actual search process, the advertisement triggering proportion can be greatly improved, and the showing efficiency of search flow is greatly improved.
It should be noted that, in the prior art, the auxiliary display results that need to be displayed for the user are usually determined based on the keywords input by the user in the actual search process of the user, and the reverse use of the auxiliary display results that are historically displayed for the search terms for user search has never been considered.
Fig. 4 is a schematic structural diagram of an apparatus for mining keywords related to a search term according to another embodiment of the present invention. The mining device according to the present embodiment comprises a first obtaining means 1, a generating means 2 and a first extracting means 3, wherein said first extracting means 3 further comprises a second extracting means 31, a calculating means 32 and a third extracting means 33. The first obtaining device 1 and the generating device 2 have been described in detail in the embodiment shown in fig. 3, and are not described herein again.
The second extraction means 31 extracts a plurality of base terms from the at least one aggregated result.
The second extracting device 31 may extract a plurality of basic terms from the at least one aggregation result in a plurality of manners.
For example, for each aggregation result in the at least one aggregation result, the second extraction device 31 performs word segmentation on the aggregation result to obtain a plurality of basic entries corresponding to the aggregation result.
For another example, for each of the at least one aggregation result, the second extraction device 31 extracts the 3 words with the highest frequency of occurrence in the aggregation result.
It should be noted that the foregoing examples are only for better illustrating the technical solutions of the present invention, and are not meant to limit the present invention, and those skilled in the art should understand that any implementation manner of extracting a plurality of basic terms from the at least one aggregation result is included in the scope of the present invention.
For each of the plurality of basic terms, the calculating means 32 calculates a weight of the basic term according to the characteristic of the basic term.
The weight of the basic entry is used for representing the importance of the basic entry.
Preferably, the characteristics of the base entry include at least one of:
1) The part of speech/importance level of the underlying entry. Different parts of speech may correspond to different importance levels, such as the highest importance level of verbs, the second order of nouns, and the lowest importance levels of prepositions, adverbs, and adverbs.
2) TF-IDF characteristics of the base entry in the aggregated result. The occurrence frequency of the basic entries in one aggregation result is TF, the occurrence frequency of all the aggregation results is IDF, the higher the TF is, the higher the importance of the basic entries is, and the lower the IDF is, the lower the importance of the basic entries is.
3) And displaying the user behavior characteristics corresponding to the result items by the basic entries. The user behavior characteristics comprise any characteristics related to the behavior executed by the user for the display result item where the basic vocabulary entry is located, such as user access time, access times, in-page operation and the like.
4) Occurrences of the base entry in the query. Wherein the occurrence has the following three cases: the basic entry appears in the query, part of contents in the basic entry appears in the query, and the basic entry and part of contents thereof do not appear in the query. The importance of the basic entries is decreased in order according to the above three cases.
It should be noted that the features of the basic entry are only used for example and not for limiting the present invention, and those skilled in the art should understand that any feature of the basic entry that can be used for determining the importance of the basic entry (for example, the type of the result item presented by the basic entry, the type of the result item presented includes, but is not limited to, a business result, a natural result, a promotion result, etc., and for example, the business attribute of the basic entry, the business attribute includes a business type, whether the basic entry is purchased, a business rank, etc.) should be included in the scope of the features of the basic entry described in the present invention.
Specifically, for each of the basic entries, the calculating device 32 may calculate the weight of the basic entry according to the characteristics of the basic entry in various ways.
For example, for each basic entry in the basic entries, the calculating device 32 determines the weights corresponding to the features of the basic entry, and then the calculating device 32 adds the weights corresponding to the features of the basic entry to obtain the weight of the basic entry.
For another example, weighting coefficients may be further set for different features, and for each of the plurality of basic terms, the calculating device 32 may determine weights corresponding to the features of the basic term, and then the calculating device 32 may calculate the weights of the basic terms based on the weights corresponding to the features of the basic term and the weighting coefficients of the features.
It should be noted that, the above examples are only for better illustrating the technical solutions of the present invention, and not for limiting the present invention, and those skilled in the art should understand that any implementation manner for calculating the weight of each of the basic terms according to the characteristics of the basic term should be included in the scope of the present invention.
The third extraction device 33 extracts keywords related to the query from the plurality of basic terms according to the weights corresponding to the plurality of basic terms obtained through calculation.
Specifically, the third extracting device 33 may extract the keywords related to the query from the multiple basic terms in multiple ways according to the weights corresponding to the multiple basic terms obtained through calculation, respectively.
For example, the third extraction device 33 sorts the basic terms according to the corresponding weights from high to low, and intercepts the top N basic terms as N keywords related to the query.
For another example, the third extracting device 33 extracts M basic terms, of which weights are higher than a predetermined weight, from the basic terms as M keywords related to the query.
It should be noted that, the foregoing examples are only for better illustrating the technical solutions of the present invention, and are not limiting to the present invention, and those skilled in the art should understand that any implementation manner for extracting keywords related to query from the multiple basic terms according to the calculated weights corresponding to the multiple basic terms respectively is included in the scope of the present invention.
According to the scheme of the embodiment, a plurality of basic entries are extracted from at least one aggregation result obtained through aggregation, the weight of each basic entry is calculated according to the characteristics of the basic entry, and then keywords related to the search word are extracted from the basic entries based on the weight of each basic entry, so that the mined keywords related to the search word are more likely to reflect or influence the real-time search requirements of the user for the search word.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not to denote any particular order.

Claims (14)

1. A method for mining keywords related to a search term, wherein the method comprises:
obtaining a history display result with high relevance to a query of a search word according to search log information in a search engine, wherein the history display result comprises a history search result aiming at the query and displayed in a history display page and/or an auxiliary display result relevant to the query, and the auxiliary display result relevant to the query represents a display result item which does not contain the query and is relevant to the query in the history display page except the history search result;
aggregating information in a page corresponding to the historical search result and/or aggregating contents displayed by the auxiliary display result to obtain at least one aggregation result corresponding to the query;
extracting a plurality of base terms from the at least one aggregated result;
for each basic entry in the basic entries, calculating the weight of the basic entry according to the characteristics of the basic entry;
extracting keywords related to query from the plurality of basic entries according to the weights respectively corresponding to the plurality of basic entries obtained through calculation, and storing the keywords related to the query and the query into a database so that a user can search the keywords related to the search word input by the user during searching; if the history display result includes a history search result for the query displayed in a history display page, aggregating information in the page corresponding to the history search result to obtain at least one aggregated result corresponding to the query, further comprising:
constructing a plurality of < query, url > pairs according to the query and a plurality of urls corresponding to the historical search results;
and according to the multiple < query, url > pairs, aggregating the information in the page corresponding to the historical search result.
2. The method of claim 1, wherein the history presentation result includes an auxiliary presentation result related to query presented in a history presentation page, and the aggregating the content presented by the auxiliary presentation result includes:
and aggregating all the display contents in the auxiliary display result.
3. The method of claim 1, wherein the step of extracting a plurality of base terms from the at least one aggregated result comprises:
and for each aggregation result in the at least one aggregation result, performing word segmentation on the aggregation result to obtain a plurality of basic entries corresponding to the aggregation result.
4. The method of claim 1 or 3, wherein the characteristics of the base entry comprise at least one of:
-the part of speech/importance level of the base entry;
-TF-IDF characteristics of the base entry in the aggregated result;
-the user behavior characteristics corresponding to the presentation result items where the basic vocabulary entry is located;
-occurrence of the base entry in the query.
5. The method of claim 1, wherein the step of obtaining the history presentation result with high correlation with the search term query according to the search log information in the search engine comprises:
and obtaining a history display result with high correlation with the query according to the search log information in the search engine and by combining with the preset index information.
6. The method according to claim 5, wherein the predetermined index indicated by the predetermined index information includes at least one of:
-historical exposure volume;
-a history presentation location;
-historical click volume;
-historical click time distribution.
7. The method of claim 1, wherein the method further comprises:
searching keywords related to the search terms input by the user according to the search terms input by the user;
and initiating a search based on the search word input by the user and the keyword related to the search word input by the user, and providing a search result to the user.
8. An apparatus for mining a keyword related to a search term, wherein the apparatus comprises:
the search engine comprises a first obtaining device and a second obtaining device, wherein the first obtaining device is used for obtaining a history display result which has high correlation with a query word according to search log information in a search engine, and the history display result comprises a history search result aiming at the query and displayed in a history display page and/or an auxiliary display result related to the query;
the generating device is used for aggregating information in a page corresponding to the historical search result and/or aggregating contents displayed by the auxiliary display result to obtain at least one aggregation result corresponding to the query;
a first extracting device, configured to extract a keyword related to the query from the at least one aggregated result;
the auxiliary presentation result related to the query represents presentation result items which do not contain the query and are related to the query in the history presentation page except the history search result; if the history display result comprises a history search result aiming at query and displayed in the history display page, the device further comprises:
a construction device, configured to construct a plurality of < query, url > pairs according to the query and a plurality of urls corresponding to the historical search results;
a first aggregation device, configured to aggregate, according to the multiple < query, url > pairs, information in a page corresponding to the historical search result;
the first extraction means comprises:
second extracting means for extracting a plurality of basic terms from the at least one aggregated result;
calculating means for calculating, for each of the plurality of basic entries, a weight of the basic entry according to a characteristic of the basic entry;
and the third extraction device is used for extracting keywords related to the query from the plurality of basic terms according to the weights respectively corresponding to the plurality of basic terms obtained through calculation, and storing the keywords related to the query and the query into a database so that a user can search the keywords related to the search term input by the user during searching.
9. The apparatus of claim 8, wherein the history presentation result includes an auxiliary presentation result related to query presented in a history presentation page, and the generating apparatus includes:
and the second aggregation device is used for aggregating all the presentation contents in the auxiliary presentation result.
10. The apparatus of claim 8, wherein the second extraction means is for:
and for each aggregation result in the at least one aggregation result, performing word segmentation on the aggregation result to obtain a plurality of basic entries corresponding to the aggregation result.
11. The apparatus of claim 8 or 10, wherein the characteristics of the base entry comprise at least one of:
-the part of speech/importance level of the base entry;
-TF-IDF characteristics of the base entry in the aggregated result;
-the user behavior characteristics corresponding to the presentation result items where the basic vocabulary entry is located;
-occurrence of the base entry in the query.
12. The apparatus of claim 8, wherein the first obtaining means comprises:
and the second obtaining device is used for obtaining a history display result with high correlation with the query according to the search log information in the search engine and by combining the preset index information.
13. The apparatus according to claim 12, wherein the predetermined index indicated by the predetermined index information includes at least one of:
-historical exposure volume;
-a history presentation location;
-historical click volume;
-historical click time distribution.
14. The apparatus of claim 8, wherein the apparatus further comprises:
the searching device is used for searching keywords related to the search words input by the user according to the search words input by the user;
and the searching device is used for initiating a search based on the search words input by the user and the keywords related to the search words input by the user and providing the search results for the user.
CN201710138638.5A 2017-03-09 2017-03-09 Method and device for mining keywords related to search terms Active CN108572971B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710138638.5A CN108572971B (en) 2017-03-09 2017-03-09 Method and device for mining keywords related to search terms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710138638.5A CN108572971B (en) 2017-03-09 2017-03-09 Method and device for mining keywords related to search terms

Publications (2)

Publication Number Publication Date
CN108572971A CN108572971A (en) 2018-09-25
CN108572971B true CN108572971B (en) 2022-11-01

Family

ID=63578090

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710138638.5A Active CN108572971B (en) 2017-03-09 2017-03-09 Method and device for mining keywords related to search terms

Country Status (1)

Country Link
CN (1) CN108572971B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299383B (en) * 2018-11-02 2021-11-05 北京字节跳动网络技术有限公司 Method and device for generating recommended word, electronic equipment and storage medium
CN109740075B (en) * 2018-12-13 2020-12-01 北京百度网讯科技有限公司 Event correlation calculation method, device, equipment and storage medium
CN109740161B (en) * 2019-01-08 2023-06-20 北京百度网讯科技有限公司 Data generalization method, device, equipment and medium
CN111222918B (en) * 2020-01-04 2023-06-30 厦门二五八网络科技集团股份有限公司 Keyword mining method and device, electronic equipment and storage medium
WO2021232292A1 (en) * 2020-05-20 2021-11-25 深圳市欢太科技有限公司 Log data processing method and related product
CN111782962B (en) * 2020-09-04 2021-01-12 浙江口碑网络技术有限公司 Pattern matching method and device and electronic equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544186B (en) * 2012-07-16 2017-03-01 富士通株式会社 The method and apparatus excavating the subject key words in picture
CN103942279B (en) * 2014-04-01 2018-07-10 百度(中国)有限公司 Search result shows method and apparatus
CN104063454A (en) * 2014-06-24 2014-09-24 北京奇虎科技有限公司 Search push method and device for mining user demands
CN104933100B (en) * 2015-05-28 2018-05-04 北京奇艺世纪科技有限公司 keyword recommendation method and device

Also Published As

Publication number Publication date
CN108572971A (en) 2018-09-25

Similar Documents

Publication Publication Date Title
CN108572971B (en) Method and device for mining keywords related to search terms
CN109815308B (en) Method and device for determining intention recognition model and method and device for searching intention recognition
US9171078B2 (en) Automatic recommendation of vertical search engines
JP5721818B2 (en) Use of model information group in search
US10997184B2 (en) System and method for ranking search results
US9934293B2 (en) Generating search results
CN105045901A (en) Search keyword push method and device
CN103678576A (en) Full-text retrieval system based on dynamic semantic analysis
CN103136228A (en) Image search method and image search device
CN105653562A (en) Calculation method and apparatus for correlation between text content and query request
CN111639255B (en) Recommendation method and device for search keywords, storage medium and electronic equipment
JP7451747B2 (en) Methods, devices, equipment and computer readable storage media for searching content
CN104268175A (en) Data search device and method thereof
WO2020132623A1 (en) Ranking image search results using machine learning models
CN103309869A (en) Method and system for recommending display keyword of data object
CN106407316B (en) Software question and answer recommendation method and device based on topic model
CN105468649A (en) Method and apparatus for determining matching of to-be-displayed object
CN105653553B (en) Word weight generation method and device
US9064014B2 (en) Information provisioning device, information provisioning method, program, and information recording medium
JP2013054606A (en) Document retrieval device, method and program
KR20120038418A (en) Searching methods and devices
CN116383340A (en) Information searching method, device, electronic equipment and storage medium
CN111144122A (en) Evaluation processing method, evaluation processing device, computer system, and medium
CN113392329A (en) Content recommendation method and device, electronic equipment and storage medium
JP6589036B1 (en) Failure sign detection system and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant