CN116975405A - Search word processing method, apparatus, device, storage medium and program product - Google Patents

Search word processing method, apparatus, device, storage medium and program product Download PDF

Info

Publication number
CN116975405A
CN116975405A CN202310412688.3A CN202310412688A CN116975405A CN 116975405 A CN116975405 A CN 116975405A CN 202310412688 A CN202310412688 A CN 202310412688A CN 116975405 A CN116975405 A CN 116975405A
Authority
CN
China
Prior art keywords
search
words
quality
word
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310412688.3A
Other languages
Chinese (zh)
Inventor
陈裕通
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202310412688.3A priority Critical patent/CN116975405A/en
Publication of CN116975405A publication Critical patent/CN116975405A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Abstract

The application relates to a search term processing method, a search term processing device, a search term processing computer device, a search term processing storage medium and a search term processing program product. The method relates to artificial intelligence natural language processing technology, which can be applied to the fields of searching and recommending, and comprises the following steps: determining a plurality of candidate search words corresponding to the search text; querying quality scores of the candidate search terms; the quality score of the search word is positively correlated with the click rate of the search result page and the richness of the search result page of the search word; determining a low quality search term from the plurality of candidate search terms based on the quality score; querying high-quality search words with semantic correlation relations with the low-quality search words; and replacing the low-quality search words in the plurality of candidate search words with corresponding high-quality search words to obtain a plurality of recommended search words corresponding to the search text. By adopting the method, the probability that the user clicks and finds the required information in the corresponding search result page can be improved, so that the search experience of the user is improved.

Description

Search word processing method, apparatus, device, storage medium and program product
Technical Field
The present application relates to the field of computer technology, and in particular, to a search term processing method, apparatus, computer device, storage medium, and computer program product.
Background
With the development of computer technology and internet technology, information searching by means of various platforms becomes an indispensable information collection channel for daily work and life of people. For example, people may search for daily information using a general search system, search for merchandise information at an e-commerce platform, search for popular videos at a video platform, and so forth.
In order to improve the searching efficiency and the searching experience of the user, a searching system generally recommends some search words based on the search text input by the user, and the search words may be related to the searching intention of the user, so that a searching prompt effect can be played for the user, and the user can conveniently and quickly find out the wanted information.
However, in the related manner, after the recommended search word of the search system is clicked by the user, the situation that the information required by the user cannot be found in the search result page still exists, and the search experience of the user is poor.
Disclosure of Invention
Based on the foregoing, it is necessary to provide a method, an apparatus, a computer device, a computer readable storage medium and a computer program product for processing search words, where the determined recommended search words improve the probability that a user clicks and finds required information in a corresponding search result page, so as to improve the search experience of the user.
In a first aspect, the present application provides a method for processing search terms. The method comprises the following steps:
acquiring a search text input in a search interface;
acquiring a plurality of candidate search words corresponding to the search text;
querying a quality score of each candidate search term; the quality score of the search word is positively correlated with the click rate of the search result page and the richness of the search result page of the search word;
screening low-quality search words from the plurality of candidate search words according to the quality scores;
querying high-quality search words with semantic correlation with the low-quality search words;
replacing low-quality search words in the plurality of candidate search words with corresponding high-quality search words to obtain a plurality of recommended search words corresponding to the search text;
presenting the plurality of recommended search terms in the search interface.
In a second aspect, the application further provides a search word processing device. The device comprises:
the candidate search word determining module is used for acquiring search text input in a search interface; acquiring a plurality of candidate search words corresponding to the search text;
the query module is used for querying the quality score of each candidate search term; the quality score of the search word is positively correlated with the click rate of the search result page and the richness of the search result page of the search word;
The low-quality search word screening module is used for screening low-quality search words from the candidate search words according to the quality scores;
the query module is further used for querying high-quality search words with semantic correlation with the low-quality search words;
and the search word replacement module is used for replacing low-quality search words in the plurality of candidate search words with corresponding high-quality search words to obtain a plurality of recommended search words corresponding to the search text, and presenting the plurality of recommended search words in the search interface.
In one embodiment, the search term processing apparatus further includes:
the quality score determining module is used for determining the click rate of the search result page and the richness of the search result page of each search word in the search word bank; and determining the quality score of each search word according to the click rate of the search result page and the richness of the search result page.
In one embodiment, the quality score determination module further comprises:
the search result page click rate statistics unit is used for counting the search times of the search words and the click times of the search result pages of the search words; and determining the click rate of the search result page of the search word according to the ratio of the click times to the search times.
In one embodiment, the quality score determination module further comprises:
the search result page richness statistics unit is used for counting the quantity of various types of contents in the search result page of the search word; and determining the richness of the search result page of the search word according to the quantity of the various types of content.
In one embodiment, the search result page richness statistics unit is further configured to obtain weights of various types of content; according to the weights of the various types of contents, carrying out weighted summation on the quantity of the various types of contents in the first page of the search result page of the search word to obtain weighted scores; and determining the richness of the search result page of the search word according to the weighted score.
In one embodiment, the low quality search word screening module is further configured to screen candidate search words with corresponding quality scores lower than a first threshold value from the plurality of candidate search words as low quality search words according to the quality scores.
In one embodiment, the search term processing apparatus further includes:
the clustering module is used for extracting semantic vector representation of each search word in the search word library; clustering search words in the search word bank according to the semantic vector representation to obtain a plurality of clustering clusters; wherein, each cluster has a cluster center, and the semantic vector of the search word in the cluster represents the similarity with the cluster center of the cluster where the semantic vector is located and is larger than the similarity with the cluster centers of other clusters; semantic correlation exists among search words in the same cluster; and storing the cluster identification of each cluster corresponding to the search word included in the cluster.
In one embodiment, the query module is further configured to query a cluster identifier corresponding to a target cluster in which the low-quality search term is located; determining search words in a target cluster corresponding to the cluster identifier; inquiring the quality scores of the search words in the target cluster; and taking the search word with the highest corresponding quality score in the target cluster as a high-quality search word with semantic correlation with the low-quality search word.
In one embodiment, the clustering module is further configured to represent a semantic vector of a first search term in the search word bank as a first cluster center; traversing the search words in the search word library, and calculating the similarity between semantic vector representations of the traversed search words and the centers of all clusters; if the maximum similarity in the similarity is greater than or equal to a second threshold value, adding the traversed search word into a cluster in which the cluster center corresponding to the maximum similarity is located; if the maximum similarity in the similarity is smaller than a second threshold, representing the semantic vector of the traversed search word as a newly added cluster center, and obtaining a plurality of cluster clusters until the search word in the search word bank is traversed.
In one embodiment, the clustering module is further configured to perform semantic vector extraction on each search word in the search word bank through a preset semantic vector representation model, so as to obtain semantic vector representation of each search word in the search word bank.
In one embodiment, the candidate search term determining module is further configured to determine a plurality of search terms prefixed by the search text in a search term library; calculating semantic similarity between each search word prefixed by the search text and the search text; acquiring the heat of each search word prefixed by the search text; and screening a plurality of candidate search words from the plurality of search words according to the semantic similarity and the heat.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
acquiring a search text input in a search interface;
acquiring a plurality of candidate search words corresponding to the search text;
querying a quality score of each candidate search term; the quality score of the search word is positively correlated with the click rate of the search result page and the richness of the search result page of the search word;
Screening low-quality search words from the plurality of candidate search words according to the quality scores;
querying high-quality search words with semantic correlation with the low-quality search words;
replacing low-quality search words in the plurality of candidate search words with corresponding high-quality search words to obtain a plurality of recommended search words corresponding to the search text;
presenting the plurality of recommended search terms in the search interface.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring a search text input in a search interface;
acquiring a plurality of candidate search words corresponding to the search text;
querying a quality score of each candidate search term; the quality score of the search word is positively correlated with the click rate of the search result page and the richness of the search result page of the search word;
screening low-quality search words from the plurality of candidate search words according to the quality scores;
querying high-quality search words with semantic correlation with the low-quality search words;
Replacing low-quality search words in the plurality of candidate search words with corresponding high-quality search words to obtain a plurality of recommended search words corresponding to the search text;
presenting the plurality of recommended search terms in the search interface.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:
acquiring a search text input in a search interface;
acquiring a plurality of candidate search words corresponding to the search text;
querying a quality score of each candidate search term; the quality score of the search word is positively correlated with the click rate of the search result page and the richness of the search result page of the search word;
screening low-quality search words from the plurality of candidate search words according to the quality scores;
querying high-quality search words with semantic correlation with the low-quality search words;
replacing low-quality search words in the plurality of candidate search words with corresponding high-quality search words to obtain a plurality of recommended search words corresponding to the search text;
presenting the plurality of recommended search terms in the search interface.
According to the method, the device, the computer equipment, the storage medium and the computer program product for processing the search words, after the search text input in the search interface is acquired and a plurality of candidate search words corresponding to the search text are acquired, for each candidate search word, the corresponding quality score is inquired, the low-quality search word is screened out from the plurality of candidate search words according to the quality score, and the low-quality search word can be replaced by the higher-quality search word due to the fact that the quality score of the search word is positively correlated with the search result page click rate and the search result page richness of the search word, so that the user can be prevented from clicking the low-quality search word and entering the search result page with poor conversion effect to find out required information, the search experience of the user is improved, and the related indexes of a search system such as the search result page click rate are also improved. Moreover, the high-quality search words adopted in the replacement are search words with semantic correlation with low-quality search words, so that the candidate search words can be replaced by search words which are irrelevant to the search intention of the user without semantic deviation, and the search experience of the user is further improved.
Drawings
FIG. 1 is a diagram of an application environment for a search term processing method in one embodiment;
FIG. 2 is a flow diagram of a method of search term processing in one embodiment;
FIG. 3 is a schematic diagram of recommended search terms in one embodiment;
FIG. 4 is a flowchart illustrating steps for clustering search terms in one embodiment;
FIG. 5 is a flow diagram of a server processing search terms in a search term library in one embodiment;
FIG. 6 is a flow diagram of a search response in one embodiment;
FIG. 7 is a flow chart of a method of processing search terms in one embodiment;
FIG. 8 is a block diagram of a search term processing apparatus in one embodiment;
FIG. 9 is a block diagram of a search term processing apparatus in another embodiment;
FIG. 10A is an internal block diagram of a computer device in one embodiment;
fig. 10B is an internal structural diagram of a computer device in another embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The application provides a search word processing method, which relates to an artificial intelligence (Artificial Intelligence, AI) technology, wherein the artificial intelligence is a theory, a method, a technology and an application system which simulate, extend and expand human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.
The embodiment of the application provides a search word processing method, which relates to a natural language processing (Nature Language processing, NLP) technology, wherein the natural language processing is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
Embodiments of the present application relate to the following concepts:
clustering: a data set is partitioned into different classes or clusters such that the similarity of data objects within the same cluster is as large as possible, while the variability of data objects not in the same cluster is as large as possible. In the application, the clustering function is to aggregate search words with similar semantics to the same cluster, and search words with non-similar semantics are divided into different clusters.
Search results page: and the search result page is jumped to after the search word is submitted.
Conversion rate: in the search result page entered by clicking the recommended search term, the probability of the user's click event, i.e., the click rate of the search result page, occurs.
Correlation: semantic relatedness between two search terms.
The search word processing method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process, such as search terms, clustering results of search terms, quality scoring results of search terms, and so forth. The data storage system may be integrated on the server 104 or may be located on the cloud or other servers. In one embodiment, the server 104 may obtain a search text input in a search interface, obtain a plurality of candidate search words corresponding to the search text, and then query quality scores of the candidate search words, where the quality scores of the search words are positively correlated with a click rate of a search result page of the search word and a richness of the search result page, and then the server 104 screens out low-quality search words from the plurality of candidate search words according to the quality scores, queries high-quality search words having semantic correlation with the low-quality search words, and finally replaces the low-quality search words in the plurality of candidate search words with corresponding high-quality search words to obtain a plurality of recommended search words corresponding to the search text, and presents the plurality of recommended search words in the search interface.
In a specific application scenario, the terminal 102 obtains a search text input by a user in real time in a search box and sends the search text to the server 104 (a server providing a search service), for example, the user inputs "Shenzhen", the terminal obtains "Shenzhen" and sends the server, the user continues to input "Shenzhen fit", the terminal obtains "Shenzhen fit" and sends the server, the user continues to input "Shenzhen fit to see sea", the terminal obtains "Shenzhen fit to see sea" and sends the server, the server receives the search text sent by the terminal each time, processes the search text according to the above search word processing steps, and feeds back a plurality of recommended search words corresponding to the search text to the terminal, the terminal presents a plurality of recommended search words in a search interface and presents the recommended search words to the user, the user can submit one of the recommended search words to the terminal, and the terminal presents a search result page corresponding to the recommended search word.
Of course, the above-mentioned search word processing method may also be executed by the terminal 102 alone, for example, the terminal 102 obtains a search text input in a search interface, and obtains a plurality of candidate search words corresponding to the search text; querying quality scores of the candidate search terms; the quality score of the search word is positively correlated with the click rate of the search result page and the richness of the search result page of the search word; screening low-quality search words from the candidate search words according to the quality scores; querying high-quality search words with semantic correlation relations with the low-quality search words; and replacing low-quality search words in the plurality of candidate search words with corresponding high-quality search words to obtain a plurality of recommended search words corresponding to the search text, and presenting the plurality of recommended search words in a search interface.
The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In the related art, after recall of candidate search words related to a search text, the candidate search words are ranked according to their popularity (the popularity can be measured by data such as a search frequency, a click frequency, etc.), recommended search words are determined according to the ranking result, and returned to a terminal, and the terminal displays a list of recommended search words. However, since the conversion effect of the search result page (such as the click rate of the search result page) is not considered in this way, there may be a case where the click rate of the recommended search term is high, i.e. the heat is high, but the click rate of the search structure page is low, i.e. the search service recommends the search term that the user intends to search for to the user, but the user cannot find the required information from the search result page of the search term after clicking, thereby reducing the overall search experience.
In the embodiment of the application, the quality scores of the candidate search words recalled are inquired, and the low-quality search words are screened out from the candidate search words according to the quality scores, so that the low-quality search words can be replaced by the higher-quality search words because the quality scores of the search words are positively correlated with the click rate of the search result page and the richness of the search result page of the search words, the problem that a user can not find required information because the user clicks the low-quality search words and then enters the search result page with poor conversion effect is avoided, the search experience of the user is improved, and the related indexes of a search system such as the click rate of the search result page are also improved. Moreover, the high-quality search words adopted in the replacement are search words with semantic correlation with low-quality search words, so that the candidate search words can be replaced by search words which are irrelevant to the search intention of the user without semantic deviation, and the search experience of the user is further improved.
In one embodiment, as shown in fig. 2, a search term processing method is provided, which is illustrated by using the method applied to the computer device (such as the terminal 102 or the server 104) in fig. 1 as an example, and includes the following steps:
Step 202, obtaining a search text input in a search interface, and obtaining a plurality of candidate search words corresponding to the search text.
The search text is a text for searching information, for example, the search text may be a text for searching video, a text for searching commodity information, or a text for searching general information. The candidate search terms are recalled search terms that the computer device has understood the search text. Candidate search terms may be selected from historical search terms stored by the search system, or may be automatically generated by the search system. The recall mode of the candidate search words can be prefix recall, pinyin prefix recall, simple spelling recall, word granularity recall, semantic recall based on deep learning, and the like, and any recall mode can be adopted.
In one embodiment, step 202 may include: determining a plurality of search words taking a search text as a prefix in a search word library; calculating semantic similarity between each search word prefixed by the search text and the search text; acquiring the heat of each search word prefixed by a search text; and screening a plurality of candidate search words from the plurality of search words according to the semantic similarity and the heat.
Specifically, the computer device, after obtaining the search text input in the search interface, retrieves historical search words prefixed by the search text from the search word stock, and calculates semantic similarity between the historical search words and the search text. For example, the computer device may calculate semantic similarity of the search text to each of the retrieved historical search terms using a neural network model based on natural language processing. In addition, the computer device also obtains the heat degree of each searched historical search word, and the heat degree can be counted according to the historical search frequency, for example, the search frequency in the past period of time. The computer device then screens a plurality of candidate search terms from the retrieved historical search terms based on the two indicators of semantic similarity and heat.
Step 204, inquiring the quality score of each candidate search term; the quality score of the search word is positively correlated with the click rate of the search result page and the richness of the search result page of the search word.
Wherein the quality score evaluates the "good or bad" of the candidate search term from multiple dimensions. The plurality of dimensions includes a search result page click rate and a search result page richness of the search term. Of course, other dimensions may be added to evaluate the quality of the search term according to actual needs.
For example, the evaluation dimension of the quality score of a search term may also include the search hotness of the search term, and so on. In some cases, the evaluation dimension of the quality score may also relate to the application scenario of the search service, and different application scenarios may require different evaluation dimensions, e.g., in a search system that includes an advertisement search, a higher quality score may be given to a search term that is profitable for the advertisement; in a search system comprising commodity information search, higher quality scores may be given to search terms with large sales of commodities; in a search system including medical information searching, the quality scores of the search terms may also be determined according to the search intentions, i.e., the search terms with different intentions may correspond to different quality score calculation modes, for example, if the intention of the search text is to search for a hospital, the quality scores of the search terms related to the hospital may be higher than those of the search terms related to non-hospital, and if the intention of the search text is to search for a doctor, the quality scores of the search terms related to the doctor may be higher than those of the search terms related to non-doctor. Optionally, the computer device may also consider the following: whether the search term contains a mispronounced word, the length of the search term, whether the search term contains a sensitive word, etc., it will be appreciated that the search term containing the mispronounced word will not typically be clicked by the user, and the corresponding quality score should be relatively low; the longer the length of the search term, the more difficult it is to generally search for information desired by the user, and the corresponding quality score should be relatively low; the search term contains some sensitive words and the corresponding quality score should be relatively low.
The click rate of the search result page of the search word reflects the quality of the search word to a certain extent, the click rate of the search result page of the search word can be represented by the ratio of the click times of the search result page of the search word to the search times (or a numerical value positively related to the ratio) within a period of time, and the higher the click rate of the search result page, the higher the quality score of the search word. Specifically, the terminal submits the search word to the computer device, the computer device feeds back a search result page of the search word to the terminal, the terminal displays the search result page, and the user may check the information by clicking due to the existence of the information required by the user in the search result page, or may not trigger a clicking event on the search result page due to the absence of the information required by the user in the search result page.
In one embodiment, the step of determining a search result page click rate for the search term comprises: counting the searching times of the search words and the clicking times of the search result pages of the search words; and determining the click rate of the search result page of the search word according to the ratio of the click times to the search times.
Specifically, the computer device may count the number of times each search term has been searched over a period of time (e.g., one week, half month, one month, three months, etc.), and wherein the number of click events has been triggered for the skipped search result page, and the search result page click rate for the search term is determined based on the ratio of the latter to the former. For example, the search term "Shenzhen suitable for seeing sea" is selected to click 2000 times in the past month, wherein 1500 times the user further triggers the click behavior in their corresponding search result page, and the click rate of the search result page corresponding to the search term is 75%.
Optionally, if the candidate search term does not have a corresponding search record in the past period of time, the computer device may set the corresponding search result page click rate to a preset value, such as 0.5.
The richness of the search result page reflects the quality of the search words to a certain extent, the richness of the search result page can be measured by the quantity of various types of contents in the search result page, and the higher the richness of the search result page is, the higher the quality score of the search words is. The types of content contained in the search results page may include text, video, questions and answers, pictures, and so forth.
In one embodiment, the step of determining the richness of the search results page of the search term includes: counting the quantity of various types of contents in a search result page of the search word; and determining the richness of the search result page of the search word according to the quantity of the various types of content.
Specifically, the computer device may invoke the search interface to obtain content data of a search result page of each search term, extract the number of types of content in the search result page according to the content data, and determine the richness of the search result page of the search term according to the number of types of content.
In one embodiment, determining the search result page richness of the search term based on the number of types of content includes: acquiring weights of various types of content; according to the weights of the various types of contents, carrying out weighted summation on the quantity of the various types of contents in the first page of the search result page of the search word to obtain weighted scores; and determining the richness of the search result page of the search word according to the weighted score.
Alternatively, if the number of contents of a certain type exceeds a threshold, the threshold is taken as the number of contents of that type. Alternatively, the computer device may count only the number of contents of each type in the top page of the search result page of the search term, and count the richness of the top page as the richness of the search result page. Alternatively, the weights of the types may be set according to actual requirements, and it will be understood that those content types that are more easily clicked by the user should be given higher weights, and those content types that are not easily clicked by the user should be given relatively lower weights. For example, the weight of the graphics context is 0.2, the weight of the video is 0.3, the weight of the question and answer is 0.3, and the weight of the picture is 0.2.
Alternatively, the computer device may calculate the search result page richness Sresult according to the following formula:
s.t.Σ i∈C w i =1;
wherein, C is a content type set of the search result page, such as graphics context, video, question and answer, picture, etc., i C is the number of types, ni is the number corresponding to the i-th type of content in the content data of the search result page, wi is the i-th type of weight, K is a super parameter, and is used for limiting the maximum score of each type, and can be set to 10.
In one embodiment, a computer device may determine a search result page click rate and a search result page richness for each search term in a search thesaurus; and determining the quality score of each search word according to the click rate of the search result page and the richness of the search result page.
For example, the computer device may obtain weights for the click rate of the search result page and the richness of the search result page, where the weights may be values preset according to experience, for example, 0.6 and 0.4, respectively, and weight-sum the click rate of the search result page and the richness of the search result page according to the weights, so as to obtain the quality score of the search word.
As mentioned above, the evaluation dimensions of the quality score may also include other dimensions, and accordingly, the computer device may weight and sum the evaluation metrics for each dimension according to the weights of the respective evaluation dimensions to obtain the quality score of the search term.
Optionally, after deriving a quality score for each search term, the computer device may store the search term in correspondence with the respective quality score. For example, the storage may be performed by the following data structure:
< word, score >, i.e., search term-quality score.
Thus, according to the foregoing, the quality score of each search word may be stored in a storage system or database of the computer device, and the computer device may query the storage system or database for the quality score corresponding to each candidate search word based on the candidate search word. Thus, the computer equipment can obtain the quality score of each candidate search term only by means of inquiry, does not need to consume a large amount of calculation, and has the advantage of quick response.
And step 206, screening low-quality search words from the candidate search words according to the quality scores.
The computer device may compare the quality score of the candidate search term to a set threshold and treat the candidate search term having a quality score below the set threshold as a low quality search term. Wherein the set threshold is a value preset empirically. Because the quality score of the search word is positively correlated with the search result page click rate and the search result page richness of the search word, candidate search words with poor search result page click rate and poor search result page richness can be found from candidate search words of the search text, so that the low-quality search word can be replaced by a higher-quality search word in the follow-up process, and the problem that a user can not find required information after clicking the low-quality search word and entering a search result page with poor conversion effect can be avoided.
Step 208, query high quality search terms having semantic relevance to low quality search terms.
The two search words have semantic correlation relations, and the two search words are characterized as words with similar semantics. The search words with the semantic correlation relations are stored as a data object in a storage system or a database of the computer equipment, so that the computer equipment can find the search words with the semantic correlation relations according to the search words conveniently. For each low-quality search word in the candidate search words, the computer equipment inquires the search words with semantic correlation with the low-quality search word, inquires the corresponding quality scores of the search words, screens out the search words with the highest quality score or the top ranking, and if the quality score of the screened search word is greater than a set certain threshold value, the search word is the high-quality search word. Optionally, if the quality score of the selected search term is lower than a certain threshold value, then there is no high quality search term having a semantic correlation with the low quality search term, and the computer device may filter the search term and not act as a recommended search term. The threshold for screening low quality search terms and the threshold for screening high quality search terms may be different values or the same values, which is not limited in this embodiment of the present application.
Step 210, replacing the low-quality search word in the plurality of candidate search words with the corresponding high-quality search word to obtain a plurality of recommended search words corresponding to the search text, and presenting the plurality of recommended search words in the search interface.
Specifically, the computer device replaces the low-quality search word with the high-quality search word, so that further screening and processing of candidate search words are completed, the obtained search words are recommended search words which can be recommended to the user, the computer device can present the recommended search words to the user to prompt the user, and the user can conveniently and quickly find out required information. Because the high-quality search words adopted in the replacement are search words with semantic correlation with low-quality search words, the candidate search words can be prevented from being replaced by search words which are irrelevant to the search intention of the user due to semantic deviation, and the search experience of the user is further improved.
In one embodiment, among a plurality of candidate search words corresponding to the search text, a plurality of low-quality search words with similar semantics may exist, so that the plurality of low-quality search words with similar semantics may correspond to the same high-quality search word, the number of recommended search words which respond to the search text after replacement is reduced and is lower than the required target number, at this time, the computer device may "replace" some candidate search words from the recall search words corresponding to the search text, so that a certain number of candidate search words are supplemented, screening and processing are continued according to the steps described above until the number of recommended search words reaches the target number, and the computer device may return the recommended search words to the terminal and present the recommended search words to the user.
Referring to fig. 3 (a), in the related art, when a user inputs "finger", the recommended search term includes "finger straighten is curved", the quality of content data of a search result page is poor, and the click rate of the search result page is low. Referring to fig. 3 (b), by the search term processing method provided by the embodiment of the present application, after the candidate search term of "finger is straightened and bent" is replaced by "how the finger is straightened and how to get back", the recommended search term list is presented to the user, and since the quality of the content data of the search result page of "how to get back when the finger is straightened" is relatively good, the click rate of the search result page is relatively high, thereby improving the quality of the search result page of each search term in the recommended search term list, improving the possibility of clicking on the search result page by the user, facilitating the user to find the required information as soon as possible, and improving the user experience.
According to the search word processing method, after the search text input in the search interface is acquired, the plurality of candidate search words corresponding to the search text are acquired, the corresponding quality score is queried for each candidate search word, and the low-quality search word is screened out from the plurality of candidate search words according to the quality score. Moreover, the high-quality search words adopted in the replacement are search words with semantic correlation with low-quality search words, so that the candidate search words can be replaced by search words which are irrelevant to the search intention of the user without semantic deviation, and the search experience of the user is further improved.
In one embodiment, as shown in fig. 4, the method further includes a step of clustering the search terms, specifically including steps 402 to 406:
step 402, extracting semantic vector representations of each search term in the search term library.
The search words in the search word library may be search words stored by the computer device according to the historical search record, or may be search words generated by the computer device, for example, may be abstract phrases extracted by the computer device according to network popular events or popular information. The computer device may store the search terms into a search term library, and for each search term in the search term library, the computer device may extract a corresponding semantic vector representation.
The computer device may employ a word vector computing method to compute a semantic vector representation of each search word. Alternatively, the computer device may employ word2vec weighted averaging to obtain a semantic vector representation of each search term, in a manner that characterizes the term as a vector, and the distance of these vectors may reflect the semantic relevance of the term, i.e., mapping the terms in a given vocabulary into a low-dimensional dense space where the semantically identical terms are closer together and the semantically different terms are farther apart. Optionally, the computer device may also obtain a search term semantic vector representation based on SimCSE (Simple Contrastive Learning of Sentence Embeddings, a framework of simple comparative sentence vector characterization) training. Optionally, the computer device may also obtain a semantic vector representation of the search term through a self-trained Sentence vector characterization model (the model may be based on Sentence-BERT, etc.). The embodiment of the application does not limit the way in which the semantic vector representation of the search term is obtained.
In one embodiment, extracting a semantic vector representation of each search term in a search thesaurus includes:
and respectively extracting semantic vectors of each search word in the search word bank through a preset semantic vector representation model to obtain semantic vector representations of each search word in the search word bank.
Step 404, clustering search words in a search word bank according to semantic vector representation to obtain a plurality of clusters; wherein, each cluster has a cluster center, and the semantic vector of the search word in the cluster represents the similarity with the cluster center of the cluster where the semantic vector is located and is larger than the similarity with the cluster centers of other clusters; semantic correlation exists among search words in the same cluster.
The clustering function is to aggregate the semantically related search words into one data object (i.e. cluster), and the irrelevant search words are divided into different clusters, so that search words with semantically related relation with the search words can be conveniently searched. Specifically, the computer device may use any clustering method to cluster the search words according to the semantic vector representation of each search word in the search word bank, so as to obtain a plurality of cluster clusters. The clustering means may be single-pass clustering, k-means clustering, and the like.
And step 406, storing the cluster identification of each cluster corresponding to the search word included in the cluster.
After obtaining the plurality of clusters, the computer device may store the cluster identifier of each cluster and the search term included in the cluster in a storage system or database, for example, may store the cluster identifier and the search term in a key-value database such as redis, memcache, or the like. Alternatively, the computer device may store the following data structures:
< word, cluster_id >, search word-cluster identification of the cluster in which the word is located;
< cluster_id, word_list >, cluster identification of cluster-search term list;
that is, the computer device may find the cluster identifier of the cluster in which the search word is located according to the search word, and the computer device may query all the search words included in the cluster according to the cluster identifier of the cluster.
In the embodiment, the clustering result is stored by clustering the search words in advance, so that only a storage system or a database is required to be queried when the low-quality search words are replaced by the high-quality search words, a large amount of calculation is not required to be consumed, the method has the advantage of quick response, the high-quality search words can be quickly embedded, and the response efficiency is ensured.
In one embodiment, step 208, querying high quality search terms that have semantic relevance to low quality search terms includes: inquiring cluster identifiers corresponding to target clusters where the low-quality search words are located; determining search words in a target cluster corresponding to the cluster identifier; inquiring the quality scores of the search words in the target cluster; and taking the search word with the highest corresponding quality score in the target cluster as a high-quality search word with semantic correlation with the low-quality search word.
In one embodiment, step 404, clustering search words in the search word bank according to the semantic vector representation to obtain a plurality of clusters, including: the semantic vector of the first search word in the search word library is expressed as a first cluster center; traversing search words in a search word library, and calculating similarity between semantic vector representations of the traversed search words and centers of all clusters; if the maximum similarity in the similarity is greater than or equal to a second threshold value, adding the traversed search word into a cluster where the cluster center corresponding to the maximum similarity is located; if the maximum similarity in the similarity is smaller than a second threshold value, representing the semantic vector of the traversed search word as a newly added cluster center; and returning to the step of traversing the search words in the search word stock to continue to be executed until the search words in the search word stock are traversed, so as to obtain a plurality of cluster clusters.
Specifically, the computer device may search the word library for a semantic vector representation of a first search word as a first cluster center, creating a cluster; then, the computer equipment traverses the search words in the search word library, calculates the similarity between the semantic vector representation X of the next traversed search word and the cluster centers of all the existing clusters, calculates the similarity by adopting cosine distance measurement, and finds out the existing cluster with the maximum similarity with the semantic vector representation X of the search word; if the similarity is greater than or equal to a threshold value theta, adding the semantic vector representation X of the search word into a cluster with the maximum similarity, if the similarity is less than the threshold value theta, not belonging to any existing cluster, creating a new cluster, attributing the semantic vector representation X of the search word to the newly created cluster, and setting the cluster center as the semantic vector representation X of the search word; and according to the mode, traversing the search words in the search word library continuously until all the search words are traversed, and ending the clustering. In the clustering mode for the search words, each search word only needs to be traversed once, and the calculation speed is very high.
Based on the foregoing, before the computer device provides the search service for the terminal, quality score and clustering are required to be performed on the search words, that is, quality score evaluation is performed on all the search words; clustering all search words; and storing the clustering result and the quality score of the search word. FIG. 5 is a flow diagram that illustrates processing of search terms in a search term library by a computer device, in one embodiment. Referring to fig. 5, in order to enhance the search experience of the user, the computer device needs to perform quality evaluation on each search word in the search word bank, and also needs to perform semantic related clustering on the search word to form a cluster formed by a plurality of semantic related search words. FIG. 6 is a flow diagram of a search response in one embodiment. That is, after the preparation is completed, the computer device may provide a search service for a client including a search function on the terminal. The method specifically comprises the following steps: after a user inputs a search text, recalling a plurality of candidate search words corresponding to the search text from a search word library, inquiring quality scores of the candidate search words, detecting low-quality search words with low quality scores from the candidate search words, inquiring a cluster where the low-quality search words are located, finding out search words with corresponding quality scores higher than a set value from the cluster as high-quality search words, replacing the low-quality search words with the corresponding high-quality search words, outputting a recommended search word list, feeding back to a terminal, and displaying the recommended search word list to the user by the terminal.
FIG. 7 is a flow chart of a search term processing method in one embodiment. The method comprises the following steps:
step 702, for each search word in the search word bank, counting the search times of the search word and the click times of the search result page of the search word.
The computer device may count the number of times each search term has been searched over a period of time (e.g., one week, half month, one month, or three months, etc.), and wherein the number of click events has been triggered for the skipped search result page, and the search result page click rate for the search term is determined based on the ratio of the latter to the former. Optionally, if the candidate search term does not have a corresponding search record in the above past period of time, the computer device may set the corresponding search result page click rate to a preset value.
Step 704, determining the click rate of the search result page of the search word according to the ratio of the click times to the search times.
Step 706, counting the number of each type of content in the search result page of the search term.
Specifically, the computer device may invoke the search interface to obtain content data of a search result page of each search term, extract the number of types of content in the search result page according to the content data, and determine the richness of the search result page of the search term according to the number of types of content.
Step 708, obtaining the weight of each type of content, and according to the weight of each type of content, carrying out weighted summation on the quantity of each type of content in the first page of the search result page of the search word, so as to obtain a weighted score.
Alternatively, the weights of the various types of content may be set according to actual requirements, and it will be understood that those types of content that are more likely to be clicked by the user should be given higher weights, and those types of content that are less likely to be clicked by the user should be given relatively lower weights.
Step 710, determining the richness of the search result page of the search term according to the weighted score.
Alternatively, the computer device may calculate the search result page richness Sresult according to the following formula:
s.t.∑ i∈C w i =1;
wherein, C is a content type set of the search result page, such as graphics context, video, question and answer, picture, etc., i C is the number of types, ni is the number corresponding to the i-th type of content in the content data of the search result page, wi is the i-th type of weight, K is a super parameter, and is used for limiting the maximum score of each type, and can be set to 10.
Step 712, determining quality scores of the search terms according to the click rate of the search result page and the richness of the search result page.
Step 714, the search term is stored in correspondence with the quality score.
Optionally, after deriving a quality score for each search term, the computer device may store the search term in correspondence with the respective quality score. For example, the storage may be performed by the following data structure:
< word, score >, i.e., search term-quality score.
Step 716 extracts a semantic vector representation of each search term in the search term library.
The search words in the search word library may be search words stored by the computer device according to the historical search record, may also be search words generated by itself, and may be abstract phrases extracted by the computer device according to network popular events or popular information, for example. The computer device may store the search terms into a search term library, and for each search term in the search term library, the computer device may extract a corresponding semantic vector representation.
The computer device may employ a word vector computing method to compute a semantic vector representation of each search word. Alternatively, the computer device may employ word2vec weighted averaging to obtain a semantic vector representation of each search term, in a manner that characterizes the term as a vector, and the distance of these vectors may reflect the semantic relevance of the term, i.e., mapping the terms in a given vocabulary into a low-dimensional dense space where the semantically identical terms are closer together and the semantically different terms are farther apart. Optionally, the computer device may also obtain a search term semantic vector representation based on SimCSE (Simple Contrastive Learning of Sentence Embeddings, a framework of simple comparative sentence vector characterization) training. Optionally, the computer device may also obtain a semantic vector representation of the search term through a self-trained Sentence vector characterization model (the model may be based on Sentence-BERT, etc.). The embodiment of the application does not limit the way in which the semantic vector representation of the search term is obtained.
Step 718, representing the semantic vector of the first search term in the search word bank as the first cluster center.
Step 720, traversing search words in the search word library, and calculating the similarity between semantic vector representations of the traversed search words and the centers of all clusters; if the maximum similarity in the similarity is greater than or equal to a second threshold value, adding the traversed search word into a cluster where the cluster center corresponding to the maximum similarity is located; if the maximum similarity in the similarity is smaller than a second threshold, the semantic vector representation of the traversed search word is used as a newly added cluster center, and a plurality of clusters are obtained when the search word in the search word library is traversed.
The clustering function is to aggregate the semantically related search words into one data object (i.e. cluster), and the irrelevant search words are divided into different clusters, so that search words with semantically related relation with the search words can be conveniently searched.
Step 722, storing the cluster identifier of each cluster corresponding to the search word included in the cluster, and storing the cluster identifier of each cluster corresponding to the search word included in the cluster.
After obtaining the plurality of clusters, the computer device may store the cluster identifier of each cluster and the search term included in the cluster in a storage system or database, for example, may store the cluster identifier and the search term in a key-value database such as redis, memcache, or the like. Alternatively, the computer device may store the following data structures:
< word, cluster_id >, search word-cluster identification of the cluster in which the word is located;
< cluster_id, word_list >, cluster identification of cluster-search term list;
that is, the computer device may find the cluster identifier of the cluster in which the search word is located according to the search word, and the computer device may query all the search words included in the cluster according to the cluster identifier of the cluster.
Step 724, obtaining the search text and determining a plurality of search words prefixed by the search text in the search word bank.
Step 726, calculating semantic similarity between each search word prefixed by the search text and the search text;
for example, the computer device may calculate semantic similarity of the search text to each of the retrieved search terms using a neural network model based on natural language processing.
Step 728, the hotness of each search term prefixed by the search text is obtained.
In addition, the computer device also obtains the heat degree of each searched historical search word, and the heat degree can be counted according to the historical search frequency, for example, the search frequency in the past period of time.
Step 730, selecting a plurality of candidate search words from the plurality of search words according to the semantic similarity and the heat.
The computer device then screens a plurality of candidate search terms from the retrieved historical search terms based on the two indicators of semantic similarity and heat.
Step 732 queries the quality scores for each candidate search term.
Step 734, selecting candidate search words with corresponding quality scores lower than a first threshold value from the plurality of candidate search words according to the quality scores as low-quality search words.
The computer device may compare the quality score of the candidate search term to a set threshold and treat the candidate search term having a quality score below the set threshold as a low quality search term. Wherein the set threshold is a value preset empirically.
Step 736, querying a cluster identifier corresponding to the target cluster in which the low-quality search term is located.
Step 738, determining the search term in the target cluster corresponding to the cluster identifier.
Step 740, querying respective quality scores of the search words in the target cluster.
Step 742, using the search word with the highest corresponding quality score in the target cluster as the high-quality search word with the semantic correlation with the low-quality search word.
The search words with the semantic correlation relations are stored as a data object in a storage system or a database of the computer equipment, so that the computer equipment can find the search words with the semantic correlation relations according to the search words conveniently. For each low-quality search word in the candidate search words, the computer equipment inquires the search words with semantic correlation with the low-quality search word, inquires the corresponding quality scores of the search words, screens out the search words with the highest quality score or the top ranking, and if the quality score of the screened search word is greater than a set certain threshold value, the search word is the high-quality search word. Optionally, if the quality score of the selected search term is lower than a certain threshold value, then there is no high quality search term having a semantic correlation with the low quality search term, and the computer device may filter the search term and not act as a recommended search term. The threshold for screening low quality search terms and the threshold for screening high quality search terms may be different values or the same values, which is not limited in this embodiment of the present application.
In step 744, the low quality search word in the plurality of candidate search words is replaced with the corresponding high quality search word to obtain a plurality of recommended search words corresponding to the search text, and the search text is searched and responded according to the plurality of recommended search words.
According to the search word processing method, after the plurality of candidate search words corresponding to the search text are obtained, corresponding quality scores are inquired for each candidate search word, and low-quality search words are determined from the plurality of candidate search words according to the quality scores, and because the quality scores of the search words are positively correlated with the click rate of search result pages and the richness of the search result pages of the search words, the low-quality search words can be replaced by the higher-quality search words, so that the situation that a user can not find required information after clicking the low-quality search words and entering the search result pages with poor conversion effect can be avoided, the search experience of the user is improved, and related indexes of a search system such as the click rate of the search result pages are also improved. Moreover, the high-quality search words adopted in the replacement are search words with semantic correlation with low-quality search words, so that the candidate search words can be replaced by search words which are irrelevant to the search intention of the user without semantic deviation, and the search experience of the user is further improved.
By scoring the quality of the search words in advance and storing the quality score of each search word and clustering the search words in the search word library in advance and storing the clustering result, the search word library can ensure that only a storage system or a database is required to be queried when the low-quality search word is replaced by the high-quality search word, and a large amount of calculation is not required.
It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a search word processing device for realizing the above related search word processing method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the device for processing search terms provided below may refer to the limitation of the method for processing search terms in the above description, which is not repeated here.
In one embodiment, as shown in fig. 8, there is provided a search term processing apparatus 800 comprising: a candidate search term determination module 802, a query module 804, a low quality search term screening module 806, and a search term replacement module 808, wherein:
a candidate search word determining module 802, configured to obtain a search text input in a search interface, and obtain a plurality of candidate search words corresponding to the search text;
a query module 804, configured to query quality scores of candidate search terms; the quality score of the search word is positively correlated with the click rate of the search result page and the richness of the search result page of the search word;
a low quality search term screening module 806 for screening low quality search terms from the plurality of candidate search terms according to the quality score;
The query module 804 is further configured to query high-quality search terms having semantic relevance to low-quality search terms;
the search word replacement module 808 is configured to replace a low quality search word in the plurality of candidate search words with a corresponding high quality search word, obtain a plurality of recommended search words corresponding to the search text, and present the plurality of recommended search words in the search interface.
In one embodiment, as shown in fig. 9, the search term processing apparatus 800 further includes:
a quality score determining module 810 for determining a click rate of a search result page and a richness of the search result page for each search term in the search thesaurus; and determining the quality score of each search word according to the click rate of the search result page and the richness of the search result page.
In one embodiment, as shown in fig. 9, the quality score determination module 810 further includes:
the search result page click rate statistics unit is used for counting the search times of the search words and the click times of the search result pages of the search words; and determining the click rate of the search result page of the search word according to the ratio of the click times to the search times.
In one embodiment, as shown in fig. 9, the quality score determination module 810 further includes:
the search result page richness statistics unit is used for counting the quantity of various types of contents in the search result page of the search word; and determining the richness of the search result page of the search word according to the quantity of the various types of content.
In one embodiment, the search result page richness statistics unit is further configured to obtain weights of various types of content; according to the weights of the various types of contents, carrying out weighted summation on the quantity of the various types of contents in the first page of the search result page of the search word to obtain weighted scores; and determining the richness of the search result page of the search word according to the weighted score.
In one embodiment, the low quality search term screening module 806 is further configured to screen candidate search terms with a corresponding quality score lower than a first threshold value from the plurality of candidate search terms as low quality search terms according to the quality score.
In one embodiment, as shown in fig. 9, the search term processing apparatus 800 further includes:
a clustering module 812, configured to extract a semantic vector representation of each search term in the search term library; clustering search words in a search word bank according to semantic vector representation to obtain a plurality of clustering clusters; wherein, each cluster has a cluster center, and the semantic vector of the search word in the cluster represents the similarity with the cluster center of the cluster where the semantic vector is located and is larger than the similarity with the cluster centers of other clusters; semantic correlation exists among search words in the same cluster; and storing the cluster identification of each cluster corresponding to the search word included in the cluster.
In one embodiment, the query module 804 is further configured to query a cluster identifier corresponding to a target cluster in which the low-quality search term is located; determining search words in a target cluster corresponding to the cluster identifier; inquiring the quality scores of the search words in the target cluster; and taking the search word with the highest corresponding quality score in the target cluster as a high-quality search word with semantic correlation with the low-quality search word.
In one embodiment, the clustering module 812 is further configured to represent the semantic vector of the first search term in the search word bank as the first cluster center; traversing search words in a search word library, and calculating similarity between semantic vector representations of the traversed search words and centers of all clusters; if the maximum similarity in the similarity is greater than or equal to a second threshold value, adding the traversed search word into a cluster where the cluster center corresponding to the maximum similarity is located; if the maximum similarity in the similarity is smaller than a second threshold, the semantic vector of the traversed search word is used as a newly added cluster center, and a plurality of clusters are obtained until the search word in the search word bank is traversed.
In one embodiment, the clustering module 812 is further configured to perform semantic vector extraction on each search word in the search word bank through a preset semantic vector representation model, so as to obtain a semantic vector representation of each search word in the search word bank.
In one embodiment, the candidate search term determination module 802 is further configured to determine a plurality of search terms prefixed by a search text in a search term library; calculating semantic similarity between each search word prefixed by the search text and the search text; acquiring the heat of each search word prefixed by a search text; and screening a plurality of candidate search words from the plurality of search words according to the semantic similarity and the heat.
After obtaining the search text input in the search interface and obtaining the plurality of candidate search words corresponding to the search text, the search word processing device 800 queries the corresponding quality score for each candidate search word, determines the low-quality search word from the plurality of candidate search words according to the quality score, and can replace the low-quality search word with a higher-quality search word because the quality score of the search word is positively correlated with the click rate of the search result page and the richness of the search result page of the search word, thereby avoiding that the user can not find the required information after clicking the low-quality search word and entering the search result page with poor conversion effect, improving the search experience of the user, and improving the related indexes of the search system, such as the click rate of the search result page. Moreover, the high-quality search words adopted in the replacement are search words with semantic correlation with low-quality search words, so that the candidate search words can be replaced by search words which are irrelevant to the search intention of the user without semantic deviation, and the search experience of the user is further improved.
The various modules in the search term processing apparatus 800 described above may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be the server 104 shown in FIG. 1, and the internal structure of which may be as shown in FIG. 10A. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store search terms, quality scores for the search terms, clustering results for the search terms, and so forth. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of search word processing.
In one embodiment, a computer device is provided, which may be the terminal 102 shown in fig. 1, and the internal structure diagram of which may be as shown in fig. 10B. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of search word processing. The display unit of the computer equipment is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device, wherein the display screen can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on a shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structures shown in fig. 10A and 10B are merely block diagrams of portions of structures associated with aspects of the application and are not intended to limit the computer apparatus to which aspects of the application may be applied, and that a particular computer apparatus may include more or less components than those shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a computer device is provided, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the search word processing method provided by the embodiment of the present application when the computer program is executed.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps of the search word processing method provided by the embodiment of the present application.
In one embodiment, a computer program product is provided, comprising a computer program that, when executed by a processor, implements the steps of the search term processing method provided by the embodiments of the present application.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (15)

1. A method of search term processing, the method comprising:
acquiring a search text input in a search interface;
acquiring a plurality of candidate search words corresponding to the search text;
querying a quality score of each candidate search term; the quality score of the search word is positively correlated with the click rate of the search result page and the richness of the search result page of the search word;
Screening low-quality search words from the plurality of candidate search words according to the quality scores;
querying high-quality search words with semantic correlation with the low-quality search words;
replacing low-quality search words in the plurality of candidate search words with corresponding high-quality search words to obtain a plurality of recommended search words corresponding to the search text;
presenting the plurality of recommended search terms in the search interface.
2. The method according to claim 1, wherein the method further comprises:
extracting semantic vector representations of each search word in the search word bank;
clustering search words in the search word bank according to the semantic vector representation to obtain a plurality of clustering clusters; wherein, each cluster has a cluster center, and the semantic vector of the search word in the cluster represents the similarity with the cluster center of the cluster where the semantic vector is located and is larger than the similarity with the cluster centers of other clusters; semantic correlation exists among search words in the same cluster;
and storing the cluster identification of each cluster corresponding to the search word included in the cluster.
3. The method of claim 2, wherein the querying high quality search terms that have semantic relevance to the low quality search terms comprises:
Inquiring a cluster identifier corresponding to a target cluster where the low-quality search word is located;
determining search words in a target cluster corresponding to the cluster identifier;
inquiring the quality scores of the search words in the target cluster;
and taking the search word with the highest corresponding quality score in the target cluster as a high-quality search word with semantic correlation with the low-quality search word.
4. The method according to claim 2, wherein clustering the search words in the search word bank according to the semantic vector representation to obtain a plurality of clusters comprises:
the semantic vector of the first search word in the search word bank is used as the first cluster center;
traversing the search words in the search word library, and calculating the similarity between semantic vector representations of the traversed search words and the centers of all clusters;
if the maximum similarity in the similarity is greater than or equal to a second threshold value, adding the traversed search word into a cluster in which the cluster center corresponding to the maximum similarity is located;
if the maximum similarity in the similarity is smaller than a second threshold value, representing the semantic vector of the traversed search word as a newly added cluster center;
And returning to the step of traversing the search words in the search word bank to be continuously executed until the search words in the search word bank are traversed, so as to obtain a plurality of cluster clusters.
5. The method of claim 2, wherein extracting the semantic vector representation of each search term in the search term library comprises:
and respectively extracting semantic vectors of each search word in the search word bank through a preset semantic vector representation model to obtain semantic vector representations of each search word in the search word bank.
6. The method according to claim 1, wherein the method further comprises:
for each search word in the search word bank, acquiring the click rate of a search result page and the richness of the search result page corresponding to the search word;
and calculating the quality score of each search word according to the click rate of the search result page and the richness of the search result page.
7. The method of claim 6, wherein the step of search result page click rate of the search term comprises:
counting the searching times of the search words and the clicking times of the search result pages of the search words;
and calculating the click rate of the search result page of the search word according to the ratio of the click times to the search times.
8. The method of claim 6, wherein the step of searching for the search term page richness comprises:
counting the quantity of various types of contents in a search result page of the search word;
and calculating the richness of the search result page of the search word according to the quantity of the various types of content.
9. The method of claim 8, wherein calculating the search result page richness of the search term according to the number of the types of content comprises:
acquiring weights of various types of content;
according to the weights of the various types of contents, carrying out weighted summation on the quantity of the various types of contents in the first page of the search result page of the search word to obtain weighted scores;
and determining the richness of the search result page of the search word according to the weighted score.
10. The method of claim 6, wherein said screening low quality search terms from said plurality of candidate search terms based on said quality score comprises:
and screening candidate search words with the corresponding quality scores lower than a first threshold value from the plurality of candidate search words according to the quality scores, and taking the candidate search words as low-quality search words.
11. The method according to any one of claims 1 to 10, wherein the obtaining a plurality of candidate search terms corresponding to the search text includes:
Acquiring a plurality of search words prefixed by the search text from a search word library;
calculating semantic similarity between each search word prefixed by the search text and the search text;
acquiring the heat of each search word prefixed by the search text;
and screening a plurality of candidate search words from the plurality of search words according to the semantic similarity and the heat.
12. A search term processing apparatus, the apparatus comprising:
the candidate search word determining module is used for acquiring search text input in a search interface; acquiring a plurality of candidate search words corresponding to the search text;
the query module is used for querying the quality score of each candidate search term; the quality score of the search word is positively correlated with the click rate of the search result page and the richness of the search result page of the search word;
the low-quality search word screening module is used for screening low-quality search words from the candidate search words according to the quality scores;
the query module is further used for querying high-quality search words with semantic correlation with the low-quality search words;
and the search word replacement module is used for replacing low-quality search words in the plurality of candidate search words with corresponding high-quality search words to obtain a plurality of recommended search words corresponding to the search text, and presenting the plurality of recommended search words in the search interface.
13. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 11 when the computer program is executed.
14. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 11.
15. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 11.
CN202310412688.3A 2023-04-10 2023-04-10 Search word processing method, apparatus, device, storage medium and program product Pending CN116975405A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310412688.3A CN116975405A (en) 2023-04-10 2023-04-10 Search word processing method, apparatus, device, storage medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310412688.3A CN116975405A (en) 2023-04-10 2023-04-10 Search word processing method, apparatus, device, storage medium and program product

Publications (1)

Publication Number Publication Date
CN116975405A true CN116975405A (en) 2023-10-31

Family

ID=88482111

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310412688.3A Pending CN116975405A (en) 2023-04-10 2023-04-10 Search word processing method, apparatus, device, storage medium and program product

Country Status (1)

Country Link
CN (1) CN116975405A (en)

Similar Documents

Publication Publication Date Title
WO2021139325A1 (en) Media information recommendation method and apparatus, electronic device, and storage medium
US20200081906A1 (en) Visual Interactive Search
WO2020108608A1 (en) Search result processing method, device, terminal, electronic device, and storage medium
WO2020006835A1 (en) Customer service method, apparatus, and device for engaging in multiple rounds of question and answer, and storage medium
US10606883B2 (en) Selection of initial document collection for visual interactive search
US20220284327A1 (en) Resource pushing method and apparatus, device, and storage medium
CN111652378B (en) Learning to select vocabulary for category features
JP2022160464A (en) Method for sharing knowledge between dialogue systems, dialogue method, knowledge sharing apparatus, dialogue device, electronic device and, storage medium
CN110390052B (en) Search recommendation method, training method, device and equipment of CTR (China train redundancy report) estimation model
US9218366B1 (en) Query image model
WO2021155691A1 (en) User portrait generating method and apparatus, storage medium, and device
WO2020123689A1 (en) Suggesting text in an electronic document
CN111159431A (en) Knowledge graph-based information visualization method, device, equipment and storage medium
Wei et al. Online education recommendation model based on user behavior data analysis
US11436489B2 (en) Combining statistical methods with a knowledge graph
CN110008396B (en) Object information pushing method, device, equipment and computer readable storage medium
JP7213890B2 (en) Accelerated large-scale similarity computation
CN116975359A (en) Resource processing method, resource recommending method, device and computer equipment
CN114996490A (en) Movie recommendation method, system, storage medium and device
CN116975405A (en) Search word processing method, apparatus, device, storage medium and program product
CN111708952B (en) Label recommending method and system
CN113420139B (en) Text matching method and device, electronic equipment and storage medium
US9600529B2 (en) Attribute-based document searching
US20220237246A1 (en) Techniques for presenting content to a user based on the user&#39;s preferences
CN117648427A (en) Data query method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication