CN114625986A

CN114625986A - Method, device and equipment for sorting search results and storage medium

Info

Publication number: CN114625986A
Application number: CN202210128961.5A
Authority: CN
Inventors: 许翔泓; 欧阳凯; 王刘鄞; 路彦雄; 郑海涛
Original assignee: Tencent Technology Shenzhen Co Ltd; Shenzhen International Graduate School of Tsinghua University
Current assignee: Tencent Technology Shenzhen Co Ltd; Shenzhen International Graduate School of Tsinghua University
Priority date: 2022-02-11
Filing date: 2022-02-11
Publication date: 2022-06-14

Abstract

The application discloses a method, a device, equipment and a storage medium for sequencing search results, and belongs to the technical field of artificial intelligence. The method comprises the following steps: obtaining a plurality of search results to be ordered corresponding to the target search word; determining a relevance feature of each search result to the target search term; determining diversity characteristics corresponding to each search result based on a diversity characteristic extraction model, wherein the diversity characteristics are used for indicating how much the corresponding search result comprises corresponding meanings of the target search words; determining a ranking indication value corresponding to each search result based on the correlation characteristic and the diversity characteristic corresponding to each search result; and sorting the plurality of search results based on the sorting indication value corresponding to each search result. By the method and the device, the efficiency of searching information through the search terms by the user can be improved.

Description

Method, device and equipment for sorting search results and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a storage medium for ranking search results.

Background

With the development of internet technology and big data, it is currently very common for users to search for relevant information through search terms.

In the related art, after receiving a search request sent by a user terminal, a search platform can match corresponding search results in a database according to search terms carried in the search request. And then, according to the matching degree of the search result and the search word, sorting the matched search result, and finally returning the sorted search result to the user terminal.

The same search word may have different meanings in different fields, and when a user searches information through the search word, the search platform cannot clearly know the specific meaning of the search word input by the user. For example, the names of some animals and fruits are registered as their own brands by enterprises. Thus, when the user uses the name of the animal or fruit registered as the brand as the search word, the search platform cannot know whether the user wants to search the related brand information or the related animal or fruit information. Therefore, if the search platform sorts the search results according to the matching between the search results and the search terms, the search results ranked in the front may not be the information that the user wants to search, and the search efficiency may be low.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for sequencing search results, and can solve the problem of low search efficiency. The technical scheme is as follows:

in a first aspect, a method for ranking search results is provided, the method comprising:

obtaining a plurality of search results to be ordered corresponding to the target search word;

determining a relevance feature of each search result to the target search term;

determining diversity characteristics corresponding to each search result based on a diversity characteristic extraction model, wherein the diversity characteristics are used for indicating the number of corresponding meanings of the corresponding search results including the target search words;

determining a ranking indication value corresponding to each search result based on the correlation characteristic and the diversity characteristic corresponding to each search result;

and sequencing the plurality of search results to be sequenced on the basis of the sequencing indicated value corresponding to each search result.

Optionally, the determining the diversity feature corresponding to each search result based on the diversity feature extraction model includes:

determining the data characteristics corresponding to each search result;

establishing a first full-connection graph corresponding to the plurality of search results to be ordered, wherein a plurality of nodes in the first full-connection graph correspond to data characteristics of the plurality of search results to be ordered one by one;

inputting data characteristics corresponding to nodes connected with each edge in the first full-connected graph into a weight calculation unit in the diversity characteristic extraction model, determining a weight value corresponding to each edge, and obtaining a second full-connected graph after the weight value is determined;

and inputting the second full-connection graph into a diversity feature extraction unit in the diversity feature extraction model to obtain diversity features corresponding to each search result.

Optionally, the determining, based on the correlation feature and the diversity feature corresponding to each search result, a ranking indication value corresponding to each search result includes:

determining the data characteristics corresponding to each search result;

for each search result, inputting the relevance characteristics corresponding to the search result into a relevance score calculation module to obtain relevance scores corresponding to the relevance characteristics; inputting data characteristics, diversity characteristics and the target search words corresponding to the search results into a diversity score calculation module to obtain diversity scores corresponding to the diversity characteristics;

and respectively carrying out weighted summation on the relevance scores and the diversity scores of each search result to obtain a sequencing indication value corresponding to each search result.

Optionally, before determining the diversity feature corresponding to each search result based on the diversity feature extraction model, the method further includes:

acquiring a reference search result sequence corresponding to the sample search word, wherein the reference search result sequence comprises a plurality of search results matched with the sample search word;

generating a positive sample sequence and a negative sample sequence based on the reference search result sequence, wherein the sequence of each first search result in the positive sample sequence is the same as the sequence of each first search result in the reference search result sequence, and the sequence of each second search result in the negative sample sequence is different from the sequence of each second search result in the reference search result sequence;

determining a first diversity score corresponding to the search results included in the positive sample sequence and a second diversity score corresponding to the search results included in the negative sample sequence based on the diversity extraction model; determining a first diversity reference value corresponding to the search result included in the positive sample sequence, and determining a second diversity reference value corresponding to the search result included in the negative sample sequence;

training the diversity extraction model based on the first diversity score, the second diversity score, the first diversity reference value and the second diversity reference value.

Optionally, the sorting the search results to be sorted based on the sorting indication value corresponding to each search result includes:

and sorting the plurality of search results to be sorted according to the order of the sorting indication values from high to low.

In a second aspect, an apparatus for ranking search results is provided, the apparatus comprising:

the acquisition module is used for acquiring a plurality of search results to be ordered corresponding to the target search terms;

the determining module is used for determining the correlation characteristics of each search result and the target search words; determining diversity characteristics corresponding to each search result based on a diversity characteristic extraction model, wherein the diversity characteristics are used for indicating the number of corresponding meanings of the corresponding search results including the target search words; determining a ranking indication value corresponding to each search result based on the correlation characteristic and the diversity characteristic corresponding to each search result;

and the sorting module is used for sorting the plurality of search results to be sorted based on the sorting indicating value corresponding to each search result.

Optionally, the determining module is configured to:

determining the data characteristics corresponding to each search result;

and inputting the second full-link graph into a diversity characteristic extraction unit in the diversity characteristic extraction model to obtain diversity characteristics corresponding to each search result.

Optionally, the determining module is configured to:

determining the data characteristics corresponding to each search result;

for each search result, inputting the relevance characteristics corresponding to the search result into a relevance score calculation module to obtain relevance scores corresponding to the relevance characteristics; inputting the data characteristics, the diversity characteristics and the target search terms corresponding to the search results into a diversity score calculation module to obtain diversity scores corresponding to the diversity characteristics;

Optionally, the apparatus further comprises a training module, configured to:

Optionally, the sorting module is configured to:

In a third aspect, a computer device is provided, and the computer device includes a processor and a memory, where at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to implement the operations performed by the method for ranking search results according to the first aspect.

In a fourth aspect, a computer-readable storage medium is provided, in which at least one instruction is stored, and the at least one instruction is loaded and executed by a processor to implement the operations performed by the method for ranking search results according to the first aspect.

In a fifth aspect, a computer program product is provided, where the computer program product includes at least one instruction, and the at least one instruction is loaded and executed by a processor to implement the operations performed by the method for ranking search results according to the first aspect.

The technical scheme provided by the embodiment of the application has the following beneficial effects:

according to the method and the device for ranking the target search words, the relevance characteristics corresponding to the search results to be ranked and the diversity characteristics used for indicating the corresponding meanings of the target search words are determined, and then the corresponding ranking indication values are determined according to the relevance characteristics and the diversity characteristics corresponding to the search results. Therefore, when the search results to be ranked are ranked according to the ranking indicated value corresponding to each search result, the relevance between the search results and the query words can be referred to, and the number of the meanings of the query words included in the search results is also referred to. In this way, in the search results after the ranking, the search results ranked at the top can include the meanings of more query terms, so that the probability of hitting the search intention of the user can be improved, and the search efficiency is further improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application;

FIG. 2 is a flowchart of a method for ranking search results according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a method for ranking search results according to an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating a method for ranking search results according to an embodiment of the present application;

FIG. 5 is a flowchart of a method for training a diversity feature extraction model according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an apparatus for ranking search results according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The method for ordering search results provided by the application relates to an Artificial Intelligence (AI) technology, wherein AI is a theory, method, technology and application system for simulating, extending and expanding human Intelligence, sensing environment, acquiring knowledge and using knowledge to obtain optimal results by using a digital computer or a machine controlled by the digital computer. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes image processing, image Recognition, image semantic understanding, image retrieval, Optical Character Recognition (OCR), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also includes common biometric technologies such as face Recognition and fingerprint Recognition.

Key technologies of Speech Technology (Speech Technology) are Automatic Speech Recognition (ASR) and Speech synthesis (TTS) technologies, as well as voiceprint Recognition. The computer can listen, see, speak and feel, and the development direction of the future human-computer interaction is provided, wherein the voice becomes one of the best viewed human-computer interaction modes in the future.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between a person and a computer using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The method specially studies how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers have intelligence, and is applied in various fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning. For example, in the present application, a machine learning model related to a method for ranking search results provided by the present application needs to be trained through machine learning, so that the trained machine learning model implements the method for ranking search results. The machine learning model to which the present application relates will be described in the following embodiments.

The automatic driving technology generally comprises technologies such as high-precision maps, environment perception, behavior decision, path planning, motion control and the like, and has wide application prospects.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The scheme provided by the embodiment of the application relates to technologies such as artificial intelligence natural language processing and machine learning, and is specifically explained by the following embodiments:

fig. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application. Referring to fig. 1, the method for eliminating background audio data provided by the present application may be implemented by a server. The server may be a background server corresponding to the search platform. The server has access to the internet and is capable of communicating with the user terminal. The user terminal can send the search terms input by the user to the server, and the server can perform query processing on the search terms after receiving the search terms sent by the user terminal to obtain search results matched with the search terms. The search results may then be ranked and the ranked search results sent to the user terminal.

Because a search word has different meanings in different fields, the specific meaning of the search word input by the user cannot be known clearly only according to the search word sent by the user terminal. For example, the names of some animals and fruits are registered as their own brands by enterprises. Thus, when the user uses the name of the animal or fruit registered as the brand as the search word, the search platform cannot know whether the user wants to search the related brand information or the related animal or fruit information. By the method for sequencing the search results, after the search results corresponding to the search terms are sequenced, the search results sequenced in the front can cover more meanings of the search terms. That is, information when the search word is in a plurality of domains, respectively, is included in the contents of the top-ranked search results.

Fig. 2 is a flowchart of a method for ranking search results according to an embodiment of the present application, and referring to fig. 2, the method includes:

step 201, obtaining a plurality of search results to be ordered corresponding to the target search term.

In implementation, after receiving any search term (which may be referred to as a target search term) sent from the user terminal to the user terminal, the server may perform query processing on the target search term to obtain a search result matched with the search term. The search results may include, among other things, documents, audio, images, etc. that match the target search terms. For example, after the search term "apple" is queried, the obtained search result may include a website of apple company, a nutritional value of apple, a planting method of apple, and the like.

In one case, all of the resulting search results that match the search term may be determined to be a plurality of search results to be ranked. In another case, since thousands of search results may be matched for one search term, a preset number of search results with the highest matching degree with the target search term among the search results matched with the target search term may be determined as a plurality of search results to be ranked corresponding to the target search term.

Step 202, determining the relevance characteristics of each search result and the target search terms.

Wherein the relevance feature is used to represent the relevance between each search result and the target search term. For the correlation characteristics of each search result and the target search terms, the correlation calculation algorithm can be used for calculating the search results. For example, the whole search results, Uniform Resource Locators (URLs), anchor documents, titles, and the like may be respectively calculated by a relevance calculation algorithm to obtain a plurality of relevance features of each search result and the target search term. The relevance calculation algorithm may be a weighting technique (TF-IDF) for information retrieval and data mining, BM25, language model decorrelation relevance feature (LMIR), web page ranking algorithm (PageRank), or the like. In addition, the number of in-links and the number of out-links in the search result can be used as the correlation characteristics of the search result and the target search term.

And step 203, determining diversity characteristics corresponding to each search result based on the diversity characteristic extraction model.

Wherein the diversity characteristic is used for indicating the number of corresponding meanings of the corresponding search result including the target search word.

For a plurality of search results to be ranked, the data features corresponding to each search result can be extracted through a preset data feature extraction model. For example, after the search term "apple" is queried, the obtained search result may include a website of apple company, a nutritional value of apple, a planting method of apple, and the like. The website of apple company comprises a display image of the product, a document for introducing the nutritional value of the apple, and a video for introducing the apple planting method. Inputting the display images of the apple company website including the products into a pre-trained image feature extraction model to obtain corresponding image features; inputting a document introducing the nutritional value of the apples into a pre-trained document feature extraction model to obtain corresponding document features; and inputting the video introducing the apple planting method into a pre-trained video feature extraction model to obtain corresponding video features.

After the data features corresponding to each search result are obtained, the target keywords and the data features corresponding to each search result can be input into a diversity feature extraction model trained in advance, and the diversity feature extraction model outputs the diversity features corresponding to each search result. The diversity feature extraction model may be a Graph Attention Network (GAT). The training process of the diversity feature extraction model is not described here.

For determining the diversity characteristic corresponding to each search result, the further processing may include:

determining the data characteristics corresponding to each search result; establishing a first full-connection graph corresponding to a plurality of search results to be ordered; inputting data characteristics corresponding to nodes connected with each edge in the first full-connected graph into a weight calculation unit in the diversity characteristic extraction model, determining a weight value corresponding to each edge, and obtaining a second full-connected graph after the weight value is determined; and inputting the second full-connection graph into a diversity feature extraction unit in the diversity feature extraction model to obtain diversity features corresponding to each search result.

WhereinThe number of nodes included in the established first full-connection graph is equal to the number of search results to be ordered, a plurality of nodes in the first full-connection graph correspond to the plurality of search results one by one, and each node is a data feature of the corresponding search result. In the first fully-connected graph, an edge is arranged between every two nodes, and the weight value of each edge can be used to represent the association degree between two nodes connected by the edge, that is, the association degree of the search results corresponding to the two nodes connected by the edge. After the first fully connected graph is established, a weight value corresponding to each edge in the first fully connected graph may be initialized, for example, the weight value of each edge is set to 1. As shown in fig. 3, fig. 3 is a schematic diagram of a first full-connection diagram provided in an embodiment of the present application. In fig. 3, the search results to be ranked include 5, each corresponding to a node d₀-d₄The weight value of the edge between every two nodes is 1.

After the first full-connection graph is obtained, the first full-connection graph can be input into the graph attention network, and diversity characteristics corresponding to each search result are output by the graph attention network. The graph attention network comprises a weight calculation unit and a diversity feature extraction unit, wherein the weight calculation unit can be a Multilayer Perceptron (MLP), the diversity feature extraction unit can be composed of at least one layer of graph neural network, the Multilayer Perceptron included in the graph attention network can be subsequently called as a first Multilayer Perceptron, and the number of layers of the graph neural network can be preset by technicians.

After the first full-connected graph is input into the attention network, the weight value of each edge can be updated by a first multi-layer perceptron in the attention network. For the weight value of each edge, the data characteristics of the search results corresponding to the two nodes connected by the corresponding edge may be input into the first multilayer perceptron, and the first multilayer perceptron outputs the weight value after updating the edge. After the updated weight value corresponding to each edge is determined, a second full-join graph can be obtained. The second full-connected graph is the first full-connected graph after the weighted value of the edge is updated. Therefore, if the weighted values of the edges corresponding to one node and other nodes are higher, the search result corresponding to the node may include more meanings of the search terms. After the second fully-connected graph is obtained, the second fully-connected graph can be input into the diversity feature extraction unit, and the diversity feature corresponding to each search result is output by the diversity feature extraction unit.

And step 204, determining a sorting indication value corresponding to each search result based on the correlation characteristic and the diversity characteristic corresponding to each search result.

After obtaining the correlation characteristic and diversity characteristic corresponding to each search result, the rank indication value of each search result may be calculated according to the correlation characteristic and diversity characteristic corresponding to each search result.

For each search result, inputting the relevance characteristics corresponding to the search result into a relevance score calculation module to obtain relevance scores corresponding to the relevance characteristics; determining data characteristics corresponding to each search result, inputting the data characteristics, diversity characteristics and target search words corresponding to the search results into a diversity score calculation module to obtain diversity scores corresponding to the diversity characteristics; and carrying out weighted summation on the relevance scores and the diversity scores to obtain a sorting indication value corresponding to the search results.

The relevance score calculating module can be a multilayer perceptron, the multilayer perceptron can be called a second multilayer perceptron subsequently, the diversity score calculating module can also be a multilayer perceptron, and the multilayer perceptron can be called a third multilayer perceptron subsequently. It should be noted that the first multi-layer sensor, the second multi-layer sensor and the third multi-layer sensor are different multi-layer sensors.

As shown in FIG. 4, for example, 18 relevance features r may be obtained for each search result₁-r₁₈18 correlation characteristics r corresponding to each search result₁-r₁₈And inputting the result into a second multi-layer sensing machine, and outputting the correlation score corresponding to the search result by the second multi-layer sensing machine. After obtaining a first full-connectivity graph corresponding to the search results to be ranked (not shown in fig. 4), the first full-connectivity graph may be input into the attentive network, and the attentive network outputs that each search result corresponds toThe diversity characteristics of (a). And then inputting the data characteristics, diversity characteristics and target search terms corresponding to each search result into a third multi-layer sensing machine, and outputting diversity scores corresponding to the search results by the third multi-layer sensing machine.

In fig. 4, q is a target search term, d is a search result, r is a correlation feature, L is the number of layers of the attention network, K is the number of heads of attention, MLP is a multi-layer perceptron, and the following is a corresponding calculation formula:

an all-1 adjacency matrix a is first constructed to represent a first fully-connected graph, in which,

n is the number of search results to be ranked,

representing the real number domain. Then can be used for learning

Dimensions of data features of the search results are transformed, for example, the embedded representation of the search results is transformed from the F (100) dimension to the F' (256) dimension to enhance expressive power of the model. Then, the weight of the corresponding edge of any two nodes is calculated according to the following formula.

Wherein, the superscript represents the number of layers of the neural network of the graph,

is a calculation formula of the attention score. d_iI-th node in the fully-connected graph representing a level 0 input, d_jThe jth node in the fully-connected graph representing the level 0 input. E⁽⁰⁾(d_i) Features corresponding to the ith node in a fully-connected graph representing a level 0 input, E⁽⁰⁾(d_j) Representing the characteristics corresponding to the jth node in the fully-connected graph of the input of the 0 th layer; e^(l)(d_i) Features corresponding to the ith node in a fully-connected graph representing the input at level I, E^(l)(d_j) And representing the characteristic corresponding to the jth node in the full-connection graph of the ith layer input.

The weight calculation formula is expressed as follows:

the MLP represents a first multi-layered perceptron, and the activation function of the first multi-layered perceptron may be a Linear rectification function (ReLU).

The ith node in the fully-connected graph representing the input at level i,

the jth node in the fully-connected graph representing the input at the l-th level.

After obtaining the corresponding weight of each edge, normalization processing may be performed on the corresponding weight of each edge as follows:

wherein the content of the first and second substances,

for normalizing the weight of the corresponding edge of the ith node and the jth node after the treatment,

for normalizing the weight of the corresponding edge of the jth node and the kth node before the processing,

the weights of the corresponding edges of the jth node and the ith node before normalization processing are obtained.

The number of nodes in the first fully-connected graph.

According to the above formula, the update of the input of each layer is finally obtained:

here, the

K is the number of heads in the multi-head attention mechanism, here taken as 4.

The ith node in the fully connected graph representing the l-th level input. Output from the last layer

The diversity characteristic corresponding to the ith node in the full-connection graph.

For each search result, after the relevance score and the diversity score corresponding to the search result are obtained, the ranking indication value corresponding to each search result can be calculated according to the relevance score and the diversity score corresponding to the search result. For example, a technician may preset a weight coefficient corresponding to each of the relevance score and the diversity score, and after obtaining the relevance score and the diversity score corresponding to the search result, may perform weighted summation on the relevance score and the diversity score according to the preset weight coefficient, thereby obtaining a ranking indication value corresponding to the search result. With continued reference to fig. 4, in one possible scenario, the calculation of the rank indication value may be implemented by the following equation:

S^rel(d_i)×λ+(1-λ)×S^div(d_i)＝S(d_i)

wherein, λ is a preset weight coefficient, S^rel(d_i) The relevance score corresponding to the search result output by the first multi-layer perceptron, S^div(d_i) And outputting diversity scores corresponding to the search results output by the second multi-layer perceptron. S (d)_i) The calculated ranking indication value.

Step 205, ranking the plurality of search results based on the ranking indication value corresponding to each search result.

And after the sequencing indication value corresponding to each search result is obtained, sequencing the plurality of search results according to the sequencing indication value corresponding to each search result. And then sending the sorted search results to the user terminal. For example, the plurality of search results may be sorted in order of high to low of the sorting indication value.

In this way, the user terminal can display the plurality of search results according to the corresponding sequence after receiving the plurality of search results after being sorted. In the embodiment of the application, the relevance characteristics corresponding to a plurality of search results to be ranked and the diversity characteristics used for expressing the corresponding meanings of the target search words are determined, and then the corresponding ranking indication value is determined according to the relevance characteristics and the diversity characteristics corresponding to each search result. Therefore, when the search results to be ranked are ranked according to the ranking indicated value corresponding to each search result, the relevance between the search results and the query words can be referred to, and the number of the meanings of the query words included in the search results is also referred to. In this way, in the search results after the ranking, the search results ranked at the top can include the meanings of more query terms, so that the probability of hitting the search intention of the user can be improved, and the search efficiency is further improved.

Fig. 5 is a method for training a diversity feature extraction model according to an embodiment of the present application, and referring to fig. 5, the method includes:

step 501, a reference search result sequence corresponding to the sample search term is obtained, and the reference search result sequence includes a plurality of search results matched with the sample search term.

Before training the diversity feature extraction model, a sample search term and a reference search result sequence corresponding to the sample search term can be preset. The reference search result sequence comprises a plurality of search results matched with the sample search words, the search results ranked at the front in the reference search result sequence cover the meanings of more sample search words, and the search results ranked at the back cover the meanings of less sample search words.

Step 502, generating a positive sample sequence and a negative sample sequence based on the reference search result sequence.

The sequence of each first search result in the positive sample sequence is the same as the sequence of each first search result in the reference search result sequence, and the sequence of each second search result in the negative sample sequence is different from the sequence of each second search result in the reference search result sequence.

For a sample search term, a positive sample sequence and a plurality of negative sample sequences may be generated according to a reference search result sequence corresponding to the sample search term. The plurality of search results included in the positive sample sequence are search results in the reference search result sequence, and the sequence of the included plurality of search results in the positive sample sequence is the same as the sequence of the plurality of search results in the reference search result sequence. The plurality of search results included in the negative sample sequence are search results in the reference search result sequence, and the sequence of the included plurality of search results in the positive sample sequence is different from that of the plurality of search results in the reference search result sequence.

Step 503, determining a first diversity score corresponding to the search result included in the positive sample sequence and determining a second diversity score corresponding to the search result included in the negative sample sequence based on the diversity extraction model.

For the positive sample sequence corresponding to the sample search term, data features corresponding to each first search result included in the positive sample sequence can be determined, a first full-link graph corresponding to each first search result is determined, then the first full-link graph is input into a diversity extraction model to be trained, and diversity features corresponding to each first search result are output by the diversity extraction model. After the diversity characteristic corresponding to each first search result is obtained, for the first search result, the sample search terms, the diversity characteristic corresponding to the first search result, and the data characteristic may be input into the second multi-tier sensor, and the second multi-tier sensor outputs the first diversity score corresponding to each first search result.

For the negative sample sequence corresponding to the sample search term, data characteristics corresponding to each second search result included in the negative sample sequence can be determined, a first full-link graph corresponding to each second search result is determined, then the first full-link graph is input into a diversity extraction model to be trained, and diversity characteristics corresponding to each second search result are output by the diversity extraction model. After the diversity characteristic corresponding to each second search result is obtained, for the second search result, the sample search term, the diversity characteristic corresponding to the second search result, and the data characteristic may be input into the second multi-tier sensor, and the second multi-tier sensor outputs a second diversity score corresponding to each second search result.

Step 504, determining a first diversity reference value corresponding to the search result included in the positive sample sequence, and determining a second diversity reference value corresponding to the search result included in the negative sample sequence.

The first diversity reference value is a diversity indication value corresponding to a search result included in the positive sample sequence calculated by the diversity evaluation algorithm. The second diversity reference value is a diversity indication value corresponding to a search result included in the negative sample sequence calculated by the diversity evaluation algorithm. The diversity evaluation algorithm may be an α -Normalized broken cumulative gain (NDCG).

And 505, training the diversity extraction model based on the first diversity score, the second diversity score, the first diversity reference value and the second diversity reference value.

After obtaining the first diversity score, the second diversity score, the first diversity reference value and the second diversity reference value, the corresponding loss value can be determined according to the first diversity score, the second diversity score, the first diversity reference value and the second diversity reference value, and then the diversity extraction model is trained according to the loss value, wherein the corresponding loss function is as follows:

wherein the content of the first and second substances,

representing a set of all sample search terms, q representing a sample search term, S_qIs a training sample set of sample search terms q, s is a training sample, r₁Is a positive sequence sample, r₂Is a negative sequence sample. Δ M ═ M (r)₁)-M(r₂) Represents the weight of the training sample, where M (r)₁) For the calculated first diversity reference value, M (r)₂) For the calculated second diversity reference value, when Δ M > 0, y _s1, otherwise y_s＝0，

Is the first diversity score that is calculated,

is the second diversity score calculated by the model,

and

respectively represent 1-y_sAnd 1-P (r)₁，r₂)。

It should be noted that, during training, the diversity extraction model and the multilayer perceptron can be trained together through a gradient descent algorithm, and when the overall model meets the training completion condition, the training of the diversity extraction model can be determined to be completed.

By the aid of the method for training the diversity characteristic extraction model, when the diversity characteristic extraction model is trained, meaning included by each search result does not need to be used as a training label. Therefore, when the training sample is constructed, the meaning included by each search result does not need to be determined, the construction efficiency of the training sample can be improved, and the training efficiency of training the diversity characteristic extraction model is improved.

In the embodiment of the application, the relevance characteristics corresponding to a plurality of search results to be ranked and the diversity characteristics used for indicating the corresponding meanings of the target search words are determined, and then the corresponding ranking indication value is determined according to the relevance characteristics and the diversity characteristics corresponding to each search result. Therefore, when the search results to be ranked are ranked according to the ranking indicated value corresponding to each search result, the relevance between the search results and the query words can be referred to, and the number of the meanings of the query words included in the search results is also referred to. In this way, in the search results after the ranking, the search results ranked at the top can include the meanings of more query terms, so that the probability of hitting the search intention of the user can be improved, and the search efficiency is further improved.

All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.

Fig. 6 is an apparatus for ranking search results according to an embodiment of the present application, where the apparatus may be a server in the foregoing embodiment, and the apparatus includes:

an obtaining module 610, configured to obtain multiple search results to be ranked corresponding to a target search term;

a determining module 620, configured to determine a relevance feature of each search result to the target search term; determining diversity characteristics corresponding to each search result based on a diversity characteristic extraction model, wherein the diversity characteristics are used for indicating the number of corresponding meanings of the corresponding search results including the target search words; determining a ranking indication value corresponding to each search result based on the correlation characteristic and the diversity characteristic corresponding to each search result;

a sorting module 630, configured to sort the plurality of search results to be sorted based on the sorting indication value corresponding to each search result.

Optionally, the determining module 620 is configured to:

determining the data characteristics corresponding to each search result;

Optionally, the determining module 620 is configured to:

determining the data characteristics corresponding to each search result;

Optionally, the apparatus further comprises a training module, configured to:

Optionally, the sorting module 630 is configured to:

and sorting the plurality of search results according to the order of the sorting indication values from high to low.

It should be noted that: in the apparatus for sorting search results provided in the foregoing embodiment, when sorting search results, only the division of each function module is illustrated, and in practical applications, the function distribution may be completed by different function modules according to needs, that is, the internal structure of the device is divided into different function modules, so as to complete all or part of the functions described above. In addition, the apparatus for ranking search results and the method for ranking search results provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiments and are not described herein again.

Fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application, where the computer device may be a server according to the foregoing embodiments, and the computer device 700 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (cpus) 701 and one or more memories 702, where the memory 702 stores at least one instruction, and the at least one instruction is loaded and executed by the processors 701 to implement the methods according to the foregoing method embodiments. Certainly, the server may further have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the computer device 700 may further include other components for implementing device functions, which are not described herein again.

In an exemplary embodiment, a computer-readable storage medium, such as a memory, including instructions executable by a processor in a terminal to perform the method of ranking search results in the above-described embodiments is also provided. The computer readable storage medium may be non-transitory. For example, the computer-readable storage medium may be a read-only memory (ROM), a Random Access Memory (RAM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, which includes at least one instruction that is loaded and executed by a processor to implement the method for ranking search results in the above embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

It should be noted that information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, displayed data, etc.), and signals (including but not limited to signals transmitted between a user terminal and other equipment, etc.) referred to in the present application are authorized by a user or are sufficiently authorized by various parties, and the collection, use, and processing of the relevant data need to comply with relevant laws and regulations and standards in relevant countries and regions. For example, data referred to in this application (e.g., targeted search terms, search results, etc.) is obtained with sufficient authorization.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of ranking search results, the method comprising:

and sequencing the plurality of search results to be sequenced based on the sequencing indication value corresponding to each search result.

2. The method of claim 1, wherein determining diversity features corresponding to each search result based on the diversity feature extraction model comprises:

determining the data characteristics corresponding to each search result;

3. The method of claim 1, wherein the determining the ranking indication value corresponding to each search result based on the relevance feature and the diversity feature corresponding to each search result comprises:

determining the data characteristics corresponding to each search result;

4. The method of claim 1, wherein before determining the diversity feature corresponding to each search result based on the diversity feature extraction model, the method further comprises:

and training the diversity extraction model based on the first diversity score, the second diversity score, the first diversity reference value and the second diversity reference value.

5. The method according to claim 1, wherein the sorting the plurality of search results to be sorted based on the sorting indication value corresponding to each search result comprises:

6. An apparatus for ranking search results, the apparatus comprising:

7. The apparatus of claim 6, wherein the determining module is configured to:

determining the data characteristics corresponding to each search result;

inputting data characteristics corresponding to nodes connected with each edge in the first full-connection graph into a weight calculation unit in the diversity characteristic extraction model, determining a weight value corresponding to each edge, and obtaining a second full-connection graph after the weight value is determined;

8. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction that is loaded and executed by the processor to perform operations performed by the method of ranking search results of any of claims 1 to 5.

9. A computer-readable storage medium having stored therein at least one instruction which is loaded and executed by a processor to perform operations performed by a method of ranking search results according to any of claims 1 to 5.

10. A computer program product comprising at least one instruction loaded and executed by a processor to perform operations performed by a method of ranking search results according to any of claims 1 to 5.