CN112182155B - Search result diversification method based on generated type countermeasure network - Google Patents

Search result diversification method based on generated type countermeasure network Download PDF

Info

Publication number
CN112182155B
CN112182155B CN202011024084.4A CN202011024084A CN112182155B CN 112182155 B CN112182155 B CN 112182155B CN 202011024084 A CN202011024084 A CN 202011024084A CN 112182155 B CN112182155 B CN 112182155B
Authority
CN
China
Prior art keywords
document
scoring
generator
diversified
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011024084.4A
Other languages
Chinese (zh)
Other versions
CN112182155A (en
Inventor
窦志成
刘炯楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Renmin University of China
Original Assignee
Renmin University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Renmin University of China filed Critical Renmin University of China
Priority to CN202011024084.4A priority Critical patent/CN112182155B/en
Publication of CN112182155A publication Critical patent/CN112182155A/en
Application granted granted Critical
Publication of CN112182155B publication Critical patent/CN112182155B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application realizes a search result diversified training method based on a generated type countermeasure network by a method in the artificial intelligence field, after query words are given, a corresponding candidate document set is defined, a sampler, a generator and a judging unit are sequentially arranged on a logic path, and a means of diversified scoring functions is arranged in the judging unit and the generator, so that training is performed by a positive feedback process; and introducing a generative countermeasure network, and combining an explicit model and an implicit model through the generative countermeasure network, so that a final generator can generate better diversified document sequences through the means.

Description

Search result diversification method based on generated type countermeasure network
Technical Field
The application relates to the field of artificial intelligence, in particular to a search result diversification method based on a generated type countermeasure network.
Background
Search result diversification is an effective method for solving the problem that users propose fuzzy query, most of the currently mainstream diversification algorithms are supervised methods, and a high-quality data set is required for training a search result diversification model. The primary goal of search result diversification is to have the ranked list returned by the search engine cover as much as possible all of the sub-topics of the user's query. A series of search result diversification algorithms have been proposed by researchers. The main flow of these algorithms is: when a user proposes a query word, according to the diversification scoring function, continuously selecting the best diversification document under the current selected document sequence and adding the selected document sequence, and continuously repeating the process to guide the document sequence to be long enough. The models can be broadly divided into two types: implicit models and explicit models. The implicit model focuses on how novel the document is. Dissimilarity between scored documents and selected documents is considered in the diversified scoring function of the implicit model.
The existing research results show that the supervised learning method is better than the unsupervised method. But supervised learning requires high quality data samples for training. However, since there are a large number of documents in the training set, there are few documents related to each sub-topic, resulting in a high quality data sample that is difficult to obtain. The existing supervision method solves the problem through handwriting rules, but has shortcomings. [2] The first 20 documents in the ideal ranking, while of higher quality, are fewer in number resulting in a model that may be under fitted. [3] The positive example sample and the negative example sample are selected according to the evaluation index adopted in the process, and the super parameter which depends on the evaluation index range is compared. The scarcity of high quality training data samples can lead to insufficient or offset training, affecting the final effect. Meanwhile, the existing models can be basically divided into an explicit model and an implicit model respectively, two ideas are not combined, and all information is not utilized, and if the explicit model and the implicit model are combined in a certain way, the diversified effect of the search result can be possibly improved.
Disclosure of Invention
Therefore, the application provides a training method based on diversification of search results of a generated countermeasure network, which comprises the following steps:
in the training process, after a query word in a training library is given, a corresponding candidate document set is defined, a sampler, a generator and a judging device unit are sequentially arranged through a logic path, a diversified scoring function of the judging device is arranged in the judging device, a means of the diversified scoring function of the generator is arranged in the generator, and the training is carried out through a positive feedback process; and, the diversified scoring function introduces a generative countermeasure network, and simultaneously combines an explicit model and an implicit model through the generative countermeasure network. Finally, in the use process, after the user puts forward the query word, the generator performs diversified rearrangement on the search result and returns the diversified search result, and finally the search result is obtained;
specifically, for query word q in training, its sub-topic { i } is determined 1 ,i 2 ,…,i k -the corresponding candidate document set is d= { D } 1 ,d 2 ,…,d k The sampler firstly selects the document from the document set D, rearranges the document, and inputs the rearranged sequence S into the generator as prefix dataThe generator takes S as a selected document set, several document sets D ' with highest scores selected according to a diversified scoring function are taken as negative example samples to be sent to the determiner, the positive example samples are documents D selected according to the maximized diversified scoring standard, and the determiner classifies the negative example document sets D ' and the positive example documents D and gives feedback to the generator after receiving the negative example document sets D ' and the positive example documents D;
this process is formulated as:
g is a generator, D is a determiner, θ is a generator parameter, φ is a determiner parameter, D φ Given by a sigmod function, a sample distribution p is generated θ Given by the softmax function;
wherein f φ The scoring function is diversified in the determiner. f (f) θ The formulas of the optimization generator and the determiner are as follows:
log(1+exp(f φ (d|q, S)) is the feedback of the determiner to the generator.
The implementation mode of the diversified scoring function of the determiner is specifically as follows: defining the scoring document sent by the generator as d t The query is q, and the sub topics are I respectively q ={i 1 ,i 2 ,…,i K The selected document sequence is s= { d } 1 ,d 2 ,…d t-1 Searching query words and sub topics by using a traditional search model, selecting documents with the front order to link into a pseudo document, embedding the document, the pseudo document corresponding to the query and the pseudo document corresponding to the sub topics by using a doc2vec model, and generating a vector e after embedding the scoring document d For the vector e after the query is embedded q A vector e embedded in the sub-topics i And further modeling a relevance vector x of the scoring document and the query d,q The relevance vector x of the scoring document and the sub-topics d,i After feature extraction, a diversified scoring function of the determiner is obtained:
the calculation process of the sub-topic distribution condition A (i|S) under the scoring document is that: firstly, using a recurrent neural network to synthesize the selected scoring document:
LSTM is a neuron function of long-term and short-term memory network, and after passing through a layer of recurrent neural network, a distributed representation h of the selected document is obtained t-1 If the whole information of the past document is contained, the method for calculating the sub-topic distribution is as follows:
for a pair ofFurther considering information about the relevance between the scoring document and the sub-topics:
and finally obtaining a complete diversified scoring function calculation method of the determiner.
The traditional retrieval model is a BM25 model, and the features extracted by the features comprise a TF-IDF model, a BM25 model, an LMIR model, a PageRank score, a webpage input degree and a webpage output degree.
The diversified scoring function of the generator is the diversified scoring function of the implicit model R-LTR, and the implementation mode is as follows: based on the extraction of the scoring documents and the query consideration features, consider a correlation vector between the scoring documents and the query, and a relationship vector R between the scoring documents ij Modeling a relationship vector using four dimensions, the dimensions including: sub-topic diversity, text diversity, title diversity, anchor text diversity;
the diversified scoring function of the implicit model R-LTR is specifically as follows:
where R is ijk For dissimilarity between documents considered from different dimensions, we apply here to d i The maximum dissimilarity with the previously selected document represents his novelty.And->Are trainable parameters.
The implementation mode of the sampler is as follows: designing random samples to simulate a generator in generating the scoring document d t The random sampling process is: directly selecting k=10 documents in the scoring document set, and rearranging according to the maximized diversified scoring index alpha-NDCG, wherein the obtained sequence S is used as the input of a generator.
The specific method for the generator to carry out diversified rearrangement on the search results comprises the following steps: firstly, initializing the sequence S of the scoring document selected to be empty; then, selecting and obtaining the highest diversified scoring function f θ A document d of score; if S is long enough, the process is exited and a diversified search result is returned; otherwise, adding d into S, and returning to the previous step.
The application has the technical effects that:
(1) The generated countermeasure network is utilized, so that the problem that high-quality data samples are difficult to obtain is solved to a certain extent; (2) The explicit model and the implicit model are combined, and the coverage of search results is improved by utilizing information in different dimensions; (3) In order to provide data to the generators in the generative antagonism network, sampling algorithms are designed that compromise both scale and quality.
Drawings
Diversified scoring function model for the determiner of FIG. 1
Detailed Description
The following is a preferred embodiment of the present application and a technical solution of the present application is further described with reference to the accompanying drawings, but the present application is not limited to this embodiment.
In order to achieve the above object, the present application provides a method for diversifying search results based on a generative countermeasure network.
Considering the prior art, the search result diversification is an effective method for solving the problem that users propose fuzzy query, most of the currently mainstream diversification algorithms are supervised methods, and the current diversification algorithms can be divided into an explicit model and an implicit model according to different utilized information and different optimized targets. The main flow of these algorithms is: when a user proposes a query word, selecting the best diversified document under the selected document sequence according to the diversified scoring function, adding the best diversified document into the document sequence, and continuously expanding the sequence until the sequence is long enough. The diversity scoring function in the supervised method requires high quality data set training, but how to select the data set is a challenge for the current diversity algorithm because of the large data volume and the small number of documents related to the sub-topics. The present application therefore introduces a generative countermeasure network to the training process of diversification of search results to generate data to replace handwritten rules. The explicit model and the implicit model are combined through the generated countermeasure network, so that the diversity effect of the search results is improved.
Search result diversification framework based on generated type countermeasure network
Most of the existing models for diversification of search results are based on a supervised method, and the supervised method requires high quality data for training. The model of the application introduces the generated countermeasure network into the diversity of search results, directly optimizes the sub-topic coverage in the negative example sample by using an explicit model in the generator, directly compares dissimilarities among documents by using an implicit model in the determiner, and can provide finer information for the generator for optimization. In addition, since it is difficult for the generator itself to generate negative examples, the present application requires one sampler to generate prefix data for the generator.
Assume that for query q, its sub-topics are { i } 1 ,i 2 ,…,i k The corresponding candidate document set D is { D } 1 ,d 2 ,…,d k }. The sampler firstly selects the document from the document set D, rearranges the document, and inputs the rearranged sequence S into the generator as prefix data. The generator takes S as a selected document set, and several document sets D' with highest scores selected according to a diversified scoring function are taken as negative examples and fed to the determiner. The positive example sample is the document d selected with the maximized diversity score criteria. After receiving the negative example document set D' and the positive example document D, the determiner classifies them and gives feedback to the generator, and trains with a positive feedback process.
The entire training process can be formulated with the following formula:
in the formula, G is a generator, D is a determiner, theta is a generator parameter, phi is a determiner parameter, D φ Given by a sigmod function, a sample distribution p is generated θ The softmax function is given.
Wherein f φ The scoring function is diversified in the determiner. f (f) θ The scoring function is diversified in the determiner. With the above formulas, the present application can easily give formulas for optimizing the generator and the determiner.
Since the generated challenge network is difficult to calculate gradients in discrete domains, the present application employs a poliygradodient approach in reinforcement learning, by sampling from a negative example sample set, where log (1+exp (f) φ (d|q, S)) can be considered as feedback from the determiner to the generator, which includes some information not considered by the generator, and helps to promote the diversity effect of the search results of the generator.
Diversified scoring function used by determiner
As an important component of the generative antagonism network, the diversified scoring functions in the decider directly determine whether the decider can effectively classify the positive example document and the negative example document. In the model of the application, a diversified scoring function of the explicit model DSSA is selected. The reason is that the explicit model uses external information, which can provide finer information to the generator than the implicit method, thereby forming positive feedback.
The present application assumes that the current scoring document is d t The query is q, and the sub topics are I respectively q ={i 1 ,i 2 ,…,i K The selected document sequence is s= { d } 1 ,d 2 ,…d t-1 }。
Considering the extraction of characteristics, firstly, the application needs to embed documents and queries respectively with sub-topics, and the generated vectors are e respectively d ,e q ,e i Considering that the query and the sub-topics are actually shorter and only consist of a few words, the application uses the traditional retrieval model such as BM25 to retrieve the query words and the sub-topics, selects the documents with the front order to link into a pseudo document, then embeds the document, the pseudo document corresponding to the query and the pseudo document corresponding to the sub-topics through the doc2vec model to obtain e d ,e q ,e i . But using only embedded documents may not be accurate, so the present application also models documents and queries, direct relevance vectors for documents and sub-topics, x respectively d,q ,x d,i The characteristics employed are shown in the following table:
name of the name Description of the application Length of
TF-IDF TF-IDF model 5
BM25 BM25 model 5
LMIR LMIR model 5
PAGERANK PageRank score 1
Degree of penetration Webpage income degree 1
Degree of delivery Web page out-degree 1
After feature extraction, the form of the diversified scoring function is given:
it can be seen that S rel (d t Q) and S sub (d t ,i k ) The relevance of documents to queries and sub-topics is scored separately. The most critical part of the whole model is A (i|S), namely the distribution of sub-topics under the currently selected documents. Firstly, considering that the selected document sequence S contains sequence information, the application firstly utilizes a recurrent neural network to synthesize the selected document, and a specific formula is as follows:
LSTM, a neuron function of a long-term and short-term memory network, has a good effect in all recurrent neural networks, and can obtain a representation h of a selected document after passing through a layer of recurrent neural network t-1 Then the application has a relatively simple method of calculating the sub-topic distribution:
may be used as part of computing features of the sub-topic distribution, but since the application is used in extracting this part of featuresAt the time of symptom, the embedded vector is mainly used, which may not be accurate enough, so the application considers the related information between the document and the sub-topics:
it can be seen that the final model relates the embedded representation of the document itself to the relevance vector of the document to the query, and the relevance vector of the document to the sub-topic.
Diversified scoring function used by generator
As an important component in the generative antagonism network, the diversity scoring function in the generator directly determines the quality of the documents generated by the generator, and thus the behavior of the model. In the model of the present application, the present application selects a diversified scoring function using an implicit model R-LTR in the generator. The implicit model has the advantages that the implicit model has fewer relative parameters and is easy to train, meanwhile, the implicit model directly uses the document to extract the characteristics, and external sub-topic information is not needed.
The present application assumes that the current scoring document is d t The query is q, and the selected document sequence is s= { d 1 ,d 2 ,…d t-1 }。
Considering feature extraction, similar to the explicit method, the application also considers the relevance vector between the document and the query. At the same time, the application also relates to the relation vector R between the documents ij Modeling was performed. Considering that when humans compare documents, multiple pieces of information in the documents are often extracted for comparison, such as topics, first sentence of each segment, etc., the present application employs four dimensions to model a relationship vector. The following table shows:
name of the name Description of the application
Sub-topic diversity Euclidean distance on SVD model
Text diversity Cosine similarity of text vectors
Title diversity Cosine similarity of heading vectors
Anchor text diversity Cosine similarity of anchor text vectors
After feature extraction, the present application gives a scoring function for the implicit model R-LTR:
in the model of R-LTR, the degree of novelty of the scored document relative to the selected document is ultimately obtained by comparison of the document to a plurality of features between the documents. Because of the novelty of directly considering the document, the generator can give some information which cannot be considered by the model considering the coverage of the sub-topics, and the information can help the generator to further optimize the negative example document generated by the generator, give feedback to the determiner, and continuously give positive feedback to the determiner. Thereby promoting diversification of search results.
Sampling device
The sampler is used as a component for providing the selected document sequence S for the generator, the quality of S plays a certain role on the quality of the positive example document and the negative example document, and the selection of S is a part of the whole sampling, so the application designs an algorithm which has both quality and quantity.
Considering generator non-idealities, the present application designs random sampling to simulate the generator in generating d t The ideal ordering case has not been generated before. The method has the advantages that the thought of random sampling is relatively simple, k documents in the document set are directly selected, rearranged according to the maximized diversified scoring index, and then used as an S as input of a generator. The sampling method also has some problems, and the quality of S can not be ensured due to direct random selection.
Through the test of the application, the combination of the method can generate a better sampling effect, and finally the model of the application also adopts the method for sampling.
Diversification rearrangement of search results
Firstly, the application firstly introduces the training process of the search result diversification algorithm based on the generated countermeasure network. Considering that if the method of generating an countermeasure network is adopted for direct cold start, offset on training can be caused, the method is simple by firstly pre-training two diversified scoring functions of a generator and a determiner before adopting the training method, the method adopts an optimization method in R-LTR, a sequence composed of the first 20 documents in ideal ordering is selected, and the generator and the determiner are respectively optimized by taking the sequence as input according to a maximum likelihood method. After model optimization, reheat starts training of the generated type countermeasure network proposed by the application.
In both the previous MLE and the generative antagonism network, the application gradually optimizes the model through an Adam Optimezer optimizer, and selects a final generator as a search result diversification model.
The way of using the model to diversify the rearrangement of search results is as follows:
firstly, initializing a selected document sequence S to be empty;
secondly, selecting a document d for obtaining the score of the highest diversified scoring function f;
finally, if S is long enough, the process is exited. Otherwise, adding d into S, and returning to the previous step
Through the mode, the method and the device can return a diversified search result aiming at one query of the user.

Claims (6)

1. A search result diversification method based on a generated type countermeasure network is characterized in that: in the training process, after a query word in a training library is given, a corresponding candidate document set is defined, a sampler, a generator and a judging device unit are sequentially arranged through a logic path, a diversified scoring function of the judging device is arranged in the judging device, a means of the diversified scoring function of the generator is arranged in the generator, and the training is carried out through a positive feedback process; in addition, a generating type countermeasure network is introduced into the diversified scoring function, meanwhile, an explicit model and an implicit model are combined through the generating type countermeasure network, finally, in the use process, after a user puts forward a query word, the generator performs diversified rearrangement on search results and returns diversified search results, and finally, the search results are obtained;
specifically, for query word q in training, its sub-topic { i } is determined 1 ,i 2 ,…,i k -the corresponding candidate document set is d= { D } 1 ,d 2 ,…,d k Firstly, selecting a document from a document set D by the sampler, rearranging the document, inputting a sequence S obtained after rearranging into the generator as prefix data, taking the S as the selected document set, and taking a plurality of document sets D' with highest scores selected according to a diversified scoring function as negative example samples to be fed into the determiner by the generatorThe positive example sample is a document D selected by the maximized diversified scoring standard, and the determiner classifies the negative example document set D' and the positive example document D after receiving the same and gives feedback to the generator;
this process is formulated as:
g is a generator, D is a determiner, θ is a generator parameter, φ is a determiner parameter, D φ Given by a sigmod function, a sample distribution p is generated θ Given by the softmax function;
wherein f φ For diversifying scoring functions in a decider, f θ The formulas of the optimization generator and the determiner are as follows:
log(1+exp(f φ (d|q, S)) is the feedback of the determiner to the generator.
2. The method for diversifying search results based on a generative antagonism network of claim 1, wherein: implementation method of diversified scoring function of determinerThe formula is specifically as follows: defining the scoring document sent by the generator as d t The query is q, and the sub topics are I respectively q ={i 1 ,i 2 ,…,i K The selected document sequence is s= { d } 1 ,d 2 ,…d t-1 Searching query words and sub topics by using a traditional search model, selecting documents with the front order to link into a pseudo document, embedding the document, the pseudo document corresponding to the query and the pseudo document corresponding to the sub topics by using a doc2vec model, and generating a vector e after embedding the scoring document d For the vector e after the query is embedded q A vector e embedded in the sub-topics i And further modeling a relevance vector x of the scoring document and the query d,q The relevance vector x of the scoring document and the sub-topics d,i After feature extraction, a diversified scoring function of the determiner is obtained:
the calculation process of the sub-topic distribution condition A (i|S) under the scoring document is that: firstly, using a recurrent neural network to synthesize the selected scoring document:
LSTM is a neuron function of the long-term and short-term memory network, and after passing through a layer of recurrent neural network, the distributed representation h of the selected document is obtained t-1 If the whole information of the past document is contained, the method for calculating the sub-topic distribution is as follows:
for a pair ofFurther considering information about the relevance between the scoring document and the sub-topics:
and finally obtaining a complete diversified scoring function calculation method of the determiner.
3. A method of diversifying search results based on a generative antagonism network as recited in claim 2, wherein: the traditional retrieval model is a BM25 model, and the features extracted by the features comprise a TF-IDF model, a BM25 model, an LMIR model, a PageRank score, a webpage input degree and a webpage output degree.
4. A method of diversifying search results based on a generative antagonism network as claimed in claim 3, wherein: the diversified scoring function of the generator is the diversified scoring function of the implicit model R-LTR, and the implementation mode is as follows: considering the scoring based on the extraction of the scoring document and the query consideration featuresA correlation vector between the scored document and the query, a relation vector R between the scored documents ij Modeling a relationship vector using four dimensions, the dimensions including: sub-topic diversity, text diversity, title diversity, anchor text diversity;
the diversified scoring function of the implicit model R-LTR is specifically as follows:
where R is ijk For dissimilarity between documents considered from different dimensions, pair d i The maximum dissimilarity with the previously selected document represents his novelty,and->Are trainable parameters.
5. The method for diversifying search results based on a generative antagonism network of claim 4, wherein: the implementation mode of the sampler is as follows: designing random samples to simulate a generator in generating the scoring document d t The random sampling process is: directly selecting k=10 documents in the scoring document set, and rearranging according to the maximized diversified scoring index alpha-NDCG, wherein the obtained sequence S is used as the input of a generator.
6. The method for diversifying search results based on a generative antagonism network of claim 5, wherein: the generator performs multiple search resultsThe specific method for the sampling rearrangement comprises the following steps: firstly, initializing the sequence S of the scoring document selected to be empty; then, a diversified scoring function f used for obtaining the highest generator is selected θ A document d of score; if S is long enough, the process is exited and a diversified search result is returned; otherwise, adding d into S, and returning to the previous step.
CN202011024084.4A 2020-09-25 2020-09-25 Search result diversification method based on generated type countermeasure network Active CN112182155B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011024084.4A CN112182155B (en) 2020-09-25 2020-09-25 Search result diversification method based on generated type countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011024084.4A CN112182155B (en) 2020-09-25 2020-09-25 Search result diversification method based on generated type countermeasure network

Publications (2)

Publication Number Publication Date
CN112182155A CN112182155A (en) 2021-01-05
CN112182155B true CN112182155B (en) 2023-08-18

Family

ID=73945377

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011024084.4A Active CN112182155B (en) 2020-09-25 2020-09-25 Search result diversification method based on generated type countermeasure network

Country Status (1)

Country Link
CN (1) CN112182155B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114003696B (en) * 2021-11-05 2024-03-26 中国人民大学 Search result diversification method and system combining explicit features and implicit features
CN116010609B (en) * 2023-03-23 2023-06-09 山东中翰软件有限公司 Material data classifying method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488195A (en) * 2015-12-07 2016-04-13 中国人民大学 Search result diversification ordering method based on hierarchical structure subtopic
CN108171266A (en) * 2017-12-25 2018-06-15 中国矿业大学 A kind of learning method of multiple target depth convolution production confrontation network model
CN111159454A (en) * 2019-12-30 2020-05-15 浙江大学 Picture description generation method and system based on Actor-Critic generation type countermeasure network
CN111295669A (en) * 2017-06-16 2020-06-16 马克波尔公司 Image processing system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11205103B2 (en) * 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488195A (en) * 2015-12-07 2016-04-13 中国人民大学 Search result diversification ordering method based on hierarchical structure subtopic
CN111295669A (en) * 2017-06-16 2020-06-16 马克波尔公司 Image processing system
CN108171266A (en) * 2017-12-25 2018-06-15 中国矿业大学 A kind of learning method of multiple target depth convolution production confrontation network model
CN111159454A (en) * 2019-12-30 2020-05-15 浙江大学 Picture description generation method and system based on Actor-Critic generation type countermeasure network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
窦志成 等.搜索结果多样化研究综述.《计算机学报》.2019,2591-2613. *

Also Published As

Publication number Publication date
CN112182155A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN111611361B (en) Intelligent reading, understanding, question answering system of extraction type machine
CN109829104B (en) Semantic similarity based pseudo-correlation feedback model information retrieval method and system
CN109635083B (en) Document retrieval method for searching topic type query in TED (tele) lecture
CN105393265A (en) Active featuring in computer-human interactive learning
CN116134432A (en) System and method for providing answers to queries
CN109697289A (en) It is a kind of improved for naming the Active Learning Method of Entity recognition
CN112182155B (en) Search result diversification method based on generated type countermeasure network
CN110083696A (en) Global quotation recommended method, recommender system based on meta structure technology
CN112307182B (en) Question-answering system-based pseudo-correlation feedback extended query method
CN112527993B (en) Cross-media hierarchical deep video question-answer reasoning framework
CN110888991A (en) Sectional semantic annotation method in weak annotation environment
Cheng et al. Semantic pre-alignment and ranking learning with unified framework for cross-modal retrieval
CN113934835B (en) Retrieval type reply dialogue method and system combining keywords and semantic understanding representation
Li et al. Coltr: Semi-supervised learning to rank with co-training and over-parameterization for web search
CN111581365A (en) Predicate extraction method
Perdana et al. Instance-based deep transfer learning on cross-domain image captioning
CN116089592A (en) Method, device and storage medium for realizing open-domain multi-answer question and answer
CN112507097B (en) Method for improving generalization capability of question-answering system
CN111723179B (en) Feedback model information retrieval method, system and medium based on conceptual diagram
Wang et al. Comparison between calculation methods for semantic text similarity based on siamese networks
CN114238661A (en) Text discrimination sample detection generation system and method based on interpretable model
Du et al. Hierarchical multi-layer transfer learning model for biomedical question answering
Zhang et al. Microblog Text Classification System Based on TextCNN and LSA Model
Wei et al. Coached active learning for interactive video search
Bleiweiss A hierarchical book representation of word embeddings for effective semantic clustering and search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant