CN115526338A - Reinforced learning model construction method for information retrieval - Google Patents
Reinforced learning model construction method for information retrieval Download PDFInfo
- Publication number
- CN115526338A CN115526338A CN202211287916.0A CN202211287916A CN115526338A CN 115526338 A CN115526338 A CN 115526338A CN 202211287916 A CN202211287916 A CN 202211287916A CN 115526338 A CN115526338 A CN 115526338A
- Authority
- CN
- China
- Prior art keywords
- candidate
- word
- candidate document
- action
- documents
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3347—Query execution using vector based model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application relates to the field of information retrieval, in particular to a reinforcement learning model construction method for information retrieval, which comprises the following steps: s100, acquiring a feature code Q of query information Q and a feature code of each candidate document in a candidate document set; s200, constructing an MDP model, wherein: initial state s of the MDP model 0 =[0,q]The agent of the MDP model selects the action a in the initial state 0 Has a probability distribution of pi (a) 0 |s 0 (ii) a w); and S300, performing model training on the MDP model according to the long-term reward. The invention improves the accuracy of document sorting during information retrieval.
Description
Technical Field
The invention relates to the field of information retrieval, in particular to a reinforcement learning model construction method for information retrieval.
Background
With the rapid development of the internet, learning to rank (L2R) technology is also getting more and more attention, which is one of the common tasks of machine Learning. When information is retrieved, a query target is given, and a result which best meets the requirement needs to be calculated and returned. The use of a Markov Decision Process (MDP) to generate document rankings is disclosed in the prior art, which alleviates the problem of ranking complexity to some extent. However, the enhanced learning model based on the MDP in the prior art is mostly based on a first-order markov decision process, which causes the position of each document to be related to the previous document only and not to be related to the previous documents, thereby affecting the accuracy of document ranking in information retrieval. How to improve the accuracy of document ordering during information retrieval is an urgent problem to be solved.
Disclosure of Invention
The invention aims to provide a reinforcement learning model construction method for information retrieval, so as to improve the accuracy of document sequencing during information retrieval.
According to the invention, a reinforcement learning model construction method for information retrieval is provided, which comprises the following steps:
s100, acquiring the feature code Q of the query information Q and the feature codes of the candidate documents in the candidate document set.
S200, constructing an MDP model, wherein: initial state s of the MDP model 0 =[0,q](ii) a Agent of MDP model selects action a in initial state 0 Probability distribution ofThe action a 0 For selecting candidate documents from the set of candidate documentsAs candidate documentsW is a preset initialized trainable parameter,x m(a) candidate document d selected from the set of candidate documents for action a m(a) Characteristic code of (1), A(s) 0 ) Is in an initial state s 0 Set of actions selectable from ") H A conjugate transpose matrix of (); initial reward function of MDP model For preset candidate documentsThe relevance tag of (a); the state s corresponding to the intelligent agent of the MDP model in the t step t Lower selection action a t Probability distribution of Is an action a t Candidate documents selected from the set of candidate documentsCharacteristic code of (1), A(s) t ) For the state s corresponding to the t-th step t A set of next selectable actions; rho t A quantum probability distribution operator for the first n-1 selected candidate documents containing the agent, n being a predetermined value,decision reward function of MDP model Is a predetermined action a f Candidate documents selected from the set of candidate documentsThe relevance tag of (1).
S300, performing model training on the MDP model according to the long-term reward; wherein the long-term rewardλ is a predetermined discount factor, r k Feature coding of kth candidate document returned for MDP modelK ranges from 1 to M, M is the number of candidate documents included in the candidate document set, and E is an expectation.
Compared with the prior art, the reinforcement learning model construction method for information retrieval has obvious beneficial effects, can achieve considerable technical progress and practicability, has industrial wide utilization value and at least has the following beneficial effects:
the method considers the sorting dependency relationship among a plurality of candidate documents and expands the first-order Markov decision process to the n-order Markov decision process.
Furthermore, the method constructs the characteristics of the query and the candidate documents through a quantum language model, calculates the probability of possible actions of the intelligent agent through a quantum probability theory, introduces longer candidate document sequence information in the ordering process, and improves the accuracy of document ordering under the condition of not increasing the complexity of ordering.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a reinforcement learning model construction method for information retrieval according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an interaction process between an agent and an environment according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
According to the invention, a reinforcement learning model construction method for information retrieval is provided, which comprises the following steps:
s100, acquiring the feature code Q of the query information Q and the feature codes of the candidate documents in the candidate document set.
According to the invention, the number of independent potential semantics included in the candidate document set is recorded as N, and then the words are modeled as being defined in an N-dimensional Hilbert space H N Quantum concept of (1), where the underlying semantics form a set of basis vectors of space { ψ 1 ,ψ 2 ,…,ψ N }. Each word may use hilbert space H N The superposition state of the basis vectors, i.e. the linear combination of the basis vectors and the complex-valued weights. Therefore, the method for acquiring the feature codes of the candidate documents in the candidate document set comprises the following steps:
s110, for the e-th candidate document doc in the candidate document set e And performing word segmentation to obtain m words.
S120, obtaining a complex word vector of the first word Is a complex number, { w j } N j=1 Is non-negative and real and satisfies sigma N j=1 w 2 j =1,θ j Is a real number w j Corresponding complex phase and satisfies theta j ∈[-π,π],ψ j Is a Hilbert space H N The jth base vector above, N being the number of independent potential semantics included in the candidate document set, and i being an imaginary unit.
According to the invention, it is also possible to redefine it according to the Euler formulaθ j Are trainable parameters.
S130, obtaining candidate document doc e Characteristic code x of m =∑ m l=1 (u l ·t l ·(t l ) H ),u l Is the first word at doc e Of importance, Σ m l=1 u l =1。
According to the invention, the candidate document doc e Can be represented as a sequence of m complex-word vectors t 1 ,t 2 ,…,t m ]If the word features of a candidate document are used to form the ground state of a state space, the encoding features x of the candidate document may be represented by a quantum language model m 。
Optionally, u is obtained according to the word frequency (that is tf) of the first word in doc l (ii) a Or obtaining u according to the inverse text frequency index (tf-idf) of the ith word l 。
According to the method of S110-S130, the feature codes of the candidate documents in the candidate document set can be obtained.
According to the invention, the method for obtaining q comprises the following steps:
and S111, performing word segmentation on the query information Q to obtain c words.
S121, obtaining a complex word vector of the b-th word Is a complex number, { w j } N j=1 Is non-negative and real and satisfies sigma N j=1 w 2 j =1,θ j Is a real number w j Corresponding complex phase and satisfies theta j ∈[-π,π],ψ j Is a Hilbert space H N N is the number of independent potential semantics comprised by the candidate document set, and i is an imaginary unit.
S131, obtaining the Q characteristic code Q = ∑ Sigma c b=1 (u b ·t b ·(t b ) H ),u b Sigma is the importance of the b-th word in Q c b=1 u b =1。
According to the invention, the query information can be represented as a sequence of c complex-word vectors t 1 ,t 2 ,…,t c ]If the word features of the query message are used to form the ground state of a state space, the encoding features q of the query message may be represented by a quantum language model.
Optionally, u b =1/c。
S200, constructing an MDP model, wherein: initial state s of the MDP model 0 =[0,q](ii) a Agent of MDP model selects action a in initial state 0 Probability distribution ofThe action a 0 For selecting a candidate document from the set of candidate documentsAs candidate documentsW is a predetermined initial trainable parameter, x m(a) Candidate document d selected from the set of candidate documents for action a m(a) Feature coding of,A(s 0 ) At an initial state s 0 Lower optional action set, () H A conjugate transpose matrix of (); initial reward function for MDP model As preset candidate documentsThe relevance tag of (a); the state s corresponding to the intelligent agent of the MDP model in the t step t Lower selection action a t Probability distribution of Is an action a t Candidate documents selected from the set of candidate documentsCharacteristic code of (1), A(s) t ) For the state s corresponding to the t-th step t A set of next selectable actions; rho t The operator is a quantum probability distribution operator containing the first n-1 selected candidate documents of the agent, n is a preset value,decision reward function of MDP model Is a predetermined action a f Candidate documents selected from the set of candidate documentsThe relevance tag of (1).
In accordance with the present invention, the process of ranking candidate documents may be formulated as an MDP, wherein the construction of the candidate document ranking may be viewed as a sequential decision, wherein each time step corresponds to a ranking position, and each action selects the candidate document corresponding to the position. The ranking method for candidate documents may consist of < S, A, T, R, π >.
S is a state set representing an Environment (i.e., environment). During the ranking process, the Agent (i.e., agent) should know the current ranking position and optional candidate documents. At the t-th step, the state can be defined as s t =[t,X t ]Wherein X is t And selecting from the feature code sets of the rest candidate documents.
A represents an agent selectable action set. Optional set of actions A(s) t ) Dependent on the state s t . At the t step, the action a t ∈A(s t ) Feature encoding of selected candidate documentsArranged at the t +1 th position, where m (a) t ) Represents action a t The selected candidate document index.
T (S, A): t is S × A → S, representing the pass through action a t Will state s t Transition to a new state s t+1 。
R (S, A) is an instant prize. This reward may be considered the quality of the selected candidate document during the ranking process.
π(a|s):A×S→[0,1]Representing the behaviour of an agent, i.e. optional action a t The probability distribution of (2) is a probability distribution calculated by quantum probability in the present invention.
According to the present invention, the initial state setting of the MDP model includes: given a coding feature Q corresponding to query information Q and a feature coding set X of a candidate document set 0 (length M) and the corresponding set of relevance tags Y (length M), with q as the initial state s 0 =[0,q]. Action a t Selected candidate documentsIs coded asIs a candidate documentThe relevance tags of (2). In the invention, the relevance label of each candidate document is preset and optional, the relevance label of each candidate document is the relevance between the query information marked by a user and the corresponding candidate document and is a digital label, wherein 0 is used for representing irrelevant, 1 is used for representing weak relevance, and 2 is used for representing strong relevance.
According to the invention, the agent is in an initial state s 0 =[0,q]Next selects an action a 0 Selecting action a 0 The probability distribution of (c) is:
wherein w is a preset trainable parameter for initialization. Optionally, w is input by a user or read from a configuration file; those skilled in the art will appreciate that any means for obtaining w in the prior art is within the scope of the present invention.
At this time, action a can be obtained 0 Selected m (a) 0 ) A candidate document. Defining the evaluation index as a reward function under the information retrieval scene:
accordingly, the state of the agent will go to s 1 =T([0,q],a 0 )=[1,X 1 ]WhereinIs to encode the characteristics of the selected documentRemove candidate set X 0 And obtaining the latest feature code set of the candidate document.
As a specific embodiment, taking 3 rd order Markov decision process as an example, at step t the agent needs to be in initial state s t =[t,X t ]Next select an action a t Selecting action a t The probability distribution of (c) is:
where ρ is t A quantum probability distribution operator for the first 2 selected candidate documents containing the agent:
thus, an action a in consideration of candidate document information selected by the first two agents under 3-order Markov decision can be obtained t Selected m (a) t ) A candidate document.
The reward function that takes into account a 3 rd order markov decision is:
thus, the state of the agent will go to s t+1 =T([t,X t ],a t )=[t+1,X t+1 ]WhereinIs to encode the characteristics of the selected documentRemoving candidate set X t Resulting in the latest candidate set.
The above process is repeated until M candidate documents are ranked, and the process of generating an ordered document set by this ranking algorithm is shown in fig. 1 and 2.
S300, performing model training on the MDP model according to the long-term reward; wherein the long-term reward L = E [. Sigma ] M k=1 (λ k-1 *r k )]λ is a predetermined discount factor, r k Feature coding of kth candidate document returned for MDP modelK ranges from 1 to M, M is the number of candidate documents included in the candidate document set, and E is an expectation.
It should be understood that the model-trained MDP model may be used for information retrieval. Because the feature codes and the candidate documents have one-to-one correspondence, the ordered candidate document set can be obtained according to the feature code set of the ordered candidate documents returned by the MDP model.
Although some specific embodiments of the present invention have been described in detail by way of example, it should be understood by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. It will also be appreciated by those skilled in the art that various modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.
Claims (8)
1. A reinforcement learning model construction method for information retrieval is characterized by comprising the following steps:
s100, acquiring a feature code Q of query information Q and a feature code of each candidate document in a candidate document set;
s200, constructing an MDP model, wherein: initial state s of the MDP model 0 =[0,q](ii) a Agent of MDP model selects action a in initial state 0 Probability distribution ofThe action a 0 For selecting candidate documents from the set of candidate documentsAs candidate documentsW is a predetermined initial trainable parameter, x m(a) Candidate document d selected from the set of candidate documents for action a m(a) Characteristic code of (1), A(s) 0 ) Is in an initial state s 0 Set of actions selectable from ") H A conjugate transpose matrix of (); initial reward function for MDP model As preset candidate documentsThe relevance tag of (a); the state s corresponding to the intelligent agent of the MDP model in the t step t Lower selection action a t Probability distribution of Is an action a t Candidate documents selected from the set of candidate documentsCharacteristic code of (1), A(s) t ) For the state s corresponding to step t t A set of next selectable actions; rho t The operator is a quantum probability distribution operator containing the first n-1 selected candidate documents of the agent, n is a preset value,n is more than or equal to 2; of MDP modelsDecision reward function Is a predetermined action a f Candidate documents selected from the set of candidate documentsThe relevance tag of (a);
s300, performing model training on the MDP model according to the long-term reward; wherein the long-term awardsλ is a predetermined discount factor, r k Feature coding of kth candidate document returned for MDP modelK ranges from 1 to M, M is the number of candidate documents included in the candidate document set, and E is an expectation.
2. The method according to claim 1, wherein in S100, the method for obtaining the feature code of each candidate document in the candidate document set comprises:
s110, for the e-th candidate document doc in the candidate document set e Performing word segmentation to obtain m words;
s120, obtaining a complex word vector of the first word Is a complex number, { w j } N j=1 Is non-negative and real and satisfies sigma N j=1 w 2 j =1,θ j Is a real number w j Corresponding complex phase and satisfies theta j ∈[-π,π],ψ j Is a Hilbert space H N The jth base vector, N being the number of independent potential semantics included in the candidate document set, i being an imaginary unit;
s130, obtaining candidate document doc e Characteristic code x of m =∑ m l=1 (u l ·t l ·(t l ) H ),u l Is the first word at doc e Of importance, Σ m l=1 u l =1。
3. The method of claim 2, wherein in S130, the first word is in doc e Term frequency acquisition u in l 。
4. The method of claim 2, wherein in S130, u is obtained according to an inverse text frequency index of the ith word l 。
5. The method of claim 1, wherein in S100, the method for obtaining q comprises:
s111, segmenting words of the query information Q to obtain c words;
s121, obtaining a complex word vector of the b-th word Is a complex number, { w j } N j=1 Is non-negative and real and satisfies sigma N j=1 w 2 j =1,θ j Is a real number w j Corresponding complex phase and satisfies theta j ∈[-π,π],ψ j Is a Hilbert space H N The jth base vector, N being the number of independent potential semantics included in the candidate document set, and i being an imaginary unit;
s131, obtaining the Q characteristic code Q = ∑ Sigma c b=1 (u b ·t b ·(t b ) H ),u b Sigma is the importance of the b-th word in Q c b=1 u b =1。
6. The method of claim 5, wherein u is b =1/c。
7. The method of claim 1, wherein n =3.
8. The method of claim 1, wherein the correlation tag is 0 or 1 or 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211287916.0A CN115526338B (en) | 2022-10-20 | 2022-10-20 | Reinforced learning model construction method for information retrieval |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211287916.0A CN115526338B (en) | 2022-10-20 | 2022-10-20 | Reinforced learning model construction method for information retrieval |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115526338A true CN115526338A (en) | 2022-12-27 |
CN115526338B CN115526338B (en) | 2023-06-23 |
Family
ID=84706705
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211287916.0A Active CN115526338B (en) | 2022-10-20 | 2022-10-20 | Reinforced learning model construction method for information retrieval |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115526338B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109783709A (en) * | 2018-12-21 | 2019-05-21 | 昆明理工大学 | A kind of sort method based on Markovian decision process and k- arest neighbors intensified learning |
CN111241407A (en) * | 2020-01-21 | 2020-06-05 | 中国人民大学 | Personalized search method based on reinforcement learning |
US20210089868A1 (en) * | 2019-09-23 | 2021-03-25 | Adobe Inc. | Reinforcement learning with a stochastic action set |
CN114860893A (en) * | 2022-07-06 | 2022-08-05 | 中国人民解放军国防科技大学 | Intelligent decision-making method and device based on multi-mode data fusion and reinforcement learning |
-
2022
- 2022-10-20 CN CN202211287916.0A patent/CN115526338B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109783709A (en) * | 2018-12-21 | 2019-05-21 | 昆明理工大学 | A kind of sort method based on Markovian decision process and k- arest neighbors intensified learning |
US20210089868A1 (en) * | 2019-09-23 | 2021-03-25 | Adobe Inc. | Reinforcement learning with a stochastic action set |
CN111241407A (en) * | 2020-01-21 | 2020-06-05 | 中国人民大学 | Personalized search method based on reinforcement learning |
CN114860893A (en) * | 2022-07-06 | 2022-08-05 | 中国人民解放军国防科技大学 | Intelligent decision-making method and device based on multi-mode data fusion and reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN115526338B (en) | 2023-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wu et al. | Session-based recommendation with graph neural networks | |
CN111027327B (en) | Machine reading understanding method, device, storage medium and device | |
US20210182680A1 (en) | Processing sequential interaction data | |
Huang et al. | A novel two-step procedure for tourism demand forecasting | |
CN109446414B (en) | Software information site rapid label recommendation method based on neural network classification | |
CN113268609A (en) | Dialog content recommendation method, device, equipment and medium based on knowledge graph | |
CN110781409A (en) | Article recommendation method based on collaborative filtering | |
CN110263245B (en) | Method and device for pushing object to user based on reinforcement learning model | |
CN113011529B (en) | Training method, training device, training equipment and training equipment for text classification model and readable storage medium | |
CN114860915A (en) | Model prompt learning method and device, electronic equipment and storage medium | |
US20220067055A1 (en) | Methods and apparatuses for showing target object sequence to target user | |
CN109086463B (en) | Question-answering community label recommendation method based on regional convolutional neural network | |
CN113723115B (en) | Open domain question-answer prediction method based on pre-training model and related equipment | |
CN112000788A (en) | Data processing method and device and computer readable storage medium | |
CN111626827A (en) | Method, device, equipment and medium for recommending articles based on sequence recommendation model | |
CN114358023A (en) | Intelligent question-answer recall method and device, computer equipment and storage medium | |
CN105045827A (en) | Familiarity based information recommendation method and apparatus | |
Robles et al. | Learning to reinforcement learn for neural architecture search | |
CN115526338A (en) | Reinforced learning model construction method for information retrieval | |
US11983633B2 (en) | Machine learning predictions by generating condition data and determining correct answers | |
Gupta et al. | Forecasting through motifs discovered by genetic algorithms | |
CN115310449A (en) | Named entity identification method and device based on small sample and related medium | |
CN117938951B (en) | Information pushing method, device, computer equipment and storage medium | |
CN116684480B (en) | Method and device for determining information push model and method and device for information push | |
CN117875424B (en) | Knowledge graph completion method and system based on entity description and symmetry relation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |