CN115526338A - Reinforced learning model construction method for information retrieval - Google Patents

Reinforced learning model construction method for information retrieval Download PDF

Info

Publication number
CN115526338A
CN115526338A CN202211287916.0A CN202211287916A CN115526338A CN 115526338 A CN115526338 A CN 115526338A CN 202211287916 A CN202211287916 A CN 202211287916A CN 115526338 A CN115526338 A CN 115526338A
Authority
CN
China
Prior art keywords
candidate
word
candidate document
action
documents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211287916.0A
Other languages
Chinese (zh)
Other versions
CN115526338B (en
Inventor
蒋永余
方省
曹家
王璋盛
罗引
王磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Wenge Technology Co ltd
Original Assignee
Beijing Zhongke Wenge Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Wenge Technology Co ltd filed Critical Beijing Zhongke Wenge Technology Co ltd
Priority to CN202211287916.0A priority Critical patent/CN115526338B/en
Publication of CN115526338A publication Critical patent/CN115526338A/en
Application granted granted Critical
Publication of CN115526338B publication Critical patent/CN115526338B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the field of information retrieval, in particular to a reinforcement learning model construction method for information retrieval, which comprises the following steps: s100, acquiring a feature code Q of query information Q and a feature code of each candidate document in a candidate document set; s200, constructing an MDP model, wherein: initial state s of the MDP model 0 =[0,q]The agent of the MDP model selects the action a in the initial state 0 Has a probability distribution of pi (a) 0 |s 0 (ii) a w); and S300, performing model training on the MDP model according to the long-term reward. The invention improves the accuracy of document sorting during information retrieval.

Description

Reinforced learning model construction method for information retrieval
Technical Field
The invention relates to the field of information retrieval, in particular to a reinforcement learning model construction method for information retrieval.
Background
With the rapid development of the internet, learning to rank (L2R) technology is also getting more and more attention, which is one of the common tasks of machine Learning. When information is retrieved, a query target is given, and a result which best meets the requirement needs to be calculated and returned. The use of a Markov Decision Process (MDP) to generate document rankings is disclosed in the prior art, which alleviates the problem of ranking complexity to some extent. However, the enhanced learning model based on the MDP in the prior art is mostly based on a first-order markov decision process, which causes the position of each document to be related to the previous document only and not to be related to the previous documents, thereby affecting the accuracy of document ranking in information retrieval. How to improve the accuracy of document ordering during information retrieval is an urgent problem to be solved.
Disclosure of Invention
The invention aims to provide a reinforcement learning model construction method for information retrieval, so as to improve the accuracy of document sequencing during information retrieval.
According to the invention, a reinforcement learning model construction method for information retrieval is provided, which comprises the following steps:
s100, acquiring the feature code Q of the query information Q and the feature codes of the candidate documents in the candidate document set.
S200, constructing an MDP model, wherein: initial state s of the MDP model 0 =[0,q](ii) a Agent of MDP model selects action a in initial state 0 Probability distribution of
Figure BDA0003900127620000011
The action a 0 For selecting candidate documents from the set of candidate documents
Figure BDA0003900127620000012
As candidate documents
Figure BDA0003900127620000013
W is a preset initialized trainable parameter,x m(a) candidate document d selected from the set of candidate documents for action a m(a) Characteristic code of (1), A(s) 0 ) Is in an initial state s 0 Set of actions selectable from ") H A conjugate transpose matrix of (); initial reward function of MDP model
Figure BDA0003900127620000014
Figure BDA0003900127620000015
For preset candidate documents
Figure BDA0003900127620000016
The relevance tag of (a); the state s corresponding to the intelligent agent of the MDP model in the t step t Lower selection action a t Probability distribution of
Figure BDA0003900127620000017
Figure BDA0003900127620000018
Is an action a t Candidate documents selected from the set of candidate documents
Figure BDA0003900127620000019
Characteristic code of (1), A(s) t ) For the state s corresponding to the t-th step t A set of next selectable actions; rho t A quantum probability distribution operator for the first n-1 selected candidate documents containing the agent, n being a predetermined value,
Figure BDA0003900127620000021
decision reward function of MDP model
Figure BDA0003900127620000022
Figure BDA0003900127620000023
Is a predetermined action a f Candidate documents selected from the set of candidate documents
Figure BDA0003900127620000024
The relevance tag of (1).
S300, performing model training on the MDP model according to the long-term reward; wherein the long-term reward
Figure BDA0003900127620000025
λ is a predetermined discount factor, r k Feature coding of kth candidate document returned for MDP model
Figure BDA0003900127620000026
K ranges from 1 to M, M is the number of candidate documents included in the candidate document set, and E is an expectation.
Compared with the prior art, the reinforcement learning model construction method for information retrieval has obvious beneficial effects, can achieve considerable technical progress and practicability, has industrial wide utilization value and at least has the following beneficial effects:
the method considers the sorting dependency relationship among a plurality of candidate documents and expands the first-order Markov decision process to the n-order Markov decision process.
Furthermore, the method constructs the characteristics of the query and the candidate documents through a quantum language model, calculates the probability of possible actions of the intelligent agent through a quantum probability theory, introduces longer candidate document sequence information in the ordering process, and improves the accuracy of document ordering under the condition of not increasing the complexity of ordering.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a reinforcement learning model construction method for information retrieval according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an interaction process between an agent and an environment according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
According to the invention, a reinforcement learning model construction method for information retrieval is provided, which comprises the following steps:
s100, acquiring the feature code Q of the query information Q and the feature codes of the candidate documents in the candidate document set.
According to the invention, the number of independent potential semantics included in the candidate document set is recorded as N, and then the words are modeled as being defined in an N-dimensional Hilbert space H N Quantum concept of (1), where the underlying semantics form a set of basis vectors of space { ψ 1 ,ψ 2 ,…,ψ N }. Each word may use hilbert space H N The superposition state of the basis vectors, i.e. the linear combination of the basis vectors and the complex-valued weights. Therefore, the method for acquiring the feature codes of the candidate documents in the candidate document set comprises the following steps:
s110, for the e-th candidate document doc in the candidate document set e And performing word segmentation to obtain m words.
S120, obtaining a complex word vector of the first word
Figure BDA0003900127620000031
Figure BDA0003900127620000032
Is a complex number, { w j } N j=1 Is non-negative and real and satisfies sigma N j=1 w 2 j =1,θ j Is a real number w j Corresponding complex phase and satisfies theta j ∈[-π,π],ψ j Is a Hilbert space H N The jth base vector above, N being the number of independent potential semantics included in the candidate document set, and i being an imaginary unit.
According to the invention, it is also possible to redefine it according to the Euler formula
Figure BDA0003900127620000033
θ j Are trainable parameters.
S130, obtaining candidate document doc e Characteristic code x of m =∑ m l=1 (u l ·t l ·(t l ) H ),u l Is the first word at doc e Of importance, Σ m l=1 u l =1。
According to the invention, the candidate document doc e Can be represented as a sequence of m complex-word vectors t 1 ,t 2 ,…,t m ]If the word features of a candidate document are used to form the ground state of a state space, the encoding features x of the candidate document may be represented by a quantum language model m
Optionally, u is obtained according to the word frequency (that is tf) of the first word in doc l (ii) a Or obtaining u according to the inverse text frequency index (tf-idf) of the ith word l
According to the method of S110-S130, the feature codes of the candidate documents in the candidate document set can be obtained.
According to the invention, the method for obtaining q comprises the following steps:
and S111, performing word segmentation on the query information Q to obtain c words.
S121, obtaining a complex word vector of the b-th word
Figure BDA0003900127620000034
Figure BDA0003900127620000035
Is a complex number, { w j } N j=1 Is non-negative and real and satisfies sigma N j=1 w 2 j =1,θ j Is a real number w j Corresponding complex phase and satisfies theta j ∈[-π,π],ψ j Is a Hilbert space H N N is the number of independent potential semantics comprised by the candidate document set, and i is an imaginary unit.
S131, obtaining the Q characteristic code Q = ∑ Sigma c b=1 (u b ·t b ·(t b ) H ),u b Sigma is the importance of the b-th word in Q c b=1 u b =1。
According to the invention, the query information can be represented as a sequence of c complex-word vectors t 1 ,t 2 ,…,t c ]If the word features of the query message are used to form the ground state of a state space, the encoding features q of the query message may be represented by a quantum language model.
Optionally, u b =1/c。
S200, constructing an MDP model, wherein: initial state s of the MDP model 0 =[0,q](ii) a Agent of MDP model selects action a in initial state 0 Probability distribution of
Figure BDA0003900127620000041
The action a 0 For selecting a candidate document from the set of candidate documents
Figure BDA0003900127620000042
As candidate documents
Figure BDA0003900127620000043
W is a predetermined initial trainable parameter, x m(a) Candidate document d selected from the set of candidate documents for action a m(a) Feature coding of,A(s 0 ) At an initial state s 0 Lower optional action set, () H A conjugate transpose matrix of (); initial reward function for MDP model
Figure BDA0003900127620000044
Figure BDA0003900127620000045
As preset candidate documents
Figure BDA0003900127620000046
The relevance tag of (a); the state s corresponding to the intelligent agent of the MDP model in the t step t Lower selection action a t Probability distribution of
Figure BDA0003900127620000047
Figure BDA0003900127620000048
Is an action a t Candidate documents selected from the set of candidate documents
Figure BDA0003900127620000049
Characteristic code of (1), A(s) t ) For the state s corresponding to the t-th step t A set of next selectable actions; rho t The operator is a quantum probability distribution operator containing the first n-1 selected candidate documents of the agent, n is a preset value,
Figure BDA00039001276200000410
decision reward function of MDP model
Figure BDA00039001276200000411
Figure BDA00039001276200000412
Is a predetermined action a f Candidate documents selected from the set of candidate documents
Figure BDA00039001276200000413
The relevance tag of (1).
In accordance with the present invention, the process of ranking candidate documents may be formulated as an MDP, wherein the construction of the candidate document ranking may be viewed as a sequential decision, wherein each time step corresponds to a ranking position, and each action selects the candidate document corresponding to the position. The ranking method for candidate documents may consist of < S, A, T, R, π >.
S is a state set representing an Environment (i.e., environment). During the ranking process, the Agent (i.e., agent) should know the current ranking position and optional candidate documents. At the t-th step, the state can be defined as s t =[t,X t ]Wherein X is t And selecting from the feature code sets of the rest candidate documents.
A represents an agent selectable action set. Optional set of actions A(s) t ) Dependent on the state s t . At the t step, the action a t ∈A(s t ) Feature encoding of selected candidate documents
Figure BDA0003900127620000051
Arranged at the t +1 th position, where m (a) t ) Represents action a t The selected candidate document index.
T (S, A): t is S × A → S, representing the pass through action a t Will state s t Transition to a new state s t+1
R (S, A) is an instant prize. This reward may be considered the quality of the selected candidate document during the ranking process.
π(a|s):A×S→[0,1]Representing the behaviour of an agent, i.e. optional action a t The probability distribution of (2) is a probability distribution calculated by quantum probability in the present invention.
According to the present invention, the initial state setting of the MDP model includes: given a coding feature Q corresponding to query information Q and a feature coding set X of a candidate document set 0 (length M) and the corresponding set of relevance tags Y (length M), with q as the initial state s 0 =[0,q]. Action a t Selected candidate documents
Figure BDA0003900127620000052
Is coded as
Figure BDA0003900127620000053
Is a candidate document
Figure BDA0003900127620000054
The relevance tags of (2). In the invention, the relevance label of each candidate document is preset and optional, the relevance label of each candidate document is the relevance between the query information marked by a user and the corresponding candidate document and is a digital label, wherein 0 is used for representing irrelevant, 1 is used for representing weak relevance, and 2 is used for representing strong relevance.
According to the invention, the agent is in an initial state s 0 =[0,q]Next selects an action a 0 Selecting action a 0 The probability distribution of (c) is:
Figure BDA0003900127620000055
wherein w is a preset trainable parameter for initialization. Optionally, w is input by a user or read from a configuration file; those skilled in the art will appreciate that any means for obtaining w in the prior art is within the scope of the present invention.
At this time, action a can be obtained 0 Selected m (a) 0 ) A candidate document. Defining the evaluation index as a reward function under the information retrieval scene:
Figure BDA0003900127620000056
accordingly, the state of the agent will go to s 1 =T([0,q],a 0 )=[1,X 1 ]Wherein
Figure BDA0003900127620000057
Is to encode the characteristics of the selected document
Figure BDA0003900127620000058
Remove candidate set X 0 And obtaining the latest feature code set of the candidate document.
As a specific embodiment, taking 3 rd order Markov decision process as an example, at step t the agent needs to be in initial state s t =[t,X t ]Next select an action a t Selecting action a t The probability distribution of (c) is:
Figure BDA0003900127620000059
where ρ is t A quantum probability distribution operator for the first 2 selected candidate documents containing the agent:
Figure BDA0003900127620000061
thus, an action a in consideration of candidate document information selected by the first two agents under 3-order Markov decision can be obtained t Selected m (a) t ) A candidate document.
The reward function that takes into account a 3 rd order markov decision is:
Figure BDA0003900127620000062
thus, the state of the agent will go to s t+1 =T([t,X t ],a t )=[t+1,X t+1 ]Wherein
Figure BDA0003900127620000063
Is to encode the characteristics of the selected document
Figure BDA0003900127620000064
Removing candidate set X t Resulting in the latest candidate set.
The above process is repeated until M candidate documents are ranked, and the process of generating an ordered document set by this ranking algorithm is shown in fig. 1 and 2.
S300, performing model training on the MDP model according to the long-term reward; wherein the long-term reward L = E [. Sigma ] M k=1k-1 *r k )]λ is a predetermined discount factor, r k Feature coding of kth candidate document returned for MDP model
Figure BDA0003900127620000065
K ranges from 1 to M, M is the number of candidate documents included in the candidate document set, and E is an expectation.
It should be understood that the model-trained MDP model may be used for information retrieval. Because the feature codes and the candidate documents have one-to-one correspondence, the ordered candidate document set can be obtained according to the feature code set of the ordered candidate documents returned by the MDP model.
Although some specific embodiments of the present invention have been described in detail by way of example, it should be understood by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. It will also be appreciated by those skilled in the art that various modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.

Claims (8)

1. A reinforcement learning model construction method for information retrieval is characterized by comprising the following steps:
s100, acquiring a feature code Q of query information Q and a feature code of each candidate document in a candidate document set;
s200, constructing an MDP model, wherein: initial state s of the MDP model 0 =[0,q](ii) a Agent of MDP model selects action a in initial state 0 Probability distribution of
Figure FDA0003900127610000011
The action a 0 For selecting candidate documents from the set of candidate documents
Figure FDA0003900127610000012
As candidate documents
Figure FDA0003900127610000013
W is a predetermined initial trainable parameter, x m(a) Candidate document d selected from the set of candidate documents for action a m(a) Characteristic code of (1), A(s) 0 ) Is in an initial state s 0 Set of actions selectable from ") H A conjugate transpose matrix of (); initial reward function for MDP model
Figure FDA0003900127610000014
Figure FDA0003900127610000015
As preset candidate documents
Figure FDA0003900127610000016
The relevance tag of (a); the state s corresponding to the intelligent agent of the MDP model in the t step t Lower selection action a t Probability distribution of
Figure FDA0003900127610000017
Figure FDA0003900127610000018
Is an action a t Candidate documents selected from the set of candidate documents
Figure FDA0003900127610000019
Characteristic code of (1), A(s) t ) For the state s corresponding to step t t A set of next selectable actions; rho t The operator is a quantum probability distribution operator containing the first n-1 selected candidate documents of the agent, n is a preset value,
Figure FDA00039001276100000110
n is more than or equal to 2; of MDP modelsDecision reward function
Figure FDA00039001276100000111
Figure FDA00039001276100000117
Is a predetermined action a f Candidate documents selected from the set of candidate documents
Figure FDA00039001276100000112
The relevance tag of (a);
s300, performing model training on the MDP model according to the long-term reward; wherein the long-term awards
Figure FDA00039001276100000116
λ is a predetermined discount factor, r k Feature coding of kth candidate document returned for MDP model
Figure FDA00039001276100000113
K ranges from 1 to M, M is the number of candidate documents included in the candidate document set, and E is an expectation.
2. The method according to claim 1, wherein in S100, the method for obtaining the feature code of each candidate document in the candidate document set comprises:
s110, for the e-th candidate document doc in the candidate document set e Performing word segmentation to obtain m words;
s120, obtaining a complex word vector of the first word
Figure FDA00039001276100000114
Figure FDA00039001276100000115
Is a complex number, { w j } N j=1 Is non-negative and real and satisfies sigma N j=1 w 2 j =1,θ j Is a real number w j Corresponding complex phase and satisfies theta j ∈[-π,π],ψ j Is a Hilbert space H N The jth base vector, N being the number of independent potential semantics included in the candidate document set, i being an imaginary unit;
s130, obtaining candidate document doc e Characteristic code x of m =∑ m l=1 (u l ·t l ·(t l ) H ),u l Is the first word at doc e Of importance, Σ m l=1 u l =1。
3. The method of claim 2, wherein in S130, the first word is in doc e Term frequency acquisition u in l
4. The method of claim 2, wherein in S130, u is obtained according to an inverse text frequency index of the ith word l
5. The method of claim 1, wherein in S100, the method for obtaining q comprises:
s111, segmenting words of the query information Q to obtain c words;
s121, obtaining a complex word vector of the b-th word
Figure FDA0003900127610000021
Figure FDA0003900127610000022
Is a complex number, { w j } N j=1 Is non-negative and real and satisfies sigma N j=1 w 2 j =1,θ j Is a real number w j Corresponding complex phase and satisfies theta j ∈[-π,π],ψ j Is a Hilbert space H N The jth base vector, N being the number of independent potential semantics included in the candidate document set, and i being an imaginary unit;
s131, obtaining the Q characteristic code Q = ∑ Sigma c b=1 (u b ·t b ·(t b ) H ),u b Sigma is the importance of the b-th word in Q c b=1 u b =1。
6. The method of claim 5, wherein u is b =1/c。
7. The method of claim 1, wherein n =3.
8. The method of claim 1, wherein the correlation tag is 0 or 1 or 2.
CN202211287916.0A 2022-10-20 2022-10-20 Reinforced learning model construction method for information retrieval Active CN115526338B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211287916.0A CN115526338B (en) 2022-10-20 2022-10-20 Reinforced learning model construction method for information retrieval

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211287916.0A CN115526338B (en) 2022-10-20 2022-10-20 Reinforced learning model construction method for information retrieval

Publications (2)

Publication Number Publication Date
CN115526338A true CN115526338A (en) 2022-12-27
CN115526338B CN115526338B (en) 2023-06-23

Family

ID=84706705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211287916.0A Active CN115526338B (en) 2022-10-20 2022-10-20 Reinforced learning model construction method for information retrieval

Country Status (1)

Country Link
CN (1) CN115526338B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109783709A (en) * 2018-12-21 2019-05-21 昆明理工大学 A kind of sort method based on Markovian decision process and k- arest neighbors intensified learning
CN111241407A (en) * 2020-01-21 2020-06-05 中国人民大学 Personalized search method based on reinforcement learning
US20210089868A1 (en) * 2019-09-23 2021-03-25 Adobe Inc. Reinforcement learning with a stochastic action set
CN114860893A (en) * 2022-07-06 2022-08-05 中国人民解放军国防科技大学 Intelligent decision-making method and device based on multi-mode data fusion and reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109783709A (en) * 2018-12-21 2019-05-21 昆明理工大学 A kind of sort method based on Markovian decision process and k- arest neighbors intensified learning
US20210089868A1 (en) * 2019-09-23 2021-03-25 Adobe Inc. Reinforcement learning with a stochastic action set
CN111241407A (en) * 2020-01-21 2020-06-05 中国人民大学 Personalized search method based on reinforcement learning
CN114860893A (en) * 2022-07-06 2022-08-05 中国人民解放军国防科技大学 Intelligent decision-making method and device based on multi-mode data fusion and reinforcement learning

Also Published As

Publication number Publication date
CN115526338B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
Wu et al. Session-based recommendation with graph neural networks
CN111027327B (en) Machine reading understanding method, device, storage medium and device
US20210182680A1 (en) Processing sequential interaction data
Huang et al. A novel two-step procedure for tourism demand forecasting
CN109446414B (en) Software information site rapid label recommendation method based on neural network classification
CN113268609A (en) Dialog content recommendation method, device, equipment and medium based on knowledge graph
CN110781409A (en) Article recommendation method based on collaborative filtering
CN110263245B (en) Method and device for pushing object to user based on reinforcement learning model
CN113011529B (en) Training method, training device, training equipment and training equipment for text classification model and readable storage medium
CN114860915A (en) Model prompt learning method and device, electronic equipment and storage medium
US20220067055A1 (en) Methods and apparatuses for showing target object sequence to target user
CN109086463B (en) Question-answering community label recommendation method based on regional convolutional neural network
CN113723115B (en) Open domain question-answer prediction method based on pre-training model and related equipment
CN112000788A (en) Data processing method and device and computer readable storage medium
CN111626827A (en) Method, device, equipment and medium for recommending articles based on sequence recommendation model
CN114358023A (en) Intelligent question-answer recall method and device, computer equipment and storage medium
CN105045827A (en) Familiarity based information recommendation method and apparatus
Robles et al. Learning to reinforcement learn for neural architecture search
CN115526338A (en) Reinforced learning model construction method for information retrieval
US11983633B2 (en) Machine learning predictions by generating condition data and determining correct answers
Gupta et al. Forecasting through motifs discovered by genetic algorithms
CN115310449A (en) Named entity identification method and device based on small sample and related medium
CN117938951B (en) Information pushing method, device, computer equipment and storage medium
CN116684480B (en) Method and device for determining information push model and method and device for information push
CN117875424B (en) Knowledge graph completion method and system based on entity description and symmetry relation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant