WO2020010955A1

WO2020010955A1 - Keyword extraction method based on reinforcement learning, and computer device and storage medium

Info

Publication number: WO2020010955A1
Application number: PCT/CN2019/089217
Authority: WO
Inventors: 徐易楠; 刘云峰; 吴悦; 胡晓; 汶林丁
Original assignee: 深圳追一科技有限公司
Priority date: 2018-07-13
Filing date: 2019-05-30
Publication date: 2020-01-16
Also published as: CN108897896A; CN108897896B

Abstract

A keyword extraction method based on reinforcement learning, the method comprising: establishing a keyword memory slot Gn for an nth group of dialogs in a corpus; initializing the keyword memory slot Gn and then obtaining a keyword memory slot GL; and updating the keyword memory slot GL in multiple rounds by using a reinforcement learning model, so as to obtain a keyword memory slot G'L, wherein the keyword memory slot G'L comprises a word vector of a keyword extracted from the nth group of dialogs.

Description

Keyword extraction method based on reinforcement learning, computer equipment and storage medium

Cross-reference to related applications

This application claims the priority of a Chinese patent application filed on July 13, 2018 with the Chinese Patent Office, application number 201810774634.0, and the invention name is "Keyword Extraction Method Based on Reinforcement Learning", the entire contents of which are incorporated herein by reference in.

Technical field

The present application relates to the technical field of natural language processing, and in particular to a keyword extraction method, computer equipment, and storage medium based on reinforcement learning.

Background technique

With the increase of users in Internet companies, artificial customer service has reduced the user's impression of the company due to busy users' inability to respond to user questions in a timely manner, so intelligent robots have emerged at the historic moment. In related technology, the working method of intelligent robots is as follows: firstly, analyze hot topics with high frequency and clear intentions of users, and abstract them into several types of standard questions (FAQs); for each FAQ, a professional business The person marks the standard answer, and then uses technical means to analyze whether the question can match any existing FAQ for the question of the future user. When the match is successful, the pre-marked answer is returned to the user to efficiently solve the user. Questioning effect. However, the above-mentioned intelligent customer service robot is mainly a single round of question and answer, that is, the user asks a question, and the intelligent customer service robot returns a response to the user, and terminates the question and answer. When the user continues to ask questions based on the previous Q & A context, because the intelligent customer service robot cannot prepare to grasp the context of the content and dialogue context, it often answers unanswered questions, which greatly reduces user satisfaction. In related technologies, in order to integrate intelligent customer service robots with contextual situations, In the dialogue, the following content is provided to help understand, using the encoding-decoding method, that is, the entire sentence above is encoded, and the following dialogue is decoded and stitched in the question below, as an additional input below. However, this method cannot explicitly save the above dialogue information, and the coded above content is directly stitched into the following, which not only cannot effectively extract the keyword information, but also causes data redundancy, which is not conducive to the following dialogue content. Clearly referring to the resolution, the auxiliary role of the question and answer below is relatively small. Therefore, a new technical solution that can ensure the continuous and effective conduct of multiple rounds of dialogue is urgently needed to solve this problem.

Summary of the invention

According to various embodiments of the present application, a keyword extraction method based on reinforcement learning, a computer device, and a storage medium are provided.

A keyword extraction method based on reinforcement learning, including:

Preprocess a corpus composed of multiple sets of dialog data;

Establish a keyword memory slot G _n for the n-th group of conversations in the corpus, where the keyword memory slot G _{n is} used to record word vectors of multiple historical keywords of the n-th group of conversations;

Initialize the keyword memory slot G _n to obtain the keyword memory slot G _L ; and

Use a reinforcement learning model to perform multiple rounds of update on the keyword memory slot G _L to obtain a keyword memory slot G ′ _L , where the keyword memory slot G ′ _L includes a word vector of a plurality of keywords extracted from the n-th group of conversations .

A computer device includes a memory and one or more processors. Computer-readable instructions are stored in the memory. When the computer-readable instructions are executed by the processor, the one or more processors execute the following steps:

Preprocess a corpus composed of multiple sets of dialog data;

Establish a keyword memory slot G _n for the nth group of dialogs in the corpus, and the keyword memory slot G _{n is} used to record the word vectors of multiple historical keywords of the nth group of dialogs;

The keyword learning slot G _L is updated multiple times by using a reinforcement learning model to obtain the keyword memory slot G ′ _{L. The} keyword memory slot G ′ _L includes word vectors of a plurality of keywords extracted from the nth group of conversations.

A storage medium stores a computer program. When the computer program is executed by a processor, the following operations are implemented:

Preprocess a corpus composed of multiple sets of dialog data;

Details of one or more embodiments of the invention are set forth in the accompanying drawings and description below. Other features, objects, and advantages of the invention will be apparent from the description, the drawings, and the claims.

It should be understood that the above general description and the following detailed description are merely exemplary and explanatory, and should not limit the present application.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawings herein are incorporated in and constitute a part of the specification, illustrate embodiments consistent with the present application, and together with the description serve to explain the principles of the application.

In order to explain the technical solutions in the embodiments of the present application or the prior art more clearly, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are merely These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without paying creative work.

FIG. 1 is an application environment diagram of a keyword extraction method based on reinforcement learning provided by an embodiment of the present application.

FIG. 2 is a flowchart of a keyword extraction method based on reinforcement learning provided by an embodiment of the present application.

FIG. 3 is a flowchart of a keyword extraction method based on reinforcement learning provided by another embodiment of the present application.

FIG. 4 is an internal structural diagram of a computer device in an embodiment of the present application.

detailed description

In order to make the purpose, technical solution, and advantages of the present application clearer, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the application, and are not used to limit the application.

The keyword extraction method based on reinforcement learning provided in the embodiment of the present application can be applied to the application environment shown in FIG. 1. Among them, the computer device 11 preprocesses a corpus composed of multiple sets of conversation data; establishes a keyword memory slot G _n for the n group conversations in the corpus, and the keyword memory slot G _{n is} used to record multiple histories of the n group conversations Keyword word vector; initialize the keyword memory slot G _n to obtain the keyword memory slot G _L ; and use the reinforcement learning model to perform multiple rounds of updating the keyword memory slot G _L to obtain the keyword memory slot G ′ _L , the keywords The memory slot G ′ _L includes word vectors of a plurality of keywords extracted from the n-th conversation. The computer device 11 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, portable wearable devices, independent servers, or server clusters composed of multiple servers.

As shown in FIG. 2, the method in this embodiment includes:

S21: Preprocess a corpus composed of multiple sets of dialog data.

The corpus consists of multiple sets of high-frequency standard question-and-answer FAQ dialogue data. The corpus is used as an interactive environment for reinforcement learning.

Preprocessing a corpus composed of multiple sets of dialog data, including: establishing a correspondence table between word vectors and keywords and words, and performing vector transformation on all dialog questions and answers in the corpus according to the correspondence table between word vectors and keywords and words, In the nth group of dialogues, vector transformation is performed to obtain S _i , and standard answers corresponding to the i-th question are subjected to vector transformation to obtain Y _i .

Vector conversion of all dialog questions and answers in the corpus, including: using Word2Vec tools to convert all dialog questions and standard answers in the corpus into vector form. Word2Vec is a Google open source tool for word vector calculation.

S22: Establish a keyword memory slot G _n in the n-th group of conversations in the corpus. The keyword memory slot G _{n is} used to record word vectors of multiple historical keywords in the n-th group of conversations.

S23: Initialize the keyword memory slot G _n to obtain a keyword memory slot G _L.

The keyword memory is initialized to obtain grooves G _n Image Memory groove G _L, comprising: a memory for keywords grooves G _n and the length initialization vector initialization, including initialization of the length of the longitudinal groove G _n arranged keyword memory is L, the vector Initialization includes setting the vector in the keyword memory slot G _n to 0 to obtain the keyword memory slot G _L = [0,0, ..., 0], for example, L = 5, then G _L = [0,0,0 , 0,0].

S24: Use the reinforcement learning model to perform multiple rounds of update on the keyword memory slot G _L to obtain the keyword memory slot G ' _{L. The} keyword memory slot G' _L includes a word vector of multiple keywords extracted from the nth group of conversations. .

Use the reinforcement learning model to perform multiple rounds of update on the keyword memory slot G _L to obtain the keyword memory slot G ' _L , including:

The first clause sequentially scanned to the end of each word in the sentence S _i n-th group session in the current question and the current scan and the n-th word C _i of the current keyword group session groove G _L splicing memory as the state vector s, That is, s = [C _i , G _L ];

Take the state s as input into the reinforcement learning model and get the output action a, which is a positive integer with a value ranging from [0, L];

Set the state transition probability P (s '| s, a) to 1, so that each time state s performs an action a, a state transition occurs to obtain a new state s';

Determine whether the current scanned word is a keyword according to the value of action a;

Calculate the reward function R (s, a);

Determine the output value of action a in the next training according to the value of the reward function R (s, a);

The number of reinforcement learning training times is set to M, that is, the keyword memory slot G _L is updated by M rounds using a reinforcement learning model to obtain the keyword memory slot G ′ _L , and the keyword memory slot G ′ _L includes an output value of action a.

The value of the action of a current scan is determined whether the word as a keyword, comprising: an operation when a is 0, then the current scan as a key word is not C _i; a is not 0 when the operation of the current scan word C _i considered Image And update the keyword memory slot G _L.

The current scan word C _{i is} regarded as a keyword, and the keyword memory slot G _{L is} updated, including:

The current scan word C _{i is} stored in the k-th position of the keyword memory slot G _L , where k is a value output by the action a.

The pre-processing operation is performed on the word vectors in the updated keyword memory slot G ′ _L to obtain the keyword words. The pre-processing operation includes: extracting the keyword words corresponding to the word vector according to the correspondence table between the word vectors and the keyword words; memory groove G _'L word vectors in trans to give a preprocessing operation after updating the keyword words, easy to visualize the art of extracting keywords based on keywords in the art can words verification and improvement of reinforcement learning model.

Alternatively, the keyword word vector in the keyword memory slot G ′ _L is stitched into the next question of the nth group of dialogues to supplement the missing keyword information in the next question.

The memory slot G ' _L stores the keywords of the nth group of conversations in the corpus. After the user asks a new question, the method adds the keywords in the memory slot G' _L to the new question and brings them into the neural network model together, so that Output accurate answers to new questions.

For example, user question 1 is "I want to book a hotel, how do I do it?", Question 2 is "how to charge?" The method first preprocesses each question, for example, question 1 is "I want to book a hotel, how do I do it?" ? "After preprocessing, the vector S ₁ = [1,2,3,4,5,6,7] is obtained. The correspondence between the word vector and the keywords is shown in Table 1.

Table 1

问句关键词词语Question, keywords, word	问句关键词向量Question Keyword Vector
我I	11
想miss you	22
预定Book	33
酒店Hotel	44

该The	55
如何how is it	66
操作operating	77

Question 2 "How to charge?" The vector is converted to S ₂ = [6,8]. In reinforcement learning model will be given a training set of standard S ₁ A sentence of Y _1, S ₂ A standard sentence Y _{_2,} Y _1, Y ₂ details will not be repeated.

Establish a keyword memory slot G _{n for the} dialogues of Question 1 and Question 2; the keyword memory slot G _{n is} used to record the keywords of Question 1;

Initialize the keyword memory slot G _n to obtain G _L , and set L = 5, then G _{L is} initialized to [0,0,0,0,0];

The reinforcement learning model is used to update the keyword memory slot G _L to the keyword memory slot G ′ _L , and the keyword memory slot G ′ _L is a word vector of keywords extracted from the n-th group of conversations.

Scan each word in the current question S ₁ in the conversation in turn from the beginning to the end of the sentence, and use the current scan word, such as "I", to convert it into a word vector and then [1] and the current keyword memory slot G _L As the state s, that is, s = [1,0,0,0,0,0];

The state s is taken as an input into the reinforcement learning model, and an output action a is obtained. The action a is a positive integer with a value ranging from [0,5]. When a = 0, the current scanning word “I” is not a keyword; when a ≠ 0, the current scanning word “I” is a keyword, and the current scanning word “I” is stored in the first position of the keyword memory slot G _L k positions. k is the value output by action a. For example, if k = 5, it is updated to the keyword memory slot G ′ _L = [0,0,0,0,1]. Because the currently scanned word "I" is not the end of the sentence, the reward function R (s, a) is 0, and the next word "think" is scanned. Because the state transition probability P (s '| s, a) is 1, we get a new state s' = [2,0,0,0,0,1]. The new action a 'obtained from the new state s'. When a' = 3, the current keyword memory slot is updated to [0,0,2,0,1]. Scan all the words in question 1 in turn until the current scanned word is the word "operation" at the end of the sentence, calculate the reward function R (s, a), and continuously modify the output of action a according to the reward function R (s, a).

After repeating the above process M times, set M = 100, so that the keyword memory slot G ' _L = [6,4,1,2,3] is finally output. The neural network model outputs the predicted answer and the labeled answer Y _{2 in the} training set with the smallest error to ensure continuous and effective multi-round dialogue.

It can be understood that the above keyword extraction method based on reinforcement learning has no strict restrictions on the use scenario and specific conversation content, and no strict restrictions on the training process and parameter range of the reinforcement learning model, and the calculation method of the predicted answer includes, but is not limited to Neural network model.

In this embodiment, the establishment of a keyword memory slot G _n by the n-th group session corpus, the obtained keyword memory slot groove G _L G _n memory after initialization keyword, the keyword model using reinforcement learning and memory slot G _L The keyword memory slot G ' _{L is} obtained by performing multiple rounds of update. The keyword memory slot G' _L includes the keyword word vector extracted from the nth group of conversations, which effectively improves the accuracy of the standard question and answer response below, and can guarantee multiple rounds. The dialogue continues to be effective, and the keywords above are explicitly extracted and stitched into the content below, so that technicians can see the keyword content intuitively, and it is easy to adjust the algorithm and model to output the most accurate keywords.

As shown in FIG. 3, based on the previous embodiment, the method for calculating the reward function R (s, a) in this embodiment includes:

S31: determining whether the current scan word C _i is the end of a sentence the word, when the word is not the end of a sentence the operation proceeds to S32; if a word into the end of a sentence S33;

S32: When the current scan word C _i is not the end of a sentence, the reward function R (s, a) is 0;

S33: When the current scan word C _i is a sentence ending word, vector stitching is performed between the current question S _i and the current keyword memory slot G _{L of} the n-th dialog to obtain [C _i , G _L ];

S34: Output a predicted answer vector P _i according to the vector [C _i , G _L ];

The vector [C _i , G _L ] is input to the neural network model, and the predicted answer vector P _{i is} output according to the neural network model. The neural network model is a traditional technology, for example, the convolutional neural network model published in the application publication number CN107562792A "A deep learning-based question answering matching method".

S35: calculating a predicted vector P _i and the standard answer sentence A negative Y _i squared error function as a reward R (s, a), i.e., R (s, a) = - (P i -Y i) 2.

The larger the value of the reward function R (s, a), the more the output actions meet the state requirements, that is, the more accurate the output keyword vector, in the next training, the action a will tend to the value of the reward function R (s, a) Large direction output, through the function of the reward function R (s, a), can make the reinforcement learning model filter out keywords that meet the requirements of the context, thereby improving the response accuracy of customer service robots.

In this embodiment, through the selection and adjustment of parameters in the reinforcement learning model, the most accurate keywords are continuously sought and combined with the following to obtain the most accurate answer, thereby improving the intelligence of the customer service robot.

In one embodiment, a computer device is provided. The computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 4. The computer equipment includes a processor, a memory, a network interface, a display screen, and an input device connected through a system bus. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for running the operating system and computer programs in a non-volatile storage medium. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by a processor to implement a keyword extraction method based on reinforcement learning. The display screen of the computer device may be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer device may be a touch layer covered on the display screen, or a button, a trackball or a touchpad provided on the computer device casing. , Or an external keyboard, trackpad, or mouse.

Those skilled in the art can understand that the structure shown in FIG. 4 is only a block diagram of a part of the structure related to the scheme of the present application, and does not constitute a limitation on the computer equipment to which the scheme of the present application is applied. The specific computer equipment may be Include more or fewer parts than shown in the figure, or combine certain parts, or have a different arrangement of parts.

Preprocess a corpus composed of multiple sets of dialog data;

Further, preprocessing the corpus composed of multiple sets of dialog data includes: establishing a correspondence table between word vectors and keywords, and performing question and answer sentences for all dialogues in the corpus according to the correspondence table between word vectors and keywords. Vector transformation. The i-th question in the nth group of dialogues is transformed into S _i , and the standard answer corresponding to the i-th question is transformed into vectors to obtain Y _i .

Further, vector conversion is performed on the questions and answers of all dialogues in the corpus, including: using Word2Vec tools to convert the questions of all dialogues in the corpus and the standard answers corresponding to the questions into vector form.

Further, the keyword memory is initialized to obtain grooves G _n Image Memory groove G _L, comprising: a memory for keywords grooves G _n and the length initialization vector initialization, including initialization of the length of the keyword memory slot length is set to G _n L, vector initialization includes setting the vector in the keyword memory slot G _n to 0, and obtaining the keyword memory slot G _L = [0,0, ..., 0].

Further, using the reinforcement learning model to perform multiple rounds of updating the keyword memory slot G _L to obtain the keyword memory slot G ′ _L includes:

Set the state transition probability P (s '| s, a) to 1, so that each time state s performs action a, a state transition can occur to obtain a new state s';

Calculate the reward function R (s, a);

Determine the output value of action a at the next training based on the value of the reward function R (s, a); and

Further, according to the value of the action of a current scan is determined whether the word as a keyword, comprising: an operation when a is 0, then the current scan as a key word is not C _i; when the operation is not a 0, depending on the current scan word C _i As keywords, and update the keyword memory slot G _L.

Further, the current scan word C _{i is} regarded as a keyword, and the keyword memory slot G _{L is} updated, including:

Further, calculating the reward function R (s, a) includes:

When the current scanning word C _i is a sentence ending word, vector stitching is performed between the current question S _i and the current keyword memory slot G _{L of} the nth group of dialogs to obtain [C _i , G _L ];

Output a predictive answer vector P _i according to the vector [C _i , G _L ];

Calculating a predicted vector P _i and answers A negative square error standard sentence Y _i as a reward function R (s, a), i.e., R (s, a) = - (P i -Y i) 2; and

When the current scan word C _i is not the end of a sentence, the reward function R (s, a) is 0.

Further, the prediction answer vector P _{i is} output according to the vector [S _i , G _L ], including:

The vector [C _i , G _L ] is input to the neural network model, and the predicted answer vector P _{i is} output according to the neural network model.

Further, it also includes:

Performing a pre-processing operation on the word vectors in the updated keyword memory slot G ′ _L to obtain keyword words. The anti-preprocessing operation includes: extracting the keyword words corresponding to the word vector according to the correspondence table between the word vectors and the keyword words;

A storage medium is characterized in that the storage medium stores a computer program, and when the computer program is executed by a processor, the following operations are implemented:

Preprocess a corpus composed of multiple sets of dialog data;

Further, vector conversion is performed on the questions and answers of all dialogues in the corpus, including: using Word2Vec tools to convert the questions of all dialogues in the corpus and the standard answers corresponding to the questions into a vector form.

Calculate the reward function R (s, a);

Further, calculating the reward function R (s, a) includes:

Output a predictive answer vector P _i according to the vector [C _i , G _L ];

Further, it also includes:

Performing a pre-processing operation on the word vector in the updated keyword memory slot G ′ _L to obtain a keyword word, and the anti-preprocessing operation includes: extracting a keyword word corresponding to the word vector according to a correspondence table between the word vector and the keyword word;

It can be understood that the same or similar parts in the above embodiments can be referred to each other. For the content that is not described in detail in some embodiments, refer to the same or similar content in other embodiments.

It should be noted that, in the description of the present application, the terms “first”, “second”, and the like are used for descriptive purposes only, and cannot be understood to indicate or imply relative importance. In addition, in the description of this application, unless otherwise stated, the meaning of "a plurality" means at least two.

Any process or method description in a flowchart or otherwise described herein can be understood as representing a module, fragment, or portion of code that includes one or more executable instructions for implementing the operation of a particular logical function or process And, the scope of the preferred embodiments of the present application includes additional implementations, in which the functions may be performed out of the order shown or discussed, including performing functions in a substantially simultaneous manner or in the reverse order according to the functions involved, which should It is understood by those skilled in the art to which the embodiments of the present application pertain. It should be understood that each part of the application may be implemented by hardware, software, firmware, or a combination thereof. In the above embodiments, a plurality of operations or methods may be implemented by software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it may be implemented using any one or a combination of the following techniques known in the art: Discrete logic circuits, application specific integrated circuits with suitable combinational logic gate circuits, programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.

Those of ordinary skill in the art may understand that all or part of the operations carried by the methods in the foregoing embodiments may be implemented by a program instructing related hardware. The program may be stored in a computer-readable storage medium. When the program is executed, Including one of the operations of the method embodiments or a combination thereof.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist separately physically, or two or more units may be integrated into one module. The above integrated modules can be implemented in the form of hardware or software functional modules. If the integrated module is implemented in the form of a software functional module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.

The aforementioned storage medium may be a read-only memory, a magnetic disk, or an optical disk.

In the description of this specification, the description with reference to the terms “one embodiment”, “some embodiments”, “examples”, “specific examples”, or “some examples” and the like means specific features described in conjunction with the embodiments or examples , Structure, materials, or features are included in at least one embodiment or example of the present application. In this specification, the schematic expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although the embodiments of the present application have been shown and described above, it can be understood that the above embodiments are exemplary and should not be construed as limitations on the present application. Those skilled in the art can interpret the above within the scope of the present application. Embodiments are subject to change, modification, substitution, and modification.

It should be noted that the present invention is not limited to the above-mentioned best embodiment. Those skilled in the art can derive other various forms of products under the inspiration of the present invention, but regardless of any change in shape or structure, any Technical solutions having the same or similar technical solutions as the present application all fall within the protection scope of the present invention.

Claims

A keyword extraction method based on reinforcement learning, including:

Preprocess a corpus composed of multiple sets of dialog data;

Establish a keyword memory slot G n for the n-th group of conversations in the corpus, where the keyword memory slot G n is used to record word vectors of multiple historical keywords of the n-th group of conversations;

Initialize the keyword memory slot G n to obtain the keyword memory slot G L ; and

Use a reinforcement learning model to perform multiple rounds of update on the keyword memory slot G L to obtain a keyword memory slot G ′ L , where the keyword memory slot G ′ L includes a word vector of a plurality of keywords extracted from the n-th group of conversations .
The method according to claim 1, wherein the preprocessing the corpus composed of a plurality of sets of dialog data comprises: establishing a correspondence table between a word vector and a keyword word, and corresponding to the keyword word according to the word vector Relation table performs vector transformation on all dialog questions and answers in the corpus, vector transformation of the i-th question in the nth group of dialogues to obtain S i , and vector transformation of standard answers corresponding to the i-th question obtain Y i.
The method according to claim 2, wherein performing vector conversion on the questions and answers of all conversations in the corpus comprises: using Word2Vec tool to convert the questions and answers of all conversations in the corpus. The standard answer sentence corresponding to the sentence is converted into vector form.
The method according to claim 1, wherein the initializing the keyword memory slot G n to obtain the keyword memory slot G L comprises: performing a length initialization and a vector initialization of the keyword memory slot G n . The length initialization includes setting the length of the keyword memory slot G n to L, and the vector initialization includes setting the vector in the keyword memory slot G n to 0 to obtain the keyword memory slot G L = [0,0 , ..., 0].
The method according to claim 1, wherein the step of updating the keyword memory slot G L by using a reinforcement learning model to obtain the keyword memory slot G ' L comprises:

Each word in the current question S i in the n-th group of dialogues is scanned sequentially from the beginning to the end of the sentence, and the stitching vector of the current scan word C i and the current keyword memory slot G L of the n-th group of dialogues is used as the state s, that is, s = [C i , G L ];

Take state s as input into the reinforcement learning model to obtain output action a, where action a is a positive integer with a value in the range [0, L];

Set the state transition probability P (s '| s, a) to 1, so that each time state s performs an action a, a state transition occurs to obtain a new state s';

Determine whether the current scanned word is a keyword according to the value of action a;

Calculate the reward function R (s, a);

Determine the output value of action a at the next training based on the value of the reward function R (s, a); and

The number of reinforcement learning training times is set to M, that is, the keyword memory slot G L is updated in an M round by using the reinforcement learning model to obtain the keyword memory slot G ′ L , and the keyword memory slot G ′ L includes an action a Output value.
The method according to claim 5, wherein said operation based on the value of a current scan is determined whether the word as a keyword, comprising: an operation when a is 0, then the current scan as a key word is not C i; when the operation a is not 0, the current scan word C i is regarded as a keyword, and the keyword memory slot G L is updated.
The method according to claim 6, wherein said current scan word C i regarded as keywords, and updates the keyword memory slot G L, comprising:

The current scan word C i is stored in the k-th position of the keyword memory slot G L , where k is a value output by the action a.
The method according to claim 5, wherein the calculating the reward function R (s, a) comprises:

When the current scanning word C i is a sentence ending word, vector stitching is performed between the current question S i and the current keyword memory slot G L of the nth group of dialogs to obtain [C i , G L ];

Output a predicted answer vector P i according to the vector [C i , G L ];

Calculating a predicted vector P i and answers A negative square error standard sentence Y i as a reward function R (s, a), i.e., R (s, a) = - (P i -Y i) 2; and

When the current scan word C i is not the end of a sentence, the reward function R (s, a) is 0.
The method according to claim 8, wherein the outputting a predicted answer vector P i according to the vector [S i , G L ] comprises:

The vector [C i , G L ] is input to a neural network model, and a predicted answer vector P i is output according to the neural network model.
The method according to any one of claims 1 or 2, further comprising:

After updating the keyword memory groove G 'L in the anti-pre-word vectors obtained in keyword words, the inverse preprocessing operation comprises: extracting a keyword word vectors corresponding word according to the corresponding keyword table word vector words ;

Alternatively, the keyword word vectors in the keyword memory slot G ′ L are stitched into the next question of the n-th group of dialogues to supplement the missing keyword information in the next question.
A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the one or more processors are executed. The following steps:

Preprocess a corpus composed of multiple sets of dialog data;

Establish a keyword memory slot G n for the n-th group of conversations in the corpus, where the keyword memory slot G n is used to record word vectors of multiple historical keywords of the n-th group of conversations;

Initialize the keyword memory slot G n to obtain the keyword memory slot G L ; and

Use a reinforcement learning model to perform multiple rounds of update on the keyword memory slot G L to obtain a keyword memory slot G ′ L , where the keyword memory slot G ′ L includes a word vector of a plurality of keywords extracted from the n-th group of conversations .
The computer device according to claim 11, wherein the preprocessing of a corpus composed of a plurality of sets of dialog data comprises: establishing a correspondence table between word vectors and keyword words, and according to the word vectors and keyword words Correspondence table for vector transformation of all dialog questions and answers in the corpus, vector transformation of the i-th question in the nth group of dialogues to obtain S i , and vectoring of standard answers corresponding to the i-th question converted to Y i.
The computer device according to claim 12, characterized in that the vector transforming the questions and answers of all conversations in the corpus comprises: using Word2Vec tool to convert the questions and relations of all conversations in the corpus with The standard answer corresponding to the question is converted into vector form.
The computer device according to claim 11, wherein the initializing the keyword memory slot G n to obtain the keyword memory slot G L comprises: performing a length initialization and a vector initialization of the keyword memory slot G n . The length initialization includes setting the length of the keyword memory slot G n to L, and the vector initialization includes setting the vector in the keyword memory slot G n to 0 to obtain the keyword memory slot G L = [0, 0, ..., 0].
The computer apparatus according to claim 11, wherein said reinforcement learning model using a keyword memory multiple rounds of grooves G L obtained keyword memory update groove G 'L, comprising:

Each word in the current question S i in the n-th group of dialogues is scanned sequentially from the beginning to the end of the sentence, and the stitching vector of the current scan word C i and the current keyword memory slot G L of the n-th group of dialogues is used as the state s, that is, s = [C i , G L ];

Take state s as input into the reinforcement learning model to obtain output action a, where action a is a positive integer with a value in the range [0, L];

Set the state transition probability P (s '| s, a) to 1, so that each time state s performs an action a, a state transition occurs to obtain a new state s';

Determine whether the current scanned word is a keyword according to the value of action a;

Calculate the reward function R (s, a);

Determine the output value of action a at the next training based on the value of the reward function R (s, a); and

The number of reinforcement learning training times is set to M, that is, the keyword memory slot G L is updated in an M round by using the reinforcement learning model to obtain the keyword memory slot G ′ L , and the keyword memory slot G ′ L includes an action a Output value.
The computer apparatus according to claim 15, wherein the determining whether the current scan word as a keyword, comprising the operation of a value: when the operation a is 0, C i is the current scan words as keywords not; if Action a is not 0, the current scan word C i is regarded as a keyword, and the keyword memory slot G L is updated.
The computer apparatus according to claim 16, wherein said current scan word C i regarded as keywords, and updates the keyword memory slot G L, comprising:

The current scan word C i is stored in the k-th position of the keyword memory slot G L , where k is a value output by the action a.
The computer device according to claim 15, wherein the calculating the reward function R (s, a) comprises:

When the current scanning word C i is a sentence ending word, vector stitching is performed between the current question S i and the current keyword memory slot G L of the nth group of dialogs to obtain [C i , G L ];

Output a predicted answer vector P i according to the vector [C i , G L ];

Calculating a predicted vector P i and answers A negative square error standard sentence Y i as a reward function R (s, a), i.e., R (s, a) = - (P i -Y i) 2; and

When the current scan word C i is not the end of a sentence, the reward function R (s, a) is 0.
The computer device according to claim 18, wherein the outputting a predicted answer vector P i according to the vector [S i , G L ] comprises:

The vector [C i , G L ] is input to a neural network model, and a predicted answer vector P i is output according to the neural network model.
The computer device according to claim 15 or 16, further comprising:

After updating the keyword memory groove G 'L in the anti-pre-word vectors obtained in keyword words, the inverse preprocessing operation comprises: extracting a keyword word vectors corresponding word according to the corresponding keyword table word vector words ;

Alternatively, the keyword word vectors in the keyword memory slot G ′ L are stitched into the next question of the n-th group of dialogues to supplement the missing keyword information in the next question.
A storage medium stores a computer program. When the computer program is executed by a processor, the following operations are implemented:

Preprocess a corpus composed of multiple sets of dialog data;

Establish a keyword memory slot G n for the n-th group of conversations in the corpus, where the keyword memory slot G n is used to record word vectors of multiple historical keywords of the n-th group of conversations;

Initialize the keyword memory slot G n to obtain the keyword memory slot G L ; and

Use a reinforcement learning model to perform multiple rounds of update on the keyword memory slot G L to obtain a keyword memory slot G ′ L , where the keyword memory slot G ′ L includes a word vector of a plurality of keywords extracted from the n-th group of conversations .
The storage medium according to claim 21, wherein the preprocessing the corpus composed of a plurality of sets of dialog data comprises: establishing a correspondence table between word vectors and keyword words, and according to the word vectors and keyword words Correspondence table for vector transformation of all dialog questions and answers in the corpus, vector transformation of the i-th question in the nth group of dialogues to obtain S i , and vectoring of standard answers corresponding to the i-th question converted to Y i.
The storage medium according to claim 22, wherein performing vector transformation on the questions and answers of all conversations in the corpus comprises: using Word2Vec tool to convert the questions and relations of all conversations in the corpus with The standard answer corresponding to the question is converted into vector form.
The storage medium according to claim 21, wherein the initializing the keyword memory slot G n to obtain the keyword memory slot G L comprises: performing length initialization and vector initialization of the keyword memory slot G n , and The length initialization includes setting the length of the keyword memory slot G n to L, and the vector initialization includes setting the vector in the keyword memory slot G n to 0 to obtain the keyword memory slot G L = [0, 0, ..., 0].
The storage medium according to claim 21, wherein the step of updating the keyword memory slot G L by using a reinforcement learning model to obtain the keyword memory slot G ' L comprises:

Each word in the current question S i in the n-th group of dialogues is scanned sequentially from the beginning to the end of the sentence, and the stitching vector of the current scan word C i and the current keyword memory slot G L of the n-th group of dialogues is used as the state s, that is, s = [C i , G L ];

Take state s as input into the reinforcement learning model to obtain output action a, where action a is a positive integer with a value in the range [0, L];

Set the state transition probability P (s '| s, a) to 1, so that each time state s performs an action a, a state transition occurs to obtain a new state s';

Determine whether the current scanned word is a keyword according to the value of action a;

Calculate the reward function R (s, a);

Determine the output value of action a at the next training based on the value of the reward function R (s, a); and

The number of reinforcement learning training times is set to M, that is, the keyword memory slot G L is updated in an M round by using the reinforcement learning model to obtain the keyword memory slot G ′ L , and the keyword memory slot G ′ L includes an action a Output value.
The storage medium according to claim 25, wherein the determining whether the current scan word as a keyword, comprising the operation of a value: when the operation a is 0, C i is the current scan words as keywords not; if Action a is not 0, the current scan word C i is regarded as a keyword, and the keyword memory slot G L is updated.
The storage medium according to claim 26, wherein said current scan word C i regarded as keywords, and updates the keyword memory slot G L, comprising:

The current scan word C i is stored in the k-th position of the keyword memory slot G L , where k is a value output by the action a.
The storage medium according to claim 25, wherein the calculating the reward function R (s, a) comprises:

When the current scanning word C i is a sentence ending word, vector stitching is performed between the current question S i and the current keyword memory slot G L of the nth group of dialogs to obtain [C i , G L ];

Output a predicted answer vector P i according to the vector [C i , G L ];

Calculating a predicted vector P i and answers A negative square error standard sentence Y i as a reward function R (s, a), i.e., R (s, a) = - (P i -Y i) 2; and

When the current scan word C i is not the end of a sentence, the reward function R (s, a) is 0.
The storage medium according to claim 28, wherein the outputting a predicted answer vector P i according to the vector [S i , G L ] comprises:

The vector [C i , G L ] is input to a neural network model, and a predicted answer vector P i is output according to the neural network model.
The storage medium according to claim 25 or 26, further comprising:

After updating the keyword memory groove G 'L in the anti-pre-word vectors obtained in keyword words, the inverse preprocessing operation comprises: extracting a keyword word vectors corresponding word according to the corresponding keyword table word vector words ;

Alternatively, the keyword word vectors in the keyword memory slot G ′ L are stitched into the next question of the n-th group of dialogues to supplement the missing keyword information in the next question.