CN116501859B

CN116501859B - Paragraph retrieval method, equipment and medium based on refrigerator field

Info

Publication number: CN116501859B
Application number: CN202310752492.9A
Authority: CN
Inventors: 刘昊; 夏祎敏; 马坚; 魏志强; 孔令磊; 曾谁飞; 李桂玺; 张景瑞
Original assignee: Ocean University of China; Qingdao Haier Refrigerator Co Ltd
Current assignee: Ocean University of China; Qingdao Haier Refrigerator Co Ltd
Priority date: 2023-06-26
Filing date: 2023-06-26
Publication date: 2023-09-01
Anticipated expiration: 2043-06-26
Also published as: CN116501859A

Abstract

The invention relates to a paragraph retrieval method, equipment and medium based on the field of refrigerator, belonging to the field of natural language processing paragraph retrieval, wherein the method comprises the steps of using a cross training method for a dual task in model training of transfer learning; a fluency rewarding mechanism and a data filtering method based on target performance rewards are introduced on a problem generating model, and the method uses a cross joint training problem generating and paragraph retrieval model to reduce the overfitting of the model in the source field. According to the method, the problem generation based on the fluency rewarding mechanism is introduced, the quality of the generated problem is truly improved from the practical point of view, and meanwhile, the data filtering method based on the target performance rewarding is introduced, so that the adaptability of the QG and IR model in the field of refrigerators is further improved.

Description

Paragraph retrieval method, equipment and medium based on refrigerator field

Technical Field

The invention belongs to the field of natural language processing paragraph retrieval, and particularly relates to a paragraph retrieval method, device, equipment and medium based on the field of refrigerators.

Background

In modern households, refrigerators are one of indispensable home appliances. It provides people with important functions of food storage and fresh keeping, and ensures that food is kept fresh and safe. However, the use of refrigerators is not known to the best and involves various complex aspects such as correct storage temperature, classification of food materials, fresh keeping skills, etc. For many people, they often face some questions and problems when using the refrigerator. Solving these problems, providing accurate and useful guidance is critical to ensuring food safety and improving use experience. In this case, the paragraph retrieval task in the refrigerator field becomes particularly important. Paragraph retrieval is an information retrieval technique that helps users quickly obtain the required knowledge and guidance by locating and extracting relevant paragraphs from a large volume of text. In the field of refrigerators, accurate and authoritative guidance can be provided through paragraph search, and various questions about refrigerator use by a user can be answered. This technique is not merely for convenience and practicality, but more importantly to ensure food safety and health. The necessity of paragraph retrieval task in the field of refrigerator stands out an important role in satisfying the user's needs and solving the actual problem. Through accurate search and relevant text provision, the user can obtain authoritative guidance about food storage, temperature control, food preservation skills and the like. This not only helps the user to properly use the refrigerator, but also avoids food waste and food safety problems. Therefore, the paragraph search task in the refrigerator field has important necessity for improving user experience, guaranteeing food safety and promoting development of refrigerator technology.

The difficulty of paragraph search task (IR) in the refrigerator field compared to other fields is that the refrigerator field involves various aspects such as storage of food materials, temperature control, food preservation skills, etc. Knowledge and information in this area is often very extensive and complex, requiring specific guidance and advice for different situations and needs. In addition, the data in the refrigerator field may be relatively small, especially disclosing a large-scale data set that is available. This can lead to challenges with data starvation in building paragraph retrieval models. The training of the paragraph retrieval model is completed by generating rich and accurate domain questions through the question generation task. Question Generation (QG) is a task of automatically generating questions from various inputs such as original text, databases, or semantic representations. People have the ability to raise rich, creative and heuristic problems; for example, the problem: how can the longan be stored in a refrigerator keep fresh and sweet? Paragraph: the longan is placed in a transparent sealed container to ensure that the container is well sealed to prevent ingress of oxygen and moisture. The container is then placed in a refrigerated compartment, maintaining a temperature of about 4 degrees celsius. The fresh-keeping period of the longan can be prolonged by refrigerating, and the deterioration of the longan caused by the too high temperature is avoided. It is recommended to eat longan in 3 to 5 days to maintain its fresh taste. How to give a problem creation model the ability to put forth a problem under various input conditions and to grammar the correct problem is a challenging task.

In the refrigerator field, collecting the tag data for the question generation and paragraph retrieval tasks requires a field expert, and thus the cost of constructing the supervision model is high. By utilizing models trained in other fields where the marking data is easy to acquire, the transfer learning avoids the limitation of insufficient marking data in the refrigerator field. Conventional QG and IR tasks employ a self-training approach in transfer learning in which a pre-training model is given that can perform the task of interest in the source domain and label-free data from the refrigerator domain, the pre-training model being used to predict labels for the refrigerator domain data. The pre-trained model is further trained on the composite data to adapt to the new domain (this step is also referred to as domain-adaptive tuning). Although self-training improves performance in the refrigerator field, the trimmed self-training model may be overfitted due to validation bias.

Disclosure of Invention

Aiming at the technical problems, the invention provides a paragraph retrieval method, a device, equipment and a medium based on the field of refrigerators. The method considers the dual of QG and IR tasks and proposes cross joint training of two models to reduce the overfitting of the models in the source field. The input data for QG and IR are paragraphs, questions, respectively (the input data does not need to be aligned). The high-quality synthetic data pairs are generated through QG and IR, the high-quality data pairs are manually selected for training the data value estimator model for generating and estimating the fluency rewarding mechanism and the problem, and the technical problem that the training of the fluency rewarding model and the problem generating data value estimator model cannot be completed due to the fact that a label data set is lacking in the field of refrigerators is solved. The invention introduces the problem generation based on the fluency rewarding mechanism, and truly improves the quality of the generated problem from the practical point of view. In addition, the invention introduces a data filtering method based on target performance rewards, and further improves the adaptability of the QG and IR models in the field of refrigerators.

The invention is realized by the following technical scheme:

a paragraph retrieval method based on the field of refrigerators, the method comprising:

step one, a cross training method is used for model training of transfer learning aiming at a dual task; according to the cross training method, a problem generation model (QG) and a paragraph retrieval model (IR) are used, paragraphs and related problems of knowledge in the field of refrigerators are collected to serve as training data of the problem generation model (QG) and the paragraph retrieval model (IR), the training data collected manually do not need to be aligned, and finally synthetic data are obtained, wherein the synthetic data are higher in data quality compared with the traditional self-training method;

step two, introducing a fluency rewarding mechanism on the problem generating model; the problem generating model generates a problem by using a basic model, and manually selecting high-quality data from the relatively-quality synthetic data obtained in the first step as training data of a fluency rewarding mechanism, wherein the fluency rewarding mechanism is used for evaluating the fluency of the basic model generating problem; then, fine tuning is carried out on the basic model under the reinforcement learning framework by optimizing a fluency rewarding mechanism;

step three, a data filtering method based on target performance rewards; manually selecting high-quality data from the relatively quality synthesized data obtained in the step one as training data of the data value estimator model, recording whether the answer is correct or not and the information including the time used when the paragraph retrieval model answers the questions, and transmitting the information as feedback to the data value estimator model; the parameters of the data value estimator model are adjusted by directly feeding back information of the paragraph retrieval model, so that the value of data is estimated better, the performance of the problem generation and paragraph retrieval model is improved, and the adaptability of the migration model in the field of refrigerators is further improved.

Further, the first step: QG and IR have dual properties, QG uses BART model; IR uses a pre-trained dense channel retriever DPR that uses BERT double encoders to separately address problemsSum paragraphCoding and training to maximize codingAnddot product between, while minimizing the similarity of other closely related but negative paragraphs; for QG, there is an unlabeled problem in the refrigerator domain, its dual task IR retrieves their corresponding input paragraphs from the refrigerator domain, the resulting problem paragraph pairs are added to the synthetic data of the QG for fine tuning the QG; for IR, there are unlabeled paragraphs in the refrigerator field, the QG generates their input questions, and the generated question-paragraph pairs are added to the IR's synthetic data for fine-tuning the IR.

Further, the basic model is a BART model.

Further, the second step is to pretrain a language model firstThen, the problem generated in the first step is processedFluency rewards of (2)Is defined as being composed ofNegative confusion of the evaluation, expressed as:

(2)；

to optimize the fluency rewarding mechanism in training, a defined loss functionThe following are provided:

(3)；

wherein the method comprises the steps ofIs a problem of predictionIs the t-th mark in the question generator, which is the vocabulary distribution specified from the decoder of the question generatorSampling in the above; t is a problem of predictionTogether T markers;is a predefined negative confusion that is used as a baseline reward in reinforcement learning algorithms for stabilizing the training process.

Further, the value of the data is estimated in the third step: connecting the questions, corresponding answer paragraphs and context as inputs to a data value evaluator model, and encoding the sequence using BERT;

(4)；

wherein, the liquid crystal display device comprises a liquid crystal display device,c represents answer paragraphs, questions and context, respectively;representing slave'"hidden representation of tag derived input sequence;is a special marker used as a separator.

The invention relates to a paragraph retrieval device based on the field of refrigerators, which comprises a model training module based on transfer learning of a cross training method, a problem generation model module based on a fluency rewarding mechanism and a data filtering module based on target performance rewarding; the model training module based on the transfer learning of the cross training method runs the method of the step one; the problem generating model module based on fluency rewards operates the method of the step two, and the data filtering module based on target performance rewards operates the method of the step three.

The present invention also provides a computer readable storage medium storing a computer program adapted to be loaded by a processor and to perform the paragraph retrieving method based on the refrigerator field.

Compared with the prior art, the invention has the beneficial effects that: (1) Because the current refrigerator field lacks a public problem data set for training a paragraph retrieval model, the invention firstly introduces a problem generation model generation problem. Second, currently, the QG and IR are typically fine-tuned in a self-training manner under the migration learning, which may be affected by the confirmation bias, resulting in the over-fitting problem. The invention provides a method for generating a cross training problem and paragraph retrieval model based on transfer learning, which combines QG and IR training, thereby improving the problem caused by a self-training method;

(2) Because the refrigerator field lacks public marking data pairs, the field expert marking data needs to consume a great deal of cost. The cross training method based on transfer learning not only improves the problem of over fitting, but also enables the model training to generate data pairs with higher quality only by arranging misaligned paragraphs and problems, thereby greatly saving the expense of data marking. In addition, according to the synthetic data generated by the models, high-quality data are manually selected again for training of the fluency rewarding model and the data value evaluator, so that the problem that the fluency rewarding model and the data value evaluator model lack training data in the field of refrigerators is solved, and the performance of the two models in the field of refrigerators is better;

(3) Because of the evaluation index of the conventional question generation task, the evaluation is limited to evaluating the similarity and the coincidence of the generated text and the answer. Therefore, the invention adopts the correctness of the grammar and logic of the question sentence frequently cited in the question quality evaluation by human beings to train the related model aiming at the index. Aiming at the lack of optimization of sentence grammar correctness of the traditional problem generation model, the invention introduces a problem generation model based on a fluency rewarding mechanism;

(4) While the prior art consistency evaluator filtered out some low quality data, the prior art evaluator confidence threshold setting is not adaptable to the refrigerator field. The invention provides a data filtering method based on target performance rewards, so that the adaptability of QG and IR models in the field of refrigerators is further improved.

Drawings

FIG. 1 is a flow chart of step one of the present invention;

FIG. 2 is a flow chart of the second step of the invention;

fig. 3 is a flow chart of the third invention step.

Detailed Description

The present invention will be further described with reference to specific embodiments thereof, wherein it is apparent that the embodiments described are only some, but not all, of the embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1: a paragraph retrieval method based on the field of refrigerators comprises the following specific steps:

step one, a cross training method is used for model training of transfer learning aiming at a dual task; according to the cross training method, a problem generation model (QG) and a paragraph retrieval model (IR) are used, paragraphs and related problems of knowledge in the field of refrigerators are collected to serve as training data of the problem generation model (QG) and the paragraph retrieval model (IR), the training data collected manually do not need to be aligned, and finally high-quality synthetic data are obtained;

step two, introducing a fluency rewarding mechanism on a problem generating model, wherein the problem generating model generates a problem by using a basic model, and manually selecting high-quality data from the composite data with higher quality obtained in the step one as training data of the fluency rewarding mechanism, wherein the fluency rewarding mechanism is used for evaluating the fluency of the basic model generating problem; then, fine tuning is carried out on the basic model under the reinforcement learning framework by optimizing a fluency rewarding mechanism;

step three, a data filtering method based on target performance rewards; manually selecting high-quality data from the relatively quality synthetic data obtained in the step one as training data of the data value estimator model, recording whether the answer is correct or not and the information including the time used when the paragraph retrieval model answers the questions, and transmitting the information as feedback to the data value estimator model; the parameters of the data value estimator model are adjusted by directly feeding back information of the paragraph retrieval model, so that the value of data is estimated better, the performance of the problem generation and paragraph retrieval model is improved, and the adaptability of the migration model in the field of refrigerators is further improved.

The method specifically comprises the following steps:

step one, QG causesUsing the BART model. IR uses a pre-trained dense channel retriever DPR that uses BERT double encoders to separately address problemsSum paragraphCoding and training to maximize codingAnddot product between them while minimizing the similarity of other closely related but negative going paragraphs. For QGs, there are unlabeled questions in the target domain, whose dual task IR can retrieve their corresponding input paragraphs from the target domain, and the resulting question paragraph pairs are added to the synthetic data of the QG for fine tuning the QG. For IR, there are unlabeled paragraphs in the target field, the QG can generate their input questions, the generated question paragraph pairs being added to the IR's synthetic data for fine-tuning the IR. Notably, during the self-training process, the QG model learns the composite data of the QG model itself, and the IR model is the same. In the self-training data, input data is distributed sampling from a target domain, and output data is a prediction result of the noise label. Therefore, the two models learn the output data of the noisy label, thereby causing the problem that the models are over-fitted in the refrigerator field. In the cross training process, the training data of the QG model is the synthetic data of the IR model, and the training data of the IR model is the synthetic data of the QG model. For training data of the QG model, its input data is output data of the IR model noisy labels, and its output data is input data of the IR model distributed samples from the target domain. Therefore, in cross training, the QG model learns that there is output data of the correct label, and the IR model is the same. This is in contrast to the training data labels of the self-training method. Therefore, the cross training method can reduce the learning of the model on the noisy output label and improve the problem of the over fitting of the QG and IR models.

Step two, as shown in FIG. 2, the question generator uses the BART model to give a paragraphAs input, the goal is to generate a related questionThe question may be defined by paragraphsThe answer. This can be expressed as maximizing conditional probability:

(1)，

Wherein the method comprises the steps ofIs a problem of generationT-th tag of (c), and q<t represents a previously decoded flag, i.e,…,. An overall framework for a problem creation model based on a fluency rewards mechanism is shown in fig. 2.

The present embodiment designs a fluency rewarding mechanism aimed at evaluating the fluency of the problem generated by the basic model. The BART model is then fine-tuned under the reinforcement learning framework by optimizing the fluency rewards mechanism. Next, the present invention describes in detail the design of the fluency bonus mechanism.

Under training a good Language Model (LM), the confusion of a sentence is generally regarded as a good indicator of its fluency. Thus, the present embodiment introduces language model-based rewards to enhance the generation of questionsIs a fluid property of (3). First pre-training a language modelThen, the generated problemFluency rewards of (2)Is defined as being composed ofNegative confusion of the evaluation, expressed as:

(2)，

to optimize fluency rewards in training, defined loss functionsThe following are provided:

(3)，

wherein the method comprises the steps ofIs a problem of predictionIs the t-th mark in the question generator, which is the vocabulary distribution specified from the decoder of the question generatorIs sampled.Is a predefined negative confusion degree, which is used as a benchmark reward in the reinforcement learning algorithm for stabilizing the training process, the language model used in the present embodimentIs the BART model.

First, the problem generating model is pre-trained by minimum cross entropy loss and copy error loss, and the losses are combined into；

，

By linear combinationAnd a reinforcement learning based loss functionFine-tuning use of combined loss functions of (a)A basic QG model is trained to maximize the fluency rewards for QGs previously defined. The method is specifically as follows:

where L represents the combined loss function.

Step three, as shown in fig. 3, a data value estimator model is designed on the basis of the step two and is recorded asIt receives a synthetic question-answer example；（c _u For refrigerator field context, p _u Is a paragraph of the field of refrigerators,to be a generated problem) and outputs a score representing its "value", i.e. This "value" may represent "potential for improving the retrieval performance of the refrigerator domain paragraph when used as a training sample". With this score, the most useful synthetic example in the refrigerator domain paragraph retrieval training can be selected. The present embodiment uses the BERT model as the basis for the problem value evaluator. Specifically, the questions, corresponding answer paragraphs and context are connected as inputs to a question value evaluator, and the sequences are encoded using BERT;

(4)，

wherein, the liquid crystal display device comprises a liquid crystal display device,and c represent answer paragraphs, questions and contexts, respectively.Representing slave'"hidden representation of the tag derived input sequence".Is a special marker used as a separator.

Probability of answer paragraphs (start index and end index) given by a pre-trained paragraph retrieval modelAs an additional feature is added to the hidden representation h, speeding up the training convergence of the data value estimator model and improving performance. Thus, these two features are combinedLinear transformations added to the original hidden representation, then constructing a linear classifier to output the values of the problem;

(5)，

(6)，

(7)，

wherein H represents the use of BERT model from "<CLS>"dimension of hidden representation of token-derived input sequence". H1, H2, H3, H4 represent dimensions of the intermediate hidden representation in QVE (question value estimator), W1, W2, W3, W4: trainable weight parameters representing a linear layer;，，， the method comprises the steps of carrying out a first treatment on the surface of the b1, b2, b3, b4: trainable bias parameters representing the linear layer,，，，is a trainable parameter of the linear layer.

The reward for the data value evaluator model is based on the performance improvement that the selected sample brings in training the IR model in the refrigerator domain. For this purpose, IR models are modeled on selected batches of samples based on cross entropy lossFine tuning of (c).

RewardsIs defined as the IR model on a selected batch of samples Pt in the refrigerator field before trimming) And after fine tuning%) Performance gain of (2);

(8)。

whereas the problem selection process is discrete and non-differentiable, reinforcement learning is used to update the data value estimator model. Mathematically, the goal is to minimize the following expression:

(9) ，

representing a loss function, the goal being to minimize the loss; e: representing the expected value, representing the expected operation on the following expression. S: representing selected questions, representing slave strategiesA problem of selection in the above.The strategy representing the problem selection, γ as a parameter, D as input data.

After the problem value estimator model is trained, it can be used to calculate the problem values for all synthetic problems in the refrigerator domain. Then, the embodiment selects the first K% of the synthesized data pair as the training corpus to train the refrigerator domain IR model. The particular value of the highest K% depends on the particular setting and requirements. This value is adjusted according to the actual situation and the requirements of the refrigerator field. In general, the highest K% of the synthesis problem-paragraph pairs are chosen as the training corpus in order to screen out relatively high quality samples, to avoid introducing low quality synthesis problems into IR model training in the refrigerator field. How much K% is specifically chosen depends on the number and quality of the balance training samples required, and can generally be adjusted and selected based on the performance of the experimental and validation set. In the field of refrigerators, the K% value selected by the invention is 30% for guaranteeing user experience due to the field specificity.

According to the problem value evaluation model, by considering the optimization target of the downstream IR model performance, the problem value evaluation model can select more useful problems, so that the IR model in the refrigerator field is improved.

The problem value assessment model is typically improved more when there are more annotated (problem-paragraph) pairs of data. This is because the problem value assessment model training (using reinforcement learning) relies on IR feedback based on the available annotation pairs. With more annotation pairs, the feedback may be more accurate, resulting in a better problem value assessment model to select a more useful synthetic problem. The invention is based on the step one, which completes the generation of a large number of high quality synthetic data pairs. By this part of the synthetic data, high quality available annotation pairs are manually picked up, thus making the IR model feedback more accurate and the problem selection more valuable.

After three steps, the invention constructs a problem generation model based on a fluency rewarding mechanism, a filter based on target performance rewarding optimization and a paragraph retrieval model. First we generate synthetic data pairs using a problem generation model based on a fluency rewards mechanism. Second, we filter out high scoring data using a filter optimized based on the target performance rewards. The partial data is then used as training data for the paragraph retrieval model. Finally, the synthesized data of the paragraph retrieval model is used as training data of the model generated based on the problem of the fluency rewarding mechanism, and the iterative loop training is carried out, so that the common improvement of the performance of the model is finally realized.

Claims

1. A paragraph retrieval method based on the field of refrigerators is characterized by comprising the following steps:

step one, a cross training method is used for model training of transfer learning aiming at a dual task; the cross training method uses a problem generation model and a paragraph retrieval model, wherein the problem generation model is abbreviated as QG, the paragraph retrieval model is abbreviated as IR, paragraphs of knowledge in the field of refrigerators and related problems are collected to serve as training data of the problem generation model and the paragraph retrieval model, the training data do not need to be manually aligned, and finally synthetic data are obtained;

QG and IR have dual properties, QG uses BART model; IR uses a pre-trained dense channel searcher DPR, QG uses a BERT double encoder to encode the question q and the paragraph p, respectively, and the BERT double encoder is trained to maximize the code E _P (p) and E _Q (q) similarity between closely related but negative going paragraphs while minimizing similarity of other closely related but negative going paragraphs; for QG, there is an unlabeled problem in the refrigerator field, its dual task IR retrieves the input paragraph corresponding to the unlabeled problem from the refrigerator field, the generated question paragraph pair is added to the synthetic data of QG for fine tuning the QG; for IR, there are unlabeled paragraphs in the refrigerator field, the QG generates their input questions, the generated question paragraph pairs are added to the IR's synthetic data for fine-tuning the IR;

step two, introducing a fluency rewarding mechanism on the problem generating model; the problem generating model generates a problem by using a basic model, and in the step one, high-quality data is manually selected as training data of a fluency rewarding mechanism, wherein the fluency rewarding mechanism is used for evaluating the fluency and grammar correctness of the basic model generating problem; then, fine tuning is carried out on the basic model under the reinforcement learning framework by optimizing a fluency rewarding mechanism;

first, a language model p is pre-trained _LM Then, the fluency rewards R of the questions q generated in the step one are awarded _flu Is defined as p _LM Negative confusion of the evaluation, expressed as:

to optimize the fluency rewarding mechanism in training, a defined loss function L _flu The following are provided:

wherein q _t Is the t-th marker of the generated question q,is a predictive question->Is the t-th mark in the question generator, which is the vocabulary distribution P specified from the decoder of the question generator _QG (q _t |p,q _<t ) Sampling in the above; t indicates problem of prediction->Together with T tags, alpha _flu Is a predefined negative confusion, which is used as a benchmark reward in reinforcement learning algorithms for stabilizing the training process;

step three, a data filtering method based on target performance rewards; manually selecting high-quality data from the high-quality synthesized data obtained in the step one as training data of the data value estimator model, recording information including whether the answer is correct or not and the time used when the paragraph retrieval model answers the questions, and transmitting the information as feedback to the data value estimator model; using the directly fed back information of the paragraph retrieval model to adjust the parameters of the data value evaluator model;

the method for estimating the value of the data by the data value estimator model is to connect the questions, corresponding answer paragraphs and context as the input of the data value estimator model and encode the sequence by using BERT;

h＝BERT[<CLS>q<ANS>p<sep>c] (4)；

wherein p, q, c represent answer paragraphs, questions, and contexts, respectively; h E R ^H Representing slave'<CLS>"hidden representation of tag derived input sequence;<ANS>、<sep>is a special marker used as a separator.

2. The method for retrieving paragraphs based on the refrigerator field of claim 1, wherein the basic model is a BART model.

3. The device is characterized by comprising a model training module based on transfer learning of a cross training method, a problem generating model module based on a fluency rewarding mechanism and a data filtering module based on target performance rewarding; the model training module based on the transfer learning of the cross training method runs the method of the step one in the paragraph retrieval method based on the refrigerator field according to any one of claims 1-2; the problem generating model module based on fluency rewards operates the method of the step two in the paragraph retrieving method based on the refrigerator field according to any one of claims 1-2, and the data filtering module based on target performance rewards operates the method of the step three in the paragraph retrieving method based on the refrigerator field according to any one of claims 1-2.

4. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded by a processor and to perform the refrigerator domain based paragraph retrieval method according to any of the claims 1-2.