CN114372478A

CN114372478A - Knowledge distillation-based question and answer method, terminal equipment and storage medium

Info

Publication number: CN114372478A
Application number: CN202111487499.XA
Authority: CN
Inventors: 洪万福; 李晓昊
Original assignee: Xiamen Yuanting Information Technology Co ltd
Current assignee: Xiamen Yuanting Information Technology Co ltd
Priority date: 2021-12-07
Filing date: 2021-12-07
Publication date: 2022-04-19

Abstract

The invention relates to a question-answering method based on knowledge distillation, a terminal device and a storage medium, wherein the method comprises the following steps: s1: collecting question and answer data of a demand field to form a training set; s2: extracting the characteristics of each training data in the training set; s3: constructing and training a fine model with higher network complexity through a training set to generate a teacher network model; s4: constructing a fine ranking model with lower network complexity, training through a training set based on knowledge distillation and a teacher network model, and generating a fine ranking student network model; s5: constructing a rough ranking model with lower network complexity, and training through a training set based on knowledge distillation and a teacher network model to generate a rough ranking student network model; s6: after the characteristics of the question to be answered are extracted, corresponding answers are searched from a question-answer library through a rough arrangement student network model and a fine arrangement student network model in sequence. The invention can improve the efficiency of the question answering system.

Description

Knowledge distillation-based question and answer method, terminal equipment and storage medium

Technical Field

The invention relates to the field of automatic question answering, in particular to a question answering method based on knowledge distillation, terminal equipment and a storage medium.

Background

With the rapid development of information technology, the increasing amount of network information makes it difficult for users to quickly find desired content from the large amount of information returned by search engines. People's demand for fast and accurate information acquisition is also increasing, and automatic question answering systems are also in force. The automatic question-answering system provides a natural language question-answering communication mode for people, provides required answers for users directly instead of related webpages, and has the characteristics of convenience, quickness, high efficiency and the like. Is deeply loved by users.

Two necessary links for constructing an FAQ question-answering system based on the vertical field are key information analysis and similarity calculation, namely element identification, of the questions, wherein the key information analysis comprises the following steps: extracting keywords and carrying out deep semantic analysis. The large-scale FAQ question-answering system in the vertical field usually has a large enough question answer library, namely a corpus, and has low use complexity, simple structure, low accuracy of small models and poor automatic answering effect; the effect requirement can be met by using a large model with high complexity, but the large-scale parameters are large in quantity, long reasoning and calculation time is needed, the prediction speed is slow, the efficiency of a question-answering system is influenced, and the production and deployment requirements are not met.

Disclosure of Invention

In order to solve the above problems, the present invention provides a question answering method based on knowledge distillation, a terminal device and a storage medium.

The specific scheme is as follows:

a question-answering method based on knowledge distillation, comprising the following steps:

s1: collecting question and answer data of a demand field to form a training set;

s2: extracting features of each training data in the training set, wherein the extracted features comprise keyword features and semantic information features;

s3: constructing and training a fine model with higher network complexity through a training set, and taking the trained model as a teacher network model;

the fine ranking model is used for extracting the similarity between semantic information features of the two question sentences;

s4: constructing a fine ranking model with lower network complexity, training the fine ranking model with lower complexity through a training set based on knowledge distillation and a teacher network model, and taking the trained model as a fine ranking student network model;

s5: constructing a rough model with lower network complexity, training the rough model with lower complexity through a training set based on knowledge distillation and a teacher network model, and taking the trained model as a rough student network model;

the rough model is used for extracting the similarity between the keyword characteristics of the two question sentences;

s6: after the characteristics of the question to be answered are extracted, corresponding answers are searched from a question-answer library through a rough arrangement student network model and a fine arrangement student network model in sequence.

Further, the training data in the training set is composed of standard questions and similar questions similar to the standard questions.

Further, step S1 includes preprocessing the collected question and answer data, where the preprocessing includes: extracting standard questions, obtaining answers, cleaning abnormal data and cleaning punctuation marks.

Further, the extraction of semantic information features adopts Bert based on a transform bidirectional encoder representation model.

Further, loss function L in the process of training student network model based on knowledge distillation and teacher network model_StudentComprises the following steps:

L_Student＝H(y,f(x))+λ*||z_t-z_s||²

wherein H represents a cross entropy loss function, y represents a real answer, f (x) represents a prediction output of a student network model, λ represents a distillation loss influence degree adjusting parameter, and z represents a distillation loss influence degree adjusting parameter_tLogs, z representing teacher network model output_sLogs representing the student network model output.

A knowledge-distillation-based question answering terminal device comprises a processor, a memory and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the method of the embodiment of the invention.

A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as described above for an embodiment of the invention.

According to the technical scheme, knowledge distillation is used, the students with low complexity learn the sequencing ability and the semantic understanding ability of the teacher network through the network, the teacher network guides the student network to train, the problem complexity of the student network is reduced, and the student network model which is good, small and fast is obtained finally. The method has important application value for promoting the deployment of a large-scale FAQ question-answering system on edge equipment and improving the efficiency of the question-answering system caused by the improvement of the attribute identification and inference speed.

Drawings

Fig. 1 is a flowchart illustrating a first embodiment of the present invention.

Detailed Description

To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. Those skilled in the art will appreciate still other possible embodiments and advantages of the present invention with reference to these figures.

The invention will now be further described with reference to the accompanying drawings and detailed description.

The first embodiment is as follows:

the embodiment of the invention provides a question-answering method based on knowledge distillation, as shown in figure 1, the method comprises the following steps:

s1: and collecting question and answer data of the demand field to form a training set.

The question-answer data can be collected by crawling question-answer linguistic data of a real service scene, and then manually or semi-manually generating question-answer data pairs. In this embodiment, 150W question-answer data pairs, i.e. 150W standard questions and corresponding standard question answers, are collected.

Because 150 pieces of question-answering data in the intelligent question-answering field are not enough to support the intelligent question-answering in a certain field, in order to be able to complement the data amount automatically, in this embodiment, preferably, 750 pieces of similar questions are generated by using a similar data generation module according to 150 pieces of standard questions, and the similar questions and the 150 pieces of standard questions together form a training set. The similar data generation module only needs to adopt the existing technology, and is not described in detail herein.

The embodiment also comprises the step of preprocessing the collected question answering data, wherein the specific preprocessing process comprises the following steps: extracting standard questions, obtaining answers, cleaning abnormal data, cleaning punctuation marks and the like. After 900W pieces of training data are preprocessed, 850W pieces of training data are obtained.

S2: and extracting the features of each training data in the training set, wherein the extracted features comprise keyword features and semantic information features.

The extraction of the keyword features and the semantic information features can respectively adopt the existing models. In this embodiment, the semantic information features are preferably extracted by using bert (bidirectional Encoder from transforms) based on a transform bidirectional Encoder representation model, which performs model compression on information acquisition of natural language.

It should be noted that the keyword feature may also be a keyword feature in other embodiments, and is not limited herein.

S3: and constructing and training a fine model with higher network complexity through a training set, and taking the trained model as a teacher network model.

The fine ranking model is used for extracting the similarity between the semantic information features of the two question sentences, the semantic information features of the two question sentences are input, and the similarity between the semantic information features is output. The training data for training the refined model should include semantic information features of two question sentences and semantic similarity corresponding to the two question sentences, in this embodiment, when the two input question sentences are standard questions and similar questions thereof, the semantic similarity is 1, and when the two input question sentences are two different standard questions or similar questions of one standard question and another standard question, the semantic similarity is 0.

The fine ranking model with higher network complexity can select a DNN ranking model with deeper layers.

the fine model in step S4 has the same function as the fine model in step S3, but because the complexity of the network is low, a non-depth model such as a DNN ranking model or LR/FM model with a shallow depth may be selected.

S5: and constructing a rough model with lower network complexity, training the rough model with lower complexity through a training set based on knowledge distillation and a teacher network model, and taking the trained model as a rough student network model.

The rough model is used for extracting the similarity between the keyword features of the two question sentences, the keyword features of the two question sentences are input, and the similarity between the keyword features is output. The training data for training the rough model should include keyword features of two question sentences and keyword feature similarities corresponding to the two question sentences, in this embodiment, when the two input question sentences are standard questions and similar questions thereof, the keyword feature similarity is 1, and when the two input question sentences are two different standard questions or similar questions of one standard question and another standard question, the keyword feature similarity is 0.

The rough model is used as a preposed link of the fine model, a balance point needs to be found in speed and accuracy, the balance point is positioned in a similarity problem that accuracy is not pursued, the effect can be poorer than that of the fine model, but the return quantity can be used for making up the difference, so that the small model and the high speed are important targets of the rough model.

The rough model is not limited to a large rough model as a teacher model by knowledge distillation, but a fine model is adopted as the teacher model, the rough model is adopted as a student model, such as an FM (frequency modulation) or double-tower DNN (digital noise network) model, and the rough model of the student model simulates the sequencing result of the links of the fine model so as to guide the optimization process of the rough model.

After extracting the features of the question to be answered, firstly searching all similar question sentences with higher keyword feature similarity from a question-answer library through a rough student network model and the keyword features of the rough student network model, then searching similar question sentences with higher semantic similarity from all searched similar question sentences with higher keyword feature similarity through a fine student network model, and further acquiring corresponding answers according to the searched similar question sentences with higher semantic similarity.

In the above steps S4 and S5, the process of training the student network model based on the knowledge distillation and teacher network model includes:

s401: acquiring a real answer in training data of an input model; the specific result is obtained by artificial marking and key words in question sentences;

s402: calculating cross entropy loss H (y, f (x)) of the real label and the student network model according to the training data;

s403: according to the training data, the Logits output by the teacher network model and the Logits output by the student network model are calculated, the Logits output by the student network model is fitted with the Logits output by the teacher network model, and the generalization capability of the teacher network model is enhanced by the teacher network model_t-z_s||²；

S404, calculating a total loss function L_Student：

L_Student＝H(y,f(x))+λ*||z_t-z_s||²

Wherein H represents a cross entropy loss function, y represents a real answer, f (x) represents a prediction output of a student network model, λ represents a distillation loss influence degree adjusting parameter, and z represents a distillation loss influence degree adjusting parameter_tLogs (last layer output of neural network), z, representing teacher network model output_sLogs representing the student network model output.

The embodiment of the invention provides a question-answering method based on knowledge distillation, which is used for slimming the whole flow of a question-answering system, and is not ideal in effect if a small model is trained by directly adopting a conventional group Truth, so that a large model can be learned by using a small model with lower complexity for different modules of the question-answering system to obtain a small and fast model. When the small model is trained, besides the conventional Ground Truth is adopted to train the small model, the large model is used to assist the training of the small model, and some knowledge learned by the large model is transferred to the small model.

Example two:

the invention also provides knowledge distillation-based question answering terminal equipment, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps in the above method embodiment of the first embodiment of the invention.

Further, as an executable scheme, the question answering terminal device based on knowledge distillation may be a computing device such as a desktop computer, a notebook, a palm computer, and a cloud server. The knowledge distillation based question answering terminal device can comprise, but is not limited to, a processor and a memory. It is understood by those skilled in the art that the above-mentioned structure of the knowledge-distillation-based question-answering terminal device is only an example of the knowledge-distillation-based question-answering terminal device, and does not constitute a limitation of the knowledge-distillation-based question-answering terminal device, and may include more or less components than the above, or combine some components, or different components, for example, the knowledge-distillation-based question-answering terminal device may further include an input-output device, a network access device, a bus, etc., which is not limited in this embodiment of the present invention.

Further, as an executable solution, the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, and the like. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor is a control center of the knowledge-distillation-based question answering terminal device, and various interfaces and lines are used to connect various parts of the entire knowledge-distillation-based question answering terminal device.

The memory may be used to store the computer program and/or module, and the processor may implement various functions of the knowledge distillation-based question answering terminal device by operating or executing the computer program and/or module stored in the memory and calling data stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The invention also provides a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method of an embodiment of the invention.

The knowledge distillation-based question answering terminal device integrated module/unit can be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), software distribution medium, and the like.

While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A question-answering method based on knowledge distillation, which is characterized by comprising the following steps:

2. The knowledge-based distillation question-answering method according to claim 1, characterized in that: the training data in the training set consists of standard questions and similar questions similar to the standard questions.

3. The knowledge-based distillation question-answering method according to claim 1, characterized in that: step S1 further includes preprocessing the collected question and answer data, where the preprocessing includes: extracting standard questions, obtaining answers, cleaning abnormal data and cleaning punctuation marks.

4. The knowledge-based distillation question-answering method according to claim 1, characterized in that: and extracting semantic information features by adopting Bert based on a transform bidirectional encoder representation model.

5. The knowledge-based distillation question-answering method according to claim 1, characterized in that: loss function L in process of training student network model based on knowledge distillation and teacher network model_StudentComprises the following steps:

L_Student＝H(y,f(x))+λ*||z_t-z_s||²

6. A question-answering terminal device based on knowledge distillation, which is characterized in that: comprising a processor, a memory and a computer program stored in the memory and running on the processor, the processor implementing the steps of the method according to any of claims 1 to 5 when executing the computer program.

7. A computer-readable storage medium storing a computer program, characterized in that: the computer program when executed by a processor implementing the steps of the method as claimed in any one of claims 1 to 5.