CN114372478A - Knowledge distillation-based question and answer method, terminal equipment and storage medium - Google Patents

Knowledge distillation-based question and answer method, terminal equipment and storage medium Download PDF

Info

Publication number
CN114372478A
CN114372478A CN202111487499.XA CN202111487499A CN114372478A CN 114372478 A CN114372478 A CN 114372478A CN 202111487499 A CN202111487499 A CN 202111487499A CN 114372478 A CN114372478 A CN 114372478A
Authority
CN
China
Prior art keywords
model
question
network model
training
distillation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111487499.XA
Other languages
Chinese (zh)
Inventor
洪万福
李晓昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Yuanting Information Technology Co ltd
Original Assignee
Xiamen Yuanting Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Yuanting Information Technology Co ltd filed Critical Xiamen Yuanting Information Technology Co ltd
Priority to CN202111487499.XA priority Critical patent/CN114372478A/en
Publication of CN114372478A publication Critical patent/CN114372478A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a question-answering method based on knowledge distillation, a terminal device and a storage medium, wherein the method comprises the following steps: s1: collecting question and answer data of a demand field to form a training set; s2: extracting the characteristics of each training data in the training set; s3: constructing and training a fine model with higher network complexity through a training set to generate a teacher network model; s4: constructing a fine ranking model with lower network complexity, training through a training set based on knowledge distillation and a teacher network model, and generating a fine ranking student network model; s5: constructing a rough ranking model with lower network complexity, and training through a training set based on knowledge distillation and a teacher network model to generate a rough ranking student network model; s6: after the characteristics of the question to be answered are extracted, corresponding answers are searched from a question-answer library through a rough arrangement student network model and a fine arrangement student network model in sequence. The invention can improve the efficiency of the question answering system.

Description

Knowledge distillation-based question and answer method, terminal equipment and storage medium
Technical Field
The invention relates to the field of automatic question answering, in particular to a question answering method based on knowledge distillation, terminal equipment and a storage medium.
Background
With the rapid development of information technology, the increasing amount of network information makes it difficult for users to quickly find desired content from the large amount of information returned by search engines. People's demand for fast and accurate information acquisition is also increasing, and automatic question answering systems are also in force. The automatic question-answering system provides a natural language question-answering communication mode for people, provides required answers for users directly instead of related webpages, and has the characteristics of convenience, quickness, high efficiency and the like. Is deeply loved by users.
Two necessary links for constructing an FAQ question-answering system based on the vertical field are key information analysis and similarity calculation, namely element identification, of the questions, wherein the key information analysis comprises the following steps: extracting keywords and carrying out deep semantic analysis. The large-scale FAQ question-answering system in the vertical field usually has a large enough question answer library, namely a corpus, and has low use complexity, simple structure, low accuracy of small models and poor automatic answering effect; the effect requirement can be met by using a large model with high complexity, but the large-scale parameters are large in quantity, long reasoning and calculation time is needed, the prediction speed is slow, the efficiency of a question-answering system is influenced, and the production and deployment requirements are not met.
Disclosure of Invention
In order to solve the above problems, the present invention provides a question answering method based on knowledge distillation, a terminal device and a storage medium.
The specific scheme is as follows:
a question-answering method based on knowledge distillation, comprising the following steps:
s1: collecting question and answer data of a demand field to form a training set;
s2: extracting features of each training data in the training set, wherein the extracted features comprise keyword features and semantic information features;
s3: constructing and training a fine model with higher network complexity through a training set, and taking the trained model as a teacher network model;
the fine ranking model is used for extracting the similarity between semantic information features of the two question sentences;
s4: constructing a fine ranking model with lower network complexity, training the fine ranking model with lower complexity through a training set based on knowledge distillation and a teacher network model, and taking the trained model as a fine ranking student network model;
s5: constructing a rough model with lower network complexity, training the rough model with lower complexity through a training set based on knowledge distillation and a teacher network model, and taking the trained model as a rough student network model;
the rough model is used for extracting the similarity between the keyword characteristics of the two question sentences;
s6: after the characteristics of the question to be answered are extracted, corresponding answers are searched from a question-answer library through a rough arrangement student network model and a fine arrangement student network model in sequence.
Further, the training data in the training set is composed of standard questions and similar questions similar to the standard questions.
Further, step S1 includes preprocessing the collected question and answer data, where the preprocessing includes: extracting standard questions, obtaining answers, cleaning abnormal data and cleaning punctuation marks.
Further, the extraction of semantic information features adopts Bert based on a transform bidirectional encoder representation model.
Further, loss function L in the process of training student network model based on knowledge distillation and teacher network modelStudentComprises the following steps:
LStudent=H(y,f(x))+λ*||zt-zs||2
wherein H represents a cross entropy loss function, y represents a real answer, f (x) represents a prediction output of a student network model, λ represents a distillation loss influence degree adjusting parameter, and z represents a distillation loss influence degree adjusting parametertLogs, z representing teacher network model outputsLogs representing the student network model output.
A knowledge-distillation-based question answering terminal device comprises a processor, a memory and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the method of the embodiment of the invention.
A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as described above for an embodiment of the invention.
According to the technical scheme, knowledge distillation is used, the students with low complexity learn the sequencing ability and the semantic understanding ability of the teacher network through the network, the teacher network guides the student network to train, the problem complexity of the student network is reduced, and the student network model which is good, small and fast is obtained finally. The method has important application value for promoting the deployment of a large-scale FAQ question-answering system on edge equipment and improving the efficiency of the question-answering system caused by the improvement of the attribute identification and inference speed.
Drawings
Fig. 1 is a flowchart illustrating a first embodiment of the present invention.
Detailed Description
To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. Those skilled in the art will appreciate still other possible embodiments and advantages of the present invention with reference to these figures.
The invention will now be further described with reference to the accompanying drawings and detailed description.
The first embodiment is as follows:
the embodiment of the invention provides a question-answering method based on knowledge distillation, as shown in figure 1, the method comprises the following steps:
s1: and collecting question and answer data of the demand field to form a training set.
The question-answer data can be collected by crawling question-answer linguistic data of a real service scene, and then manually or semi-manually generating question-answer data pairs. In this embodiment, 150W question-answer data pairs, i.e. 150W standard questions and corresponding standard question answers, are collected.
Because 150 pieces of question-answering data in the intelligent question-answering field are not enough to support the intelligent question-answering in a certain field, in order to be able to complement the data amount automatically, in this embodiment, preferably, 750 pieces of similar questions are generated by using a similar data generation module according to 150 pieces of standard questions, and the similar questions and the 150 pieces of standard questions together form a training set. The similar data generation module only needs to adopt the existing technology, and is not described in detail herein.
The embodiment also comprises the step of preprocessing the collected question answering data, wherein the specific preprocessing process comprises the following steps: extracting standard questions, obtaining answers, cleaning abnormal data, cleaning punctuation marks and the like. After 900W pieces of training data are preprocessed, 850W pieces of training data are obtained.
S2: and extracting the features of each training data in the training set, wherein the extracted features comprise keyword features and semantic information features.
The extraction of the keyword features and the semantic information features can respectively adopt the existing models. In this embodiment, the semantic information features are preferably extracted by using bert (bidirectional Encoder from transforms) based on a transform bidirectional Encoder representation model, which performs model compression on information acquisition of natural language.
It should be noted that the keyword feature may also be a keyword feature in other embodiments, and is not limited herein.
S3: and constructing and training a fine model with higher network complexity through a training set, and taking the trained model as a teacher network model.
The fine ranking model is used for extracting the similarity between the semantic information features of the two question sentences, the semantic information features of the two question sentences are input, and the similarity between the semantic information features is output. The training data for training the refined model should include semantic information features of two question sentences and semantic similarity corresponding to the two question sentences, in this embodiment, when the two input question sentences are standard questions and similar questions thereof, the semantic similarity is 1, and when the two input question sentences are two different standard questions or similar questions of one standard question and another standard question, the semantic similarity is 0.
The fine ranking model with higher network complexity can select a DNN ranking model with deeper layers.
S4: constructing a fine ranking model with lower network complexity, training the fine ranking model with lower complexity through a training set based on knowledge distillation and a teacher network model, and taking the trained model as a fine ranking student network model;
the fine model in step S4 has the same function as the fine model in step S3, but because the complexity of the network is low, a non-depth model such as a DNN ranking model or LR/FM model with a shallow depth may be selected.
S5: and constructing a rough model with lower network complexity, training the rough model with lower complexity through a training set based on knowledge distillation and a teacher network model, and taking the trained model as a rough student network model.
The rough model is used for extracting the similarity between the keyword features of the two question sentences, the keyword features of the two question sentences are input, and the similarity between the keyword features is output. The training data for training the rough model should include keyword features of two question sentences and keyword feature similarities corresponding to the two question sentences, in this embodiment, when the two input question sentences are standard questions and similar questions thereof, the keyword feature similarity is 1, and when the two input question sentences are two different standard questions or similar questions of one standard question and another standard question, the keyword feature similarity is 0.
The rough model is used as a preposed link of the fine model, a balance point needs to be found in speed and accuracy, the balance point is positioned in a similarity problem that accuracy is not pursued, the effect can be poorer than that of the fine model, but the return quantity can be used for making up the difference, so that the small model and the high speed are important targets of the rough model.
The rough model is not limited to a large rough model as a teacher model by knowledge distillation, but a fine model is adopted as the teacher model, the rough model is adopted as a student model, such as an FM (frequency modulation) or double-tower DNN (digital noise network) model, and the rough model of the student model simulates the sequencing result of the links of the fine model so as to guide the optimization process of the rough model.
S6: after the characteristics of the question to be answered are extracted, corresponding answers are searched from a question-answer library through a rough arrangement student network model and a fine arrangement student network model in sequence.
After extracting the features of the question to be answered, firstly searching all similar question sentences with higher keyword feature similarity from a question-answer library through a rough student network model and the keyword features of the rough student network model, then searching similar question sentences with higher semantic similarity from all searched similar question sentences with higher keyword feature similarity through a fine student network model, and further acquiring corresponding answers according to the searched similar question sentences with higher semantic similarity.
In the above steps S4 and S5, the process of training the student network model based on the knowledge distillation and teacher network model includes:
s401: acquiring a real answer in training data of an input model; the specific result is obtained by artificial marking and key words in question sentences;
s402: calculating cross entropy loss H (y, f (x)) of the real label and the student network model according to the training data;
s403: according to the training data, the Logits output by the teacher network model and the Logits output by the student network model are calculated, the Logits output by the student network model is fitted with the Logits output by the teacher network model, and the generalization capability of the teacher network model is enhanced by the teacher network modelt-zs||2
S404, calculating a total loss function LStudent
LStudent=H(y,f(x))+λ*||zt-zs||2
Wherein H represents a cross entropy loss function, y represents a real answer, f (x) represents a prediction output of a student network model, λ represents a distillation loss influence degree adjusting parameter, and z represents a distillation loss influence degree adjusting parametertLogs (last layer output of neural network), z, representing teacher network model outputsLogs representing the student network model output.
The embodiment of the invention provides a question-answering method based on knowledge distillation, which is used for slimming the whole flow of a question-answering system, and is not ideal in effect if a small model is trained by directly adopting a conventional group Truth, so that a large model can be learned by using a small model with lower complexity for different modules of the question-answering system to obtain a small and fast model. When the small model is trained, besides the conventional Ground Truth is adopted to train the small model, the large model is used to assist the training of the small model, and some knowledge learned by the large model is transferred to the small model.
Example two:
the invention also provides knowledge distillation-based question answering terminal equipment, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps in the above method embodiment of the first embodiment of the invention.
Further, as an executable scheme, the question answering terminal device based on knowledge distillation may be a computing device such as a desktop computer, a notebook, a palm computer, and a cloud server. The knowledge distillation based question answering terminal device can comprise, but is not limited to, a processor and a memory. It is understood by those skilled in the art that the above-mentioned structure of the knowledge-distillation-based question-answering terminal device is only an example of the knowledge-distillation-based question-answering terminal device, and does not constitute a limitation of the knowledge-distillation-based question-answering terminal device, and may include more or less components than the above, or combine some components, or different components, for example, the knowledge-distillation-based question-answering terminal device may further include an input-output device, a network access device, a bus, etc., which is not limited in this embodiment of the present invention.
Further, as an executable solution, the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, and the like. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor is a control center of the knowledge-distillation-based question answering terminal device, and various interfaces and lines are used to connect various parts of the entire knowledge-distillation-based question answering terminal device.
The memory may be used to store the computer program and/or module, and the processor may implement various functions of the knowledge distillation-based question answering terminal device by operating or executing the computer program and/or module stored in the memory and calling data stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The invention also provides a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method of an embodiment of the invention.
The knowledge distillation-based question answering terminal device integrated module/unit can be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), software distribution medium, and the like.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A question-answering method based on knowledge distillation, which is characterized by comprising the following steps:
s1: collecting question and answer data of a demand field to form a training set;
s2: extracting features of each training data in the training set, wherein the extracted features comprise keyword features and semantic information features;
s3: constructing and training a fine model with higher network complexity through a training set, and taking the trained model as a teacher network model;
the fine ranking model is used for extracting the similarity between semantic information features of the two question sentences;
s4: constructing a fine ranking model with lower network complexity, training the fine ranking model with lower complexity through a training set based on knowledge distillation and a teacher network model, and taking the trained model as a fine ranking student network model;
s5: constructing a rough model with lower network complexity, training the rough model with lower complexity through a training set based on knowledge distillation and a teacher network model, and taking the trained model as a rough student network model;
the rough model is used for extracting the similarity between the keyword characteristics of the two question sentences;
s6: after the characteristics of the question to be answered are extracted, corresponding answers are searched from a question-answer library through a rough arrangement student network model and a fine arrangement student network model in sequence.
2. The knowledge-based distillation question-answering method according to claim 1, characterized in that: the training data in the training set consists of standard questions and similar questions similar to the standard questions.
3. The knowledge-based distillation question-answering method according to claim 1, characterized in that: step S1 further includes preprocessing the collected question and answer data, where the preprocessing includes: extracting standard questions, obtaining answers, cleaning abnormal data and cleaning punctuation marks.
4. The knowledge-based distillation question-answering method according to claim 1, characterized in that: and extracting semantic information features by adopting Bert based on a transform bidirectional encoder representation model.
5. The knowledge-based distillation question-answering method according to claim 1, characterized in that: loss function L in process of training student network model based on knowledge distillation and teacher network modelStudentComprises the following steps:
LStudent=H(y,f(x))+λ*||zt-zs||2
wherein H represents a cross entropy loss function, y represents a real answer, f (x) represents a prediction output of a student network model, λ represents a distillation loss influence degree adjusting parameter, and z represents a distillation loss influence degree adjusting parametertLogs, z representing teacher network model outputsLogs representing the student network model output.
6. A question-answering terminal device based on knowledge distillation, which is characterized in that: comprising a processor, a memory and a computer program stored in the memory and running on the processor, the processor implementing the steps of the method according to any of claims 1 to 5 when executing the computer program.
7. A computer-readable storage medium storing a computer program, characterized in that: the computer program when executed by a processor implementing the steps of the method as claimed in any one of claims 1 to 5.
CN202111487499.XA 2021-12-07 2021-12-07 Knowledge distillation-based question and answer method, terminal equipment and storage medium Pending CN114372478A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111487499.XA CN114372478A (en) 2021-12-07 2021-12-07 Knowledge distillation-based question and answer method, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111487499.XA CN114372478A (en) 2021-12-07 2021-12-07 Knowledge distillation-based question and answer method, terminal equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114372478A true CN114372478A (en) 2022-04-19

Family

ID=81140031

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111487499.XA Pending CN114372478A (en) 2021-12-07 2021-12-07 Knowledge distillation-based question and answer method, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114372478A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117743557A (en) * 2024-02-08 2024-03-22 卓智网络科技有限公司 Intelligent customer service method, intelligent customer service device, server and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117743557A (en) * 2024-02-08 2024-03-22 卓智网络科技有限公司 Intelligent customer service method, intelligent customer service device, server and storage medium
CN117743557B (en) * 2024-02-08 2024-06-04 卓智网络科技有限公司 Intelligent customer service method, intelligent customer service device, server and storage medium

Similar Documents

Publication Publication Date Title
CN110795543B (en) Unstructured data extraction method, device and storage medium based on deep learning
CN102262634B (en) Automatic questioning and answering method and system
CN110297893B (en) Natural language question-answering method, device, computer device and storage medium
CN108846138B (en) Question classification model construction method, device and medium fusing answer information
CN111767385A (en) Intelligent question and answer method and device
CN116127095A (en) Question-answering method combining sequence model and knowledge graph
CN113239169A (en) Artificial intelligence-based answer generation method, device, equipment and storage medium
CN113761153A (en) Question and answer processing method and device based on picture, readable medium and electronic equipment
CN116561538A (en) Question-answer scoring method, question-answer scoring device, electronic equipment and storage medium
CN113342958B (en) Question-answer matching method, text matching model training method and related equipment
CN111310463A (en) Test question difficulty estimation method and device, electronic equipment and storage medium
CN112685550B (en) Intelligent question-answering method, intelligent question-answering device, intelligent question-answering server and computer readable storage medium
CN111552773A (en) Method and system for searching key sentence of question or not in reading and understanding task
CN110795544B (en) Content searching method, device, equipment and storage medium
CN115905487A (en) Document question and answer method, system, electronic equipment and storage medium
CN114416929A (en) Sample generation method, device, equipment and storage medium of entity recall model
CN114372478A (en) Knowledge distillation-based question and answer method, terminal equipment and storage medium
CN117194730A (en) Intention recognition and question answering method and device, electronic equipment and storage medium
CN110347807B (en) Problem information processing method and device
CN116991976A (en) Model training method, device, electronic equipment and readable storage medium
CN117218482A (en) Model training method, video processing device and electronic equipment
CN116401344A (en) Method and device for searching table according to question
CN115270746A (en) Question sample generation method and device, electronic equipment and storage medium
CN113569018A (en) Question and answer pair mining method and device
CN117235237B (en) Text generation method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination