CN113297364A - Natural language understanding method and device for dialog system - Google Patents

Natural language understanding method and device for dialog system Download PDF

Info

Publication number
CN113297364A
CN113297364A CN202110632046.5A CN202110632046A CN113297364A CN 113297364 A CN113297364 A CN 113297364A CN 202110632046 A CN202110632046 A CN 202110632046A CN 113297364 A CN113297364 A CN 113297364A
Authority
CN
China
Prior art keywords
model
layer
training
representation
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110632046.5A
Other languages
Chinese (zh)
Other versions
CN113297364B (en
Inventor
刘露
王乃钰
包铁
张雪松
彭涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Priority to CN202110632046.5A priority Critical patent/CN113297364B/en
Publication of CN113297364A publication Critical patent/CN113297364A/en
Application granted granted Critical
Publication of CN113297364B publication Critical patent/CN113297364B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The invention belongs to the technical field of intelligent dialogue, in particular to a natural language understanding method and a device oriented to a dialogue system, which comprises a word embedding layer, a coding representation layer and a joint learning layer, has reasonable structure and is definitely based on a collected specific field data set and 1) an original BERT-WWM model 2) an original ERNIE model 3) and a pre-trained joint learning model 4) a 3-layer BERT-WWM model after knowledge distillation. Compared experiments are carried out on the four models, on the basis of a specific field data set, 3) the models are better than 1) and 2) models in the two performance indexes of the intention classification accuracy and slot position identification F1, and 4) the model parameter scale after knowledge distillation is greatly reduced, the inference delay is effectively reduced, and the performance loss is small.

Description

Natural language understanding method and device for dialog system
Technical Field
The invention relates to the technical field of intelligent dialogue, in particular to a natural language understanding method and a natural language understanding device oriented to a dialogue system.
Background
The rapid development and widespread popularity of the internet has made the 21 st century an era of data explosion. The demand of people for various information is increased sharply, the types of the required information are wider, when users face large-scale and complex information, how to effectively search and acquire massive information becomes a key for utilizing the information, and higher requirements are provided for an information retrieval mode. The traditional retrieval mode comprises the following steps: (1) only the keywords are matched, and the requirement of the user semantic level is not considered; (2) search results typically return a large amount of text and web pages, requiring further selections by the user. A Dialog System (Dialog System) and a Question-Answering System (Question Answering System) for a specific field are research subjects for improving a conventional search method. Compared with the traditional retrieval mode, the dialogue question-answering system can understand the question from the semantic level, and is not simple in keyword matching. The method can also replace the user to filter the content in the webpage and the document, the returned result is more accurate, and the answer corresponding to the question is returned instead of the webpage or the document.
The system can be divided into four types according to different dialogue question-answering systems of application scenes: (1) common problems (frequenctly assigned Questions, FAQ) type: this type of intelligent system will generally give questions and corresponding answers, analyze and process the user input using models and algorithms, find the question with the highest similarity in the question bank using some metric algorithm, and return the corresponding answer. (2) Task type: the design purpose of the intelligent system is to assist a user to complete a certain task, analyze the input of the user, analyze the intention of the user and adopt a series of actions to complete the requirements of the user under the guidance of a conversation strategy model. (3) The common sense type: the knowledge graph is generally used as a knowledge base of the system, triplets in the knowledge graph contain common knowledge information stored in a natural language form in reality, and answers are retrieved from the knowledge graph and returned according to user input. (4) Chatting with ease: this type of dialogue system is intended to perform many rounds of dialogue with a user in an open field, but has a high requirement for the intelligence and semantic consistency of the system.
In the intelligent conversation, problems of user privacy, low user acceptance, general user experience and the like exist in the application of the conversation system, so that it is very difficult to acquire a large number of public and high-quality conversation data sets and unsupervised corpora, and the development of the conversation system is limited to a greater extent due to the lack of the data sets, so that the challenge is brought. On the other hand, the user input in the dialogue system is often spoken expression, the semantic ambiguity and the grammar randomness degree are high, and the dialogue system also has the characteristics of unfixed sentence length distribution, content divergence and the like. The above features all bring great difficulty to the intention classification task. In addition, the user input may also include multiple intentions, and there is a certain correlation between the multiple intentions, and how to identify whether multiple intentions exist and accurately classify the multiple intentions is also a challenge facing the intent classification task.
The key modules of the system include 4 parts in total for semantic understanding, dialog state tracking, dialog management, and dialog generation. The natural language understanding task generally includes the following three subtasks: domain classification, intent classification, and slot identification. The purpose of the domain classification is to give a domain class to which the user input belongs using a model or algorithm, and the intention classification is to identify the intention of the user input. Slot identification is typically addressed by a sequence tagging task, identifying and tagging entities in user input. The present patent proposes a domain-specific natural language understanding method, whereby the domain recognition and the intention recognition are modeled as one subtask in the natural language understanding part. That is, the user input is generally divided into two parts, one part is irrelevant to the education field, the other part is relevant to the education field, and the input relevant to the field is classified more finely.
Currently, people pay more and more attention to the dialog system in various fields, and the development of the dialog system is greatly promoted by the continuous progress of deep learning technology. For conversational systems, deep learning techniques may utilize a large amount of data to learn feature representations and to classify and identify user intentions, where only a small amount of manual work is required.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
The present invention has been made in view of the problems occurring in the existing intelligent dialog systems.
Therefore, an object of the present invention is to provide a natural language understanding method and apparatus for a dialog system, which can improve accuracy of intent classification and slot position identification in an intelligent dialog system, improve semantic representation capability of a natural language understanding module by introducing a pre-training model and using a new pre-training task, and improve performance of the natural language understanding module in a specific field by introducing field adaptive pre-training and task adaptive pre-training. Meanwhile, knowledge distillation is carried out on the model, the model reasoning speed is improved, and the delay of a dialogue system is relieved.
To solve the above technical problem, according to an aspect of the present invention, the present invention provides the following technical solutions:
a natural language understanding method and device facing to dialogue system, it includes word embedding layer, coding representing layer and joint learning layer;
wherein the content of the first and second substances,
(1) word Embedding Layer (Word Embedding Layer)
X in FIG. 11,...,X5Words representing the input sequence, and e (X)1),...,e(X5) Representing the embedded word representation. The used Embedding is generated in a pre-training mode, and the layer mainly completes the representation from text to vector;
(2) coding Representation Layer (Encoding Representation Layer)
Inputting the word vectors after the embedded expression into a pre-training language model formed by stacking multiple layers of transformers, and performing high-level feature coding and extraction;
the input sequence is recorded as X1,...,X5In a multi-head Attention layer in an Encoder part of a pre-training model based on a Transformer, a Scaled Dot Product Attention mechanism (Scaled Dot-Product Attention) is adopted, a multi-head mechanism is introduced to participate in calculation by using a plurality of different Attention moment arrays, and then a feedforward network is input to complete nonlinear transformation, wherein the formula of the process is as follows:
e(X)×WQ=Q (1)
e(X)×WK=K (2)
e(X)×WV=V (3)
Figure RE-GDA0003181355870000041
MultiHead(Q,K,V)=Concat(head1,...,headh)Wo (5)
FFN(x)=max(0,xW1+b1)W2+b2 (6)
in the formula (d)kDenotes the dimension of K, and FFN denotes the fully connected sublayer in BERT Encoder.
The representation after the pre-training model coding can introduce rich unsupervised grammar and semantic knowledge learned by the model in the pre-training process, thereby improving the classification and sequence marking capability of the model;
(3) joint Learning Layer (Joint Learning Layer)
In the field of image processing, the CNN module is an indispensable module in constructing a network, and the performance thereof has been effectively proven to have good effects also in the field of natural language processing. In order to capture features from lower to higher layers, researchers have proposed VDCNN models, using convolution blocks of up to tens of layers, and also introducing residual concatenation in order to alleviate the problems of gradient vanishing. In the aspect of the recurrent neural network, both LSTM and GRU can further improve the problems of gradient disappearance and sharp gradient expansion. The introduction of bidirectional modeling and maximum pooling operations on the basis of the RNN can enhance the processing capability of the model in long-distance dependency and retain important semantic information. For slot position identification, the bidirectional long-short term memory network model has great advantages for sequence coding, excellent representation learning capacity is shown in sequence tasks, and the conditional random field has the advantage of utilizing information among labels in a sequence output stage, so that the bidirectional LSTM and the Conditional Random Field (CRF) are used for modeling together in the slot position identification task;
therefore, a mixed network formed by three models of CNN, RCNN and VDCNN is adopted for prediction during intention classification, and a BilSTM + conditional random field mode is adopted for sequence marking during slot position identification.
As a preferred solution of the natural language understanding method and apparatus in the dialog system according to the present invention, wherein: the method comprises the following steps:
the method comprises the following steps: firstly, realizing vectorization representation of a user input text by using an embedded layer of a pre-trained language model which is pre-trained on a general field corpus;
step two: then, performing high-level feature extraction and semantic representation on the quantitative representation by using a multilayer Transformer of a pre-training language model, and integrating context information to complete input coding representation;
in the joint learning layer, the intention classification is realized by using a mixed network model containing TextCNN, RCNN and VDCNN, and the specific calculation process is as follows:
(1) CNN module
The CNN module adopts a TextCNN model. The method comprises the following steps that feature extraction is completed in a one-dimensional mode in the natural language field CNN, namely after text is represented in a vectorization mode, convolution and pooling operations are conducted in the text sequence direction, a sequence with a fixed size is selected for interactive learning through convolution each time in a sliding window mode, and a formula of a model is represented as follows:
Figure RE-GDA0003181355870000051
Ci=f(w·e(wi:i+h-1)+b) (8)
c=[Ci,C2,...,Cn-h+1] (9)
R1=max{c} (10)
wherein, w1,...,wnRepresenting the input text sequence, formula (7) represents the splicing operation on the coded representation of the previous layer, and w in formula (8) represents the convolution kernel matrix. Equations (9) and (10) represent the concatenation and max pooling operation on the convolved context representations;
(2) RCNN module
The RCNN model introduces the idea of simultaneous context modeling, taking into account the context and context of words when encoding and feature extraction, and using RNNs as feature extractors in order to capture long-range dependencies of word sequences. A bi-directional LSTM base unit is used in the implementation. The maximum pooling operation is carried out on the output after LSTM coding, semantic importance knowledge of words in sentences can be effectively learned, and the formula of RCNN is as follows:
cl(xi)=f(W(l)cl(wi-1)+W(sl)e(wi-1)) (11)
cr(xi)=f(W(r)ci(wi+1)+W(sr)e(wi+1)) (12)
hi=[cl(wi);e(wi);cr(wi)] (13)
Figure RE-GDA0003181355870000061
Figure RE-GDA0003181355870000062
wherein, cl(wi) Represents the word wiAbove, correspondingly, cr(wi) Represents the word wiC isl(wi) And cr(wi) Calculated according to equations (11) and (12), respectively, W(l)Representing the transformation matrix from the previous layer to the present layer, W(sl)The matrix is used to fuse the semantic representation of the current word and the previous representation of the next word. Similarly, W(r)And W(sr)The same applies to the above description, where c isl(wi)、e(wi) And the following is cr(wi) Splicing to obtain a final code expression, carrying out nonlinear transformation according to a formula (14), and expressing the maximum pooling operation by a formula (15);
(3) VDCNN module
Vdcnn (very Deep relational networks) was originally proposed for image recognition tasks in the field of computer vision. The main idea of the model is to use a small convolution kernel (3 x 3) in all convolution layers of the entire model, then stack to a very deep depth, up to 19 layers, and let the output of this layer be R3
Step three: finally, the finally obtained codes of the three models represent R1,R2,R3Performing splicing operation, as shown in formula (16); then, after linear transformation once, calculating the probability of belonging to each category at the softmax layer, and using the cross entropy as a loss function:
Figure RE-GDA0003181355870000071
S=WsR+b (17)
Figure RE-GDA0003181355870000072
Figure RE-GDA0003181355870000073
the Bi-LSTM network designed for slot identification mainly comprises two sublayers of Bi-LSTM and CRF:
(1) Bi-LSTM layer
Sending the vector representation coded by the pre-training model into a bidirectional LSTM network, wherein the text sequence of the vectorization representation needs to be subjected to learning processes in two directions of forward (from left to right) and backward (from right to left), the formal description of the process is consistent with the formulas (11) and (12), and finally obtaining a forward coding representation cl(wi) And backward coded representation cr(wi) Splicing the two according to a formula (13);
(2) CRF layer
The CRF layer decodes the best path of all possible label paths by learning the dependency between adjacent labels to restrain the label combination, and records H as the coded representation of Bi-LSTM, n is the number of characters in the sentence, m is the number of slot position label types, then H isi,jRepresenting the score of the jth tag of the ith word in the sentence. In the CRF layer, the process of transition of the past output to the current input and the state corresponding to the input determine the current input, which are respectively called transition score and state score, and H is used as the state matrix of the CRF layer. While scores for transitions from state i to state j have a transition matrix Ti,jAnd (4) showing. The calculation was performed according to the following formula:
Figure RE-GDA0003181355870000081
Figure RE-GDA0003181355870000082
Figure RE-GDA0003181355870000083
Figure RE-GDA0003181355870000084
in the formula (21)
Figure RE-GDA0003181355870000085
Representing the true value of the label, equations (22) and (23) give the calculation formula of the model objective function and the Viterbi algorithm for calculating the optimal label sequence;
in the joint learning model, the overall loss function of the model is the weighted sum of losses of the intention classification model and the slot position identification model, namely:
L=αLintent+(1-α)Lslot (24)
the model adopts an Adam optimizer minimized objective function with linear preheating and weight attenuation to update parameters.
As a preferred solution of the natural language understanding method and apparatus in the dialog system according to the present invention, wherein: the method aims at Chinese intention recognition and slot position recognition, so that when a proper coding presentation layer pre-training model is selected, two pre-training language models which are more suitable for a Chinese scene and based on a Transformer are selected. The two models are respectively an ERNIE model proposed by Baidu and BERT-WWM proposed by the union of Harvard and fly.
As a preferred solution of the natural language understanding method and apparatus in the dialog system according to the present invention, wherein: in order to enhance the intention recognition and slot position recognition performance of the model facing the education field, the invention provides a pre-training target task integrated with field dictionary information by constructing a field dictionary, and continuously pre-trains the pre-training language model used by the code expression layer.
As a preferred solution of the natural language understanding method and apparatus in the dialog system according to the present invention, wherein: in order to further improve the representation learning capacity of the pre-training model in the face of two subtasks in the education field, the invention provides that the pre-training model based on the pre-training model is further trained in two other stages, namely the model comprises four stages of general field pre-training, field adaptation pre-training, task adaptation pre-training and fine adjustment.
As a preferred solution of the natural language understanding method and apparatus in the dialog system according to the present invention, wherein: and (3) carrying out knowledge distillation on the joint learning model based on pre-training, and distilling to three layers of BERT-WWM respectively.
Compared with the prior art, the invention has the beneficial effects that: the invention is based on the collected domain-specific data set and 1) the original BERT-WWM model 2) the original ERNIE model 3) on the pre-trained joint learning model 4) the 3-layer BERT-WWM model after knowledge distillation. Compared experiments are carried out on the four models, on the basis of a specific field data set, 3) the models are better than 1) and 2) models in the two performance indexes of the intention classification accuracy and slot position identification F1, and 4) the model parameter scale after knowledge distillation is greatly reduced, the inference delay is effectively reduced, and the performance loss is small.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the present invention will be described in detail with reference to the accompanying drawings and detailed embodiments, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise. Wherein:
FIG. 1 is a pre-training-based joint learning model according to the present invention.
FIG. 2 is a process of dictionary construction in the field of the present invention.
FIG. 3 is a multi-stage pre-training process of the present invention.
FIG. 4 is a 3-layer BERT-WWM distillation framework of the present invention.
FIG. 5 is a flowchart showing the steps of embodiment 1 of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described herein, and it will be apparent to those of ordinary skill in the art that the present invention may be practiced without departing from the spirit and scope of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Example 1
The invention can be used in single-turn, question-answer pair and search type intelligent dialogue systems facing the education field, and is used for intention classification and slot position identification of user input.
1. Firstly, the user input text is subjected to embedded representation and semantic coding, for example, the user input is as follows: what are course information for one shift in six years tomorrow? After the system receives the user input, the text is sent into the combined learning model based on pre-training, the pre-trained model forms an input code expression in a two-dimensional matrix form consisting of 64-bit binary numbers, and semantic grammar information contained in the pre-trained language model is contained.
2. User intent and slot position are predicted. After the two-dimensional matrix representation is formed, the two-dimensional matrix representation is input into the joint learning model, the mixed network model predicts the input intention, and the BilSTM + CRF identifies the slot position in the input. What are course information for one shift in six years tomorrow? The intent contained in "is identified as: "query course information", and there are two semantic slots, respectively: "date: tomorrow "and" class: one shift for six years.
3. Having obtained the user intent and slot information, the dialog state tracking module is used to collect user input, historical dialog, context, and user intent and slot values, forming a current dialog state that can be learned by the dialog state tracking module.
4. The dialogue strategy module selects the action to be executed according to the current dialogue state, the dialogue strategy is the core function of the dialogue system, is equivalent to the brain of the dialogue system, and is responsible for determining which specific action to execute according to the feedback of the current user and the output of the dialogue strategy module, and how to update the dialogue state information, etc. the dialogue strategy module presets various actions in a program and is realized by programming, and the dialogue strategy can be given in the form of artificial rules and can also be trained to obtain a strategy model through machine learning and deep learning.
5. And the dialogue response module is used for matching question-answer pairs stored in the knowledge base according to the selected dialogue action, the knowledge base stores structured course information and knowledge in an unstructured question-answer pair form, the structured information is directly input after being retrieved, cosine similarity is used as a measurement standard of matching degree for the unstructured information, if the matching degree meets the threshold requirement, a corresponding answer is output, and if not, question tracing is performed or a general answer is returned.
While the invention has been described above with reference to an embodiment, various modifications may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In particular, the various features of the disclosed embodiments of the invention may be used in any combination, provided that no structural conflict exists, and the combinations are not exhaustively described in this specification merely for the sake of brevity and resource conservation. Therefore, it is intended that the invention not be limited to the particular embodiments disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (6)

1. A natural language understanding apparatus in a dialogue-oriented system, characterized in that: the method comprises a word embedding layer, a coding representation layer and a joint learning layer;
wherein the content of the first and second substances,
(1) word Embedding Layer (Word Embedding Layer)
X in FIG. 11,...,X5Words representing the input sequence, and e (X)1),...,e(X5) Representing the embedded word representation. The used Embedding is generated in a pre-training mode, and the layer mainly completes the representation from text to vector;
(2) coding Representation Layer (Encoding Representation Layer)
Inputting the word vectors after the embedded expression into a pre-training language model formed by stacking multiple layers of transformers, and performing high-level feature coding and extraction;
the representation after the pre-training model coding can introduce rich unsupervised grammar and semantic knowledge learned by the model in the pre-training process, thereby improving the classification and sequence marking capability of the model;
(3) joint Learning Layer (Joint Learning Layer)
In the field of image processing, the CNN module is an indispensable module in constructing a network, and the performance thereof has been effectively proven to have good effects also in the field of natural language processing. In order to capture features from a lower layer to a higher layer, researchers propose a VDCNN model, use convolution blocks of up to tens of layers, introduce residual connection in order to relieve the problem caused by gradient disappearance, and in the aspect of a recurrent neural network, both LSTM and GRU can further improve the problems of gradient disappearance and gradient sharp expansion, and the introduction of bidirectional modeling and maximum pooling operation on the basis of RNN can enhance the processing capability of the model in long-distance dependence and retain important semantic information. For slot position identification, the bidirectional long-short term memory network model has great advantages for sequence coding, excellent representation learning capacity is shown in sequence tasks, and the conditional random field has the advantage of utilizing information among labels in a sequence output stage, so that the bidirectional LSTM and the Conditional Random Field (CRF) are used for modeling together in the slot position identification task;
therefore, a mixed network formed by three models of CNN, RCNN and VDCNN is adopted for prediction during intention classification, and a BilSTM + conditional random field mode is adopted for sequence marking during slot position identification.
2. The method of claim 1, wherein the natural language understanding device comprises: the method comprises the following steps:
the method comprises the following steps: firstly, realizing vectorization representation of a user input text by using an embedded layer of a pre-trained language model which is pre-trained on a general field corpus;
step two: then, performing high-level feature extraction and semantic representation on the quantitative representation by using a multilayer Transformer of a pre-training language model, and integrating context information to complete input coding representation;
in the joint learning layer, the intention classification is realized by using a mixed network model containing TextCNN, RCNN and VDCNN, and the specific calculation process is as follows:
(1) CNN module
The CNN module adopts a TextCNN model. Performing feature extraction in a one-dimensional form in the natural language field CNN, namely performing convolution and pooling operations in the text sequence direction after the text is represented in a vectorization mode, and selecting a sequence with a fixed size for interactive learning in a convolution mode each time in a sliding window mode;
(2) RCNN module
The RCNN model introduces the idea of simultaneous context modeling, taking into account the context and context of words when encoding and feature extraction, and using RNNs as feature extractors in order to capture long-range dependencies of word sequences. A bi-directional LSTM base unit is used in the implementation. The output after LSTM coding is subjected to maximum pooling operation, so that semantic importance knowledge of words in sentences can be effectively learned;
(3) VDCNN module
Vdcnn (very Deep relational networks) was originally proposed for image recognition tasks in the field of computer vision. The main idea of the model is to use a small convolution kernel (3 x 3) in all convolution layers of the entire model, then stack to a very deep depth, up to 19 layers, and let the output of this layer be R3
Step three: finally, the finally obtained codes of the three models represent R1,R2,R3And carrying out splicing operation.
3. A method for using a natural language understanding device oriented to a dialogue system according to any one of claims 1-2, wherein: the method aims at Chinese intention recognition and slot position recognition, so that when a proper coding presentation layer pre-training model is selected, two pre-training language models which are more suitable for a Chinese scene and based on a Transformer are selected. The two models are respectively an ERNIE model proposed by Baidu and BERT-WWM proposed by the union of Harvard and fly.
4. A method for using a natural language understanding device oriented to a dialogue system according to any one of claims 1-3, wherein: in order to enhance the intention recognition and slot position recognition performance of the model facing the education field, the invention provides a pre-training target task integrated with field dictionary information by constructing a field dictionary, and continuously pre-trains the pre-training language model used by the code expression layer.
5. The use method of the natural language understanding apparatus in the dialog oriented system according to any one of claims 1 to 4, wherein: in order to further improve the representation learning capacity of the pre-training model in the face of two subtasks in the education field, the invention provides that the pre-training model based on the pre-training model is further trained in two other stages, namely the model comprises four stages of general field pre-training, field adaptation pre-training, task adaptation pre-training and fine adjustment.
6. The use method of the natural language understanding apparatus in the dialog oriented system according to any one of claims 1 to 4, wherein: and (3) carrying out knowledge distillation on the joint learning model based on pre-training, and distilling to three layers of BERT-WWM respectively.
CN202110632046.5A 2021-06-07 2021-06-07 Natural language understanding method and device in dialogue-oriented system Active CN113297364B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110632046.5A CN113297364B (en) 2021-06-07 2021-06-07 Natural language understanding method and device in dialogue-oriented system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110632046.5A CN113297364B (en) 2021-06-07 2021-06-07 Natural language understanding method and device in dialogue-oriented system

Publications (2)

Publication Number Publication Date
CN113297364A true CN113297364A (en) 2021-08-24
CN113297364B CN113297364B (en) 2023-06-09

Family

ID=77327526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110632046.5A Active CN113297364B (en) 2021-06-07 2021-06-07 Natural language understanding method and device in dialogue-oriented system

Country Status (1)

Country Link
CN (1) CN113297364B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449528A (en) * 2021-08-30 2021-09-28 企查查科技有限公司 Address element extraction method and device, computer equipment and storage medium
CN114490968A (en) * 2021-12-29 2022-05-13 北京百度网讯科技有限公司 Dialog state tracking method, model training method and device and electronic equipment
CN115168593A (en) * 2022-09-05 2022-10-11 深圳爱莫科技有限公司 Intelligent dialogue management system, method and processing equipment capable of self-learning
CN115599918A (en) * 2022-11-02 2023-01-13 吉林大学(Cn) Mutual learning text classification method and system based on graph enhancement
CN115964471A (en) * 2023-03-16 2023-04-14 成都安哲斯生物医药科技有限公司 Approximate query method for medical data
CN116821287A (en) * 2023-08-28 2023-09-29 湖南创星科技股份有限公司 Knowledge graph and large language model-based user psychological portrait system and method

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180137404A1 (en) * 2016-11-15 2018-05-17 International Business Machines Corporation Joint learning of local and global features for entity linking via neural networks
US20180157638A1 (en) * 2016-12-02 2018-06-07 Microsoft Technology Licensing, Llc Joint language understanding and dialogue management
CN108763542A (en) * 2018-05-31 2018-11-06 中国华戎科技集团有限公司 A kind of Text Intelligence sorting technique, device and computer equipment based on combination learning
CN109299253A (en) * 2018-09-03 2019-02-01 华南理工大学 A kind of social text Emotion identification model construction method of Chinese based on depth integration neural network
CN109472024A (en) * 2018-10-25 2019-03-15 安徽工业大学 A kind of file classification method based on bidirectional circulating attention neural network
CN109684511A (en) * 2018-12-10 2019-04-26 上海七牛信息技术有限公司 A kind of video clipping method, video aggregation method, apparatus and system
CN109840279A (en) * 2019-01-10 2019-06-04 山东亿云信息技术有限公司 File classification method based on convolution loop neural network
CN110222173A (en) * 2019-05-16 2019-09-10 吉林大学 Short text sensibility classification method and device neural network based
CN110674639A (en) * 2019-09-24 2020-01-10 拾音智能科技有限公司 Natural language understanding method based on pre-training model
CN110807332A (en) * 2019-10-30 2020-02-18 腾讯科技(深圳)有限公司 Training method of semantic understanding model, semantic processing method, semantic processing device and storage medium
CN111078833A (en) * 2019-12-03 2020-04-28 哈尔滨工程大学 Text classification method based on neural network
CN111309915A (en) * 2020-03-03 2020-06-19 爱驰汽车有限公司 Method, system, device and storage medium for training natural language of joint learning
CN111625641A (en) * 2020-07-30 2020-09-04 浙江大学 Dialog intention recognition method and system based on multi-dimensional semantic interaction representation model
CN111753081A (en) * 2019-03-28 2020-10-09 百度(美国)有限责任公司 Text classification system and method based on deep SKIP-GRAM network

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180137404A1 (en) * 2016-11-15 2018-05-17 International Business Machines Corporation Joint learning of local and global features for entity linking via neural networks
US20180157638A1 (en) * 2016-12-02 2018-06-07 Microsoft Technology Licensing, Llc Joint language understanding and dialogue management
CN108763542A (en) * 2018-05-31 2018-11-06 中国华戎科技集团有限公司 A kind of Text Intelligence sorting technique, device and computer equipment based on combination learning
CN109299253A (en) * 2018-09-03 2019-02-01 华南理工大学 A kind of social text Emotion identification model construction method of Chinese based on depth integration neural network
CN109472024A (en) * 2018-10-25 2019-03-15 安徽工业大学 A kind of file classification method based on bidirectional circulating attention neural network
CN109684511A (en) * 2018-12-10 2019-04-26 上海七牛信息技术有限公司 A kind of video clipping method, video aggregation method, apparatus and system
CN109840279A (en) * 2019-01-10 2019-06-04 山东亿云信息技术有限公司 File classification method based on convolution loop neural network
CN111753081A (en) * 2019-03-28 2020-10-09 百度(美国)有限责任公司 Text classification system and method based on deep SKIP-GRAM network
CN110222173A (en) * 2019-05-16 2019-09-10 吉林大学 Short text sensibility classification method and device neural network based
CN110674639A (en) * 2019-09-24 2020-01-10 拾音智能科技有限公司 Natural language understanding method based on pre-training model
CN110807332A (en) * 2019-10-30 2020-02-18 腾讯科技(深圳)有限公司 Training method of semantic understanding model, semantic processing method, semantic processing device and storage medium
CN111078833A (en) * 2019-12-03 2020-04-28 哈尔滨工程大学 Text classification method based on neural network
CN111309915A (en) * 2020-03-03 2020-06-19 爱驰汽车有限公司 Method, system, device and storage medium for training natural language of joint learning
CN111625641A (en) * 2020-07-30 2020-09-04 浙江大学 Dialog intention recognition method and system based on multi-dimensional semantic interaction representation model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
战保行: "任务型对话中意图识别和槽填充的联合算法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
王乃钰 等: "基于深度学习的语言模型研究进展", 《软件学报》 *
王堃 等: "端到端对话系统意图语义槽联合识别研究综述", 《计算机工程与应用》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449528A (en) * 2021-08-30 2021-09-28 企查查科技有限公司 Address element extraction method and device, computer equipment and storage medium
CN113449528B (en) * 2021-08-30 2021-11-30 企查查科技有限公司 Address element extraction method and device, computer equipment and storage medium
CN114490968A (en) * 2021-12-29 2022-05-13 北京百度网讯科技有限公司 Dialog state tracking method, model training method and device and electronic equipment
CN115168593A (en) * 2022-09-05 2022-10-11 深圳爱莫科技有限公司 Intelligent dialogue management system, method and processing equipment capable of self-learning
CN115599918A (en) * 2022-11-02 2023-01-13 吉林大学(Cn) Mutual learning text classification method and system based on graph enhancement
CN115964471A (en) * 2023-03-16 2023-04-14 成都安哲斯生物医药科技有限公司 Approximate query method for medical data
CN116821287A (en) * 2023-08-28 2023-09-29 湖南创星科技股份有限公司 Knowledge graph and large language model-based user psychological portrait system and method
CN116821287B (en) * 2023-08-28 2023-11-17 湖南创星科技股份有限公司 Knowledge graph and large language model-based user psychological portrait system and method

Also Published As

Publication number Publication date
CN113297364B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
CN113297364B (en) Natural language understanding method and device in dialogue-oriented system
CN110781680B (en) Semantic similarity matching method based on twin network and multi-head attention mechanism
CN111259127B (en) Long text answer selection method based on transfer learning sentence vector
CN110134946B (en) Machine reading understanding method for complex data
CN112487820B (en) Chinese medical named entity recognition method
CN110390397A (en) A kind of text contains recognition methods and device
CN111831789A (en) Question-answer text matching method based on multilayer semantic feature extraction structure
CN112559706B (en) Training method of dialogue generating model, dialogue method, device and storage medium
CN110597968A (en) Reply selection method and device
CN113254604A (en) Reference specification-based professional text generation method and device
CN115080715B (en) Span extraction reading understanding method based on residual structure and bidirectional fusion attention
CN116028604A (en) Answer selection method and system based on knowledge enhancement graph convolution network
CN112925918A (en) Question-answer matching system based on disease field knowledge graph
CN115964459B (en) Multi-hop reasoning question-answering method and system based on food safety cognition spectrum
CN116010553A (en) Viewpoint retrieval system based on two-way coding and accurate matching signals
CN116662502A (en) Method, equipment and storage medium for generating financial question-answer text based on retrieval enhancement
CN117648429B (en) Question-answering method and system based on multi-mode self-adaptive search type enhanced large model
CN114328866A (en) Strong anthropomorphic intelligent dialogue robot with smooth and accurate response
CN113011196B (en) Concept-enhanced representation and one-way attention-containing subjective question automatic scoring neural network model
CN114020900A (en) Chart English abstract generation method based on fusion space position attention mechanism
CN117648469A (en) Cross double-tower structure answer selection method based on contrast learning
CN113641809A (en) XLNET-BiGRU-CRF-based intelligent question answering method
CN112579739A (en) Reading understanding method based on ELMo embedding and gating self-attention mechanism
CN113807079A (en) End-to-end entity and relation combined extraction method based on sequence-to-sequence
CN116681078A (en) Keyword generation method based on reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant