CN111160010B

CN111160010B - Training method and system for abbreviated sentence understanding model

Info

Publication number: CN111160010B
Application number: CN201911407761.8A
Authority: CN
Inventors: 朱钦佩
Original assignee: Sipic Technology Co Ltd
Current assignee: Sipic Technology Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2023-04-18
Anticipated expiration: 2039-12-31
Also published as: CN111160010A

Abstract

The embodiment of the invention provides a method for training a thumbnail sentence understanding model. The method comprises the following steps: receiving a dialog training data set; constructing an abbreviation sentence understanding model based on an encoder-decoder framework; taking the first round of complete sentences and the second round of abbreviated sentences as input of an encoder-coding layer, and determining second sentence characteristic vectors of the second round of abbreviated sentences through a self-attention mechanism; determining a relation feature vector of a second round of abbreviated sentences and a first round of complete sentences based on the first word feature vector and the second sentence feature vector; and the decoder-decoding layer generates a simulated complete statement of the second round of the abbreviated statements based on the relational feature vectors, and trains the abbreviated statement understanding model based on the simulated complete statement and the target complete statement. The embodiment of the invention also provides a training system of the abbreviated sentence understanding model. The embodiment of the invention utilizes the model generated in the neural network to restore the omitted sentences into the complete sentences, improves the understanding effect of the model for understanding the abbreviated sentences and effectively improves the reply satisfaction rate of the user to the dialog system.

Description

Training method and system for abbreviated sentence understanding model

Technical Field

The invention relates to the field of natural language processing, in particular to a method and a system for training a thumbnail sentence understanding model.

Background

Natural language human-computer interaction is a popular field of current artificial intelligence development, and is widely applied to our lives, such as 'accompanying robots', 'vehicle-mounted voice navigation', 'intelligent home appliances' and the like. In the process of man-machine interaction, users often give a large number of abbreviated sentences such as 'change to other bars', 'you feel like the woolen', 'do or not' and the like due to habits.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art:

most of the current dialog systems are single-turn dialog systems, and the real intention of the user can hardly be understood correctly. At present, two main schemes are used for solving the problem of user abbreviation: 1. analyzing the rule; 2. and carrying out multi-round statistical analysis.

The rule analysis is to continuously improve various information required by the system from the information given by the user in the set rule range, and automatically complete the task specified by the user when the information accumulation meets a certain condition. Such as "how the weather is today in Suzhou" - > "Nanjing woolen", the rules dialog system switches to the closed field "weather" according to the first sentence (the system will remain in "weather" mode at all times if there is no other strong intention). The dialogue system resolves "place = suzhou and time = today" from the first sentence, and replies to the user query result based on these two messages. When the user says "Nanjing woolen", the system replaces the analysis result with "place = Nanjing, time = today". The rule dialogue system has two limitations, namely, the rule dialogue system has a good application effect in a closed field, but can hardly be popularized to an open field; and secondly, a large amount of rule logics are required to be manually set even in the closed field, but the rules always cover places which cannot be covered.

The multi-round statistical analysis is to output the system reply according to the input multi-round information by using a neural network algorithm. This scheme is a reply given after integrating the context integrity information. However, the end-to-end (end-to-end) multi-round training is not mature at present and has not yet reached the application level. In addition, besides the problem of the abbreviation, the multi-round statistical algorithm also faces more challenging problems of topic conversion, character acceptance, attitude consistency and the like.

Disclosure of Invention

The method at least solves the problems that an abbreviation comprehension model in the prior art is only used in a closed field, has limitation and is poor in comprehension effect.

In a first aspect, an embodiment of the present invention provides a method for training a thumbnail sentence understanding model, including:

receiving a dialog training data set, the dialog training data set comprising: the method comprises the steps that a first round of complete sentences and a second round of abbreviated sentences are continuously requested by a user, and target complete sentences used for representing the second round of abbreviated sentences;

constructing a contracted sentence understanding model based on an encoder-decoder framework, wherein the contracted sentence understanding model comprises an encoder-coding layer and a decoder-decoding layer, and the contracted sentence understanding model is used for reducing an omitted sentence into a complete sentence;

taking the first round of complete sentences and the second round of abbreviated sentences as the input of the encoder-coding layer, and determining second sentence feature vectors of the second round of abbreviated sentences through an attention mechanism;

determining a first word feature vector of each word in the first round of complete sentences, and determining a relation feature vector of the second round of abbreviated sentences and the first round of complete sentences as the output of the encoder-coding layer based on the first word feature vector and the second sentence feature vector;

and the decoder-decoding layer generates a simulated complete statement of the second round of the abbreviated statement based on the relational feature vector, and trains the abbreviated statement understanding model based on the simulated complete statement and the target complete statement so as to enable the simulated complete statement to approach the target complete statement.

In a second aspect, an embodiment of the present invention provides a system for training a thumbnail sentence understanding model, including:

a data receiving program module that receives a session training data set, the session training data set comprising: the method comprises the steps that a first round of complete sentences and a second round of abbreviated sentences are continuously requested by a user, and target complete sentences used for representing the second round of abbreviated sentences;

the model building program module is used for building a contracted sentence understanding model based on an encoder-decoder framework, the contracted sentence understanding model comprises an encoder-coding layer and a decoder-decoding layer, and the contracted sentence understanding model is used for restoring an omitted sentence into a complete sentence;

a sentence characteristic determining program module, configured to determine, by using the first round of complete sentences and the second round of abbreviated sentences as inputs of the encoder-coding layer, a second sentence characteristic vector of the second round of abbreviated sentences through an attention mechanism;

a relation feature determining program module, configured to determine a first word feature vector of each word in the first round of complete sentences, and determine, based on the first word feature vector and the second sentence feature vector, a relation feature vector of the second round of abbreviated sentences and the first round of complete sentences as an output of the encoder-coding layer;

and the training program module is used for generating a simulated complete statement of the second round of the abbreviated statement by the decoder-decoding layer based on the relation characteristic vector, and training the abbreviated statement understanding model based on the simulated complete statement and the target complete statement so as to enable the simulated complete statement to approach the target complete statement.

In a third aspect, an electronic device is provided, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method for training a abbreviation understanding model of any of the embodiments of the present invention.

In a fourth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the method for training a thumbnail understanding model according to any embodiment of the present invention.

The embodiment of the invention has the beneficial effects that: by utilizing the generation model in the neural network, which contains the internal relation between words and sentences and also contains the relation between sentences, the omitted sentence is restored to be a complete sentence according to the input context, the understanding effect of the comprehension model of the abbreviated sentence is improved, and the reply satisfaction rate of the user to the dialog system is effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a flowchart of a method for training a thumbnail understanding model according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a training system of a abbreviation comprehension model according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of a method for training a abbreviation sentence understanding model according to an embodiment of the present invention, including the following steps:

s11: receiving a dialog training data set, the dialog training data set comprising: the method comprises the steps that a first round of complete sentences and a second round of abbreviated sentences are continuously requested by a user, and target complete sentences used for representing the second round of abbreviated sentences;

s12: constructing a abbreviated sentence understanding model based on an encoder-decoder framework, wherein the abbreviated sentence understanding model comprises an encoder-coding layer and a decoder-decoding layer, and the abbreviated sentence understanding model is used for reducing an omitted sentence into a complete sentence;

s13: taking the first round of complete sentences and the second round of abbreviated sentences as the input of the encoder-coding layer, and determining second sentence feature vectors of the second round of abbreviated sentences through an attention mechanism;

s14: determining a first word feature vector of each word in the first round of complete sentences, and determining a relation feature vector of the second round of abbreviated sentences and the first round of complete sentences as the output of the encoder-coding layer based on the first word feature vector and the second sentence feature vector;

s15: and the decoder-decoding layer generates a simulated complete statement of the second round of the abbreviated statement based on the relational feature vector, and trains the abbreviated statement understanding model based on the simulated complete statement and the target complete statement so as to enable the simulated complete statement to approach the target complete statement.

In the present embodiment, potential user abbreviations are restored to full expression sentences in consideration of the historical dialogue information of the user, and further, when dialogue training data is collected, a plurality of rounds of dialogs input by the user are selected.

For step S11, training data needs to be constructed first, the input of the abbreviated sentence understanding model is data requested by a user for multiple rounds of dialog, the output is data of a complete sentence after conversion or a last round of user, and the training set needs to contain almost equal amounts of data in the form of a first group and a second group. As an embodiment, the dialog training data set further comprises: and the character mark is used for marking and dividing the complete sentence in the first round and the abbreviated sentence in the second round.

In the present embodiment, two rounds of dialogue data are exemplified:

a first group:

q < BOS > Suzhou weather how-like < SEG > Nanjing woolen < EOS >

A < BOS > Nanjing weather how < EOS >

The first round of complete sentences is 'how kind of Suzhou weather', the second round of thumbnail sentences is 'Nanjing woolen', and the target complete sentences of the second round of thumbnail sentences is 'how kind of Nanjing weather'.

Second group:

q < BOS > My schoolbag with wheels Ha and Ha < SEG > everybody is busy < EOS >

A < BOS > everywhere is busy < EOS >

The first round of complete sentences are 'my bag with wheel haha', the second round of abbreviated sentences are 'everyone busy', and the target complete sentences of the second round of abbreviated sentences are 'everyone busy'.

In the above example, < BOS > represents the beginning of the sequence, < SEG > represents the two sentence separators, < EOS > represents the end of the sequence. Q is the sentence input by the user in the two-round dialog, and A is the complete sentence output.

For step S12, the model to be constructed needs to have the ability to judge the relationship between two sentences to decide whether the output sentence should remain unchanged or needs to be fused.

By using an encoding-decoding (encode-decode) framework, the characteristics of an input text can be effectively learned through algorithms such as 'seq 2 seq-attribute', 'tansformer' and the like, and the output text can be organized by ingeniously using the characteristics. For the abbreviation problem, the principle that the generative model needs to follow is: (1) the original semantics cannot be changed by the extension abbreviation; and (2) the non-abbreviation sentences should be output in their entirety. This requires that the model have both the ability to extend the abbreviations and the ability to recognize the abbreviations.

For step S13, in the encoder phase of the tarnsformer, self-attribute mechanism of the Transformer indicates the relation between words in the sentence. In addition, a sentence-level attention needs to be constructed to represent the relationship between two input sentences.

As an embodiment, the determining, by the self-attention mechanism, the sentence feature vector of the second round of abbreviated sentences includes:

outputting a feature vector of each word in the first round of complete sentences and the second round of abbreviated sentences through a self-attention mechanism, wherein the feature vector of the word comprises relationship information between the word and other words;

determining sentence feature vectors of the second round of abbreviated sentences based on the feature vectors of all the words in the second round of abbreviated sentences.

And determining sentence feature vectors of the second round of abbreviated sentences by averaging the feature vectors of all the words in the second round of abbreviated sentences.

In the present embodiment, the encoder-encoding layer inputs "< BOS > suzhou weather how-like < SEG > nanjing woolen < EOS >", and averages the embedding of the three words of "nanjing woolen" as its sentence feature S.

For step S14, the sentence characteristics of the second round of abbreviated sentences determined in step S13 are used to determine the relationship characteristics of the second round of abbreviated sentences and the first round of complete sentences,

as an implementation manner, performing attention calculation on the second sentence feature vector and the first word feature vector to obtain a plurality of sub-relationship feature vectors of the second round of abbreviated sentences and the first round of complete sentences;

and splicing the plurality of sub-relation feature vectors to obtain a relation feature vector.

In the present embodiment, an attribute process is performed on the sentence characteristics S determined in step S13 and embedding of each word in "how is the weather of suzhou" to output a hidden vector Sa, and the hidden vector Sa represents the relationship between the first round of complete sentences and the second round of abbreviated sentences.

If the encoder-encoding layer has multiple layers, each layer calculates and so on, and the encoder-encoding layer finally outputs the hidden layer vector H _ i (i = 1., 13) and the sentence relation representation ESa for each word. H _ i and ESa are spliced together as the final output Enc of the encoder-encoding layer.

For step S15: after obtaining the relation characteristic vector at the encoder-encoding layer and determining the relation between the first round of complete sentences and the second round of abbreviated sentences, generating simulated complete sentences of the second round of abbreviated sentences based on the determined hidden vector at the decoder-decoding layer. Because of the training phase, the simulated complete sentence may be somewhat different from the target complete sentence at this time. Inputting the target complete sentence into a decoder-decoding layer, determining the error of the target complete sentence and the simulated complete sentence vector, and training the abbreviated sentence understanding model through the error so as to enable the simulated complete sentence to approach the target complete sentence.

According to the embodiment, the model generated in the neural network is utilized, the internal relation between words and the relation between sentences are contained, the omitted sentences are restored to be complete sentences according to the input context, the understanding effect of the abbreviated sentence understanding model is improved, and the reply satisfaction rate of the user to the dialogue system is effectively improved.

Fig. 2 is a schematic structural diagram of a training system for a abbreviated sentence understanding model according to an embodiment of the present invention, which can execute the training method for the abbreviated sentence understanding model according to any of the above embodiments and is configured in a terminal.

The training system for the abbreviation sentence understanding model provided by the embodiment comprises: a data receiving program module 11, a model building program module 12, a sentence characteristic determining program module 13, a relation characteristic determining program module 14 and a training program module 15.

Wherein the data receiving program module 11 receives a dialogue training data set, the dialogue training data set comprising: the method comprises the steps that a first round of complete sentences and a second round of abbreviated sentences are continuously requested by a user, and target complete sentences used for representing the second round of abbreviated sentences; the model building program module 12 is used for building a contracted sentence understanding model based on an encoder-decoder framework, wherein the contracted sentence understanding model comprises an encoder-coding layer and a decoder-decoding layer, and the contracted sentence understanding model is used for reducing an omitted sentence into a complete sentence; the sentence characteristic determining program module 13 is configured to use the first round of complete sentences and the second round of abbreviated sentences as the input of the encoder-coding layer, and determine a second sentence characteristic vector of the second round of abbreviated sentences through a self-attention mechanism; the relation characteristic determining program module 14 is configured to determine a first word characteristic vector of each word in the first round of complete sentences, and determine, based on the first word characteristic vector and the second sentence characteristic vector, a relation characteristic vector of the second round of abbreviated sentences and the first round of complete sentences as an output of the encoder-coding layer; the training program module 15 is configured to generate, by the decoder-decoding layer, a simulated complete statement of the second round of abbreviated statements based on the relational feature vectors, and train the abbreviated statement understanding model based on the simulated complete statement and the target complete statement to make the simulated complete statement approach the target complete statement.

Further, the sentence feature determination program module is to:

and determining sentence feature vectors of the second round of abbreviated sentences based on the feature vectors of all the words in the second round of abbreviated sentences.

Further, the sentence characteristic determination program module is further for:

Further, the relational feature determination program module is for:

performing attention calculation on the second sentence feature vector and the first word feature vector to obtain a plurality of sub-relation feature vectors of the second round of abbreviated sentences and the first round of complete sentences;

The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the training method of the abbreviated sentence understanding model in any method embodiment;

as one embodiment, the non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:

constructing a abbreviated sentence understanding model based on an encoder-decoder framework, wherein the abbreviated sentence understanding model comprises an encoder-coding layer and a decoder-decoding layer, and the abbreviated sentence understanding model is used for reducing an omitted sentence into a complete sentence;

As a non-transitory computer-readable storage medium, it may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in the embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium, which when executed by a processor, perform a method of training a thumbnail understanding model in any of the method embodiments described above.

The non-volatile computer readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

An embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method for training a abbreviation understanding model of any of the embodiments of the present invention.

The client of the embodiment of the present application exists in various forms, including but not limited to:

(1) Mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.

(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as tablet computers.

(3) Portable entertainment devices such devices may display and play multimedia content. The devices comprise audio and video players, handheld game consoles, electronic books, intelligent toys and portable vehicle-mounted navigation devices.

(4) Other electronic devices with data processing capabilities.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising ...comprises 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of training a thumbnail understanding model, comprising:

the decoder-decoding layer generates a simulated complete statement of the second round of the abbreviated statement based on the relational feature vector, and trains the abbreviated statement understanding model based on the simulated complete statement and the target complete statement to enable the simulated complete statement to approach the target complete statement;

wherein the determining the sentence feature vector of the second round of abbreviated sentences through the self-attention mechanism comprises:

2. The method of claim 1, wherein the determining sentence feature vectors for the second round of abbreviated sentences based on feature vectors for all words in the second round of abbreviated sentences comprises:

3. The method of claim 1, wherein the determining a first word feature vector for each word in the first round of the complete sentence, the determining a relational feature vector for the second round of the abbreviated sentence from the first round of the complete sentence based on the first word feature vector and the second sentence feature vector comprises:

4. A system for training a thumbnail sentence understanding model, comprising:

a sentence characteristic determining program module, configured to determine a second sentence characteristic vector of the second round of abbreviated sentences through a self-attention mechanism, with the first round of complete sentences and the second round of abbreviated sentences as inputs of the encoder-coding layer;

a relation characteristic determining program module, configured to determine a first word characteristic vector of each word in the first round of complete sentences, and determine, based on the first word characteristic vector and the second sentence characteristic vector, a relation characteristic vector of the second round of abbreviated sentences and the first round of complete sentences, where the relation characteristic vector is used as an output of the encoder-coding layer;

a training program module, configured to generate, by the decoder-decoding layer, a simulated complete sentence of the second round of abbreviated sentences based on the relational feature vectors, and train the abbreviated sentence understanding model based on the simulated complete sentence and the target complete sentence to approximate the simulated complete sentence to the target complete sentence;

5. The system of claim 4, wherein the sentence feature determination program module is further to:

6. The system of claim 4, wherein the relational feature determination program module is to:

7. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any of claims 1-3.

8. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 3.