CN112632962B

CN112632962B - Method and device for realizing natural language understanding in man-machine interaction system

Info

Publication number: CN112632962B
Application number: CN202011565278.5A
Authority: CN
Inventors: 王宝军; 张钊; 徐坤; 张宇洋; 尚利峰; 李林琳
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-05-20
Filing date: 2020-12-25
Publication date: 2023-11-17
Anticipated expiration: 2040-12-25
Also published as: CN112632962A; CN111737972A

Abstract

The application provides a method and a related device for realizing natural language understanding in the field of artificial intelligence. According to the technical scheme provided by the application, the information of the vocabulary entry in the sentence input by the user can be inquired from the dictionary, the vocabulary entry information is subjected to disambiguation, and then the intention and related key information of the sentence are understood according to the vocabulary entry information obtained through the disambiguation. According to the technical scheme provided by the application, the performance of natural language understanding can be improved without additional data annotation, and the system for realizing natural language understanding is simple to maintain, so that the user experience can be improved.

Description

Method and device for realizing natural language understanding in man-machine interaction system

The present application claims priority from chinese patent application filed in 20 months 05 in 2020, filed in chinese national intellectual property agency, application number 202010429245.1, application name "method and apparatus for achieving natural language understanding in human-computer interaction system", the entire contents of which are incorporated herein by reference.

Technical Field

The application relates to the field of artificial intelligence, in particular to a method and a device for realizing natural language understanding in a man-machine interaction system.

Background

Artificial intelligence (artificial intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar manner to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

With the continuous development of artificial intelligence technology, natural language man-machine interaction systems that enable man-machine interaction through natural language are becoming more and more important. The man-machine interaction can be performed through natural language, and the system is required to recognize the specific meaning of the natural language of the human. The specific meaning of recognizing human natural language is called natural language understanding (natural language understand, NLU). NLU generally refers to identifying a user's intent, extracting key information in the user's natural language.

The NLU is a smart device, such as a smart speaker, a smart television, a smart car, or a smart phone, that is a part that is not interactable with user interaction, or, as it were, a key module of a human interaction system.

For example, after a mobile phone user inputs a voice "buy a ticket to Beijing" to a mobile phone assistant, an NLU module in the mobile phone assistant needs to identify that the user intends to buy the ticket, and extract a key information "destination: beijing ", so that the mobile phone assistant may further open the ticket booking application for the user and further realize the ticket booking service.

Therefore, how to implement natural language understanding is a technical problem to be solved.

With the continuous development of artificial intelligence technology, natural language man-machine interaction systems that enable man-machine interaction through natural language are becoming more and more important. The man-machine interaction can be performed through natural language, so that the system is required to be capable of identifying the specific meaning of the human natural language, and the specific meaning of the human natural language is identified as natural language understanding (natural language understand, NLU). The task of natural language understanding often needs to introduce external dictionary knowledge, such as a common entity recognition task, and the entities in the natural language are diversified, taking a song name as an example, any word or character string may be the song name, and the song name may have no definite meaning or is very long, so that it is difficult to accurately frame the boundary of the entity in a sentence through a machine learning method; further, in performing intention recognition and slot extraction, if the type of the entity cannot be clarified, it is also difficult to determine the intention of the user. Thus, external dictionary knowledge is needed as an aid. The method for introducing dictionary knowledge into the task of natural language understanding at present mainly has two problems, namely that on one hand, only word matching is adopted, noise is high, and dictionary knowledge is submerged; on the other hand, the path matching can be performed only by pre-word segmentation, so that the performance of the system is limited by the accuracy of the word segmentation method, and the model needs to be pre-trained according to the dictionary, so that the model cannot dynamically update the dictionary.

Disclosure of Invention

The application provides a method and a related device for realizing natural language understanding in a man-machine interaction system, and the method can improve the performance of natural language understanding, thereby improving the experience of users.

In a first aspect, the present application provides a method for implementing natural language understanding in a human-computer interaction system. The method comprises the following steps: acquiring target entry information, wherein the target entry information is used for representing entries contained in target sentences, and the target sentences are sentences input by a user to the man-machine interaction system; acquiring target indication information based on the target sentence and the target entry information by using a term disambiguation model, wherein the target indication information is used for indicating whether the term indicated by the target entry information accords with the semantic meaning of the target sentence; and acquiring an understanding result based on the target sentence, the target entry information and the target indication information by using a natural language understanding model, wherein the understanding result comprises target intention of the target sentence input by the user and key information of the target intention.

In the method, after the information of the vocabulary entry contained in the target sentence is acquired, the information of the vocabulary entry is not directly used for assisting the natural language understanding model in understanding the target sentence, but a vocabulary entry disambiguation model is used for judging whether the vocabulary entry is the vocabulary entry conforming to the semantics of the target sentence or checking whether the vocabulary entry can be used as the real vocabulary entry of the target sentence, and the judgment result is used for assisting the natural language understanding model in acquiring the intention and key information of the target sentence. This may improve the performance of natural language understanding, which may improve the user experience.

Typically, each model requires its training data to be annotated as it is trained. However, since the instruction information of the output of the term disambiguation model in the present application can be obtained by estimating the intention and the key information in the training data of the natural language understanding model, the term message model and the training data of the natural language understanding model in the present application may be obtained by only labeling the intention and the key information of the sentence, that is, only the training data of the natural language understanding model may be labeled. The method can save the manual labeling cost of the training data and provide the acquisition efficiency of the training data, so that the training efficiency of the two models can be improved, especially in the scene of needing to update the natural language understanding function of the man-machine interaction system, the updating efficiency of the two models can be improved, the performance of the man-machine interaction system can be improved in time, and the experience of a user is finally improved.

In some possible implementations, the obtaining the target entry information includes: and inquiring the vocabulary entries contained in the target sentence from a target dictionary to obtain target vocabulary entry information, wherein the target dictionary comprises at least one vocabulary entry.

In the implementation manner, the target vocabulary entry information is obtained according to the target dictionary query, so that under the condition that vocabulary entries contained in sentences input by a user are changed, the vocabulary entries in the sentences input by the user can be identified according to the target dictionary only by updating the vocabulary entries in the target dictionary, the identification rate of the vocabulary entries in the sentences input by the user can be conveniently and rapidly improved, the understanding performance of natural language can be improved, the performance of a man-machine interaction system can be improved, and finally the user experience can be improved.

In some possible implementations, the method is performed by an end-side device, wherein the target dictionary comprises a dictionary on the end-side device.

That is, target entry information of the target sentence can be obtained on the end-side device from the dictionary query. Compared with the method for acquiring the target entry information of the target sentence through the cloud side equipment, the method can save transmission time, so that the natural language understanding efficiency can be improved, the man-machine interaction efficiency is further improved, and finally the user experience is improved.

In addition, the terminal side device can acquire the target entry information of the target sentence based on the dictionary of the terminal side device, so that the target entry information of the target sentence can be acquired under the condition that the cloud side device is not available or connected with the cloud side device, natural language understanding is realized, the application scene of the natural language understanding can be expanded, namely, the application scene of a man-machine interaction system is expanded, and the user experience is improved.

In addition, the target entry information of the target sentence can be queried on the terminal side device according to the dictionary, so that the dictionary related to the privacy of the user can be configured on the terminal side device, the privacy of the user can be protected, and the experience of the user can be improved.

Moreover, as the target entry information of the target sentence can be queried according to the dictionary on the terminal side device, the dictionary with higher user query frequency or common use can be configured on the terminal side device, and thus, compared with the process of acquiring the target entry information from the cloud side device, the terminal side device can quickly query the target entry information of the target sentence, thereby improving the efficiency of natural language understanding, further improving the efficiency of man-machine interaction and finally improving the user experience.

In some possible implementations, the target dictionary includes a dictionary on a cloud-side device, wherein the obtaining the target entry information according to the target dictionary and the target sentence includes: sending the target sentence to the cloud side equipment; and receiving the target entry information from the cloud side equipment.

That is, the end side device may acquire target entry information of the target sentence from the cloud side device. Therefore, the storage space and the computing resources of the terminal equipment can be saved, namely, the capability requirement of the terminal equipment for realizing natural language understanding is reduced, for example, the terminal equipment with lower performance can also realize natural language understanding with higher efficiency, so that the application scene of the man-machine interaction system can be enlarged, and the user experience is finally improved.

In some possible implementations, the method further includes: and obtaining the candidate intention of the target sentence by using an intention recognition model. The sending the target sentence to the cloud side device includes: and under the condition that the dictionary corresponding to the candidate intention is judged to be positioned on the cloud side equipment according to a preset corresponding relation, sending the target sentence to the cloud side equipment, wherein the corresponding relation is used for indicating whether the intention is positioned on the cloud side equipment.

That is, in a case where it is determined that a dictionary to be used for querying target term information of a target sentence is located at a cloud-side device according to an intention of the target sentence, the cloud-side device is requested to query the target term information. The method can flexibly control the times of requesting the cloud side equipment to query the dictionary, avoid invalid query and improve the understanding efficiency of natural language, thereby improving the efficiency of man-machine interaction.

In some examples, it may be determined whether a dictionary to be used for querying target term information of a target sentence is located at a cloud-side device or an end-side device according to an intention of the target sentence, and candidate terms of the target sentence are acquired from the cloud-side device in a case where it is determined that the target dictionary is located at the cloud-side device. Thus, invalid queries can be avoided, thereby improving user experience.

In some possible implementations, the term disambiguation model is a two-class model.

In a second aspect, the present application provides a model training method, comprising: acquiring first training data, wherein the first training data comprises training sentences, intentions of the training sentences and key information of the training sentences; acquiring entry information, wherein the entry information is used for representing entries contained in the training sentences; acquiring indication information according to the first training data and the entry information, wherein the indication information is used for indicating whether the entry represented by the entry information accords with the intention and the semantic represented by the key information; obtaining second training data according to the training sentences, the entry information and the indication information, wherein the second training data comprises the training sentences, the entry information and the indication information, and the second training data is used for training an entry disambiguation model, and the entry disambiguation model is used for judging whether the entries represented by the entry information to be processed conform to the semantics of the sentences to be processed or not based on the sentences to be processed and the entry information to be processed.

In the method, in the process of acquiring the training data of the term disambiguation model and the natural language understanding model, the second training data of the term message model can be obtained by automatically labeling the first training data, so that in the process of acquiring the training data of the two models, only the first training data can be manually labeled. The training data acquisition efficiency of the two models can be improved, so that the training efficiency of the two models can be improved, the performance of the human-computer interaction system can be improved in time, and finally the user experience can be improved.

In some possible implementations, the method may further include: and acquiring third training data according to the first training data, the entry information and the indication information, wherein the third training data comprises the first training data, the entry information and the indication information, and is used for training a natural language understanding model in the first aspect or any implementation manner.

In some possible implementations, the obtaining the entry information includes: and inquiring the vocabulary entries contained in the sentence from a dictionary to obtain the vocabulary entry information, wherein the dictionary comprises at least one vocabulary entry.

In this implementation, since the term information is acquired from the dictionary query, the recognition rate of the term information of the sentence can be updated by updating the dictionary. And because the dictionary is updated more conveniently and rapidly, the recognition rate of the entry information of the sentence can be conveniently and rapidly improved, the accuracy of the natural language understanding of the sentence can be conveniently and rapidly improved, the accuracy of a man-machine interaction system can be further improved, and the experience of a user is finally improved.

In a third aspect, the present application provides a model training method. The method comprises the following steps: acquiring second training data, wherein the second training data comprises training sentences, entry information and indication information, the entry information is used for representing entries contained in the training sentences, and the indication information is used for indicating whether the entries represented by the entry information conform to the intention of the training sentences and the semantics represented by the key information of the intention; and training an entry disambiguation model according to the second training data, wherein the entry disambiguation model is used for judging whether the entry represented by the entry information to be processed accords with the semantics of the statement to be processed or not based on the statement to be processed and the entry information to be processed.

In some implementations, the second training data is second training data obtained using the second aspect or any one of the implementations.

In some implementations, the term disambiguation model of the first aspect or any one of the implementations is trained.

Because the training data obtained in the second aspect or any one of the modes is used, the training efficiency of the term disambiguation model can be improved, so that the performance of the man-machine interaction system can be improved efficiently, and the experience of a user can be improved.

In a fourth aspect, the present application provides a model training method. The method comprises the following steps: acquiring third training data, wherein the third training data comprises the training sentences, the intentions, the key information, the entry information and the indication information; and training a natural language understanding model according to the third training data, wherein the natural language understanding model is used for acquiring the intention of a user input to the sentence to be understood and the key information of the intention based on the sentence to be understood, first auxiliary information and second auxiliary information, the first auxiliary information is used for representing the entry contained in the sentence to be understood, and the second auxiliary information is used for indicating whether the entry represented by the first auxiliary information accords with the semantic meaning of the sentence to be understood.

In some implementations, the third training data is third training data obtained using the second aspect or any one of the implementations.

In some implementations, the training results in a natural language understanding model in the first aspect or in any one of its implementations.

Because the training data obtained in the second aspect or any one of the modes is used, the training efficiency of the natural language understanding model can be improved, so that the performance of the man-machine interaction system can be improved efficiently, and the experience of a user is improved.

In a fifth aspect, an apparatus for implementing natural language understanding in a human-computer interaction system is provided, where the apparatus includes a module for performing the method in the first aspect or any implementation manner of the first aspect.

In a sixth aspect, a model training apparatus is provided, the apparatus comprising means for performing the method of the second aspect or any one of the implementations described above.

In a seventh aspect, a model training apparatus is provided, the apparatus comprising means for performing the method of the third aspect or any one of the implementations described above.

In an eighth aspect, a model training apparatus is provided, the apparatus comprising means for performing the method of the fourth aspect or any one of the implementations described above.

In a ninth aspect, there is provided an apparatus for implementing natural language understanding in a human-computer interaction system, the apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, the processor being configured to perform the method of the first aspect or any implementation manner thereof when the program stored in the memory is executed.

In a tenth aspect, there is provided an apparatus for acquiring training data, the apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, the processor being for executing the method of the second aspect or any one of the implementation manners when the program stored in the memory is executed.

In an eleventh aspect, there is provided an apparatus for training a model, the apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, the processor being for executing the method of the third aspect or any one of the implementation forms thereof, when the program stored in the memory is executed.

In a twelfth aspect, there is provided an apparatus for training a model, the apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, the processor for executing the method of the fourth aspect or any one of the implementation manners when the program stored in the memory is executed.

In a thirteenth aspect, there is provided a computer readable medium storing program code for execution by a device for performing the method of the first, second, third, fourth or any one of the implementations.

In a fourteenth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first, second, third, fourth or any one of the implementations described above.

A fifteenth aspect provides a chip comprising a processor and a data interface, the processor reading instructions stored on a memory via the data interface, performing the method of the first aspect, the second aspect, the third aspect, the fourth aspect or any implementation of the foregoing.

Optionally, as an implementation manner, the chip may further include a memory, where the memory stores instructions, and the processor is configured to execute the instructions stored on the memory, where the processor is configured to perform the method in the first aspect, the second aspect, the third aspect, or the fourth aspect, or any implementation manner of the first aspect, the second aspect, the third aspect, or the fourth aspect, when the instructions are executed.

In a sixteenth aspect, there is provided a computing device comprising: a memory for storing a program; a processor for executing the program stored in the memory, the processor being configured to perform the method of the first aspect, the second aspect, the third aspect, the fourth aspect or any one of the implementation manners when the program stored in the memory is executed.

In a seventeenth aspect, there is provided a method for implementing natural language understanding in a human-computer interaction system, the method comprising: acquiring a plurality of target entries matched with target sentences in one or more dictionaries, wherein the target sentences are sentences input by a user to a human-computer interaction system; acquiring a plurality of sequences of target sentences according to the target vocabulary entries, wherein each sequence in the plurality of sequences corresponds to one target vocabulary entry; acquiring a plurality of first sequence representations corresponding to the sequences according to the sequences and the target sentences; determining whether each target term of the plurality of target terms meets the semantics of the target sentence according to the plurality of first sequence representations; and carrying out natural language understanding processing on the target sentence according to one or more first sequence representations corresponding to the target vocabulary entry conforming to the semantics of the target sentence so as to obtain a processing result.

The method for realizing natural language understanding in the man-machine interaction system of the embodiment of the application embeds dictionary knowledge and can better realize the processing of natural language understanding. The method comprises the steps of matching natural language with a dictionary, converting the natural language into sequences, and obtaining sequence representations of the sequences according to the sequences, wherein any of various terms matched with the dictionary can be converted into the sequences with the limited number because the types of the sequences are fixed with the limited number, and the corresponding sequence representations are also of the limited number. Therefore, after the dictionary is updated or expanded, the neural network model does not need to be updated, and the generalization capability of the model is improved.

With reference to the seventeenth aspect, in certain implementations of the seventeenth aspect, each of the plurality of sequences includes type information of a target term and location information of the target term in the target sentence.

The type information is used for indicating the type of the matching entity, and the position information is used for indicating the position of the matching entity in the sequence, so that the subsequent natural language understanding processing is facilitated.

With reference to the seventeenth aspect, in certain implementations of the seventeenth aspect, obtaining a plurality of first sequence representations corresponding to the plurality of sequences according to the plurality of sequences and the target sentence includes: obtaining a low-dimensional representation of each of a plurality of sequences; acquiring a context representation of the target sentence; the low-dimensional representation of each of the plurality of sequences and the contextual representation of the target sentence are fused to obtain a plurality of first sequence representations corresponding to the plurality of sequences.

With reference to the seventeenth aspect, in certain implementations of the seventeenth aspect, determining whether each target term of the plurality of target terms conforms to the semantics of the target sentence according to the plurality of first sequence representations includes: acquiring attention of the second sequence representation to other first sequence representations except the second sequence representation in the plurality of first sequence representations, wherein the second sequence representation is one of the plurality of first sequence representations; and determining whether the target entry corresponding to the second sequence representation accords with the semantics of the target sentence according to the second sequence representation and the attention.

Since the grammar structure of the target sentence input by the user is limited, most of the plurality of target entries obtained above are matching results that do not conform to the semantics of the target sentence, and most of the plurality of first sequence representations obtained by this are also results that do not conform to the semantics of the target sentence, so that it is necessary to perform disambiguation processing on the plurality of first sequence representations. When the disambiguation is carried out, the relationship between the matching entity and the context in one sequence is considered, and the relationship between different sequences is considered, so that the disambiguation result is more accurate.

With reference to the seventeenth aspect, in some implementations of the seventeenth aspect, performing natural language understanding processing on the target sentence according to one or more first sequence representations corresponding to target terms that conform to semantics of the target sentence, including: acquiring a context representation of the target sentence; fusing the first sequence representation corresponding to the target entry of the semantic meaning of the target sentence with the context representation of the target sentence to obtain one or more third sequence representations corresponding to the target entry of the semantic meaning of the target sentence; and carrying out natural language understanding processing on the target sentence according to the one or more third sequence representations.

In an eighteenth aspect, there is provided a training method of a neural network model including a first sub-network model, a second sub-network model, and a third sub-network model, the method comprising: acquiring first training data, wherein the first training data comprises training sentences and a plurality of first sequence representations matched with a target dictionary by the training sentences; training the first sub-network model according to the first training data to obtain a trained first sub-network model; acquiring second training data, wherein the second training data comprises an output result of a trained first sub-network model and a first sequence representation meeting preset requirements in a plurality of first sequence representations; training the second sub-network model according to the second training data to obtain a trained second sub-network model; acquiring third training data, wherein the third training data comprises an output result of the trained second sub-network model and a processing result of natural language understanding processing of training sentences; and training the third sub-network model according to the third training data to obtain a trained third sub-network model.

The neural network model training method is an end-to-end training method, and is simple in flow and high in training speed.

With reference to the eighteenth aspect, in some implementations of the eighteenth aspect, the first subnetwork model is an entry matching model, configured to obtain, according to a target sentence and a target dictionary input by a user, a plurality of sequence representations matched with the target dictionary, where each sequence representation in the plurality of sequence representations corresponds to a target entry, and the target entry is an entry matched with the target sentence; the second sub-network model is an entry disambiguation model and is used for determining whether the corresponding target entry of each sequence representation in the sequence representations accords with the semantics of the target sentence according to the target sentence and the sequence representations; the third sub-network model is a natural language understanding model and is used for carrying out natural language understanding processing on the target sentence according to the sequence representation corresponding to the target entry conforming to the semantic meaning of the target sentence.

With reference to the eighteenth aspect, in some implementations of the eighteenth aspect, each first sequence representation in the plurality of first sequence representations corresponds to a first target term, the first target term is a term matching the training sentence with the target dictionary, and the first sequence representation meeting the preset requirement is a sequence representation corresponding to the first target term conforming to the meaning of the training sentence.

In a nineteenth aspect, an apparatus for implementing natural language understanding in a human-computer interaction system is provided, including: the acquisition module is used for acquiring a plurality of target entries matched with target sentences in one or more dictionaries, wherein the target sentences are sentences input by a user to the human-computer interaction system; the processing module is used for acquiring a plurality of sequences of target sentences according to the target vocabulary entries, and each sequence in the plurality of sequences corresponds to one target vocabulary entry; the processing module is further used for acquiring a plurality of first sequence representations corresponding to the sequences according to the sequences and the target sentences; the processing module is further used for determining whether each target entry in the plurality of target entries accords with the target sentence semantics according to the plurality of first sequence representations; and the processing module is also used for carrying out natural language understanding processing on the target sentence according to the first sequence representation corresponding to one or more target vocabulary entries conforming to the semantic meaning of the target sentence so as to obtain a processing result.

With reference to the nineteenth aspect, in certain implementations of the nineteenth aspect, each of the plurality of sequences includes type information of a target term and location information of the target term in the target sentence.

With reference to the nineteenth aspect, in some implementations of the nineteenth aspect, the processing module obtains a plurality of first sequence representations corresponding to the plurality of sequences according to the plurality of sequences and the target sentence, including: obtaining a low-dimensional representation of each of a plurality of sequences; acquiring a context representation of the target sentence; the low-dimensional representation of each of the plurality of sequences and the contextual representation of the target sentence are fused to obtain a plurality of first sequence representations corresponding to the plurality of sequences.

With reference to the nineteenth aspect, in certain implementations of the nineteenth aspect, the processing module determining, from the plurality of first sequence representations, whether each target term of the plurality of target terms conforms to a target sentence semantic includes: acquiring attention of the second sequence representation to other first sequence representations except the second sequence representation in the plurality of first sequence representations, wherein the second sequence representation is one of the plurality of first sequence representations; and determining whether the target entry corresponding to the second sequence representation accords with the target sentence semantic according to the second sequence representation and the attention.

With reference to the nineteenth aspect, in some implementation manners of the nineteenth aspect, the processing module performs natural language understanding processing on the target sentence according to a first sequence representation corresponding to one or more target terms that conform to the semantics of the target sentence, including: acquiring a context representation of the target sentence; fusing the first sequence representation corresponding to each target term conforming to the target sentence semantic in the first sequence representation corresponding to the target term conforming to the target sentence semantic and the context representation of the target sentence to obtain one or more third sequence representations corresponding to the one or more target terms conforming to the target sentence semantic; the target sentence is subjected to natural language understanding processing according to the one or more third sequence representations.

In a twentieth aspect, there is provided a training apparatus for a neural network model, the neural network model including a first sub-network model, a second sub-network model, and a third sub-network model, the apparatus comprising: the acquisition module is used for acquiring first training data, wherein the first training data comprises training sentences and a plurality of first sequence representations of which the training sentences are matched with the target dictionary; the training module is used for training the first sub-network model according to the first training data so as to obtain a trained first sub-network model; the acquisition module is further used for acquiring second training data, wherein the second training data comprises an output result of the trained first sub-network model and a first sequence representation meeting preset requirements in the plurality of first sequence representations; the training module is further used for training the second sub-network model according to the second training data so as to obtain a trained second sub-network model; the acquisition module is also used for acquiring third training data, wherein the third training data comprises an output result of the trained second sub-network model and a processing result of natural language understanding processing of training sentences; the training module is further configured to train the third sub-network model according to the third training data, so as to obtain a trained third sub-network model.

With reference to the twentieth aspect, in some implementations of the twentieth aspect, the first subnetwork model is an entry matching model, configured to obtain, according to a target sentence and a target dictionary input by a user, a plurality of sequence representations matched with the target dictionary, each sequence representation in the plurality of sequence representations corresponding to a target entry, the target entry being an entry matched with the target sentence; the second sub-network model is an entry disambiguation model and is used for determining whether the corresponding target entry of each sequence representation in the sequence representations accords with the semantics of the target sentence according to the target sentence and the sequence representations; the third sub-network model is a natural language understanding model and is used for carrying out natural language understanding processing on the target sentence according to the sequence representation corresponding to the target entry conforming to the semantic meaning of the target sentence.

With reference to the twentieth aspect, in some implementations of the twentieth aspect, each first sequence representation in the plurality of first sequence representations corresponds to a first target term, the first target term is a term matching the training sentence with the target dictionary, and the first sequence representation meeting the preset requirement is a sequence representation corresponding to the first target term that conforms to the meaning of the training sentence.

In a twenty-first aspect, an apparatus for implementing natural language understanding in a man-machine interaction system is provided, where the apparatus includes: a processor coupled to the memory; the memory is used for storing instructions; the processor is configured to execute instructions stored in the memory to cause the apparatus to implement a method of any one of the implementations of the seventeenth aspect described above.

In a twenty-second aspect, there is provided a model training apparatus, comprising: a processor coupled to the memory; the memory is used for storing instructions; the processor is configured to execute instructions stored in the memory to cause the apparatus to implement the method of the eighteenth aspect.

In a twenty-third aspect, a computer-readable medium is provided, which stores program code for execution by a device for performing the method of any one of the above seventeenth and eighteenth aspects.

In a twenty-fourth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the methods of the seventeenth and eighteenth aspects described above.

In a twenty-fifth aspect, a chip is provided, the chip comprising a processor and a data interface, the processor reading instructions stored on a memory via the data interface, the method of the seventeenth and eighteenth aspects being performed.

Optionally, as an implementation manner, the chip may further include a memory, where the memory stores instructions, and the processor is configured to execute the instructions stored on the memory, and when the instructions are executed, the processor is configured to perform the methods in the seventeenth aspect and the eighteenth aspect.

In a twenty-sixth aspect, there is provided a computing device comprising: a memory for storing a program; a processor for executing a memory-stored program, the processor being for executing the methods of the seventeenth and eighteenth aspects described above when the memory-stored program is executed.

Drawings

FIG. 1 is a schematic flow chart of a method of acquiring training data in accordance with one embodiment of the application;

FIG. 2 is a schematic flow chart of a method of training a model in accordance with one embodiment of the application;

FIG. 3 is a schematic diagram of the structure of an entry disambiguation model according to one embodiment of the application;

FIG. 4 is a schematic flow chart of a method of training a model in accordance with another embodiment of the application;

FIG. 5 is a schematic flow chart diagram of a method of implementing natural language understanding in accordance with one embodiment of the present application;

FIG. 6 is a schematic flow chart diagram of a method of implementing natural language understanding in accordance with another embodiment of the present application;

FIG. 7 is a schematic diagram of a system architecture of one embodiment of the present application;

FIG. 8 is a deployment diagram of an apparatus for training a model provided in one embodiment of the present application;

FIG. 9 is an exemplary block diagram of a computing device according to one embodiment of the application;

FIG. 10 is a schematic diagram of a system architecture of another embodiment of the present application;

FIG. 11 is a dictionary deployment diagram of one embodiment of the present application;

FIG. 12 is a schematic block diagram of an apparatus for acquiring training data according to an embodiment of the present application;

FIG. 13 is a schematic block diagram of an apparatus for training a model in accordance with one embodiment of the present application;

FIG. 14 is a schematic block diagram of an apparatus for implementing natural language understanding in accordance with one embodiment of the present application;

FIG. 15 is a schematic block diagram of an apparatus of one embodiment of the present application;

FIG. 16 is a schematic block diagram of a computer program product of one embodiment of the present application;

FIG. 17 is a schematic diagram of a system architecture involved in implementing a natural language understanding method in a human-computer interaction system of the present application;

FIG. 18 is a schematic diagram of modules involved in implementing a natural language understanding method in the human-computer interaction system of the present application;

FIG. 19 is a schematic flow chart of a method of implementing natural language understanding in a human-machine interaction system of the present application;

FIG. 20 is a schematic flow chart of a training method of the neural network model of the present application;

FIG. 21 is a schematic diagram of the fusion of a low-dimensional representation with a contextual representation of the present application;

FIG. 22 is a schematic diagram of the sequence disambiguation of the present application;

FIG. 23 is a schematic illustration of natural language processing of disambiguation results in accordance with the present application;

FIG. 24 is a schematic block diagram of an apparatus for implementing natural language understanding in the human-machine interaction system of the present application;

FIG. 25 is a schematic block diagram of a training apparatus of the neural network model of the present application;

fig. 26 is a schematic diagram of the hardware configuration of an apparatus of the present application.

Detailed Description

In order to facilitate understanding of embodiments of the present application, some terms or concepts used in the embodiments of the present application are described below.

(1) Term disambiguation model

The entry of the term disambiguation model comprises a sentence and term information, wherein the term information is used for representing a term contained in the sentence, and the term disambiguation model is used for judging whether the term represented by the term information accords with the semantic meaning of the sentence.

The term information may represent one or more terms contained in the sentence, and each term may be a reasonable term or an unreasonable term in the sentence. The term reasonable term refers to a term conforming to the meaning of the sentence, and otherwise, the term is unreasonable.

For example, the sentence is "I want to hear a rainy day. When the term in the sentence can contain "tomorrow" and "tomorrow" are terms conforming to the meaning, namely reasonable terms, and "tomorrow" is not conforming to the meaning, namely unreasonable terms.

The term information may be the term itself or the position of the term in the sentence. For example, the sentence is "I want to hear a rainy day. "and the position index of words in a sentence starts counting from 0, the term information may be" 3-8", i.e., it means that the content from index 3 to index 8 in the sentence is the term.

(2) Natural language understanding model

The input of the natural language understanding model comprises a sentence, first auxiliary information and second auxiliary information, wherein the first auxiliary information is used for expressing an entry contained in the sentence, the second auxiliary information is used for indicating whether the entry expressed by the first auxiliary information accords with the semantic meaning of the sentence, and the natural language understanding model is used for acquiring the intention of a user for inputting the sentence and key information of the intention based on the sentence, the first auxiliary information and the second auxiliary information.

The first auxiliary information may be understood as term information, and the second auxiliary information may be understood as information output by the term disambiguation model based on the sentence and the term information.

The intention is a purpose that the user wants to express by an input sentence. For example, the user enters the sentence "I want to hear tomorrow raining". In the case of "the user's intention is to listen to the song. In the embodiment of the application, the intention of the user input sentence is also called as the intention of the sentence.

The key information may also be understood as a slot corresponding to the intention. For example, when the intention of the user is to listen to a song, the name of the song is a slot.

The entry and the slot in the embodiment of the application have essential differences. The slot is determined according to the corresponding intention. For example, the term "Beijing" may serve as a slot "destination" in some airline reservation intents: beijing ", may serve as a slot" start "in other airline reservation intents: beijing).

(3) Dictionary with a plurality of dictionary marks

Referring to a collection of words, in embodiments of the present application, a collection of words with common attributes that are collected or arranged for a particular purpose, such as a place name dictionary, a person name dictionary, a song dictionary, etc. In some cases, it may also be extended to any set of word components, such as a word segmentation dictionary.

In the embodiment of the application, the dictionary may also be called a dictionary or a dictionary. The dictionary of the embodiment of the application comprises one or more entries.

The term in the present application is also called term, item, term, which may be a word or a word, or may be a word or a word.

The term in the present application may include concrete things, well-known characters, abstract concepts, literature works, hot events, chinese words or combinations of specific topics, etc. Where a particular transaction may also be understood as an entity.

(4) End side device

The end-side device in the embodiments of the present application may be a mobile phone with computing capability, a tablet personal computer (tablet personal computer, TPC), a media player, a smart home, a notebook computer (LC), a personal digital assistant (personal digital assistant, PDA), a personal computer (personal computer, PC), a camera, a video camera, a smart watch, a Wearable Device (WD), a vehicle, or the like. It will be appreciated that the specific form of the end-side apparatus according to the embodiments of the present application is not limited.

(5) Sequence annotation

One of the common tasks of natural language processing tasks is generally to give a sequence, mark each element in the sequence or label each element, including word segmentation task and entity recognition task.

(6) Sequence modeling

The sequence is represented as a representation of one or more low-dimensional dense vectors.

(7) Groove position

Is key information corresponding to intention in the sentence of the user, for example, the intention is to order an airplane ticket, and the corresponding slot position can be take-off time, landing time and the like.

(8) Language model

Is modeling of probability distribution of sentences, and can judge whether the sentences are sequences of natural language or not.

(9) Pre-trained language model

The method can be directly used for fine adjustment of downstream tasks, and greatly shortens training time of the tasks.

(10) Word representation (ebedding)

Word representations in natural language processing refer to words or low-dimensional vector representations of words by which relationships between words can be calculated, the nature of which is such that words corresponding to vectors that are closely spaced have similar meanings.

(11) Self-attention mechanism (self-attention)

The attentiveness mechanism mimics the internal process of biological observation behavior, a mechanism that aligns internal experience with external sensations to increase the observation of fine segments of a partial region. Attention mechanisms can quickly extract important features of sparse data and are thus widely used for natural language processing tasks, particularly machine translation. Two self-attention mechanisms are improvements in attention mechanisms that reduce reliance on external information, and are more adept at capturing internal dependencies of data or features.

The following description of the technical solutions according to the embodiments of the present application will be given with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Fig. 1 is a schematic flow chart of a method of acquiring training data according to an embodiment of the present application. As shown in fig. 1, the method may include S110 to S150.

S110, acquiring first training data, wherein the first training data comprises training sentences, the intention of the training sentences and key information of the intention.

The intent and key information may be tag data of the training sentence, i.e. data that plays a supervisory role in the training process. The intent and key information may be manually annotated according to the semantics of the training sentence.

For example, for a training sentence "buy a ticket to Beijing", the intent of the training sentence is "buy ticket" and the key information (or slot) of the intent of the training sentence is "destination" may be manually labeled: beijing).

S120, entry information is acquired, wherein the entry information is used for representing entries contained in the training sentences.

For example, a training sentence is "I want to hear a rainy day. "when, one term information may contain" 1-2", another term information may contain" 6-8", and yet another term information may contain" 3-8".

In some examples, entry information for the training sentence may be obtained through a dictionary query. In this example, it may include: and acquiring a dictionary, and acquiring entry information of the training sentences according to the dictionary and the training sentences. Generally, the term information may include terms commonly included in a dictionary and a training sentence.

For example, dictionary matching can be performed on training sentences input by a user through voice or characters through a quick query method, so as to obtain matching vectors, the matching vectors possibly contain various matching results, and longest matching processing can be performed on the matching results overlapped with each other, so that matching results which are not contained with each other are finally obtained. One example of a fast query method is the word look-up tree method (TRIE).

It will be appreciated that the retrieval of the entry information of the training sentence by the dictionary lookup is merely an example, and the present embodiment is not limited to the implementation of retrieving the entry information of the training sentence. For example, term information in a training sentence may be acquired using a term recognition model obtained through machine learning.

The vocabulary entry information of the training sentences is obtained through a dictionary query mode, and compared with the vocabulary entry information of the training sentences obtained through a preset rule or a trained model, the vocabulary entry information of the training sentences can be conveniently and rapidly updated in a scene of needing to update the recognizable vocabulary entry, and the vocabulary entry information of the training sentences is obtained through a dictionary query mode in other modes by rewriting the rule or re-training the model, so that the accuracy of the vocabulary entry information can be improved more efficiently and conveniently, the accuracy of training data can be improved, the performance of the trained model is improved, and finally the performance of a human-computer interaction system applying the model can be improved, namely the user experience is improved.

S130, acquiring indication information according to the first training data and the entry information, wherein the indication information is used for indicating whether the entry represented by the entry information accords with the intention and the semantic represented by the key information.

In one example, if the term indicated by the term information is the same as the term corresponding to the key information in the first training data, the term indicated by the term information may be considered to conform to the meaning indicated by the intention and the semantic indicated by the key information, otherwise, the term indicated by the term information may be considered to be inconsistent.

For example, the training sentence included in the first training data is "i want to hear the tomorrow raining. ", intended to" listen to songs ", the key information is" song name: and when the term represented by the term information is the song name of 'raining in the open day', the corresponding indication information indicates that the term indicated by the term information accords with the intention and the semantic represented by the key information or accords with the semantic of the training sentence.

As another example, the training sentence included in the first training data is "i want to hear the tomorrow raining. ", intended to" listen to songs ", the key information is" song name: and if the term represented by the term information is the movie name of "raining in the open weather", the corresponding indication information indicates that the term indicated by the term information does not conform to the meaning represented by the intention and the key information or does not conform to the meaning of the training sentence.

S140, obtaining second training data according to the training sentences, the vocabulary entry information and the indication information, wherein the second training data comprises the training sentences, the vocabulary entry information and the indication information, and the second training data is used for training a vocabulary entry disambiguation model.

One implementation of obtaining the second training data according to the training sentence, the entry information and the indication information includes: and combining the training sentences, the entry information and the indication information into second training data, wherein the indication information is used as the marking data of the training sentences and the entry information.

In this embodiment, a second training data may be referred to as a training corpus.

For example, a training sentence is "I want to hear a rainy day. "in this case, the dictionary search results shown in table 1 can be obtained.

Table 1 dictionary lookup results

Training sentences	Query results
		I want to hear the weather raining.	Song: 1-2
I want to hear the weather raining.	Song: 6-8
		I want to hear the weather raining.	Song: 3-8
I want to hear the weather raining.	Film: 3-8

According to the first training data: i want to listen to the tomorrow and listen to the song and tomorrow to rain, and the contents in Table 1 above can get the second training data. An example of the content contained in the second training data is shown in table 2. In table 2, "0" indicates semantics that do not conform to the training sentence, and "1" indicates semantics that conform to the training sentence.

TABLE 2 training corpus

Training sentences	Query results	Indication information
			I want to hear the weather raining.	Song: 1-2	0
I want to hear the weather raining.	Song: 6-8	0
			I want to hear the weather raining.	Song: 3-8	1
I want to hear the weather raining.	Film: 3-8	0

S150, obtaining third training data according to the first training data, the entry information and the indication information, wherein the third training data comprises the first training data, the entry information and the indication information, and the third training data is used for training a natural language understanding model.

One implementation of obtaining third training data according to the first training data, the entry information and the indication information includes: and combining the first training data, the entry information and the indication information into third training data, wherein intention and key information are used as labeling data of the first training data, the entry information and the indication information.

In some examples, the method shown in fig. 1 may be performed by a cloud-side device, although it may also be performed by an end-side device. The method shown in fig. 1 may be performed by a cloud-side device, and thus, the performance efficiency is higher, that is, training data may be acquired more efficiently, because storage resources and computing resources of the cloud-side device are more abundant than those of the end-side device.

FIG. 2 is a schematic flow chart of a method of training a model in accordance with one embodiment of the application. The method may include S210 and S220.

S210, acquiring second training data, wherein the second training data is acquired by using the method shown in FIG. 1.

One implementation of acquiring the second training data is to receive the second training data acquired by the device using the method shown in fig. 1 from the other device.

Another implementation of acquiring the second training data is for the training device to acquire using the method shown in fig. 1.

S220, training the preset first model according to the second training data to obtain an entry disambiguation model.

The first model may include an EMLo model, a multi-layer perceptron (multilayer perceptron, MLP), a conditional random field (conditional random field, CRF) model, or a BERT (bidirectional encoder representations from transformers) model. The first model may be a classification model, for example, a classification model, and of course, the first model may also be a multi-classification model, which is not limited in this embodiment. An example of a BERT model is TinyBERT.

An exemplary structure of the term disambiguation model is shown in FIG. 3. The term disambiguation model shown in fig. 3 is composed of any one of an ELMo, BERT model, or CRF model and an MLP, wherein the sentence and term information are input to the ELMo, BERT model, or CRF model, and output from the ELMo, BERT model, or CRF model is input to the MLP, and MLP output 0 or 1,1 indicates that the term indicated by the term information conforms to the meaning of the sentence, and 0 indicates that the term indicated by the term information does not conform to the meaning of the sentence.

In some examples, the method shown in fig. 2 may be performed by a cloud-side device, although it may also be performed by an end-side device. The method shown in fig. 2 may be performed by a cloud-side device, and thus, the performance efficiency is higher, that is, the training vocabulary disambiguation model may be obtained more efficiently, because the storage resources and the computing resources of the cloud-side device are more abundant than those of the end-side device.

When the term disambiguation model is trained according to the second training data, the instruction information in the second training data can be used as the labeling data to monitor training, and the specific implementation mode can refer to the implementation mode of monitoring training of the neural network model in the prior art.

Fig. 4 is a schematic flow chart of a method of training a model according to another embodiment of the present application. The method may include S410 and S420.

S410, acquiring third training data, wherein the third training data is acquired by using the method shown in FIG. 1.

One implementation of acquiring the third training data is to receive the third training data acquired by the device using the method shown in fig. 1 from the other device.

Another implementation of acquiring the third training data is for the training device to acquire using the method shown in fig. 1.

S420, training a preset second model according to the third training data to obtain a natural language understanding model.

In some examples of natural language understanding models, the model may be composed of either a BERT model or a long short-term memory (LSTM) with MLPs.

In some examples, the method shown in fig. 4 may be performed by a cloud-side device, although it may also be performed by an end-side device. The method shown in fig. 4 may be performed by a cloud-side device, and thus may be performed more efficiently, i.e., may obtain a training natural language understanding model more efficiently, because storage resources and computing resources of the cloud-side device are more abundant than those of the end-side device.

When the natural language understanding model is trained according to the third training data, intention and key information in the third training data can be used as labeling data to monitor training, and a specific implementation mode can refer to a monitoring training implementation mode of the neural network model in the prior art.

FIG. 5 is a schematic flow chart diagram of a method of implementing natural language understanding in accordance with one embodiment of the present application. The method may include S510 to S550.

S510, obtaining target entry information, wherein the target entry information is used for representing entries contained in target sentences, and the target sentences are sentences input by a user to a man-machine interaction system.

The user can input the target sentence into the man-machine interaction system through voice or words. The man-machine interaction system can be a man-machine interaction system on any intelligent device, for example, a man-machine interaction system on intelligent devices such as a smart phone, an intelligent vehicle, an intelligent sound box and the like.

S530, acquiring target indication information based on the target sentence and the target entry information by using a term disambiguation model, wherein the target indication information is used for indicating whether the term indicated by the target entry information accords with the semantic meaning of the target sentence.

The term disambiguation model used in this embodiment may be trained by the method of the previous embodiment, for example, the method shown in fig. 2. For example, one implementation is: the cloud side device trains to obtain an entry disambiguation model by using the method shown in fig. 2.

Another implementation is: the term disambiguation model is received from a training device and used, the training device training to obtain the term disambiguation model using the method shown in fig. 2. For example, the cloud-side device trains to obtain a term disambiguation model using the method shown in fig. 2, and the end-side device receives the term disambiguation model from the cloud-side device and uses it.

The entry disambiguation model is used for judging whether the entry represented by the to-be-processed information accords with the semantic meaning of the to-be-processed sentence or not. For example, the target sentence and the target term information are input into the term disambiguation model, and target instruction information output by the term disambiguation model based on the target sentence and the target term information is acquired.

In some examples, the target indication information may include an entry; in other examples, the target indication information may include a position index of a first word of the term in the target sentence and a position index of a last word of the term in the target sentence. In still other examples, the target indication information may include not only a location index of the term, but also a type of the term, such as whether the term is of a song name type or a movie name type, or of a place name type.

S550, using a natural language understanding model, acquiring an understanding result based on the target sentence, the target entry information and the target indication information, wherein the understanding result comprises target intention of the target sentence input by the user and key information of the target intention.

The natural language understanding model used in the present embodiment may be trained by the method in the foregoing embodiment, for example, using the method shown in fig. 4. For example, one implementation is: the cloud-side device is trained and used to obtain a natural language understanding model using the method shown in fig. 4.

Another implementation is: a natural language understanding model is received from a training device and used. The training device trains to obtain a natural language understanding model by using the method shown in fig. 4. For example, the cloud-side device trains to obtain a natural language understanding model using the method shown in fig. 4, and the end-side device receives the natural language understanding model from the cloud-side device and uses it.

The natural language understanding model is used for acquiring intent of a user input sentence to be understood and key information of the intent based on the sentence to be understood, the first auxiliary information and the second auxiliary information.

For example, the target sentence, the target term information, and the target instruction information are input into the natural language understanding model, and an understanding result output by the natural language understanding model based on the target sentence, the target term information, and the target instruction information is acquired.

In the implementation, the relation between the entry disambiguation model and the input and output of the natural language understanding model ensures that the training data of the entry disambiguation model does not need additional data annotation, thereby saving the cost, improving the training efficiency and accelerating the realization of natural language understanding.

In this embodiment, as shown in fig. 6, S510 may include: s504, acquiring a target dictionary, wherein the target dictionary comprises at least one entry; s508, inquiring the vocabulary entry contained in the target sentence from the target dictionary to obtain target vocabulary entry information.

The dictionary in this embodiment may include a music dictionary, a movie dictionary, an application dictionary, a place name dictionary, a person name dictionary, or a user-defined dictionary provided by a third party service provider, or the like. One example of a user-defined dictionary includes a user customizing the name of a smart speaker as "iron man".

The method for dynamically loading the dictionary and inquiring the entry information according to the dynamically loaded dictionary has the advantages of being strong in generalization and simple in maintenance compared with the existing method for acquiring the entry information.

In some implementations, the preset target dictionary may be loaded from the end-side device by the end-side device, or the target dictionary may be dynamically loaded from the end-side device by the end-side device. For example, a phone assistant on a smart phone may dynamically load contact addresses in a phone application, a singer dictionary in a music playing application, a song name dictionary, an album dictionary, or a dictionary of actors, movie name dictionary, etc. in a video playing application.

In other implementations, the target dictionary may be obtained by the end-side device from the cloud-side device. For example, the end-side device transmits request information to the cloud-side device, the cloud-side device transmits a dictionary based on the request information, and the end-side device uses the received dictionary as a target dictionary.

In everyday end-side device applications, it is often desirable to deploy dictionaries on end-side devices to reduce the latency and cost consumed by end-cloud device interactions. However, daily dictionaries, such as a music dictionary and a place name dictionary, are usually very huge, and may require storage space of gigabytes (G), and storage and computing resources of an end-side device are relatively limited, so some dictionaries may be deployed on the end-side device, and other dictionaries may be deployed on the cloud-side device, that is, end-cloud collaboration is required when querying target term information according to the dictionaries.

For example, as shown in fig. 11, the full-quantity dictionary may be disposed on the cloud side, while the hot word dictionary, the commonly used dictionary, the personality dictionary, and the like are disposed on the end side device. In this way, the end side device can send the target sentence to the cloud side device under the condition that the dictionary needs to be queried; inquiring by cloud side equipment to obtain an entry matching vector 1 corresponding to the target sentence; meanwhile, the terminal side equipment can also obtain an entry matching vector 2 corresponding to the target sentence based on dictionary inquiry on the terminal side equipment; after the end side device receives the term matching vector 1 from the cloud side device, the term matching vector and the term matching vector 2 obtained by the end side device by self query are disambiguated by using a term disambiguation model, and natural language understanding is performed according to a disambiguation result by using a natural language understanding model.

Because the dictionary query of the cloud side device and the end side device may have a plurality of term matching vectors, some of the term matching vectors are reasonable, some of the term matching vectors are simply character matching, and the term matching vectors are unreasonable from the semantic level. Through the disambiguation of the term disambiguation model, the unreasonable matching results equivalent to noise can be selected to help the natural language understanding model obtain more accurate understanding results.

Wherein the full-quantity dictionary refers to a dictionary containing all entries; the hot word dictionary and the common dictionary refer to a dictionary with higher use frequency; the personality dictionary indicates a dictionary specific to each of the end-side devices, which may include user-defined entries.

According to the method for end cloud collaboration, which is provided by the embodiment, under the condition that the term disambiguation model and the natural language understanding model are deployed on the end side equipment, the dictionary knowledge rich on the cloud side can be fully utilized, the accuracy of natural language understanding is improved, and the operation cost and the deployment cost of the end side equipment can be saved; in addition, partial dictionaries, such as a personalized dictionary, an address book and the like, can be stored on the terminal side equipment, so that the privacy of a user can be protected; the term disambiguation model and the natural language understanding model are deployed on the terminal side equipment, so that the problems of time delay caused by network transmission and incapability of natural language understanding without a network can be avoided.

In one example, the intent of the target sentence may be identified by the end-side device using a lightweight intent identification model, and it may be determined whether the target dictionary corresponding to the intent is deployed at the cloud-side device or the end-side device according to a preset correspondence between the intent and the device. If the target dictionary is deployed on the end-side device, querying the dictionary on the end-side device, otherwise querying the dictionary on the cloud-side device. The implementation mode can control the communication times with the cloud side equipment more flexibly, and user experience is improved.

FIG. 7 is a schematic diagram of a system architecture of one embodiment of the present application. As shown in fig. 7, the system architecture 700 includes an execution device 710, a training device 720, a database 730, a client device 740, a data storage system 750, and a data acquisition system 760.

The data acquisition device 760 is used to acquire training data. After the training data is collected, data collection device 760 stores the training data in database 730.

For example, the data collection device 760 may read the preset first training data and the preset dictionary; and executing the method shown in fig. 1, and obtaining second training data and third training data based on the first training data; the second training data and the third training data are then stored in database 730.

In some application scenarios, the training device 720 may train the specified neural network model using the second training data maintained in the database 730 to obtain the target model 701. For example, the training device 720 may perform the method shown in FIG. 2, training to obtain an entry disambiguation model. At this time, the target model 701 is an entry disambiguation model. In embodiments of the present application, the target model may also be referred to as a target rule.

In other applications, the training device 720 may train the specified neural network model using the third training data maintained in the database 730 to obtain the target model 701. For example, training device 720 may perform the method shown in FIG. 4, training to obtain an entry disambiguation model. At this time, the target model 701 is a natural language understanding model.

In practical applications, the training data maintained in the database 730 is not necessarily acquired by the data acquisition device 760, but may be received from other devices. It should be noted that, the training device 720 does not need to train the target model 701 based on the training data maintained by the database 730, and it is also possible to acquire the training data from the cloud or other places for model training, which should not be taken as a limitation of the embodiments of the present application.

The object model 701 trained from the training device 720 may be applied to different systems or devices, such as the execution device 710 of fig. 7.

For example, after training device 720 trains to obtain the term disambiguation model and the natural language understanding model, the two models may be deployed in computing module 711 of execution device 710. That is, the calculation module 711 of the execution device 710 has deployed therein the term disambiguation model and the natural language understanding model trained by the training device 720.

The executing device 710 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an augmented reality (augmented reality, AR)/Virtual Reality (VR), a vehicle-mounted terminal, or the like, or may be a chip that may be applied to the above devices, or may be a server or cloud, or the like.

In fig. 7, the execution device 710 configures an input/output (I/O) interface 712 for data interaction with external devices, and a user may input data to the I/O interface 412 through a client device 740. For example, a user may enter a voice sentence, or a text sentence, etc., through the client device 740.

In addition, the execution device 710 includes a calculation module 711, and the calculation module 711 includes a target model 701 trained by the training device 7207.

In the process of performing related processing on data to be processed by the computing module 711 of the execution device 710 using the object model 701, the execution device 710 may call data, codes, etc. in the data storage system 750 for corresponding processing, or may store data, instructions, etc. obtained by corresponding processing in the data storage system 750. Finally, the I/O interface 412 presents the processing results to the client device 440 for presentation to the user.

For example, the target dictionary may be stored in the data storage system 750, the execution device 710 may perform the method shown in fig. 5, obtain the intention of the user and key information of the intention based on the target dictionary in the data storage system 750, and perform a corresponding task according to the intention and the key information, and send a result obtained by performing the corresponding task to the client device 740 through the I/O interface, so that the client device 740 provides the user with a result of performing the task.

It is understood that the executing device 710 and the client device 740 in the embodiment of the present application may be the same device, for example, the same terminal device.

For example, in the case where the execution device 710 and the client device 740 are the same smart phone, the smart phone may obtain a target sentence input by a user through a device such as a microphone, a keyboard, or a handwriting screen, and a mobile phone assistant of the smart phone may perform the method shown in fig. 5, obtain an intention of the target sentence and key information of the intention, call a corresponding third party application (e.g., a ticket booking application, a telephone call application, or a music playing application, etc.) according to the intention, and output the key information to the third party application, so that the third party application may perform a task according to the key information. After the third party application obtains the task result, the mobile phone assistant of the smart phone may display the task result to the user through a display screen or a speaker, etc.

In the system shown in FIG. 7, a user may manually give input data that may be manipulated through an interface provided by the I/O interface 712. In another case, the client device 740 may automatically send input data to the I/O interface 712, and if the client device 740 is required to automatically send input data requiring authorization from the user, the user may set the corresponding permissions in the client device 740. The user may view the results output by the execution device 710 at the client device 740, and the specific presentation may be in the form of a display, a sound, an action, or the like. The client device 740 may also be used as a data collection terminal to collect input data of the input I/O interface 712 and output results of the output I/O interface 712 as new sample data as shown in fig. 7, and store the new sample data in the database 730. Of course, the input data of the input I/O interface 712 and the output result of the output I/O interface 712 shown in fig. 7 may be stored in the database 730 as new sample data directly by the I/O interface 712 instead of being collected by the client device 740.

It should be understood that fig. 7 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among devices, apparatuses, modules, etc. shown in the drawing is not limited in any way, for example, in fig. 7, the data storage system 750 is an external memory with respect to the execution device 710, and in other cases, the data storage system 750 may be disposed in the execution device 710.

FIG. 8 is a deployment diagram of an apparatus for training a model according to an embodiment of the present application, where the apparatus may be deployed in a cloud environment, and the cloud environment is an entry for providing cloud services to users using basic resources in a cloud computing mode. The cloud environment includes a cloud data center including a large number of underlying resources (including computing resources, storage resources, and network resources) owned by a cloud service provider, and a cloud service platform, where the computing resources included in the cloud data center may be a large number of computing devices (e.g., servers).

In some examples, the apparatus may be a server in a cloud data center for training an entry disambiguation model; a virtual machine for training an entry disambiguation model in a cloud data center can also be created; it may also be a software device deployed on a server or virtual machine in the cloud data center, for training the term disambiguation model, which may be deployed distributed on multiple servers, or distributed on multiple virtual machines, or distributed on virtual machines and servers.

In other examples, the apparatus may be a server in a cloud data center for training a natural language understanding model; or creating a virtual machine for training a natural language understanding model in a cloud data center; it may also be a software device deployed on a server or virtual machine in a cloud data center for training a natural language understanding model, which may be deployed distributed on multiple servers, or distributed on multiple virtual machines, or distributed on virtual machines and servers.

As shown in fig. 8, the apparatus may be abstracted by a cloud service provider at a cloud service platform into a cloud service for training a term disambiguation model or a natural language understanding model, and the cloud service is provided by a cloud environment to a user after the user purchases the cloud service at the cloud service platform, to provide the cloud service for training the term disambiguation model or for training the natural language understanding model.

For example, the user may upload the first training data to the cloud environment through an application program interface (application program interface, API) or through a web page interface provided by the cloud service platform, receive the first training data by the training device, acquire the second training data and the third training data by using the method shown in fig. 1, train to obtain the term disambiguation model by using the method shown in fig. 2, train to obtain the natural language understanding model by using the method shown in fig. 4, and return the term disambiguation model and the natural language understanding model obtained by the final training to the execution device used by the user by the training device. Then, the user may input a target sentence to the execution device, the execution device may perform the method shown in fig. 5, acquire an intention of the target sentence and key information of the intention, and perform a related task according to the intention and the key information.

When the training apparatus is a software apparatus, the training apparatus may also be deployed separately on one computing device of any environment, for example, on one computing device alone or on one computing device in a data center alone.

FIG. 9 is an exemplary block diagram of a computing device according to one embodiment of the application. As shown in fig. 9, computing device 900 includes a bus 901, a processor 902, a communication interface 903, and a memory 904.

Communication between the processor 902, the memory 904, and the communication interface 903 is via a bus 901. The processor 902 may be a central processing unit. The memory 904 may include volatile memory (RAM), such as random access memory (random access memory). The memory 904 may also include a non-volatile memory (2 NVM), such as a read-only memory (2 ROM), flash memory, hard Disk Drive (HDD), or Solid State Drive (SSD). The memory 904 has stored therein executable code included with the training device, which the processor 902 reads from the memory 904 to perform the training method. The memory 904 may also include software modules required by the operating system or other processes running. The operating system may be LINUX ^TM ，UNIX ^TM ，WINDOWS ^TM Etc.

For example, the processor 902 of the computing device 900 may read the executable code in the memory 904 to implement the method shown in fig. 1, obtaining the second training data and the third training data.

As another example, the processor 902 of the computing device 900 may read the executable code in the memory 904 to implement the method shown in fig. 2 to obtain the term disambiguation model.

For another example, the processor 902 of the computing device 900 may read the executable code in the memory 904 to implement the method shown in fig. 4 to obtain a natural language understanding model.

As another example, the processor 902 of the computing device 900 may read the executable code in the memory 904 to implement the method shown in fig. 5, obtaining the intent of the sentence entered by the user and key information for that intent.

Fig. 10 is a schematic diagram of a system architecture according to another embodiment of the present application. The execution device 1010 is implemented by one or more servers, optionally in conjunction with other computing devices, such as: data storage, routers, load balancers, etc. The execution device 1010 may be disposed on one physical site or distributed across multiple physical sites. The execution device 1010 may implement the method shown in at least one of fig. 1, 2, and 4 using data in the data storage system 1050 or invoking program code in the data storage system 1050.

For example, various dictionaries may be deployed in the execution device 1010, as well as first training data including training sentences and annotated intent and key information; and, the execution device 1010 executes the method shown in fig. 1 based on the dictionary and the first training data, resulting in second training data and third training data; thereafter, the execution device 710 performs the method shown in fig. 2 based on the second training data, trains to obtain the term disambiguation model, and performs the method shown in fig. 3 based on the third training data, trains to obtain the natural language understanding model.

The user may operate respective user devices (e.g., local device 1001 and local device 1002) to interact with the execution device 1010. Each local device may represent any computing device, such as a personal computer, computer workstation, smart phone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set top box, game console, etc.

The local device of each user may interact with the performing device 710 via a communication network of any communication mechanism/communication standard, which may be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.

After training the term disambiguation model and the natural language understanding model, the execution device 1010 transmits the term disambiguation model and the natural language understanding model to the user devices (e.g., the local device 1001 and the local device 1002) via the communication network.

After receiving the term disambiguation model and the natural language understanding model, the local device 1001 or the local device 1002 may deploy the two models, and in the case of receiving the target sentence input by the user, execute the method shown in fig. 5 based on the two models, and acquire the intention of the target sentence and the key information of the intention.

In another implementation, one or more aspects of the execution device 1010 may be implemented by each local device, e.g., the local device 1001 may provide first training data, or provide a dictionary, or provide a training sentence for the execution device 1010.

Fig. 12 is a schematic block diagram of an apparatus 1100 for acquiring training data according to an embodiment of the present application. The apparatus 1100 may include a behavior acquisition module 1110, a determination module 1120, and a generation module 1130. Apparatus 1100 may be used to implement the method shown in fig. 1.

For example, the acquisition module 1110 may be used to perform S110 to S120, the determination module 1120 may be used to perform S130, and the generation module 1130 may be used to perform S140 and S150.

Fig. 13 is a schematic block diagram of an apparatus 1200 for training a model according to an embodiment of the present application. The apparatus 1200 may include a behavior acquisition module 1210 and a training module 1220. Apparatus 1200 may be used to implement the methods shown in fig. 2 or fig. 4.

For example, the acquisition module 1210 may be used to perform S210 and the training module 1220 may be used to perform S220. As another example, the acquisition module 1210 may be used to perform S410 and the training module 1220 may be used to perform S420.

Fig. 14 is a schematic block diagram of an apparatus 1300 implementing natural language understanding of an embodiment of the present application. The apparatus 1300 may include a behavior acquisition module 1310, a disambiguation module 1320, and an understanding module 1330. Apparatus 1300 may be used to implement the method shown in fig. 5.

For example, the acquisition module 1310 may be used to perform S510, the disambiguation module 1320 may be used to perform S530, and the understanding module 1330 may be used to perform S550.

Fig. 15 is a schematic block diagram of an apparatus 1400 according to an embodiment of the present application. The apparatus 1400 includes a processor 1402, a communication interface 1403, and a memory 1404. One example of the apparatus 1400 is a chip.

Communication between the processor 1402, the memory 1404, and the communication interface 1403 may be via a bus. The memory 1404 has stored therein executable code that the processor 1402 reads from the memory 1404 to perform the corresponding method. The memory 1404 may also include software modules required by the operating system or other processes running. The operating system may be LINUX ^TM ，UNIX ^TM ，WINDOWS ^TM Etc.

For example, executable code in the memory 1404 is used to implement the methods shown in any of fig. 1, 2, 3, and 5, and the processor 1402 reads the executable code in the memory 1404 to perform the methods shown in any of fig. 1, 2, 3, and 5.

Wherein the processor 1402 may be a CPU. The memory 1404 may include volatile memory (RAM), such as random access memory (random access memory). The memory 1404 may also include a non-volatile memory (2 NVM), such as a read-only memory (2 ROM), flash memory, hard Disk Drive (HDD), or Solid State Drive (SSD).

In some embodiments of the application, the disclosed methods may be implemented as computer program instructions encoded on a computer-readable storage medium in a machine-readable format or encoded on other non-transitory media or articles of manufacture. Fig. 16 schematically illustrates a conceptual partial view of an example computer program product comprising a computer program for executing a computer process on a computing device, arranged in accordance with at least some embodiments presented herein. In one embodiment, the example computer program product 1500 is provided using a signal bearing medium 1501. The signal bearing medium 1501 may include one or more program instructions 1502 that, when executed by one or more processors, may provide the functions or portions of the functions described above with respect to the methods illustrated in any of fig. 1, 2, 3, and 5. Thus, for example, in the embodiment illustrated in fig. 5, one or more features of S510-S550 may be carried by one or more instructions associated with the signal bearing medium 1501. As another example, referring to the embodiment shown in fig. 4, one or more features of S410 through S420 may be borne by one or more instructions associated with the signal bearing medium 1501.

In some examples, signal bearing medium 1501 may comprise a computer readable medium 1503 such as, but not limited to, a hard disk drive, compact Disk (CD), digital Video Disk (DVD), digital magnetic tape, memory, read-only memory (ROM), or random access memory (random access memory, RAM), among others. In some implementations, the signal bearing medium 1501 may comprise a computer recordable medium 1104 such as, but not limited to, memory, read/write (R/W) CD, R/W DVD, and the like. In some implementations, the signal bearing medium 1501 may include a communication medium 1505 such as, but not limited to, a digital and/or analog communication medium (e.g., fiber optic cable, a waveguide, a wired communications link, a wireless communications link, etc.). Thus, for example, the signal bearing medium 1,501 may be conveyed by a communication medium 1505 in wireless form (e.g., a wireless communication medium conforming to the IEEE 802.11 standard or other transmission protocol). The one or more program instructions 1502 may be, for example, computer-executable instructions or logic-implemented instructions. In some examples, the foregoing computing device may be configured to provide various operations, functions, or actions in response to program instructions 1502 conveyed to the computing device through one or more of computer readable medium 1503, computer recordable medium 1504, and/or communication medium 1505. It should be understood that the arrangement described herein is for illustrative purposes only. Thus, those skilled in the art will appreciate that other arrangements and other elements (e.g., machines, interfaces, functions, orders, and groupings of functions, etc.) can be used instead, and some elements may be omitted altogether depending on the desired results. Additionally, many of the elements described are functional terms that may be implemented as discrete or distributed components or in any suitable combination and location in conjunction with other components.

A number of scenarios where natural language understanding is required, such as scenarios where intent classification and slot extraction in a mobile phone assistant are combined with a priori knowledge: the same song in the process of playing the same song can be video or audio, and if external knowledge is not introduced, the intention cannot be judged; the "xxx" in the "call xxx" is the name of the person in the address book, if the address book is not given, the slot information cannot be extracted accurately; at present, intelligent home appliances are becoming more and more popular, for example, intelligent sound boxes are the standard of many families, and people like to give special names to the intelligent sound boxes, and if priori knowledge is not input, the intelligent sound boxes cannot be awakened by the special names. It is therefore important to introduce external knowledge in the NLP model when it is desired to accurately perform intent classification and slot extraction. The method for realizing natural language understanding in the man-machine interaction system can realize that external knowledge is introduced into natural language understanding, and the introduced external knowledge can assist tasks such as sequence labeling, sequence classification and the like, so that the accuracy and efficiency of sequence labeling, sequence classification and the like are remarkably improved.

Fig. 17 is a schematic diagram of a system architecture involved in implementing a natural language understanding method in a man-machine interaction system according to an embodiment of the present application, and as shown in fig. 17, the system architecture includes a data storage and algorithm training unit 1, an algorithm deployment unit 2, an algorithm deployment unit 3, and an algorithm deployment unit 4, where the algorithm deployment unit may have a plurality of algorithm deployment units, and is not limited to the number shown in fig. 1. Wherein the data storage and algorithm training unit 1 is adapted to store a dictionary and support training of models and to send the trained models to an algorithm deployment unit (terminal device) 2, 3, 4, etc., which is adapted to perform the operations of the algorithm, i.e. the running of the models.

Fig. 18 is a schematic diagram of modules involved in implementing a natural language understanding method in a man-machine interaction system according to an embodiment of the present application, where, as shown in fig. 18, the module includes a quick matching module, a disambiguation module, and a task related module. The quick matching module is used for quickly matching sentences input by a user with the dictionary and then converting the matching result into a low-dimensional representation. Since the matched results generally include a plurality of and most of them are unreasonable results, the disambiguation module classifies low-dimensional representations of the plurality of matched results, determining a reasonable result and an unreasonable result. Before the disambiguation module classifies the low-dimensional representations of the plurality of matching results, the task correlation module subjects the sentence input by the user to an encoder (e.g., bi-directional transformation encoder (Bidirectional Encoder Representation from Transformers, BERT) model) to obtain a contextual representation of the input sentence, fuses the low-dimensional representation of the matching results with the contextual representation of the input sentence, and inputs the fused result to the disambiguation module; and the task related module fuses the reasonable result output by the disambiguation module with the context representation of the input sentence to obtain a secondary fused result, and finally inputs the secondary fused result into the task module for intention classification and slot extraction, thereby realizing understanding of natural language.

Fig. 19 is a schematic flowchart of a method for implementing natural language understanding in a man-machine interaction system according to an embodiment of the present application, as shown in fig. 19, including steps 1901 to 1905, where the method for implementing natural language understanding in a man-machine interaction system according to an embodiment of the present application is an end-to-end method, and may be executed by the data storage and algorithm training unit 1 shown in fig. 17, and described below, respectively.

S1901, obtaining a plurality of target entries matched with target sentences in one or more dictionaries, wherein the target sentences are sentences input by a user to a human-computer interaction system.

Specifically, a target sentence input by a user to the man-machine interaction system is obtained, and the target sentence is matched with one or more dictionaries. The specific number and types of the dictionaries can be preset by people, for example, the human-computer interaction system is an intelligent sound box, the types of the dictionaries can be determined to be music dictionaries, voice book dictionaries and the like according to application scenes, and the dictionaries can also be user-defined dictionaries. Matching the target sentence with one or more dictionaries may be performed by a fast search algorithm, such as a dictionary tree (trie) algorithm, searching each word in the target sentence in the one or more dictionaries in a left-to-right order, where the target vocabulary item that is matched typically includes a plurality of words, taking the target sentence "i want to hear blue and white porcelain", the dictionary as a music dictionary and a movie dictionary, and "blue and white porcelain", "i'm" all have a match in the music dictionary, i.e., the words may be song names, and "blue and white" all have a match in the movie dictionary, i.e., the word may be a movie name.

S1902, obtaining a plurality of sequences of target sentences according to a plurality of target vocabulary entries, wherein each sequence in the plurality of sequences corresponds to one target vocabulary entry.

Specifically, a sequence corresponding to the target sentence is constructed for each target entry, and each sequence contains a matching entity including the position and type of the matching entity. For example, taking the "blue and white porcelain" matched in the music dictionary as an example, in the sequence constructed for the "blue and white porcelain", the "blue and white porcelain" needs to be marked as a song name, and the position of the "blue and white porcelain" in the sequence can be referred to in the following table 1.

Alternatively, the sequence may be marked with only the location of the matching entity, and not the type of matching entity.

S1903, obtaining a plurality of first sequence representations corresponding to the plurality of sequences according to the plurality of sequences and the target sentence.

Specifically, for the multiple sequences obtained in S1902, the sequences need to be expressed in a computer-understandable manner before subsequent natural language understanding, and then subsequent operations can be performed, that is, the sequences need to be subjected to a low-dimensional representation process to obtain a low-dimensional representation of each sequence. Meanwhile, a context representation of the target sentence is obtained, and specifically, the context representation may be obtained by processing the target sentence through an encoder (e.g., a BERT model). The low-dimensional representation of each sequence and the contextual representation of the target sentence are fused, and in particular, the low-dimensional representation of each sequence and the contextual representation of the target sentence are passed through a self-attention layer of the neural network model, whereby a plurality of first sequence representations corresponding to the plurality of sequences can be obtained.

Because the low-dimensional representation is obtained in advance, in the prior art, the vocabulary entries are directly converted into the low-dimensional representation, so that how many vocabulary entries need to be obtained in advance, and when the dictionary is updated or expanded, the newly added vocabulary entries do not have the low-dimensional representation corresponding to the vocabulary entries in advance, and the neural network model needs to be retrained. For the present application, taking table 3 as an example for illustration, the sentence corresponding to each term is converted into a sequence, then the sequence is converted into a low-dimensional representation, and the entities in the sequence are of a fixed and limited number, for example, the plurality of terms in table 3 only have the types of "song" and "movie", so that even if other song names and movie names are updated in the music dictionary and movie dictionary, the names will be converted into the types of "song" and "movie", so that the dictionary is updated without updating the neural network model, and the generalization capability of the neural network model is improved.

S1904, determining, according to the first plurality of sequence representations, whether each target term of the plurality of target terms corresponds to the semantics of the target sentence.

Based on the semantic and grammatical structures of the target sentence input by the user, it can be understood that most of the target terms obtained in S1901 are matching results that do not conform to the semantic of the target sentence, and most of the first sequence representations obtained by this are unreasonable results, and these unreasonable results are input into the subsequent neural network model to become interference noise, possibly deteriorating the model, so that it is necessary to perform disambiguation processing on the first sequence representations. Specifically, a second sequence representation is obtained for a degree of interest of other first sequence representations of the plurality of first sequence representations than the second sequence representation, the second sequence representation being one of the plurality of first sequence representations, wherein the degree of interest is an effect of the other first sequence representations on the second sequence representation. The second sequence is processed through the self-attention layer of the neural network model, so that the attention degree of the second sequence to other first sequences is obtained, and when the disambiguation processing is carried out, the relation between the matching entity and the context in one sequence is considered, the relation between different sequences is considered, and the disambiguation result can be more accurate. And then performing disambiguation processing on the second sequence representation through a linear layer of the neural network, wherein the linear layer can perform classification processing on the second sequence representation, and a result of whether the second sequence representation accords with the semantic meaning of the target sentence or does not accord with the semantic meaning of the target sentence can be obtained. For example, the two classifications may be a 0-1 classification, i.e., classifying results that match the semantics of the target sentence as 1 and classifying results that do not match the semantics of the target sentence as 0.

S1905, carrying out natural language understanding processing on the target sentence according to one or more first sequence representations corresponding to the target vocabulary entry conforming to the semantics of the target sentence so as to obtain a processing result.

Specifically, the natural language understanding processing task in the embodiment of the application includes intention classification, slot extraction, sentence classification, entity recognition and the like, and can perform natural language understanding processing on the target sentence according to the specific natural language understanding processing task. Taking intention classification and slot extraction as examples, the first sequence corresponding to one or more reasonable target entries is expressed and processed by an intention classification layer and a slot extraction layer of the neural network model, so that intention information and slot information of a target sentence are obtained.

Optionally, before performing natural language understanding processing on the target sentence according to the first sequence representations corresponding to the one or more reasonable target terms, the method for implementing natural language understanding in the man-machine interaction system according to the embodiment of the application further includes fusing the first sequence representation corresponding to each reasonable target term in the first sequence representations corresponding to the one or more reasonable target terms with the context representation of the target sentence to obtain one or more third sequence representations corresponding to the one or more reasonable target terms. And then carrying out natural language understanding processing on the target sentence according to the third sequence representation to obtain a processing result.

The method for realizing natural language understanding in the man-machine interaction system of the embodiment of the application embeds dictionary knowledge and can better realize the processing of natural language understanding. The method comprises the steps of matching natural language with a dictionary, converting the natural language into the sequence, and obtaining low-dimensional representation of the sequence according to the sequence, wherein any of various entries matched with the dictionary can be converted into the sequence with the limited number because the type of the sequence is fixed with the limited number, and the corresponding low-dimensional representation is also of the limited number. Therefore, after the dictionary is updated or expanded, the neural network model does not need to be updated, and the generalization capability of the model is improved.

Fig. 20 is a schematic flowchart of a training method of a neural network model according to an embodiment of the present application, where the neural network model includes a first sub-network model, which may be used to implement steps 1901 to 1903 in fig. 19 described above, a second sub-network model, which is used to implement step 1904 in fig. 19 described above, and a third sub-network model, which is used to implement step 1905 in fig. 19 described above. As shown in fig. 20, the training method of the neural network model according to the embodiment of the present application includes steps 2001 to 2006, which are described below.

S2001, first training data is acquired, the first training data including a training sentence and a plurality of first sequence representations of the training sentence matching the target dictionary.

Specifically, firstly, obtaining a word matching result of a training sentence and a target dictionary to obtain a plurality of target entries; then constructing a sequence of training sentences for each target entry, wherein each sequence comprises the position and the type of an entity corresponding to the target entry; converting the sequences into low-dimensional representation, and performing context representation processing on the training sentences to obtain context representation of the training sentences; and finally, fusing each sequence in the sequences with the context representation of the training sentences to obtain a plurality of matching results of the training sentences and the target dictionary.

And S2002, training the first sub-network model according to the first training data to obtain a trained first sub-network model.

Inputting a training sentence into a first sub-network model to be trained, wherein the first sub-network model is already introduced with a target dictionary, taking a plurality of matching results of the training sentence and the target dictionary as training targets of the first sub-network model, and training the first sub-network model.

And S2003, acquiring second training data, wherein the second training data comprises an output result of the trained first sub-network model and a first sequence representation meeting preset requirements in the plurality of first sequence representations.

Specifically, a matching result satisfying the preset requirement is selected from the plurality of matching results in S2001. Each first sequence representation in the plurality of first sequence representations corresponds to a first target entry, the first target entry is an entry matched with the target dictionary by the training sentence, and the first sequence representation meeting the preset requirement is a sequence representation corresponding to the first target entry conforming to the meaning of the training sentence.

And S2004, training the second sub-network model according to the second training data to obtain a trained second sub-network model.

Inputting an output result of the first sub-network model meeting the preset requirement into a second sub-network model to be trained, taking a matching result meeting the preset requirement as a training target of the second sub-network model, and training the second sub-network model.

S2005, obtaining third training data, wherein the third training data comprises an output result of the trained second sub-network model and a processing result of natural language understanding processing of training sentences.

And S2006, training the third sub-network model according to the third training data to obtain a trained third sub-network model.

Inputting an output result of the second sub-network model meeting the preset requirement into a third sub-network model to be trained, taking a processing result of natural language understanding processing of the training sentences as a training target of the third sub-network model, and training the third sub-network model.

Therefore, training of the neural network model according to the embodiment of the present application can be completed through the steps 2001 to 2006, and the trained neural network model can be used to implement the method for implementing natural language understanding in the man-machine interaction system according to the embodiment of the present application.

Optionally, the first sub-network model is an entry matching model, and is configured to obtain, according to a target sentence and a target dictionary input by a user, multiple sequence representations matched with the target dictionary, where each sequence representation in the multiple sequence representations corresponds to a target entry, and the target entry is an entry matched with the target sentence; the second sub-network model is an entry disambiguation model and is used for determining whether the corresponding target entry of each sequence representation in the sequence representations accords with the semantics of the target sentence according to the target sentence and the sequence representations; the third sub-network model is a natural language understanding model and is used for carrying out natural language understanding processing on the target sentence according to the sequence representation corresponding to the target entry conforming to the semantic meaning of the target sentence.

The method for realizing natural language understanding in the man-machine interaction system according to the embodiment of the present application may be realized by the following steps, and the method for realizing natural language understanding in the man-machine interaction system according to the embodiment of the present application is specifically described below with reference to fig. 21 to 23.

(1) Dictionary quick match

The dictionary contains related terms and corresponding types thereof, such as a music dictionary, a movie dictionary and the like, according to the requirement of a downstream task, wherein a plurality of dictionaries for matching can be provided, and the specific dictionary number and types can be preset manually. As shown in fig. 21, first, for each input sentence, the entry and the type appearing in the dictionary are searched by traversing from left to right according to the sequence of the sentences and starting from each word, and each sentence can be searched for a plurality of dictionaries, and the same entry can also correspond to a plurality of types. The quick matching set can be obtained by searching through a quick searching algorithm, such as using a trie tree (trie) algorithm, starting traversing and searching for entries appearing in a dictionary from left to right according to each word, and then sorting from large to small according to the lengths of the matched texts.

(2) Constructing labels for matching results

And constructing a characteristic sequence of a word level corresponding to the dictionary type for each result according to the matched dictionary type, wherein the characteristic sequence corresponds to the sentence word sequence one by one. Table 3 shows an example of the result of matching the sentence "I want to hear blue and white porcelain" with a dictionary, each line representing a sequence, e.g., "blue and white porcelain" in the first sequence of a dictionary lookup matches in a music dictionary, where the O-tag represents no corresponding dictionary feature there, the B-song tag represents the starting location where it is a song-name entity, the I-song tag represents the subsequent portion where it is a song-name entity, and each dictionary feature sequence contains location and type information for an entity.

TABLE 3 Table 3

(3) Obtaining a low-dimensional representation of a tag sequence of a fused context

For the tag sequence generated in (2) to be the tag ID, further processing into word representation is required before inputting the model, and the tag sequence is processed by the word representation layer of the neural network to obtain the low-dimensional representation rep of the tag sequence _match The method comprises the steps of carrying out a first treatment on the surface of the As shown in fig. 5, the input sentence is passed through an encoder (e.g., BERT moduloType) processing results in a context expressed as rep _input Low-dimensional representation rep of marker sequences using self-attention mechanisms _match And the context of the input sentence represents rep _input Fusion, whereby a tagged representation rep of the fused context can be obtained _c-match 。

(4) Dictionary disambiguation

The dictionary matching result obtained as described above is mostly an erroneous result, and if the subsequent model is inputted indiscriminately, it is an interference noise for the model, with the result that not only the expression ability of the sequence model cannot be improved but also the model may be deteriorated, so that it is necessary to disambiguate the matching result. Most of the noise matching results can be suppressed by the disambiguation step.

As shown in FIG. 22, the label in the fused context represents rep _c-match Then, rep is taken _c-match Corresponding [ CLS ]]Tag vector due to [ CLS ]]The position mark contains the information of the whole sequence through a self-attention mechanism, and in order to carry out screening and disambiguation of the results, the relation between the results is considered to correspond to the same sentence [ CLS ] ]The marker sets are processed through the self-attention layer of the neural network model to obtain attention to other dictionary feature sequences, and then predictive classification is performed through the linear layer of the neural network model. In the process, the vector finally used for classification encodes the type and position information of dictionary features and the semantic information of the original sentence through a neural network, and the dictionary matching result which is reasonable in terms of semantics can be judged through classification, so that the dictionary matching result is screened, wherein the reasonable result is judged to be 1, and the unreasonable result is judged to be 0.

The classification result adopts 0-1 classification to indicate whether the dictionary label attached to the matching result is reasonable. If dictionary categories are not of interest, only 0-1 categories may be used when reconstructing tags.

Table 4 shows the disambiguation results of the dictionary lookup sequence in table 3, in the sentence "i want to listen to blue and white porcelain" for the matched "music: blue and white porcelain "," music: blue-and-white "," movie: blue-and-white "," music: among the several results of me, only "music: blue and white porcelain "is a potentially correct entity in the original text, while several other results are semantically unreasonable results. Thus, the unreasonable dictionary matching result can be filtered out through classification.

TABLE 4 Table 4

(5) Feature fusion

As shown in fig. 23, a tag sequence of a reasonable matching result can be obtained after the disambiguation processing, and the tag sequence is fused and input into a task-related module. Specifically, a rep corresponding to the sequence with the classification result of 1 is selected _c-match And rep with _input Further fusion is carried out through the self-attention layer to obtain a context representation rep fused with dictionary knowledge _fuse ，rep _fuse Text representation fused with dictionary knowledge can be input into related task networks subsequently, so that task effects are improved.

(6) Intent classification and slot extraction

Obtaining rep _fuse The context with the dictionary knowledge integrated is then expressed rep _fuse Rep is then expressed in context with the input sentence _input Performing primary fusion, and then respectively fusing rep _fuse And inputting an intention classifier and a slot classifier, and carrying out intention classification and slot extraction.

Fig. 24 is a schematic block diagram of an apparatus for implementing natural language understanding in a man-machine interaction system according to an embodiment of the present application, as shown in fig. 24, including an obtaining module 2410 and a processing module 2420, where:

the obtaining module 2410 is configured to obtain a plurality of target terms that are matched in one or more dictionaries by a target sentence, where the target sentence is a sentence input by a user to the human-computer interaction system.

The processing module 2420 is configured to obtain a plurality of sequences of the target sentence according to a plurality of target terms, where each sequence of the plurality of sequences corresponds to a target term.

The processing module 2420 is further configured to obtain a plurality of first sequence representations corresponding to the plurality of sequences according to the plurality of sequences and the target sentence.

The processing module 2420 is further configured to determine that each target term of the plurality of target terms is reasonable or unreasonable based on the plurality of first sequence representations.

The processing module 2420 is further configured to perform natural language understanding processing on the target sentence according to the first sequence representations corresponding to the one or more reasonable target terms, so as to obtain a processing result.

Optionally, each sequence of the plurality of sequences contains type information of a matching entity and location information of the matching entity in the sequence.

Optionally, the processing module 2420 obtains a plurality of first sequence representations corresponding to the plurality of sequences according to the plurality of sequences and the target sentence, including: obtaining a low-dimensional representation of each of the plurality of sequences; acquiring a context representation of the target sentence; the low-dimensional representation of each of the plurality of sequences and the contextual representation of the target sentence are fused to obtain a plurality of first sequence representations corresponding to the plurality of sequences.

Optionally, the processing module 2420 determines that each target term of the plurality of target terms is reasonable or unreasonable according to the plurality of first sequence representations, including: obtaining a second sequence representation of interest for other first sequence representations of the plurality of first sequence representations than the second sequence representation, the second sequence representation being one of the plurality of first sequence representations; and determining whether the target entry corresponding to the second sequence representation is reasonable or unreasonable according to the second sequence representation and the attention degree.

Optionally, before the processing module 2420 performs natural language understanding processing on the target sentence according to the first sequence representation corresponding to the one or more reasonable target terms, the processing module 2420 is further configured to: acquiring a context representation of the target sentence; and fusing the first sequence representation corresponding to each reasonable target entry in the first sequence representations corresponding to the one or more reasonable target entries with the context representation of the target sentence to obtain one or more third sequence representations corresponding to the one or more reasonable target entries.

It should be understood that the device 2400 for implementing natural language understanding in the man-machine interaction system may be used to implement the method for implementing natural language understanding in the man-machine interaction system in fig. 19 and fig. 21 to 23, and specific implementation steps may refer to the descriptions of fig. 19 and fig. 21 to 23, which are not repeated herein for brevity.

Fig. 25 shows a schematic block diagram of a training apparatus of a neural network model of an embodiment of the present application, the neural network model including a first sub-network model, a second sub-network model, and a third sub-network model. As shown in fig. 25, the training apparatus includes an acquisition module 2510 and a training module 2520, wherein:

an acquisition module 2510 configured to acquire first training data, where the first training data includes a training sentence and a plurality of matching results that the training sentence matches a target dictionary;

the training module 2520 is configured to train the first sub-network model according to the first training data, so as to obtain a trained first sub-network model;

the acquiring module 2510 is further configured to acquire second training data, where the second training data includes an output result of the trained first sub-network model and a matching result that meets a preset requirement from the plurality of matching results;

the training module 2520 is further configured to train the second sub-network model according to the second training data, so as to obtain a trained second sub-network model;

the obtaining module 2510 is further configured to obtain third training data, where the third training data includes an output result of the trained second sub-network model and a processing result of the training sentence for performing natural language understanding processing;

The training module 2520 is further configured to train the third sub-network model according to the third training data, so as to obtain a trained third sub-network model.

It should be appreciated that the training apparatus 2500 of the neural network model may be used to implement the training method of the neural network model in fig. 20, and specific implementation steps may refer to the description of fig. 20, which is omitted herein for brevity.

Fig. 26 is a schematic diagram of a hardware configuration of an apparatus according to an embodiment of the present application. The apparatus 2600 shown in fig. 26 (the apparatus 2600 may be a computer device in particular) includes a memory 2601, a processor 2602, a communication interface 2603, and a bus 2604. The memory 2601, the processor 2602, and the communication interface 2603 are connected to each other by a bus 2604.

The memory 2601 may be a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access memory (random access memory, RAM). The memory 2601 may store a program, and when the program stored in the memory 2601 is executed by the processor 2602, the processor 2602 is configured to perform respective steps of a method for realizing natural language understanding and a training method of a neural network model in the man-machine interaction system according to the embodiment of the present application.

The processor 2602 may employ a general-purpose central processing unit (central processing unit, CPU), microprocessor, application specific integrated circuit (application specific integrated circuit, ASIC), graphics processor (graphics processing unit, GPU) or one or more integrated circuits for executing associated programs to implement the method for natural language understanding and the training method for neural network models in the human-computer interaction system of the embodiments of the present application.

The processor 2602 may also be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the method for implementing natural language understanding and the training method of the neural network model in the man-machine interaction system of the present application may be completed by an integrated logic circuit of hardware in the processor 2602 or an instruction in a software form.

The processor 2602 may also be a general purpose processor, a digital signal processor (digital signal processing, DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 2601, and the processor 2602 reads information in the memory 2601, and combines with hardware thereof to implement functions required to be executed by units included in a device for implementing natural language understanding and a training device for a neural network model in the man-machine interaction system of the embodiment of the application, or to implement a method for implementing natural language understanding and a training method for a neural network model in the man-machine interaction system of the embodiment of the application.

Communication interface 2603 enables communication between apparatus 2600 and other devices or communication networks using a transceiver apparatus such as, but not limited to, a transceiver. For example, a target sentence or training data input by the user may be acquired through the communication interface 2603.

Bus 2604 may include a path to transfer information between various components of device 2600 (e.g., memory 2601, processor 2602, communication interface 2603).

It should be noted that although the apparatus 2600 described above only shows a memory, a processor, a communication interface, in a particular implementation, those skilled in the art will appreciate that the apparatus 2600 may also include other devices necessary to achieve proper operation. Also, as will be appreciated by those of skill in the art, the apparatus 2600 may also include hardware devices that implement other additional functions, as desired. Furthermore, it will be appreciated by those skilled in the art that the apparatus 2600 may also include only the components necessary to implement an embodiment of the present application, and not necessarily all of the components shown in fig. 26.

According to a method provided by an embodiment of the present application, there is also provided a computer program product comprising: computer program code which, when run on a computer, causes the computer to perform the method of any of the embodiments shown in figures 1, 2, 4, 5, 6, 19, 20.

It should be noted that, the above computer program code may be stored in whole or in part on a first storage medium, where the first storage medium may be packaged together with the processor or may be packaged separately from the processor, which is not specifically limited by the present application.

According to the method provided by the embodiment of the application, the application also provides a chip system, which comprises the following steps: a processor for calling and running a computer program from a memory, causing a communication device in which the chip system is installed to perform the method of any of the embodiments shown in fig. 1, 2, 4, 5, 6, 19, 20.

According to the method provided by the embodiment of the application, the application further provides a computer readable medium, wherein the computer readable medium stores a program code, and when the computer program code runs on a computer, the computer is caused to execute the method of any one of the embodiments shown in fig. 1, 2, 4, 5, 6, 19 and 20.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a mobile hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for implementing natural language understanding in a human-computer interaction system, comprising:

inquiring vocabulary entries contained in a target sentence from a target dictionary, and acquiring target vocabulary entry information, wherein the target dictionary comprises at least one vocabulary entry, the target vocabulary entry information is used for representing one or more vocabulary entries contained in the target sentence, and the target sentence is a sentence input by a user to the man-machine interaction system;

acquiring target indication information based on the target sentence and the target entry information by using a term disambiguation model, wherein the target indication information is used for indicating whether the term indicated by the target entry information accords with the semantic meaning of the target sentence;

and acquiring an understanding result based on the target sentence, the target entry information and the target indication information by using a natural language understanding model, wherein the understanding result comprises target intention of the target sentence input by the user and key information of the target intention.

2. The method of claim 1, wherein the method is performed by an end-side device, wherein the target dictionary comprises a dictionary on the end-side device.

3. The method of claim 1, wherein the target dictionary comprises a dictionary on a cloud-side device, wherein the querying the target dictionary for terms contained in the target sentence from the target dictionary, obtaining target term information, comprises:

sending the target sentence to the cloud side equipment;

and receiving the target entry information from the cloud side equipment.

4. A method according to claim 3, characterized in that the method further comprises:

using an intent recognition model to acquire candidate intents of the target sentence; the sending the target sentence to the cloud side device includes:

and under the condition that the dictionary corresponding to the candidate intention is judged to be positioned on the cloud side equipment according to a preset corresponding relation, sending the target sentence to the cloud side equipment, wherein the corresponding relation is used for indicating whether the intention is positioned on the cloud side equipment.

5. The method of any one of claims 1 to 4, wherein the term disambiguation model is a dichotomous model.

6. A method of model training, comprising:

acquiring first training data, wherein the first training data comprises training sentences, intentions input by a user of the training sentences and key information of the intentions;

Inquiring vocabulary entries contained in the training sentences from a dictionary, and acquiring vocabulary entry information, wherein the dictionary comprises at least one vocabulary entry, and the vocabulary entry information is used for representing one or more vocabulary entries contained in the training sentences;

acquiring indication information according to the first training data and the entry information, wherein the indication information is used for indicating whether the entry represented by the entry information accords with the intention and the semantic represented by the key information;

obtaining second training data according to the training sentences, the entry information and the indication information, wherein the second training data comprises the training sentences, the entry information and the indication information, and the second training data is used for training an entry disambiguation model, and the entry disambiguation model is used for judging whether the entries represented by the entry information to be processed conform to the semantics of the sentences to be processed or not based on the sentences to be processed and the entry information to be processed.

7. The method as recited in claim 6, further comprising:

according to the first training data, the term information and the indication information, third training data are obtained, the third training data comprise the first training data, the term information and the indication information, the third training data are used for training a natural language understanding model, the natural language understanding model is used for obtaining user input intention of a statement to be understood and key information of the intention based on the statement to be understood, first auxiliary information and second auxiliary information, the first auxiliary information is used for representing one or more terms contained in the statement to be understood, and the second auxiliary information is used for indicating whether the terms represented by the first auxiliary information conform to the semantics of the statement to be understood.

8. An apparatus for implementing natural language understanding in a man-machine interaction system, comprising:

the system comprises an acquisition module, a man-machine interaction system and a target dictionary, wherein the acquisition module is used for inquiring vocabulary entries contained in the target sentence from the target dictionary and acquiring target vocabulary entry information, the target dictionary comprises at least one vocabulary entry, the target vocabulary entry information is used for representing one or more vocabulary entries contained in the target sentence, and the target sentence is a sentence input by a user to the man-machine interaction system;

the disambiguation module is used for acquiring target indication information based on the target sentence and the target entry information by using a term disambiguation model, wherein the target indication information is used for indicating whether the entry indicated by the target entry information accords with the semantics of the target sentence;

and the understanding module is used for acquiring an understanding result based on the target sentence, the target entry information and the target indication information by using a natural language understanding model, wherein the understanding result comprises target intention of the target sentence input by the user and key information of the target intention.

9. The apparatus of claim 8, wherein the apparatus is included on an end-side device, wherein the target dictionary comprises a dictionary on the end-side device.

10. The apparatus of claim 8, wherein the target dictionary comprises a dictionary on a cloud-side device, and wherein the acquisition module is specifically configured to:

sending the target sentence to the cloud side equipment;

and receiving the target entry information from the cloud side equipment.

11. The apparatus of claim 10, wherein the obtaining module is specifically configured to:

using an intent recognition model to acquire candidate intents of the target sentence;

12. The apparatus of any one of claims 8 to 11, wherein the term disambiguation model is a dichotomous model.

13. A model training device, comprising:

the acquisition module is used for acquiring first training data, wherein the first training data comprises training sentences, intentions input by a user and key information of the intentions;

the obtaining module is further configured to query, from a dictionary, terms contained in the training sentence, and obtain term information, where the dictionary includes at least one term, and the term information is used to represent one or more terms contained in the training sentence;

The judging module is used for acquiring indication information according to the first training data and the entry information, wherein the indication information is used for indicating whether the entry represented by the entry information accords with the intention and the semantic represented by the key information;

the generating module is used for acquiring second training data according to the training sentences, the entry information and the indication information, wherein the second training data comprises the training sentences, the entry information and the indication information, and the second training data is used for training an entry disambiguation model, and the entry disambiguation model is used for judging whether the entry represented by the entry information to be processed accords with the semantics of the sentence to be processed or not based on the sentence to be processed and the entry information to be processed.

14. The apparatus of claim 13, wherein the generating module is further configured to obtain third training data according to the first training data, the term information, and the indication information, where the third training data includes the first training data, the term information, and the indication information, and the third training data is used to train a natural language understanding model, where the natural language understanding model is used to obtain, based on a sentence to be understood, first auxiliary information, and second auxiliary information, intent of a user to input the sentence to be understood, and key information of the intent, the first auxiliary information is used to represent one or more terms included in the sentence to be understood, and the second auxiliary information is used to indicate whether the term represented by the first auxiliary information conforms to semantics of the sentence to be understood.

15. An apparatus for implementing natural language understanding in a man-machine interaction system, comprising: a processor coupled to the memory;

the memory is used for storing instructions;

the processor is configured to execute instructions stored in the memory to cause the apparatus to implement the method of any one of claims 1 to 5.

16. A model training device, comprising: a processor coupled to the memory;

the memory is used for storing instructions;

the processor is configured to execute instructions stored in the memory to cause the apparatus to implement the method of claim 6 or 7.

17. A computer readable medium comprising instructions which, when run on a processor, cause the processor to implement the method of any one of claims 1 to 7.

18. A method for implementing natural language understanding in a human-computer interaction system, comprising:

obtaining a plurality of target entries matched with target sentences in one or more dictionaries, wherein the target sentences are sentences input by a user to the man-machine interaction system;

obtaining a plurality of sequences of the target sentence according to a plurality of target entries, wherein each sequence in the plurality of sequences corresponds to one target entry, and the type of each sequence has a fixed and limited number;

Acquiring a plurality of first sequence representations corresponding to the sequences according to the sequences and the target statement;

obtaining a second sequence representation of interest for other first sequence representations of the plurality of first sequence representations than the second sequence representation, the second sequence representation being one of the plurality of first sequence representations;

determining whether a target entry corresponding to the second sequence representation accords with the semantics of the target sentence according to the second sequence representation and the attention;

and carrying out natural language understanding processing on the target sentence according to one or more first sequence representations corresponding to the target vocabulary entry conforming to the semantics of the target sentence so as to obtain a processing result.

19. The method of claim 18, wherein each of the plurality of sequences comprises type information of one of the target terms and location information of the target term in the target sentence.

20. The method according to claim 18 or 19, wherein the obtaining a plurality of first sequence representations corresponding to the plurality of sequences from the plurality of sequences and the target sentence comprises:

obtaining a low-dimensional representation of each of the plurality of sequences;

Acquiring a context representation of the target sentence;

the low-dimensional representation of each of the plurality of sequences and the contextual representation of the target sentence are fused to obtain a plurality of first sequence representations corresponding to the plurality of sequences.

21. The method according to claim 18 or 19, wherein said performing natural language understanding processing on said target sentence according to a first sequence representation corresponding to one or more target vocabulary entries conforming to the semantics of said target sentence comprises:

acquiring a context representation of the target sentence;

fusing the first sequence representation corresponding to each target term conforming to the semantics of the target sentence with the context representation of the target sentence to obtain one or more third sequence representations corresponding to the one or more target terms conforming to the semantics of the target sentence;

and carrying out natural language understanding processing on the target sentence according to the one or more third sequence representations.

22. A method of training a neural network model, the neural network model comprising a first sub-network model, a second sub-network model, and a third sub-network model, the method comprising:

Acquiring first training data, wherein the first training data comprises a training sentence and a plurality of first sequence representations matched with a target dictionary by the training sentence, the first sequence representations are obtained according to a plurality of sequences corresponding to a plurality of target entries, the plurality of target entries are obtained according to the training sentence and the target dictionary by matching, and each sequence type has a fixed and limited number;

training the first sub-network model according to the first training data to obtain a trained first sub-network model;

acquiring second training data, wherein the second training data comprises an output result of the trained first sub-network model and a first sequence representation meeting a preset requirement in the plurality of first sequence representations, and the first sequence representation meeting the preset requirement is a sequence representation corresponding to a first target entry conforming to the training sentence semantics;

training the second sub-network model according to the second training data to obtain a trained second sub-network model;

acquiring third training data, wherein the third training data comprises an output result of the trained second sub-network model and a processing result of natural language understanding processing of the training sentences;

And training the third sub-network model according to the third training data to obtain a trained third sub-network model.

23. The method of claim 22, wherein the first sub-network model is a term matching model for obtaining a plurality of sequence representations matching the target dictionary based on a target sentence input by a user and the target dictionary, each sequence representation in the plurality of sequence representations corresponding to a target term, the target term being a term matching the target sentence with the target dictionary;

the second sub-network model is an entry disambiguation model and is used for determining whether a corresponding target entry of each sequence representation in the sequence representations accords with the semantics of the target sentence according to the target sentence and the sequence representations;

the third sub-network model is a natural language understanding model and is used for carrying out natural language understanding processing on the target sentence according to the sequence representation corresponding to the target entry conforming to the target sentence semantic.

24. The method of claim 22 or 23, wherein each first sequence representation of the plurality of first sequence representations corresponds to a first target term, the first target term being a term of the training sentence that matches the target dictionary.

25. An apparatus for implementing natural language understanding in a man-machine interaction system, comprising:

the acquisition module is used for acquiring a plurality of target entries matched with target sentences in one or more dictionaries, wherein the target sentences are sentences input by a user to the man-machine interaction system;

the processing module is used for acquiring a plurality of sequences of the target sentence according to a plurality of target entries, each sequence in the plurality of sequences corresponds to one target entry, and the type of each sequence has a fixed and limited number;

the processing module is further used for obtaining a plurality of first sequence representations corresponding to the sequences according to the sequences and the target sentences;

the acquisition module is further configured to acquire a second sequence representation regarding a degree of attention of a first sequence representation other than the second sequence representation among the plurality of first sequence representations, where the second sequence representation is one of the plurality of first sequence representations;

the processing module is further configured to determine, according to the second sequence representation and the attention, whether a target term corresponding to the second sequence representation conforms to a semantic meaning of the target sentence;

the processing module is further configured to perform natural language understanding processing on the target sentence according to one or more first sequence representations corresponding to target terms that conform to the semantics of the target sentence, so as to obtain a processing result.

26. The apparatus of claim 25, wherein each sequence of the plurality of sequences comprises type information of one of the target terms and location information of the target term in the target sentence.

27. The apparatus of claim 25 or 26, wherein the processing module obtains a plurality of first sequence representations corresponding to the plurality of sequences from the plurality of sequences and the target sentence, comprising:

acquiring a context representation of the target sentence;

28. The apparatus of claim 25 or 26, wherein the processing module performs natural language understanding processing on the target sentence according to a first sequence representation corresponding to one or more target terms that conform to semantics of the target sentence, comprising:

acquiring a context representation of the target sentence;

29. A training apparatus for a neural network model, the neural network model comprising a first sub-network model, a second sub-network model, and a third sub-network model, the apparatus comprising:

the system comprises an acquisition module, a target dictionary and a storage module, wherein the acquisition module is used for acquiring first training data, the first training data comprises training sentences and a plurality of first sequence representations matched with the training sentences and the target dictionary, the first sequence representations are obtained according to a plurality of sequences corresponding to a plurality of target vocabulary entries, the plurality of target vocabulary entries are obtained according to the matching of the training sentences and the target dictionary, and the types of each sequence have a fixed and limited number;

the training module is used for training the first sub-network model according to the first training data so as to obtain a trained first sub-network model;

the obtaining module is further configured to obtain second training data, where the second training data includes an output result of the trained first sub-network model and a first sequence representation meeting a preset requirement in the plurality of first sequence representations, and the first sequence representation meeting the preset requirement is a sequence representation corresponding to a first target term that meets the training sentence semantic;

The training module is further configured to train the second sub-network model according to the second training data, so as to obtain a trained second sub-network model;

the acquisition module is further used for acquiring third training data, wherein the third training data comprises an output result of the trained second sub-network model and a processing result of natural language understanding processing of the training sentences;

the training module is further configured to train the third sub-network model according to the third training data, so as to obtain a trained third sub-network model.

30. The apparatus of claim 29, wherein the first sub-network model is a term matching model for obtaining a plurality of sequence representations matching the target dictionary based on a target sentence entered by a user and the target dictionary, each sequence representation in the plurality of sequence representations corresponding to a target term, the target term being a term matching the target sentence with the target dictionary;

31. The apparatus of claim 29 or 30, wherein each first sequence representation of the plurality of first sequence representations corresponds to a first target term, the first target term being a term of the training sentence that matches the target dictionary.

32. An apparatus for implementing natural language understanding in a man-machine interaction system, comprising: a processor coupled to the memory;

the memory is used for storing instructions;

the processor is configured to execute instructions stored in the memory to cause the apparatus to implement the method of any one of claims 18 to 21.

33. A model training device, comprising: a processor coupled to the memory;

the memory is used for storing instructions;

the processor is configured to execute instructions stored in the memory to cause the apparatus to implement the method of any one of claims 22 to 24.

34. A computer readable medium comprising instructions which, when run on a processor, cause the processor to implement the method of any one of claims 18 to 21 or 22 to 24.