WO2024040831A1

WO2024040831A1 - Natural language processing method and apparatus, electronic device, and storage medium

Info

Publication number: WO2024040831A1
Application number: PCT/CN2022/142456
Authority: WO
Inventors: 李林峰
Original assignee: 湖北星纪魅族科技有限公司
Priority date: 2022-08-26
Filing date: 2022-12-27
Publication date: 2024-02-29
Also published as: CN115409038A

Abstract

A natural language processing method and apparatus, an electronic device, and a storage medium. The natural language processing method comprises: acquiring a task text to be subjected to natural language processing, wherein the task text comprises a plurality of characters; performing feature extraction on the task text by using a shared neural network so as to obtain a shared feature of the task text, wherein the shared feature comprises character features of the plurality of characters and a global relationship among the plurality of characters; and inputting the shared feature into a plurality of functional neural networks to obtain a plurality of processing results respectively output by the plurality of functional neural networks, wherein the plurality of functional neural networks are used for respectively executing a plurality of different natural language processing tasks. According to the natural language processing method, a plurality of neural networks used for natural language processing and having a shared neural network are used to execute a plurality of different natural language processing tasks, such that the parameter scale of a multi-task neural network is reduced, thereby saving computing resources and computing costs.

Description

Natural language processing methods and devices, electronic equipment and storage media

This disclosure claims priority from Chinese Patent Application No. 202211030338.2 submitted on August 26, 2022. The disclosure of the above Chinese patent application is hereby cited in its entirety as part of this disclosure.

Technical field

Embodiments of the present disclosure relate to a natural language processing method and device, electronic equipment and storage media.

Background technique

Natural Language Processing (NLP) is an important direction in the field of computer science and artificial intelligence. It is used to study various theories and methods that can achieve effective communication between humans and computers using natural language. With the continuous development of artificial intelligence technology, natural language processing has begun to be widely used in a variety of application scenarios such as customer service systems, replacing a large number of manual operations.

Deep learning is a major branch of machine learning. Deep learning models, such as convolutional neural networks, recurrent neural networks, etc., are used in natural language processing to continuously learn language features by vectorizing words or sentences to complete natural language classification. , The process of understanding meets the natural language processing requirements of a large number of feature engineering.

Contents of the invention

At least one embodiment of the present disclosure provides a natural language processing method. The natural language processing method includes: obtaining a task text to be processed by the natural language, wherein the task text includes a plurality of characters; using a shared neural network to perform feature extraction on the task text to obtain sharing of the task text Features, wherein the shared features include character features of the multiple characters and global connections between the multiple characters; input the shared features into multiple functional neural networks to obtain the respective functional neural networks. Multiple output processing results, wherein the multiple functional neural networks are used to perform multiple different natural language processing tasks respectively.

For example, in the natural language processing method provided by at least one embodiment of the present disclosure, the shared neural network includes an input sub-network, a word embedding sub-network and a feature extraction sub-network, and the shared neural network is used to process the task text Perform feature extraction to obtain the shared features of the task text, including: using the input sub-network to convert the task text into a word index array, where the multiple index values included in the word index array are consistent with the Multiple characters correspond one to one; the word index array is encoded into multiple word vectors using the word embedding sub-network, wherein the multiple word vectors correspond to the multiple characters one-to-one, and the multiple words Each word vector in the vector includes character features of the corresponding character; based on the multiple word vectors, the feature extraction sub-network is used to extract the global connection between the multiple characters to obtain the shared features.

For example, in the natural language processing method provided by at least one embodiment of the present disclosure, the feature extraction sub-network includes a convolutional neural network and a long short-term memory network.

For example, in the natural language processing method provided by at least one embodiment of the present disclosure, the natural language processing task includes a question and answer functional task, and the question and answer functional task is used to parse the questions in the task text, and provide the Answers corresponding to the questions, the plurality of functional neural networks include a first functional neural network, the first functional neural network is used to perform the question and answer functional task, and the shared features are input into the multiple functional neural networks. network, obtaining the plurality of processing results respectively output by the plurality of functional neural networks, including: using the first functional neural network to perform a first process on the shared features to obtain a sentence vector, wherein the sentence vector including the category information of the question in the task text; comparing the sentence vector with multiple knowledge information vectors pre-stored in the database to calculate the vector distance between the multiple knowledge information vectors and the sentence vector The answer corresponding to the smallest knowledge information vector is used as the processing result corresponding to the first functional neural network.

For example, in the natural language processing method provided by at least one embodiment of the present disclosure, the first processing includes convolution processing, pooling processing, feature fusion processing and fully connected processing.

For example, in the natural language processing method provided by at least one embodiment of the present disclosure, the natural language processing task includes a chat-type functional task, and the chat-type functional task is used to parse purposeless dialogue information in the task text, A system answer corresponding to the purposeless dialogue information is given. The plurality of functional neural networks include a second functional neural network. The second functional neural network is used to perform the chat-type functional task. The functional neural network will The shared features are input into the multiple functional neural networks to obtain the multiple processing results respectively output by the multiple functional neural networks, including: using the second functional neural network to perform a second process on the shared features to obtain A sentence is output as a processing result corresponding to the second functional neural network, and the processing result corresponding to the second functional neural network is used as a system answer corresponding to the task text.

For example, in the natural language processing method provided by at least one embodiment of the present disclosure, the second functional neural network includes a coding sub-network and a decoding sub-network, and the second functional neural network is used to perform all operations on the shared features. The second process to obtain the output sentence as a processing result corresponding to the second functional neural network includes: using the encoding subnetwork to encode the shared features to obtain an intermediate index array; using the decoding The sub-network decodes the intermediate index array to obtain the output sentence as a processing result corresponding to the second functional neural network.

For example, in the natural language processing method provided by at least one embodiment of the present disclosure, the natural language processing task includes a task-type functional task, and the task-type functional task is used to parse the task purpose information and task key in the task text. Word information, obtain system questioning or question and answer results according to the task purpose information and the task keyword information, the plurality of functional neural networks include a third functional neural network, and the third functional neural network is used to perform the task Type functional task, inputting the shared features into the plurality of functional neural networks to obtain the plurality of processing results respectively output by the plurality of functional neural networks includes: using the third functional neural network to Perform a third process on the shared features to obtain intent features and at least one named entity corresponding to the task text, where the intent features include the task purpose information in the task text, and the at least one named entity includes The task keyword information; perform dialogue management on the intent feature and the at least one named entity, and obtain the system questioning or the question and answer result as a processing result corresponding to the third functional neural network.

For example, in the natural language processing method provided by at least one embodiment of the present disclosure, the third functional neural network includes an intent recognition sub-network and a named entity recognition sub-network, and the third functional neural network is used to analyze the shared Perform the third processing on the features to obtain the intent features and the at least one named entity corresponding to the task text, including: using the intent recognition sub-network to perform intent recognition based on the shared features to obtain the intent features corresponding to the task text. The intended features of the task text; using the named entity recognition sub-network, perform named entity recognition based on the shared features to obtain the at least one named entity corresponding to the task text.

For example, in the natural language processing method provided by at least one embodiment of the present disclosure, obtaining the task text to be subjected to the natural language processing includes: obtaining the speech segment to be subjected to the natural language processing. ; Convert the voice clip into text form to obtain the task text.

For example, the natural language processing method provided by at least one embodiment of the present disclosure further includes selecting one processing result from the plurality of processing results as the output result of the natural language processing through arbitration selection.

For example, the natural language processing method provided by at least one embodiment of the present disclosure, before obtaining the task text corresponding to the natural language, further includes: obtaining a training text; based on the training text, multiple functional neural networks to be trained Training is performed to obtain the plurality of trained functional neural networks, where the number of the plurality of functional neural networks is N, and N is an integer greater than 1, wherein in the process of training the N functional neural networks to be trained , the N functional neural networks are trained simultaneously, and the weighted sum of the M intermediate loss values corresponding to the N functional neural networks is calculated as the loss value to update the parameters of the N functional neural networks, and the M The intermediate loss values correspond to M weights respectively. The M weights are dynamically adjusted according to the output accuracy of the N functional neural networks. M is an integer greater than or equal to N.

For example, in the natural language processing method provided by at least one embodiment of the present disclosure, the N functional neural networks include a first functional neural network, a second functional neural network, and a third functional neural network, and the third functional neural network It includes an intent recognition sub-network and a named entity recognition sub-network, and training the multiple functional neural networks to be trained based on the training text includes: using the shared neural network to be trained to characterize the training text. Extract and obtain the training shared features of the training text; use the N functional neural networks to process the training shared features respectively, and obtain M sets of first intermediate results respectively output by the N functional neural networks, where, The M sets of first intermediate results include the first intermediate results output by the first functional neural network, the first intermediate results output by the second functional neural network, the first intermediate results output by the intention recognition sub-network, and A first intermediate result output by the named entity recognition sub-network.

For example, in the natural language processing method provided by at least one embodiment of the present disclosure, training the multiple functional neural networks to be trained based on the training text further includes: based on the training text and the M sets of first intermediate results calculate M intermediate loss values corresponding to the N functional neural networks, where the M intermediate loss values include the intermediate loss values corresponding to the first functional neural network, the second functional The intermediate loss value corresponding to the neural network, the intermediate loss value corresponding to the intention recognition sub-network and the intermediate loss value corresponding to the named entity recognition sub-network; calculate the weighted sum of the M intermediate loss values as the loss value; When the loss value does not satisfy the predetermined convergence condition, the parameters of the shared neural network to be trained and the N functional neural networks are updated based on the loss value.

For example, in the natural language processing method provided by at least one embodiment of the present disclosure, training the multiple functional neural networks to be trained based on the training text also includes: obtaining a test text; using the trained The shared neural network and the trained N functional neural networks process the test text to obtain M sets of second intermediate results; based on the M sets of second intermediate results and the test text, determine the respective corresponding The M output accuracies of the N functional neural networks after training, wherein the M output accuracies include the output accuracy of the first functional neural network, the output accuracy of the second functional neural network degree, the output accuracy of the intention recognition sub-network and the output accuracy of the named entity recognition sub-network; adjust the M weights respectively corresponding to the M intermediate loss values based on the M output accuracies; according to the adjustment The subsequent M weights continue to train the multiple functional neural networks to be trained.

For example, in the natural language processing method provided by at least one embodiment of the present disclosure, adjusting the M weights respectively corresponding to the M intermediate loss values based on the M output accuracies includes: determining the M outputs The weight corresponding to the maximum output accuracy in the accuracy is used as the first weight; keep the first weight unchanged, and increase the other M-1 weights among the M weights except the first weight.

For example, in the natural language processing method provided by at least one embodiment of the present disclosure, increasing M-1 weights other than the first weight among the M weights includes: according to the M-1 The magnitude relationship between the M-1 output accuracies corresponding to the M-1 weights is determined, and the M-1 amplification factors of the M-1 weights are determined, wherein, for any one of the M-1 output accuracies, the output accuracy , in response to the greater the accuracy of any of the outputs, the smaller the amplification factor of the weight corresponding to the accuracy of any of the outputs; adjust the M-1 weights according to the amplification factors of the M-1 weights.

At least one embodiment of the present disclosure also provides a natural language processing device. The natural language processing device includes: an acquisition module configured to acquire a task text to be processed by the natural language, wherein the task text includes a plurality of characters; and an extraction module configured to use a shared neural network to perform the task text on the task text. Feature extraction to obtain shared features of the task text, where the shared features include character features of the multiple characters and global connections between the multiple characters; a processing module configured to input the shared features Multiple functional neural networks are used to obtain multiple processing results respectively output by the multiple functional neural networks, where the multiple functional neural networks are used to perform multiple different natural language processing tasks.

For example, in the natural language processing device provided by at least one embodiment of the present disclosure, the acquisition module is further configured to acquire training text.

For example, the natural language processing device provided by at least one embodiment of the present disclosure further includes a training module configured to, based on the training text, train multiple functional neural networks to be trained to obtain the trained Described multiple functional neural networks, wherein the number of the multiple functional neural networks is N, and N is an integer greater than 1, wherein, in the process of training the N functional neural networks to be trained, the N functional neural networks Neural networks are trained simultaneously, and the weighted sum of M intermediate loss values corresponding to the N functional neural networks is calculated as the loss value to update the parameters of the N functional neural networks, and the M intermediate loss values correspond to M respectively. weights, and the M weights are dynamically adjusted according to the output accuracy of the N functional neural networks, where M is an integer greater than or equal to N.

At least one embodiment of the present disclosure also provides an electronic device. The electronic device includes: a processor; a memory including one or more computer program modules; wherein the one or more computer program modules are stored in the memory and configured to be executed by the processor, the One or more computer program modules are included for implementing the natural language processing method provided by any embodiment of the present disclosure.

At least one embodiment of the present disclosure also provides a storage medium for storing non-transitory computer-readable instructions. When the non-transitory computer-readable instructions are executed by a computer, the natural language provided by any embodiment of the present disclosure can be realized. Approach.

Description of drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly introduced below. Obviously, the drawings in the following description only relate to some embodiments of the present disclosure and do not limit the present disclosure. .

Figure 1 is a schematic diagram of a multi-task neural network natural language processing system;

Figure 2 is an exemplary flow chart of a natural language processing method provided by at least one embodiment of the present disclosure;

Figure 3 is a schematic diagram of an example of a shared neural network and multiple functional neural networks provided by at least one embodiment of the present disclosure;

Figure 4 is an exemplary flow chart of an example of step S120 in Figure 2;

Figure 5 is a schematic diagram of another example of a shared neural network and multiple functional neural networks provided by at least one embodiment of the present disclosure;

Figure 6 is a schematic diagram of an example of a first functional neural network provided by at least one embodiment of the present disclosure;

Figure 7 is a schematic diagram of an example of a second functional neural network provided by at least one embodiment of the present disclosure;

Figure 8 is a schematic diagram of an example of a third functional neural network provided by at least one embodiment of the present disclosure;

Figure 9 is a schematic diagram of the training part of the natural language processing method provided by at least one embodiment of the present disclosure;

Figure 10 is a schematic diagram of an example of step S150 in Figure 9;

Figure 11 is a schematic diagram of another example of step S150 in Figure 9;

Figure 12 is a schematic diagram of an example of a loss function of a shared neural network and multiple functional neural networks provided by at least one embodiment of the present disclosure;

Figure 13 is a schematic block diagram of a natural language processing device provided by at least one embodiment of the present disclosure;

Figure 14 is a schematic block diagram of an electronic device provided by at least one embodiment of the present disclosure;

Figure 15 is a schematic block diagram of another electronic device provided by at least one embodiment of the present disclosure; and

Figure 16 is a schematic diagram of a storage medium provided by at least one embodiment of the present disclosure.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings of the embodiments of the present disclosure. Obviously, the described embodiments are some, but not all, of the embodiments of the present disclosure. Based on the described embodiments of the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present disclosure.

Unless otherwise defined, technical terms or scientific terms used in this disclosure shall have the usual meaning understood by a person with ordinary skill in the art to which this disclosure belongs. "First", "second" and similar words used in this disclosure do not indicate any order, quantity or importance, but are only used to distinguish different components. Likewise, similar words such as "a", "an" or "the" do not indicate a quantitative limitation but rather indicate the presence of at least one. Words such as "include" or "comprising" mean that the elements or things appearing before the word include the elements or things listed after the word and their equivalents, without excluding other elements or things. Words such as "connected" or "connected" are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "Up", "down", "left", "right", etc. are only used to express relative positional relationships. When the absolute position of the described object changes, the relative positional relationship may also change accordingly.

The present disclosure is described below through several specific embodiments. In order to keep the following description of the embodiments of the present disclosure clear and concise, detailed descriptions of known functions and components may be omitted. When any component of the disclosed embodiments appears in more than one drawing, the component is designated by the same or similar reference number in each drawing.

Natural language processing refers to the technology that uses natural language used in human communication to interact with machines, involving multi-dimensional operations such as speech, grammar, semantics, and pragmatics. Simply put, the basic task of natural language processing is to segment the corpus to be processed based on ontology dictionary, word frequency statistics, contextual semantic analysis and other methods to form word units with the smallest part of speech as the unit and rich in semantics.

Natural language processing is widely used in human-computer interactive communication situations, such as human-computer voice interaction in cars and mobile phone voice assistants. Human-computer voice interaction in cars and mobile phone voice assistants are multi-vertical applications that support open domains. The human-computer dialogue includes encyclopedia-like knowledge questions and answers, purposeless chatting, and conversations to complete a specific task. Interaction, such as controlling vehicles, checking train tickets, etc. For example, according to different types of dialogue, natural language processing tasks can be divided into question-and-answer functional tasks, chat-type functional tasks and task-type functional tasks:

(1) Question and answer functional tasks can handle question and answer based on knowledge base, for example:

User: How many meters high is Mount Everest?

System: Mount Everest is 8,848 meters high.

(2) Chat-type functional tasks can handle purposeless conversations, for example:

User: I'm in a bad mood today.

System: Only by smiling often can you prolong your life.

(3) Task-type functional tasks can handle dialogues with word slots or multiple rounds, for example:

User: Help me find the ticket to Beijing.

System: The following tickets were found. What time do you want to depart?

User: Around 3:30, first-class high-speed train.

System: There are the following trains. Does it depart from Hongqiao Station or Shanghai Station?

User: Shanghai Station.

System: Okay, here are the first-class high-speed rail tickets from Shanghai Station to Beijing around 3:30 today.

For example, since different natural language processing tasks perform different functions, the processing required to complete the corresponding functions is also different. For example, in order to complete the above three different dialogue types, neural network implementations with different structures need to be used. Due to the different structures of neural networks, each neural network independently processes the input of the neural network and outputs the processing result of the neural network.

Figure 1 is a schematic diagram of a multi-task neural network for natural language processing. For example, as shown in Figure 1, the multi-task neural network consists of three different neural networks: question and answer neural network, chat neural network and task neural network. The three neural networks are set up independently and include input layer, NLP respectively. Feature extraction part, multiple hidden layers (such as hidden layer 1-1,..., hidden layer 1-x, etc. in Figure 1) and output layer. The hidden layer can be a suitable neural network structure selected according to actual needs. For example, the weight parameters of each layer of the three neural networks and the specific structure and number of layers of the hidden layer may be different, so that they each complete different functions. For example, question-and-answer neural networks are used to perform question-and-answer functional tasks, chat-type neural networks are used to perform chat-type functional tasks, and task-based neural networks are used to perform task-type functional tasks. For example, the same task text can be input into three neural networks respectively, and the three neural networks perform different inferences on the task text to obtain their own processing results. Finally, arbitration selects the best processing result as the answer to the task text.

For example, for the multi-task neural network in Figure 1, the three neural networks are composed of multiple network layers (such as input layer, hidden layer, etc.). The scale of each neural network is large, which will lead to too many neural network models. Moreover, the size of the model is large, so it takes up too much computing resources; especially on end-side devices with limited resources, setting up and running three neural networks at the same time often leads to the problem of insufficient resources.

For example, as shown in Figure 1, the three neural networks all include an input layer and a feature extraction part. The three network layers have similar functions and are all used to extract features from the input task text; since the three input neural networks are the same The task text, the input layer of the three neural networks and the NLP feature extraction part can share parameter weights. However, since the three neural networks are set up independently, the first three layers that can share parameter weights are also set up separately, resulting in duplication and waste of computing resources.

At least one embodiment of the present disclosure provides a natural language processing method. The natural language processing method includes: obtaining a task text to be processed by natural language, where the task text includes multiple characters; using a shared neural network to extract features from the task text to obtain shared features of the task text, where the shared features include multiple characters. Character features of characters and global connections between multiple characters; input shared features into multiple functional neural networks to obtain multiple processing results output by multiple functional neural networks respectively, where multiple functional neural networks are used to execute respectively Multiple different natural language processing tasks.

Various embodiments of the present disclosure also provide a device, electronic device or storage medium corresponding to executing the above natural language processing method.

In the natural language processing method provided by at least one embodiment of the present disclosure, a shared neural network is used to extract shared features that can be shared between different functional tasks, such as character features of the task text itself, contextual connections between characters, etc. Each shared neural network The network performs different subsequent processing on the shared features to perform different natural language processing tasks, thereby enabling multiple functional neural networks to share the weight parameters of the shared neural network, reducing the size of the neural network parameters and thus saving computing resources. Avoid duplication and waste of computing resources and save computing costs.

Below, at least one embodiment of the present disclosure will be described in detail with reference to the accompanying drawings. It should be noted that the same reference numbers in different figures will be used to refer to the same elements that have been described.

Figure 2 is an exemplary flow chart of a natural language processing method provided by at least one embodiment of the present disclosure.

For example, as shown in FIG. 2 , a natural language processing method provided by at least one embodiment of the present disclosure is used to process multiple different natural language tasks at the same time. For example, the natural language processing method includes the following steps S110 to S130.

Step S110: Obtain the task text to be processed by natural language;

Step S120: Use a shared neural network to extract features from the task text to obtain shared features of the task text;

Step S130: Input the shared features into multiple functional neural networks to obtain multiple processing results respectively output by the multiple functional neural networks.

For example, in step S110, the task text to be subjected to natural language processing is, for example, a string of sentences input by the user during human-computer interaction, that is, the task text includes multiple characters.

For example, the task text can be text in various languages, such as Chinese, English, Japanese, etc. For example, when the task text is Chinese text, the characters are in the form of a single Chinese character, and when the task text is English text, the characters are in the form of a single word. For example, the task text can also include various numbers, etc., and a single number can also be used as a character.

For example, the task text can be a text in a single language, such as a pure Chinese text. For example, the task text can also be a mixed text in multiple languages, such as a mixed Chinese and English text. This disclosure does not place specific restrictions on the form and language of the task text.

For example, in some examples, step S110 may include: obtaining a voice segment to be subjected to natural language processing; and converting the voice segment into text form to obtain the task text. For example, in applications such as human-computer voice interaction, the user's voice fragments are first obtained, and then the voice fragments are converted into text form as the task text. This disclosure places no specific restrictions on the specific conversion method of speech clips into task text.

For example, in step S120, the shared features include character features of multiple characters and global connections between multiple characters. For example, the task text is transformed into multiple feature vectors through feature extraction by the shared neural network. The shared features are included in multiple feature vectors, that is, multiple feature vectors contain both the meaning of each character and the connections between all characters. (i.e. global connection), thus containing effective information of the entire input sentence. Here, the global connection between multiple characters is the contextual connection between multiple characters. The character characteristics reflect the meaning of a single character, and the global connection between multiple characters reflects the meaningful context between multiple characters. Relationships, express the effective message of the sentence.

For example, in step S130, multiple functional neural networks process shared features to obtain multiple processing results. For example, multiple functional neural networks are used to perform multiple different natural language processing tasks respectively. For example, depending on the type of conversation, multiple different natural language processing tasks may include question-and-answer functional tasks, chat-type functional tasks, and task-based functional tasks, or may also include other types generated based on user input text during human-computer interaction. Functional tasks, embodiments of the present disclosure do not limit the types of natural language processing tasks.

FIG. 3 is a schematic diagram of an example of a shared neural network and multiple functional neural networks provided by at least one embodiment of the present disclosure. For example, in some examples, the natural language processing method shown in Figure 2 can be implemented by a shared neural network as shown in Figure 3 and N functional neural networks used to perform N different natural languages respectively. Processing tasks, where N is a positive integer.

For example, as shown in Figure 3, the shared neural network can be used to extract features from the task text in step S120 to obtain the shared features of the task text; N functional neural networks can be used to process the shared feature input in step S130. Output multiple processing results separately.

For example, compared with the input layers and NLP feature extraction parts of the three independent neural networks in Figure 1 being set independently, the natural language processing method provided by at least one embodiment of the present disclosure can share the parameter weights of the shared neural network in Figure 3, sharing For example, the neural network can be formed by merging the input layers and NLP feature extraction parts of three independent neural networks in Figure 1; the shared neural network can also be fused by the network layers that share weights in multiple other independent neural networks. This disclosure The embodiment does not limit this.

For example, as shown in Figure 3, the shared neural network includes an input sub-network S1, a word embedding sub-network S2 and a feature extraction sub-network S3.

For example, the input subnetwork S1 can be implemented as a One-Hot conversion layer, configured to perform one-hot encoding on each character in the task text, convert each character into its corresponding index value, and each index value constitutes a word index. array. Of course, the input subnetwork S1 can also be implemented as other structures, and the index value is not limited to the one-hot code form, as long as each character of the task text can be converted into a unique corresponding index value.

For example, the word embedding subnetwork S2 can convert the word index array into a multi-dimensional word vector to represent the meaning of each word (i.e., character features); the word embedding subnetwork S2 can be implemented as a suitable neural network structure according to actual needs.

For example, the feature extraction subnetwork S3 is configured to extract the global connection between multiple characters of the task text and obtain multiple feature vectors. For example, the feature extraction sub-network S can include a convolutional neural network (CNN) and a long-short memory network (Long-Stort Term Memory, LSTM), or even a larger BERT (Bidirectional Encoder Representation from Transformer) network. It can also be other convolutional, fully connected or larger-scale neural networks. The embodiments of the present disclosure do not limit the network structure of the feature extraction sub-network S3.

Since the shared neural network needs to extract shared features that can be shared by N functional neural networks, the network scale or parameter scale of the shared neural network needs to be set larger than the NLP extraction part that is set independently in Figure 1. For example, features can be added The parameter scale of the extraction sub-network S3 can be increased, for example, by increasing the weight parameters in the feature extraction sub-network S3 by 20%. Of course, you can also choose to increase the specific parameter scale as needed. The embodiments of the present disclosure do not limit this. .

For example, as shown in Figure 3, N functional neural networks (S4, S5,..., S(N+3)) respectively include multiple hidden layers and output layers. The hidden layers can be suitable neural networks selected according to actual needs. Network structure. For example, the network parameters of each network layer of N functional neural networks and the specific structure and number of hidden layers can be set as needed to perform different natural language processing tasks (for example, including but not limited to question and answer functional tasks, chatting functional tasks). tasks and task-type functional tasks, etc.).

It should be noted that FIG. 3 shows only an example of a shared neural network and multiple functional neural networks used in the natural language processing method proposed by the embodiment of the present disclosure. The specific neural network layers of the multiple functional neural networks are described in this disclosure. There are no restrictions on the number of structures or functional neural networks.

FIG. 4 is an exemplary flowchart of an example of step S120 in FIG. 2 .

For example, as shown in Figure 4, step S120 in the natural language processing method shown in Figure 2 includes the following steps S121 to S123.

Step S121: Use the input sub-network to convert the task text into a word index array;

Step S122: Use the word embedding subnetwork to encode the word index array into multiple word vectors;

Step S123: Based on multiple word vectors, use the feature extraction sub-network to extract global connections between multiple characters to obtain shared features.

In step S121, after the task text is input into the input sub-network S1, multiple index values included in the word index array output by the input sub-network S1 correspond one-to-one to multiple characters in the task text. For example, if each character corresponds to an index value, the entire task text is converted into an array composed of index values. For example, a corresponding index value can be encoded in advance for all possible characters that need to be used in the language used in the task text. The index value can be an integer value, representing the index of each character.

For example, in order to make the dimension of the input data of the word embedding sub-network S2 fixed, the length of the array can be a preset fixed value step (for example, set step = 70). If the character length of the task text exceeds step, the length of the task text is step + 1 character and the following part will be truncated and discarded. If the character length of the task text is less than step, the missing part will be filled with specific meaningless characters. For example, the specific meaningless characters here can correspond to a predetermined index value. The index value is different from the index value corresponding to any character. When the index value appears, it means that the corresponding character is a meaningless character.

For example, take the task text "Play a song of Andy Lau's Forgetful Love" as an example. After the task text passes through the input sub-network S1, the length of the output word index array is step, and the first 10 elements in the array are steps in the task text. The index values corresponding to 10 characters, and the remaining step-10 index values are represented by the index values corresponding to the specific meaningless characters mentioned above, thereby converting the user input sentence into a word index array composed of index values.

In step S122, the word embedding sub-network S2 embeds the index value corresponding to each character in the task text and encodes the word index array into multiple word vectors. Multiple word vectors correspond to multiple index values one-to-one, and thus correspond to multiple characters one-to-one; each word vector in the multiple word vectors includes the character characteristics of the corresponding character (such as the meaning of each character itself) and the character The connection between the preceding and following characters (such as the meaning of a word).

For example, the word index array output by the input sub-network S1 is transformed into a multi-dimensional word vector through the word embedding sub-network S2. A multi-dimensional word vector is a multi-dimensional floating-point matrix composed of multiple word vectors, which is used to represent the meaning of each word (ie, character characteristics). For example, if the multi-dimensional word vector is a 32-dimensional (DIM) matrix, then each word is represented by a one-dimensional array (ie, word vector) of 32 elements; when the word index array length step=70, the multi-dimensional word vector is a [ 70,32] matrix, each element in the matrix is a floating point number.

For example, taking the task text "Play a song of Andy Lau's Love Affair" as an example, the task text passes through the input sub-network S1 and then outputs a word index array with a length of step. The word index array passes through the word embedding sub-network S2 and becomes a multi-dimensional word. vector. For example, the first 10 word vectors correspond to 10 characters in the task text. The first 10 word vectors contain the character characteristics of the 10 characters (such as the meaning of each character itself in the task text) and the relationship between each of the 10 characters. The connection between the preceding and following characters (such as the meaning of each word in the task text); the remaining step-10 word vectors are represented by floating point numbers (such as null) corresponding to the above-mentioned specific meaningless characters, thus the words corresponding to the task text are The index array is converted into a multidimensional floating-point matrix consisting of word vectors.

In step S123, based on the multiple word vectors output by the word embedding sub-network S2, the feature extraction sub-network S3 extracts the global connection between multiple characters of the task text from the multiple word vectors, and obtains multiple feature vectors. feature vectors contain shared features. Therefore, multiple feature vectors contain both the meaning of each character and the connections between all characters (i.e., global connections), thus containing effective information of the entire task text.

For example, taking the task text "Play a song of Andy Lau's Forgetful Water" as an example, the task text outputs a multi-dimensional floating-point matrix composed of multiple word vectors after passing through the input sub-network S1 and the word embedding sub-network S2. After the feature extraction sub-network S3, it becomes multiple feature vectors. For example, multiple feature vectors contain both the meaning of each of the 10 characters and the connection between the 10 characters (i.e., the global connection), thereby containing effective information of the entire task text (for example, including the user's request to play a song Intention information, as well as keyword information such as "Yiyi", "Andy Lau", "Wangqingshui").

FIG. 5 is a schematic diagram of another example of a shared neural network and multiple functional neural networks provided by at least one embodiment of the present disclosure. For example, the multiple neural networks for natural language processing shown in FIG. 5 may include the shared neural network shown in FIG. 3 and N functional neural networks. In the example shown in FIG. 5, N=3.

For example, as shown in Figure 5, N functional neural networks include a first functional neural network S4, a second functional neural network S5, and a third functional neural network S6, which are respectively used to perform different natural language processing tasks.

For example, in some examples, the first functional neural network S4 can be used to process question-and-answer type functional tasks, the second functional neural network S5 can be used to process chatting type functional tasks, and the third functional neural network S6 can be used to process task-type functions. Task; the first functional neural network S4, the second functional neural network S5 and the third functional neural network S6 can also be used to perform other types of natural language processing tasks respectively, and the embodiments of the present disclosure do not limit this.

The structure of the three functional neural networks and the specific process of obtaining the processing results will be described in detail below in conjunction with Figure 5.

For example, in some examples, natural language processing tasks include question and answer functional tasks. The question and answer functional tasks deal with question and answer based on knowledge bases. For example, the question and answer functional tasks are used to parse questions in the task text and give answers corresponding to the questions.

For example, as shown in Figure 5, N functional neural networks include a first functional neural network S4. The first functional neural network S4 is used to perform a question-and-answer functional task and output a sentence vector. For example, the sentence vector includes the task text. The category information of the question (for example, general knowledge, science, etc.); after that, the sentence vector needs to be first post-processed to obtain the answer corresponding to the question in the task text.

For example, based on the first functional neural network S4, in some examples, step S130 in Figure 2 may further include: performing a first process on the shared features using the first functional neural network S4 to obtain a sentence vector; comparing the sentence vector with the database The multiple pre-stored knowledge information vectors are compared, and the answer corresponding to the knowledge information vector with the smallest vector distance from the sentence vector among the multiple knowledge information vectors is used as the processing result corresponding to the first functional neural network.

For example, the first processing includes convolution processing, pooling processing, feature fusion processing, and fully connected processing.

For example, the first functional neural network may be a convolutional neural network. FIG. 6 is a schematic diagram of an example of a first functional neural network provided by at least one embodiment of the present disclosure.

For example, as shown in Figure 6, in one example, the first functional neural network S4 may include a convolution layer, a pooling layer, a fusion layer, a fully connected layer, and an output layer, respectively used for the convolution in the above-mentioned first processing. Processing, pooling processing, feature fusion processing and full connection processing, and finally the sentence vector is obtained. The specific structures of the convolution layer, pooling layer, fusion layer, fully connected layer, and output layer can be set as needed, and this disclosure does not place specific restrictions on this.

It should be noted that the first functional neural network can be a convolutional fully connected network as shown in Figure 6 (for example, TextCNN, etc.), or the twin-tower model can also be used as the first functional neural network S4, or it can also be Other network structures can realize the task of extracting sentence vectors. The embodiments of the present disclosure do not limit the structure of the first functional neural network.

For example, a large number of questions and their answers containing various types of encyclopedia knowledge information can be set in advance, and the sentence vectors of the preset questions can be extracted and stored in the database as knowledge information vectors. In the first post-processing process of the sentence vector, the sentence vector is compared with multiple knowledge information vectors pre-stored in the database, for example, the distance between each knowledge information vector and the sentence vector is calculated to find the vector with the sentence vector. The knowledge information vector with the smallest distance, and the answer corresponding to the knowledge information vector can be used as the processing result of the first functional neural network S4, that is, the answer corresponding to the question in the task text, that is, the answer to the question raised by the user.

It should be noted that, regarding the specific process of the first post-processing, other feasible methods can also be used to obtain the answer based on the sentence vector, and this disclosure does not impose specific limitations on this.

For example, when performing a question-and-answer functional task, the first functional neural network outputs a sentence vector corresponding to the task text, and the sentence vector needs to be post-processed to obtain the final processing result as the processing result of the first functional neural network S4.

For example, in other examples, natural language processing tasks include small talk functional tasks, which can handle purposeless conversations. For example, the chat function task is used to parse the purposeless dialogue information in the task text and provide system answers corresponding to the purposeless dialogue information.

For example, as shown in Figure 5, the N functional neural networks include the second functional neural network S5. The second functional neural network S5 is used to perform chatting functional tasks and can directly obtain the output sentence as the system answer corresponding to the task text, without Further post-processing operations.

For example, based on the second functional neural network S5, in some examples, step S130 in Figure 2 may further include: using the second functional neural network S5 to perform a second process on the shared features to obtain an output sentence as a corresponding sentence corresponding to the second function. The processing result of the neural network, and the processing result corresponding to the second function neural network is used as the system answer corresponding to the task text.

For example, the second processing includes encoding processing and decoding processing. For example, the second functional neural network is typically based on a recurrent neural network architecture, including an encoding subnetwork and a decoding subnetwork. FIG. 7 is a schematic diagram of an example of a second functional neural network provided by at least one embodiment of the present disclosure.

For example, as shown in Figure 7, in one example, the encoding subnetwork includes a recurrent network, and the decoding subnetwork includes a recurrent network, a fully connected layer, and a decoding layer. In some examples, the decoding layer in the decoding sub-network generally uses Viterbi decoding, and other decoding methods can also be used; the encoding sub-network and the decoding sub-network can also be other structures that can implement encoding or decoding functions. Embodiments of the present disclosure There are no restrictions on this.

For example, the process of using the second functional neural network S5 to perform the second processing on the shared features may further include: using the encoding subnetwork to encode the shared features to obtain an intermediate index array; using the decoding subnetwork to decode the intermediate index array to obtain the output. Sentences as processing results corresponding to the second functional neural network.

For example, the intermediate index array is the encoding result obtained by encoding the shared features through the encoding subnetwork, and the intermediate index array is then decoded by the decoding subnetwork to obtain the output sentence. The output sentence is directly used as the processing result of the second functional neural network S5, that is, the system answer corresponding to the task text, that is, the response to the purposeless chat conversation sent by the user.

It should be noted that the second functional neural network can be the architecture of a recurrent neural network as shown in Figure 7 (for example, the recurrent network is RNN (Recurrent Neural Network), LSTM or GRU (Gate Recurrent Unit), etc.), or it can be Other network structures that convert shared features into output sentences can be implemented, and embodiments of the present disclosure do not limit this.

For example, in some examples, the natural language processing task includes a task-type functional task, and the task-type functional task can process a dialogue with word slots or a multi-turn dialogue. Task-type functional tasks are used to parse the task purpose information and task keyword information in the task text, and obtain system questioning or question and answer results based on the task purpose information and task keyword information. For example, task-based functional tasks include processing some complex dialogue processes, such as sentences with word slots, multi-turn dialogues that consider context, etc.

For example, in order to perform task-based functional tasks, intent recognition, named entity recognition, dialogue management, etc. need to be completed. For example, as shown in Figure 5, the N functional neural networks may include a third functional neural network S6. The third functional neural network S6 is used to perform task-type functional tasks and output intent features and named entities. The intent features and named entities are respectively Corresponding to the task purpose information and task keyword information in the task text; through the second post-processing operation, the intent features and named entities are post-processed to obtain system questioning or question and answer results corresponding to the task text.

For example, based on the third functional neural network S6, in some examples, step S130 in Figure 2 may further include: using the third functional neural network S6 to perform a third process on the shared features to obtain the intent features corresponding to the task text and at least A named entity; performs dialogue management on the intent feature and at least one named entity, and obtains system questioning or question and answer results as processing results corresponding to the third function neural network.

For example, the third functional neural network S6 includes an intention recognition sub-network S61 and a named entity recognition sub-network S62. Using the third functional neural network S6 to perform third processing on the shared features to obtain the intent features and at least one named entity corresponding to the task text, which may include: using the intent recognition sub-network S61 to perform intent recognition based on the shared features to obtain the intent features corresponding to the task text Intentional features of the text; using the named entity recognition sub-network S62, perform named entity recognition based on the shared features to obtain at least one named entity corresponding to the task text.

For example, the intent feature contains the task purpose information in the task text, that is, the intent feature represents the implicit information that the user wants to express. For example, in some specific examples, the intention of "Play me a song by Andy Lau" is to play a song by a certain singer, and the intention of "What will the weather be like in Shanghai tomorrow" is a weather query.

For example, at least one named entity contains task keyword information (also called named entity information or word slot information), that is, named entity recognition (Named Entity Recognition, NER) is to recognize the named entity information (including Keyword location, type).

For example, in one example, the user wants to book a train ticket. The shared neural network and multiple functional neural networks need to complete the interactive dialogue with the user and finally complete the ticket booking task. For example:

User: Book a train ticket for me.

System: OK, where are you going?

User: Shanghai.

System: Okay, then when will we set off?

User: Let’s do it at 10 a.m. tomorrow.

System: OK, train G0001 was found. It will depart from Beijing South Railway Station to Shanghai Hongqiao Station at 10:15 tomorrow. Do you want to make a reservation?

User: Yes.

System: OK, the ticket has been booked.

For example, in order to realize the above-mentioned task-type functional tasks, it is necessary to analyze the task purpose information and task keyword information in the task text corresponding to the voice fragments uttered by the user. For parsing the task purpose information in the task text, for example, it is necessary to understand the intention contained in each sentence of the user, for example, the intention of the first sentence is "book a train ticket"; for parsing the task keyword information in the task text, for example, it is necessary to extract Keywords (named entities, also called word slots) in each sentence, such as "Shanghai", "10 o'clock", etc.; In addition, based on the obtained task purpose information and task keyword information, the user needs to be asked how to complete the ticket booking Other information required, such as the location and time in the example above.

For example, the task text including the above-mentioned task purpose information and task keyword information is converted into shared features through the shared neural network; the shared features are intention-identified through the intention recognition word network S61, and the intention features are output (for example, the intention features include the user's requirements for ordering train ticket intention information), use the named entity sub-network S62 to perform named entity recognition based on shared features, and obtain one or more named entities (for example, the named entities include keyword information such as "Shanghai" and "10 o'clock"); and then Intent features and named entities undergo second post-processing to obtain system questions (for example, asking about location and time, etc.) or question and answer results (for example, answers to complete ticket bookings) corresponding to the task text.

For example, as shown in Figure 5, dialog management (DM) can be performed on the intent feature and at least one named entity in the second post-processing process. For example, dialogue management is to maintain and update the status information and context required for task-type functional tasks, such as what information needs to be asked in the next sentence, when to end the reply, when to ask questions, etc. For example, in the process of human-computer interaction, dialogue is generated through system questioning, thereby continuously improving communication, obtaining valuable information, and obtaining question and answer results.

FIG. 8 is a schematic diagram of an example of a third functional neural network provided by at least one embodiment of the present disclosure.

For example, as shown in Figure 8, in one example, the intent recognition sub-network can include a convolution layer, a pooling layer, a fusion layer, a fully connected layer, and an activation function layer (such as a SOFTMAX layer), which are respectively used to perform shared feature analysis. Convolution processing, pooling processing, feature fusion processing, fully connected processing and classification processing, etc., to obtain the intended features corresponding to the task text. The named entity recognition sub-network can include a Bidirectional Long-Short Term Memory (Bi-LSTM) layer, a fully connected layer and a decoding layer, which are respectively used for contextual information processing, full connection processing and decoding processing of shared features. wait.

It should be noted that the third function neural network can also be other network structures that can realize the conversion of shared features to intention features and named entities. The intention recognition sub-network and the named entity recognition sub-network can also be capable of realizing intention recognition or named entity recognition. Regarding other structures of functions, embodiments of the present disclosure do not limit the specific structures of the intent recognition sub-network and the named entity recognition sub-network.

For example, the natural language processing method provided by at least one embodiment of the present disclosure further includes: selecting one processing result from multiple processing results as the output result of natural language processing through arbitration selection. For example, a task text will produce multiple processing results after being processed by a shared neural network and multiple functional neural networks; a final output can be selected from multiple processing results through arbitration according to the different natural language processing tasks corresponding to the task text. result. For example, as shown in Figure 5, the processing result corresponding to the first neural network is the answer to the question in the task text, the processing result corresponding to the second neural network is the system answer corresponding to the purposeless dialogue information, and the processing result corresponding to the third neural network is The processing result is the result of system questioning or question and answer.

For example, if the task text is a question posed by the user, that is, the natural language processing task is a question and answer functional task, then the answer corresponding to the question output by the functional neural network used to perform the question and answer functional task is selected as the final output result; if The task text mainly contains purposeless dialogue information, that is, the natural language processing task is a chat-type functional task, then the system answer output by the functional neural network used to perform the chat-type functional task is selected as the final output result; if the task text It mainly contains task purpose information and task keyword information. That is, if the natural language processing task is a task-type functional task, then the system questioning or question and answer result output by the functional neural network used to perform the task-type functional task is selected as the final output result. .

For example, in some examples, the arbitration selection may include the following methods: if contextual information, that is, a multi-round conversation scenario, is detected, then the system questioning or question and answer result is selected as the final output result; if no multi-round conversation scenario is detected, then Based on the static priorities set in advance (such as the priorities of question-and-answer tasks and task-based tasks), the output of the functional neural network corresponding to the high-priority task is selected as the final output result; in addition to the static priorities set in advance, criticism also needs to be considered degree (such as the number of word slots), the confidence level inferred by the model, etc. It should be noted that other implementation methods can be selected for arbitration selection according to actual needs, and the embodiments of the present disclosure do not limit this.

The natural language processing method provided by at least one embodiment of the present disclosure uses multiple neural networks for natural language processing with shared neural networks to perform multiple different natural language processing tasks, reducing the scale of neural network parameters, thereby saving money. Computing resources save computing costs.

For example, the shared neural network and multiple functional neural networks provided by at least one embodiment of the present disclosure are obtained through training in advance. Figure 9 is a schematic diagram of the training part of the natural language processing method provided by at least one embodiment of the present disclosure.

For example, as shown in Figure 9, before step S110 in Figure 2, the natural language processing method further includes the following steps S140 to S150.

Step S140: Obtain training text;

Step S150: Based on the training text, train multiple functional neural networks to be trained to obtain multiple trained functional neural networks.

For example, the number of multiple functional neural networks is N, where N is an integer greater than 1. For example, in the process of training N functional neural networks to be trained, N functional neural networks are trained simultaneously, and the weighted sum of M intermediate loss values corresponding to the N functional neural networks is calculated as the loss value to update the N functional neural networks. parameters. For example, M intermediate loss values correspond to M weights respectively, and the M weights are dynamically adjusted according to the output accuracy of N functional neural networks, where M is an integer greater than or equal to N.

For example, referring to Figure 5, when the natural language processing task includes three tasks, N=3, that is, the N functional neural networks include the first functional neural network, the second functional neural network and the third functional neural network, and The third functional neural network includes an intent recognition sub-network and a named entity recognition sub-network, so M=4.

For example, in the process of training the first functional neural network, the second functional neural network and the third functional neural network to be trained, the three functional neural networks are trained simultaneously. The first functional neural network, the second functional neural network, the intent recognition sub-network and the named entity recognition sub-network respectively correspond to 4 intermediate loss values (i.e. M=4), and the weighted sum of the 4 intermediate loss values is calculated as the loss value to update Parameters of 3 functional neural networks. For example, 4 intermediate loss values correspond to 4 weights respectively, and the 4 weights can be dynamically adjusted according to the output accuracy of the 4 functional neural networks.

FIG. 10 is a schematic diagram of an example of step S150 in FIG. 9 .

For example, as shown in Figure 10, step S150 in Figure 9 may include the following steps S151 to S155.

Step S151: Use the shared neural network to be trained to perform feature extraction on the training text to obtain training shared features of the training text.

For example, regarding the structure of the shared neural network to be trained, please refer to the relevant content in Figure 3, which will not be described again here.

For example, the training text is the task text used in the neural network training process. The training text is converted into multiple training feature vectors through feature extraction of the shared neural networks S1 to S3 to be trained. The training shared features are included in multiple training feature vectors. , the training shared features include the character features of multiple characters in the training text and the global connections between multiple characters.

For example, a large number of task texts and standard processing results corresponding to the task texts can be constructed in advance to train the neural network, and any task text can be selected as the training text.

Step S152: Use N functional neural networks to process the training shared features respectively, and obtain M sets of first intermediate results respectively output by the N functional neural networks.

For example, when the N functional neural networks include the first functional neural network, the second functional neural network, and the third functional neural network, the M group of first intermediate results include the first intermediate results output by the first functional neural network S4, the second The first intermediate result output by the functional neural network S5, the first intermediate result output by the intention recognition sub-network and the first intermediate result output by the named entity recognition sub-network.

For example, for the first functional neural network S4 used to perform question-and-answer functional tasks, the first intermediate result output is a training sentence vector, and the training sentence vector includes category information of questions in the training text; for the first functional neural network S4 used to perform small talk functional tasks The first intermediate result output by the second functional neural network S5 is the training output sentence, and the training output sentence includes the system answer corresponding to the training text; for the third functional neural network S6 used to perform task-type functional tasks, the intent recognition sub-network output The first intermediate result is the training intention feature, and the first intermediate result output by the named entity recognition sub-network is one or more training named entities.

Step S153: Calculate M intermediate loss values corresponding to N functional neural networks based on the training text and M sets of first intermediate results.

For example, based on the training text and M sets of first intermediate results output by N functional neural networks respectively, such as training sentence vectors, training output sentences, training intention features, training named entities, etc., each function is calculated according to the loss function corresponding to each functional neural network. The functional neural network corresponds to the intermediate loss value respectively.

For example, M may be equal to N or not equal to N. For example, when the third functional neural network includes two sub-networks, M=N+1.

Step S154: Calculate the weighted sum of M intermediate loss values as the loss value.

For example, the M intermediate loss values corresponding to N functional neural networks are Loss1, Loss2...LosM respectively. The loss function of the shared neural network and N functional neural networks after training in steps S151 to S154 can be used as the following formula (1) express:

Loss＝k ₁ *Loss1+k ₂ *Loss2+……+k _M *LossM (1)

Among them, k ₁ is the weight of the intermediate loss value Loss1, k ₂ is the weight of the intermediate loss value Loss2,..., k _M is the weight of the intermediate loss value LossM. For example, the initial values of k ₁ , k ₂ ...k _M are all set to 1.

Step S155: When the loss value does not meet the predetermined convergence condition, update the parameters of the shared neural network to be trained and the N functional neural networks based on the loss value.

For example, if the loss value satisfies the predetermined convergence condition, a trained functional neural network will be obtained.

For example, during the training process, the weight of each intermediate loss value when calculating the loss value in step S154 can be dynamically adjusted.

For example, N functional neural networks are trained at the same time during the training process. However, due to the uneven bias of the training data and the differences among the N functional neural networks, it is impossible for the N functional neural networks to converge at the same time, and it is even difficult for some networks to converge.

For example, in order to speed up the training process, the output accuracy of the shared neural network and N functional neural networks can be measured during the training process (for example, the output accuracy is measured once every 1/10 of the total number of training rounds), based on N The output accuracy of the functional neural network dynamically adjusts the M weights of the loss function.

For example, FIG. 11 is a schematic diagram of another example of step S150 in FIG. 9 . For example, the method shown in Figure 11 is an example of dynamically adjusting the M weights of the loss function by measuring the output accuracy of N functional neural networks.

For example, as shown in Figure 11, step S150 in Figure 9 may also include the following steps S156 to S1510.

Step S156: Obtain the test text;

Step S157: Use the trained shared neural network and the trained N functional neural networks to process the test text to obtain M sets of second intermediate results;

Step S158: Based on M sets of second intermediate results and test text, determine M output accuracies respectively corresponding to the trained N functional neural networks;

Step S159: Adjust the M weights corresponding to the M intermediate loss values based on the M output accuracies;

Step S1510: Continue to train the multiple functional neural networks to be trained according to the adjusted M weights.

For example, step S159 further includes: determining the weight corresponding to the maximum output accuracy among the M output accuracies as the first weight; keeping the first weight unchanged, and increasing the other M-1 weights except the first weight among the M weights. a weight.

For example, taking the shared neural network and multiple functional neural networks shown in Figure 3 or Figure 5 as an example, in step S156, the test text is the task text used in the neural network testing process.

For example, in step S157, the feature extraction of the shared neural networks S1 to S3 after training (for example, after 1/10 of the total number of training rounds) of the test text is converted into multiple test feature vectors, and the test shared features are included in Among multiple test feature vectors, the test shared features include character features of multiple characters in the test text and global connections between multiple characters.

For example, in step S158, based on the test text and M sets of second intermediate results respectively output by the N functional neural networks, M output accuracies P1, P2, ... respectively corresponding to the trained N functional neural networks are determined. ,PM.

For example, in step S159, the M weights k ₁ , k 2 ...k _M respectively corresponding to the M intermediate loss values are adjusted based on the M output accuracies P1, P2 _, ..., PM; in step S1510, based on The adjusted M weights continue to be trained on multiple functional neural networks to be trained.

Figure 12 is a schematic diagram of an example of a loss function of a shared neural network and multiple functional neural networks provided by at least one embodiment of the present disclosure. For example, the shared neural network and multiple functional neural networks in Figure 12 are, for example, the shared neural network and multiple functional neural networks in Figure 3 or Figure 5 .

For example, as shown in Figure 12, the four intermediate loss values include the intermediate loss value Loss1 corresponding to the first functional neural network S4, the intermediate loss value Loss2 corresponding to the second functional neural network S5, and the intermediate loss value corresponding to the intention recognition sub-network S6-1. The loss value Loss3 and the intermediate loss value Loss4 corresponding to the named entity recognition sub-network S6-2.

For example, as shown in Figure 12, the weighted sum of the four intermediate loss values is calculated as the loss value Loss. The loss functions of the shared neural network and the three functional neural networks trained in steps S151 to S154 can be expressed by the following formula (2) :

Loss＝a*Loss1+b*Loss2+c*Loss3+d*Loss4 (2)

Among them, a is the weight of the intermediate loss value Loss1, b is the weight of the intermediate loss value Loss2, c is the weight of the intermediate loss value Loss3, and d is the weight of the intermediate loss value Loss4. For example, the initial values of a, b, c, and d are all set to 1.

For example, in step S155, if the loss value Loss does not meet the predetermined convergence condition, the parameters of the shared neural networks S1 to S3 and the three functional neural networks to be trained are updated based on the loss value Loss.

For example, in order to speed up the training process, the output accuracy of the shared neural network and N functional neural networks can be measured during the training process (for example, the accuracy is measured once every 1/10 of the total training epochs), according to the N functional The output accuracy of the neural network dynamically adjusts the M weights of the loss function.

For example, three functional neural networks separately process the test shared features to obtain four sets of second intermediate results. The four sets of second intermediate results include the second intermediate results output by the first functional neural network S4 and the second intermediate results output by the second functional neural network S5. The second intermediate result, the second intermediate result output by the intent recognition sub-network and the second intermediate result output by the named entity recognition sub-network.

For example, for the first functional neural network S4 used to perform question-and-answer functional tasks, the second intermediate result output is a test sentence vector, and the test sentence vector includes the category information of the questions in the test text; for the first functional neural network S4 used to perform small talk functional tasks The second intermediate result output by the second functional neural network S5 is the test output sentence, and the test output sentence includes the system answer corresponding to the test text; for the third functional neural network S6 used to perform task-type functional tasks, the intent recognition sub-network output The second intermediate result is the test intention feature, and the second intermediate result output by the named entity recognition sub-network is at least one test named entity.

For example, in step S158, based on the test text and 4 sets of second intermediate results (i.e., test sentence vector, test output sentence, test intention feature, and at least one test named entity) respectively output by the three functional neural networks, it is determined that the test text corresponds to 4 output accuracies of 3 functional neural networks after training. For example, for the shared neural network and multiple functional neural networks as shown in Figure 12, the three output accuracies include the output accuracy P1 of the first functional neural network S4, the output accuracy P2 of the second functional neural network S5, and the intention The output accuracy P3 of the recognition sub-network S6-1 and the output accuracy P4 of the named entity recognition sub-network S6-2.

For example, in step S159, the four weights a, b, c, and d respectively corresponding to the four intermediate loss values are adjusted based on the four output accuracies P1, P2, P3, and P4.

For example, sort P1, P2, P3, and P4 from large to small. In some examples, such as P2>P1>P3>P4, that is, the maximum output accuracy among the 4 output accuracy is the second function. For the output accuracy P2 of the neural network S5, the weight b corresponding to the output accuracy P2 is used as the first weight; keep the first weight b unchanged, and increase the other three weights a, except the first weight b among the four weights. c, d.

For example, the three amplification factors α, β, and γ of the three weights a, c, and d can be determined based on the relationship between the three output accuracies corresponding to the three weights a, c, and d, that is, P1>P3>P4. . For example, for any one of the three output accuracy, the greater the response to any output accuracy, the smaller the amplification factor of the weight corresponding to the output accuracy. For example, for P1>P3>P4, α<β<γ can be determined, for example, α=0.5, β=1.0, and γ=1.5 are set.

For example, the above three weights are adjusted according to the amplification factors of the above three weights. For example, the four adjusted weights a’, b’, c’, and d’ can be expressed by the following formulas (3) to (6):

a′＝a*(1+α) (3)

b′＝b (4)

c′=c*(1+β) (5)

d′＝d*(1+γ) (6)

That is, the weight b corresponding to the maximum output accuracy P2 remains unchanged, and the weights a, c, and d corresponding to the other three output accuracy are expanded to 1.5 times, 2 times, and 2.5 times respectively.

It should be noted that the above description of the training process and testing process using the shared neural network and multiple functional neural networks shown in Figure 3, Figure 5 or Figure 12 is only an example (i.e. N=3, M=4 ), other corresponding N and M can also be selected according to actual needs or based on the number of functional neural networks, and the embodiments of the present disclosure do not limit this.

The natural language processing method provided by at least one embodiment of the present disclosure can accelerate the convergence of the neural network model during the training process by dynamically adjusting the weights of each jointly trained neural network during the training process, thereby reducing the training time.

Figure 13 is a schematic block diagram of a natural language processing device provided by at least one embodiment of the present disclosure.

For example, at least one embodiment of the present disclosure provides a natural language processing device. As shown in FIG. 13 , the natural language processing device 300 includes an acquisition module 310 , an extraction module 320 , a processing module 330 and a training module 340 .

For example, the acquisition module 310 is configured to acquire a task text to be subjected to natural language processing, and the task text includes a plurality of characters; that is, the acquisition module 310 can be configured to perform, for example, step S110 shown in FIG. 2 .

For example, the extraction module 320 is configured to use a shared neural network to perform feature extraction on the task text to obtain shared features of the task text. The shared features include character features of multiple characters and global connections between multiple characters; that is, the extraction module 320 It may be configured to perform, for example, step S120 shown in FIG. 2 .

For example, the processing module 330 is configured to input shared features into multiple functional neural networks to obtain multiple processing results respectively output by the multiple functional neural networks. The multiple functional neural networks are used to perform multiple different natural language processing tasks; that is, the multiple functional neural networks are used to perform multiple different natural language processing tasks; The processing module 330 may be configured to perform, for example, steps S30 to S50 shown in FIG. 6 .

For example, in the process of training a shared neural network and multiple functional neural networks, the acquisition module 310 is also configured to acquire training text; that is, the acquisition module 310 can also be configured to perform, for example, step S140 shown in Figure 9.

For example, the training module 340 is configured to train multiple functional neural networks to be trained based on the training text to obtain multiple trained functional neural networks. The number of multiple functional neural networks is N, where N is greater than 1. integer; that is, the measurement module 340 may be configured to perform, for example, step S150 shown in FIG. 9 .

Since the details of the content involved in the operation of the above-mentioned natural language processing device 300 have been introduced during the description of the natural language processing method shown in FIG. 2 and FIG. 9 , for the sake of brevity, they will not be described again here. For details, please refer to the above description about FIGS. 1 to 12 .

It should be noted that the above-mentioned modules in the natural language processing device 300 shown in FIG. 13 can be respectively configured as software, hardware, firmware or any combination of the above items to perform specific functions. For example, these modules may correspond to dedicated integrated circuits, pure software codes, or modules that combine software and hardware. As an example, the device described with reference to FIG. 13 may be a PC computer, a tablet device, a personal digital assistant, a smartphone, a web application, or other devices capable of executing program instructions, but is not limited thereto.

In addition, although the natural language processing device 300 is described above as being divided into modules for performing corresponding processing respectively, however, it is clear to those skilled in the art that the processing performed by each module can also be performed in the device without any specific module. Divided or there is no clear demarcation between modules. In addition, the natural language processing device 300 described above with reference to FIG. 13 is not limited to including the modules described above, but can also add some other modules (for example, storage module, data processing module, etc.) as needed, or the above modules can also be added. combination.

At least one embodiment of the present disclosure also provides an electronic device, the electronic device includes a processor and a memory; the memory includes one or more computer program modules; the one or more computer program modules are stored in the memory and configured to Executed by a processor, one or more computer program modules include a method for implementing the natural language processing method provided by the embodiments of the present disclosure described above.

FIG. 14 is a schematic block diagram of an electronic device according to at least one embodiment of the present disclosure.

For example, as shown in FIG. 14 , the electronic device 400 includes a processor 410 and a memory 420 . For example, memory 420 is used to store non-transitory computer-readable instructions (eg, one or more computer program modules). The processor 410 is configured to execute non-transitory computer readable instructions. When the non-transitory computer readable instructions are executed by the processor 410, they may perform one or more steps according to the natural language processing method described above. Memory 420 and processor 410 may be interconnected by a bus system and/or other forms of connection mechanisms (not shown).

For example, the processor 410 may be a central processing unit (CPU), a digital signal processor (DSP), or other forms of processing units with data processing capabilities and/or program execution capabilities, such as a field programmable gate array (FPGA), etc.; For example, the central processing unit (CPU) may be of X86 or ARM architecture. The processor 410 may be a general-purpose processor or a special-purpose processor that may control other components in the electronic device 400 to perform desired functions.

For example, memory 420 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache), etc. Non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable compact disk read-only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer program modules may be stored on the computer-readable storage medium, and the processor 410 may run the one or more computer program modules to implement various functions of the electronic device 400 . Various application programs and various data, as well as various data used and/or generated by the application programs, etc. can also be stored in the computer-readable storage medium.

It should be noted that in the embodiments of the present disclosure, for the specific functions and technical effects of the electronic device 400, reference can be made to the above description of the natural language processing method provided by at least one embodiment of the present disclosure, which will not be described again here.

FIG. 15 is a schematic block diagram of another electronic device provided by at least one embodiment of the present disclosure.

For example, as shown in FIG. 15 , the electronic device 500 is suitable for implementing the natural language processing method provided by the embodiment of the present disclosure. It should be noted that the electronic device 500 shown in FIG. 15 is only an example, which does not bring any limitations to the functions and scope of use of the embodiments of the present disclosure.

For example, as shown in FIG. 15 , the electronic device 500 may include a processing device (such as a central processing unit, a graphics processor, etc.) 51 , which may include a natural language processing device according to any embodiment of the present disclosure, and may Various appropriate actions and processes are performed based on the program stored in the read-only memory (ROM) 52 or loaded from the storage device 48 into the random access memory (RAM) 53 . In the RAM 53, various programs and data required for the operation of the temperature error detection device 500 are also stored. The processing device 51, the ROM 52 and the RAM 53 are connected to each other via a bus 54. An input/output (I/O) interface 55 is also connected to bus 54 . Generally, the following devices may be connected to the I/O interface 55: input devices 56 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 57 such as a computer; a storage device 58 including a magnetic tape, a hard disk, etc.; and a communication device 59. The communication device 59 may allow the temperature error detection device 500 to communicate wirelessly or wiredly with other electronic devices to exchange data.

Although FIG. 15 illustrates electronic device 500 having various means, it should be understood that implementation or provision of all illustrated means is not required and electronic device 500 may alternatively implement or be provided with more or fewer means.

For detailed description and technical effects of the electronic device 500, please refer to the relevant description of the natural language processing method above, which will not be described again here.

For example, as shown in FIG. 16, storage medium 600 is used to store non-transitory computer-readable instructions 610. For example, the non-transitory computer readable instructions 610, when executed by a computer, may perform one or more steps in the natural language processing method described above.

For example, the storage medium 600 can be applied to the above-mentioned electronic device 400. For example, the storage medium 600 may be the memory 420 in the electronic device 400 shown in FIG. 15 . For example, the relevant description of the storage medium 600 may refer to the corresponding description of the memory 420 in the electronic device 400 shown in FIG. 15 , which will not be described again here.

Regarding this disclosure, the following points need to be explained:

(1) In the drawings of the embodiments of the present disclosure, only the structures related to the embodiments of the present disclosure are involved, and other structures may refer to common designs.

(2) Features in the same embodiment and different embodiments of the present disclosure can be combined with each other without conflict.

The above are only specific embodiments of the present disclosure, but the protection scope of the present disclosure is not limited thereto. Any person familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present disclosure, and all of them should be covered. within the scope of this disclosure. Therefore, the protection scope of the present disclosure should be subject to the protection scope of the claims.

Claims

A natural language processing method that includes:

Obtain the task text to be performed on the natural language processing, wherein the task text includes a plurality of characters;

Use a shared neural network to perform feature extraction on the task text to obtain shared features of the task text, where the shared features include character features of the multiple characters and global connections between the multiple characters;

The shared features are input into multiple functional neural networks to obtain multiple processing results respectively output by the multiple functional neural networks, where the multiple functional neural networks are used to perform multiple different natural language processing tasks respectively.
The natural language processing method according to claim 1, wherein the shared neural network includes an input sub-network, a word embedding sub-network and a feature extraction sub-network,

The use of the shared neural network to perform feature extraction on the task text to obtain the shared features of the task text includes:

The input sub-network is used to convert the task text into a word index array, wherein the plurality of index values included in the word index array correspond to the plurality of characters one-to-one;

The word index array is encoded into a plurality of word vectors using the word embedding sub-network, wherein the plurality of word vectors correspond to the plurality of characters one-to-one, and each word vector in the plurality of word vectors Includes character characteristics of corresponding characters;

Based on the plurality of word vectors, the feature extraction sub-network is used to extract the global connection between the plurality of characters to obtain the shared features.
The natural language processing method according to claim 2, wherein the feature extraction sub-network includes a convolutional neural network and a long short-term memory network.
The natural language processing method according to any one of claims 1 to 3, wherein the natural language processing task includes a question and answer type functional task, and the question and answer type functional task is used to parse questions in the task text, giving Provide the corresponding answer to the question,

The plurality of functional neural networks include a first functional neural network, the first functional neural network is used to perform the question and answer functional task,

The step of inputting the shared features into the multiple functional neural networks to obtain the multiple processing results respectively output by the multiple functional neural networks includes:

Using the first functional neural network to perform first processing on the shared features to obtain a sentence vector, wherein the sentence vector includes category information of the question in the task text;

The sentence vector is compared with multiple knowledge information vectors pre-stored in the database, so that the answer corresponding to the knowledge information vector with the smallest vector distance from the sentence vector among the multiple knowledge information vectors is used as the answer corresponding to the The processing result of the first functional neural network.
The natural language processing method according to claim 4, wherein the first processing includes convolution processing, pooling processing, feature fusion processing and fully connected processing.
The natural language processing method according to any one of claims 1 to 5, wherein the natural language processing task includes a chat-type functional task, and the chat-type functional task is used to parse purposelessness in the task text. Dialogue information, giving the system answer corresponding to the purposeless dialogue information,

The plurality of functional neural networks include a second functional neural network, the second functional neural network is used to perform the chat-type functional task,

The step of inputting the shared features into the multiple functional neural networks to obtain the multiple processing results respectively output by the multiple functional neural networks includes:

The second functional neural network is used to perform a second process on the shared features to obtain an output sentence as a processing result corresponding to the second functional neural network, and the processing result corresponding to the second functional neural network is As the system answer corresponding to the task text.
The natural language processing method according to claim 6, wherein the second functional neural network includes an encoding sub-network and a decoding sub-network,

Using the second functional neural network to perform the second processing on the shared features to obtain the output sentence as a processing result corresponding to the second functional neural network includes:

Using the encoding sub-network to encode the shared features to obtain an intermediate index array;

The decoding sub-network is used to decode the intermediate index array to obtain the output sentence as a processing result corresponding to the second functional neural network.
The natural language processing method according to any one of claims 1 to 7, wherein the natural language processing task includes a task-type functional task, and the task-type functional task is used to parse task purpose information in the task text. and task keyword information, and obtain system questioning or question and answer results based on the task purpose information and the task keyword information,

The plurality of functional neural networks include a third functional neural network, the third functional neural network is used to perform the task-type functional task,

The step of inputting the shared features into the multiple functional neural networks to obtain the multiple processing results respectively output by the multiple functional neural networks includes:

Using the third functional neural network to perform third processing on the shared features, obtain intent features and at least one named entity corresponding to the task text, wherein the intent features include the task in the task text Purpose information, the at least one named entity contains the task keyword information;

Perform dialogue management on the intent feature and the at least one named entity, and obtain the system questioning or the question and answer result as a processing result corresponding to the third functional neural network.
The natural language processing method according to claim 8, wherein the third functional neural network includes an intent recognition sub-network and a named entity recognition sub-network,

Using the third functional neural network to perform the third processing on the shared features to obtain the intended features and the at least one named entity corresponding to the task text includes:

Utilize the intent recognition sub-network to perform intent recognition based on the shared features to obtain the intent features corresponding to the task text;

The named entity recognition sub-network is used to perform named entity recognition based on the shared characteristics to obtain the at least one named entity corresponding to the task text.
The natural language processing method according to any one of claims 1 to 9, wherein the obtaining the task text to be subjected to the natural language processing includes:

Obtain the speech segments to be subjected to the natural language processing;

Convert the voice clip into text form to obtain the task text.
The natural language processing method according to any one of claims 1-10, further comprising:

Select one processing result from the plurality of processing results as the output result of the natural language processing through arbitration selection.
The natural language processing method according to any one of claims 1-11, before obtaining the task text corresponding to the natural language, further includes:

Get training text;

Based on the training text, multiple functional neural networks to be trained are trained to obtain the multiple trained functional neural networks, where the number of the multiple functional neural networks is N, and N is an integer greater than 1. ,

In the process of training the N functional neural networks to be trained, the N functional neural networks are trained simultaneously, and the weighted sum of the M intermediate loss values corresponding to the N functional neural networks is calculated as the loss value to update the The parameters of the N functional neural networks, the M intermediate loss values respectively correspond to M weights, the M weights are dynamically adjusted according to the output accuracy of the N functional neural networks, M is an integer greater than or equal to N .
The natural language processing method according to claim 12, wherein the N functional neural networks include a first functional neural network, a second functional neural network and a third functional neural network, and the third functional neural network includes intention recognition subnetworks and named entity recognition subnetworks,

The training of the multiple functional neural networks to be trained based on the training text includes:

Use the shared neural network to be trained to perform feature extraction on the training text to obtain the training shared features of the training text;

The training shared features are separately processed using the N functional neural networks to obtain M sets of first intermediate results respectively output by the N functional neural networks, wherein the M sets of first intermediate results include the The first intermediate result output by a functional neural network, the first intermediate result output by the second functional neural network, the first intermediate result output by the intention recognition sub-network and the first intermediate result output by the named entity recognition sub-network result.
The natural language processing method according to claim 13, wherein said training the plurality of functional neural networks to be trained based on the training text further includes:

Calculate M intermediate loss values corresponding to the N functional neural networks based on the training text and the M sets of first intermediate results, where the M intermediate loss values include intermediate loss values corresponding to the first functional neural network The loss value, the intermediate loss value corresponding to the second functional neural network, the intermediate loss value corresponding to the intention recognition sub-network, and the intermediate loss value corresponding to the named entity recognition sub-network;

Calculate the weighted sum of the M intermediate loss values as the loss value;

When the loss value does not satisfy the predetermined convergence condition, the parameters of the shared neural network to be trained and the N functional neural networks are updated based on the loss value.
The natural language processing method according to any one of claims 13 or 14, wherein said training the plurality of functional neural networks to be trained based on the training text further includes:

Get test text;

Use the trained shared neural network and the trained N functional neural networks to process the test text to obtain M sets of second intermediate results;

Based on the M sets of second intermediate results and the test text, M output accuracies respectively corresponding to the trained N functional neural networks are determined, wherein the M output accuracies include the first The output accuracy of the functional neural network, the output accuracy of the second functional neural network, the output accuracy of the intent recognition sub-network, and the output accuracy of the named entity recognition sub-network;

Adjust the M weights respectively corresponding to the M intermediate loss values based on the M output accuracies;

Continue to train the multiple functional neural networks to be trained according to the adjusted M weights.
The natural language processing method according to claim 15, wherein the adjusting the M weights respectively corresponding to the M intermediate loss values based on the M output accuracies includes:

Determine the weight corresponding to the maximum output accuracy among the M output accuracy as the first weight;

Keep the first weight unchanged, and increase the other M-1 weights among the M weights except the first weight.
The natural language processing method according to claim 16, wherein said increasing M-1 weights other than the first weight among the M weights includes:

According to the size relationship of the M-1 output accuracies corresponding to the M-1 weights, M-1 amplification factors of the M-1 weights are determined, wherein, for the M-1 output accuracies, In response to any output accuracy, the greater the accuracy of any output, the smaller the amplification factor of the weight corresponding to any output accuracy;

The M-1 weights are adjusted according to the amplification factors of the M-1 weights.
An electronic device including:

processor;

memory, including one or more computer program modules,

Wherein, the one or more computer program modules are stored in the memory and configured to be executed by the processor, and the one or more computer program modules are used to implement any one of claims 1-17 natural language processing methods.
A storage medium used to store non-transitory computer-readable instructions. When the non-transitory computer-readable instructions are executed by a computer, the natural language processing method described in any one of claims 1-17 can be implemented.