CN113887237A

CN113887237A - Slot position prediction method and device for multi-intention text and computer equipment

Info

Publication number: CN113887237A
Application number: CN202111150339.6A
Authority: CN
Inventors: 郭永亮
Original assignee: Ping An Puhui Enterprise Management Co Ltd
Current assignee: Ping An Puhui Enterprise Management Co Ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2022-01-04

Abstract

The embodiment of the application belongs to the field of artificial intelligence and relates to a slot position prediction method for a multi-intention text, which comprises the following steps: acquiring a text and preprocessing the text to obtain a text sequence; inputting the text sequence into a first neural network model, and generating a word embedding vector sequence corresponding to the text sequence; identifying, by a second neural network model, a set of intentions for the text sequence from the word embedding vector sequence; and constructing a third neural network model based on the intention set, establishing the connection between the intention set and the slot positions through the third neural network model, and predicting the slot position sequence of the text. The application also provides a slot position prediction device of the multi-intention text, computer equipment and a storage medium. The application also relates to a block chain technology, and the text information can be stored in the block chain. The method and the device are applied to the intelligent dialogue system for language understanding, and accuracy of system language understanding is improved.

Description

Slot position prediction method and device for multi-intention text and computer equipment

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a slot position prediction method and apparatus for a multi-intent text, a computer device, and a storage medium.

Background

In the background of economic globalization, the rapid development of artificial intelligence technology greatly promotes the progress of intelligent dialogue systems. The conversation system is widely applied in various industries, a large number of conversation systems such as intelligent customer service, intelligent sales, intelligent assistants and the like are developed, the application of the systems effectively improves the production efficiency and facilitates the life of people. In most existing dialog system design frameworks, language understanding is a very important component, and the language understanding includes two parts: the method comprises the steps of identifying the intentions and predicting and filling the slot positions, wherein the slot positions are key information of the identified text intentions and need to be understood deeply to finish instructions of a user, for example, the user wants to order an airplane ticket, and much necessary information needs to be known; of this information, the most core, most critical content (e.g., time, destination) can be designed into slots for specific identification, i.e., identifying the slot content expressed by the user as structured information, such as time, geographic location, etc.

Most of the existing language understanding methods are provided for single intention, but in the actual conversation process, expression of multiple intentions also occurs, and the single intention language understanding method cannot effectively process the multiple intentions; the existing multi-intention language understanding method is mainly used for recognizing intentions and does not predict and fill slot positions, obviously, the method cannot completely meet the requirement of language understanding, so that the connection between the multi-intention language understanding method and the slot positions needs to be established, the function of predicting and filling the slot positions is perfected, and the problem of low language understanding accuracy in the existing intelligent dialogue system is solved.

Disclosure of Invention

The embodiment of the application aims to provide a slot position prediction method, a slot position prediction device, computer equipment and a storage medium for a multi-intention text, and mainly aims to dynamically establish the relation between multi-intention and slot positions and accurately predict the slot positions, so that the accuracy of language understanding in an intelligent dialogue system is improved.

In order to solve the above technical problem, an embodiment of the present application provides a slot position prediction method for a multi-purpose text, which adopts the following technical solutions:

acquiring a text and preprocessing the text to obtain a text sequence;

inputting the text sequence into a first neural network model, and generating a word embedding vector sequence corresponding to the text sequence;

identifying, by a second neural network model, a set of intentions for the text sequence from the word embedding vector sequence;

and constructing a third neural network model based on the intention set, establishing the connection between the intention set and the slot positions through the third neural network model, and predicting the slot position sequence of the text.

Further, the building a third neural network model based on the intention set includes the steps of:

determining the intention node according to the intents in the intention set;

determining the number of layers according to task needs, and setting a state node on each layer to obtain state nodes of each layer;

and connecting the intention nodes with each other, connecting each intention node with each layer of state nodes, and connecting the state nodes of each layer successively to form the third neural network model.

Further, the step of establishing a connection between the intention set and the slot through the third neural network model and predicting the slot sequence of the text specifically includes:

initializing an intent node and a first level state node of the third neural network model;

and performing connection calculation on the initialized intention node and the first-layer state node, performing connection calculation on the calculation result and the next-layer state node until the last layer of the third neural network model is reached, and taking the output of the last layer as the predicted slot position sequence.

Further, the initializing the intention nodes and the first level state nodes of the third neural network model comprises:

acquiring all intention nodes of the third neural network model and coding the intention nodes into intention vectors;

and inputting the word embedding vector sequence into a fourth neural network model, and taking an output vector of the fourth neural network model as a first-layer state node of the third neural network model.

Further, the first neural network model comprises a pre-trained BERT model, and the step of inputting the text sequence into the first neural network model further comprises pre-training the BERT model:

marking and randomly masking the acquired text sequence to obtain training data;

and inputting the training data into the constructed BERT model for training, and performing fine adjustment to obtain the pre-trained BERT model.

Further, the second neural network model comprises a multi-label intention classifier, and before the step of identifying the intention set of the text sequence from the word embedding vector sequence through the second neural network model, the method further comprises:

acquiring text data marked with a slot position label and performing sequence processing to obtain a word embedding vector sequence;

inputting the word embedding vector sequence into a multi-label intention classifier to obtain an intention set comprising a plurality of intentions;

constructing a second neural network model and a third neural network model, and constructing a combined loss function through the loss functions of the second neural network model and the third neural network model;

jointly training the second and third neural network models through the set of intents and the joint loss function.

Further, the constructing the joint loss function through the loss functions of the second neural network model and the third neural network model includes:

assigning weights to the loss functions of the second neural network model and the loss functions of the third neural network model;

multiplying the loss functions of the second neural network model and the third neural network model by corresponding weights and then adding the multiplied loss functions to obtain a combined loss function;

and optimizing the joint loss function through a back propagation algorithm.

In order to solve the above technical problem, an embodiment of the present application further provides a slot position prediction apparatus for a multi-purpose text, which adopts the following technical solutions:

the slot position prediction device of the multi-intention text comprises:

the preprocessing module is used for acquiring a text and preprocessing the text to obtain a text sequence;

the generating module is used for inputting the text sequence into a first neural network model and generating a word embedding vector sequence corresponding to the text sequence;

an identification module for identifying a set of intentions of the text sequence from the word embedding vector sequence by a second neural network model;

and the prediction module is used for constructing a third neural network model based on the intention set, establishing the connection between the intention set and the slot positions through the third neural network model and predicting the slot position sequence of the text.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

the computer equipment comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the slot prediction device method of the multi-intention text when executing the computer program.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:

the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the multi-intent text slot prediction apparatus method.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects: acquiring a text and preprocessing the text to obtain a text sequence; inputting the text sequence into a first neural network model, and generating a word embedding vector sequence corresponding to the text sequence; identifying, by a second neural network model, a set of intentions for the text sequence from the word embedding vector sequence; and constructing a third neural network model based on the intention set, establishing the connection between the intention set and the slot positions through the third neural network model, and predicting the slot position sequence of the text. The method comprises the steps that vectorization processing is carried out on a text sequence obtained through preprocessing through a first neural network model, a word embedding vector sequence rich in context semantic information is obtained, a second neural network model is facilitated to recognize an intention set of a more accurate text sequence from the word embedding vector sequence, a third neural network model is further built based on the intention set, the explicit connection between the intention set and a slot position can be dynamically built, finally, a slot position sequence value of the text can be accurately predicted, and the method is applied to an intelligent dialogue system for language understanding, so that the accuracy of language understanding of the intelligent dialogue system is improved.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is an exemplary architecture diagram of an intelligent dialog system in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a slot prediction method for multi-intent text according to the present application;

FIG. 3 is a flow diagram for one embodiment of step 204 of FIG. 2;

FIG. 4 is a schematic structural diagram of one embodiment of a third neural network model in accordance with the present application;

FIG. 5 is a flow diagram of another embodiment of step 204 of FIG. 2;

FIG. 6 is a flowchart of one embodiment of step 2044 of FIG. 5;

FIG. 7 is a schematic structural diagram of one embodiment of a slot prediction apparatus for multi-intent text according to the present application;

FIG. 8 is a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, fig. 1 is an architecture diagram of an intelligent dialog system that may be used in the present application, and the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use

terminal devices

101, 102, 103 to interact with a server 105 over a network 104 to receive or send messages, data, etc. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications APP, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the

terminal devices

101, 102, 103. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform.

It should be noted that the slot position prediction method for the multi-intent text provided in the embodiments of the present application may be applied to an intelligent dialog system, and is generally executed by a server/terminal device, and accordingly, the slot position prediction apparatus for the multi-intent text is generally disposed in the server/terminal device.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, fig. 2 illustrates a flow diagram of one embodiment of a method of slot prediction of multi-intent text according to the present application. The slot position prediction method of the multi-intention text comprises the following steps:

step 201, acquiring a text and preprocessing the text to obtain a text sequence.

In this embodiment, an electronic device (for example, a server/terminal device of the intelligent dialog system shown in fig. 1) on which the slot prediction method for the multi-intent text runs may obtain the text information through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

The text can be obtained by reading the text information directly input and stored in the system by the intelligent dialogue system 100, or can be obtained from other electronic devices in a file uploading manner by the wired connection manner or the wireless connection manner. Through a wired connection mode or a wireless connection network mode, the text information of a plurality of electronic devices in the system can be remotely and simultaneously acquired, and the data transmission capability of the system is improved.

It should be emphasized that, in order to further ensure the privacy and security of the obtained text message, the text message may also be stored in a node of a block chain. The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The text may be a character text (such as chinese, english, etc.) directly including one or more sentences, or may be generated by converting speech acquired by the electronic device, and the text may include one or more intentions to be recognized. The text may need to be preprocessed before parsing, e.g. by dividing the sentences in the text, adding a start to the text sentenceSign information, etc., and convert each word of the text into a token sequence t of the text₁,t₂,…,t_nObtaining the text sequence T ═ T₁,t₂,…,t_nIf the acquired text comprises two sentences of "my dog is cut" and "he like compressing", the text can be preprocessed into { [ CLS ]]，my，dog，is，cute，[SEP]，he，likes，playing，[SEP]The text sequence.

Step 202, inputting the text sequence into a first neural network model, and generating a word embedding vector sequence corresponding to the text sequence.

In this embodiment, the preprocessed text sequence is subjected to sequence vectorization by the first neural network model, so as to generate a token sequence t corresponding to the text sequence₁,t₂,…,t_nCorresponding word-embedded vector sequence s containing context information₁,s₂,…,s_n。

Specifically, the first neural network model includes a pre-trained BERT model, and before the step of inputting the text sequence into the first neural network model, the method further includes:

The BERT model is pre-trained on a large amount of linguistic data, learned knowledge can be used in specific tasks, such as language understanding in man-machine conversation, and the performance of the tasks is improved; the BERT model also has the advantage of bidirectional depth, and the bidirectional design mechanism enables the model to simultaneously consider the text information before and after the current token, output the semantic representation of the context information contained in the current token, enables the model to capture various complex semantic relationships, generates better semantic expression, and can better generate a word embedding vector sequence s with rich semantics₁,s₂,…,s_nFor example, the above-mentioned T ═ { T ═ T can be used₁,t₂,…,t_nGeneration of text sequence carrying a singleWord vector sequence S ═ S of key information such as word position information, sentence to which the word belongs, and word encoding matrix₁,s₂,…,s_nAnd therefore, more useful text context semantic information can be provided for other models of the intelligent dialogue system, so that the intelligent dialogue system can more accurately understand text content.

The training of the BERT model is mainly divided into two stages: a pre-training phase and a Fine-tuning phase. Pre-training task 1: the first pre-training task of BERT is a random mask LM which randomly covers a part of words in a sentence and then predicts the covered words by using context information, so that the meaning of the words can be better understood according to the full text; pre-training task 2: the second pre-training task of BERT is Next Sequence Prediction (NSP), which is a Next Sentence Prediction task that mainly allows the model to better understand the relationship between sentences.

The Fine-tuning stage is used for carrying out Fine tuning when the Fine-tuning stage is subsequently applied to some downstream tasks, such as text classification, part of speech tagging, question-answering system and the like.

Step 203, identifying an intention set of the text sequence from the word embedding vector sequence through a second neural network model.

Further, the second neural network model includes a multi-label intention classifier, the word including multiple intentions output by the BERT model is embedded into a vector sequence S, and the multi-label intention classifier is input into the second neural network model, so that an intention set I ═ I { I } including multiple intentions of the text can be obtained₁,…,i_m}。

Wherein the intention classifier is a multi-label classification neural network model, which can be expressed by formula (1):

y^I＝sigmoid(W_i(LeakyReLU(W_sS+b_s))+b_i) (1)

wherein, W_i，W_s，b_iAnd b_sThe weight and the offset of the model are trainable parameters, and LeakyReLU and sigmoid are activation functions;

categorized intent tag set

Wherein n is_IIs the number of single intents. According to y^IThe predicted set of intents I ═ I can be found₁,…,i_mM is the predicted number of intents, specifically, the nth intention label value in the classified intention label set

(e.g., 0.5), its corresponding nth intention i_nIn set I.

And 204, constructing a third neural network model based on the intention set, establishing the connection between the intention set and the slot positions through the third neural network model, and predicting the slot position sequence of the text.

And the third neural network model can be dynamically constructed according to the intention set obtained by the identification, intention nodes are formed through each intention in the intention set, the intention nodes are connected with each other pairwise, the intention nodes are connected with the states of each layer of the model to obtain the third neural network model, and then the output of the last layer of the model is used as the predicted value of the slot position, so that the relation between the intention set and the slot position is dynamically established.

Further, as shown in fig. 3, the building of the third neural network model based on the intent set includes the following steps:

step 2041, determining the intention node according to the intentions in the intention set;

step 2042, determining the number of layers according to task needs, and setting a state node on each layer to obtain state nodes of each layer;

and 2043, connecting the intention nodes with each other, connecting each intention node with each layer of state nodes, and connecting the state nodes of each layer successively to form the third neural network model.

In the embodiment of the inventionIn step 203, an intention set I ═ { I ═ is identified by the intention classifier₁,…,i_mAfter the step of calculating the number of the intention nodes of the third neural network model, determining the number of the intention nodes of the third neural network model according to the predicted number of the intentions in the intention set I, taking the coded value of each intention as the initial value of each intention node, connecting all the intention nodes pairwise, determining the layer number of the third neural network model according to task requirements, and setting a state node s at each layer_tRepresenting the state of the corresponding network hierarchy, and connecting all the intention nodes with each state node, connecting the state node of the first layer of the model with the state node of the second layer, connecting the state node of the second layer of the model with the state node of the third layer of the model, and successively connecting until the state node of the last layer of the model, and the models required by different tasks are different, that is, the parameters of the structure and the layer number of the model are different, for example, as shown in fig. 4, fig. 4 shows a schematic structural diagram of an embodiment of a third neural network model according to the present application, in the present application, only the intention number m is 2 and the state node of the model is 3 as an example, but not limited thereto; that is, when the number of intents m is 2 and the state node of the model is 3, it is determined that the third neural network model includes a three-layer network and two intention nodes

Each layer includes a status node

l represents the number of layers of the model, two intention nodes

And a status node

Are connected with each other pairwise, and then the state nodes of the first layer are connected

And state nodes of the second layer

Connecting the state nodes of the second layer

And state nodes of the third layer

Connected until the last layer of the model.

Further, as shown in fig. 5, the step of establishing a connection between the intent set and the slot through the third neural network model and predicting the slot sequence of the text specifically includes:

step 2044, initializing the intention nodes and first-layer state nodes of the third neural network model;

step 2045, performing connection calculation on the initialized intention node and the first-layer state node, performing connection calculation on the calculation result and the next-layer state node until the last layer of the third neural network model is reached, and taking the output of the last layer as the predicted slot position sequence.

Specifically, the obtained value of the intention node of the third neural network model may be vectorized and encoded, and then the calculation result may be calculated by connecting with the initialized state node of the first layer, and then the calculation result may be calculated by connecting with the state node of the second layer, until the last layer of the model.

Further, as shown in fig. 6, in the step 2044, the step of initializing the intention node and the first-layer state node of the third neural network model includes:

step 20441, acquiring all intention nodes of the third neural network model, and encoding into intention vectors;

step 20442, inputting the word embedding vector sequence to a fourth neural network model, and using the output vector of the fourth neural network model as the first-layer state node of the third neural network model.

In the present embodiment, forIn a third neural network model with N intention nodes, a layer network characterizes the intention nodes

Taking the input as an input, performing connection calculation by pairwise intention nodes, and then outputting more abstract intention node characteristics

The specific calculation process is shown in the following formula 2:

wherein the content of the first and second substances,

representing an intermediate result of the calculation of the intended inter-node connection,

representing an intention node i, | | represents the connection operation of the intention node, namely splicing node vectors, sigma is an activation function, a and W_hAre trainable weight parameters.

Firstly, all intention nodes of the third neural network model are obtained at a certain time t, and the values of all the intention nodes are coded into corresponding intention vectors through matrixes

Thereby carrying out the initialization of the intention node; then, the word embedding vector sequence obtained in the step 202 is input to a fourth neural network model, and an output vector of the fourth neural network model is used as a state vector of the first layer state node of the third neural network model

The fourth neural network model comprises a pre-trained unidirectional LSTM network (long-short memory network), and the word embedded vector sequence is not directly used as the state vector of the first layer of state nodes of the third neural network model, so that the LSTM network can be used for fully extracting the context information in the word embedded vector sequence, and the accuracy of the model is further improved. The initialization inputs of the third neural network at time t can be obtained as follows:

then using equation (2), the expression of the ith node at layer l can be calculated as:

sequentially calculating the representation of each layer of the third neural network according to the formulas (2), (3) and (4) until the representation of the last layer of the network is obtained

Finally, the slot sequence value o of the corresponding text can be obtained by the following equations (5) and (6)_t：

o_t＝argmax(y_t) (6)

The method for identifying the intention set of the text sequence from the word embedding vector sequence by using the second neural network model includes the steps of:

acquiring text data marked with a slot position label as a training data set, and performing sequence processing to obtain a word embedding vector sequence;

Wherein, the second neural network model is an intention recognition model, and the loss function thereof is:

the third neural network model is a slot position prediction model, and the loss function of the third neural network model is as follows:

wherein the content of the first and second substances,

representing a label, I being a set of intents, k representing a single intention, n_INumber of labels representing a single intention, n_sIndicates the number of slot tags, M indicates the text length,

i.e. o_t。

and optimizing the joint loss function through a back propagation algorithm.

I.e. the joint loss function can be expressed as:

L＝γL₁+(1-γ)L₂ (9)

the model can be trained through a back propagation algorithm to continuously optimize the formula (9), a combined model capable of performing slot prediction and filling based on the multi-intention text is obtained through optimization training, and then the combined model is applied to the intelligent dialogue system to perform language understanding, so that the accuracy rate of the language understanding of the intelligent dialogue system can be improved, and the user experience is improved.

In summary, the text sequence obtained by the preprocessing is vectorized by the first neural network model to obtain the word embedded vector sequence rich in context semantic information, which is helpful for the second neural network model to identify the more accurate intention set of the text sequence from the word embedded vector sequence, and further, the third neural network model is constructed based on the intention set, so that the explicit connection between the intention set and the slot position can be dynamically established, and after the joint training, the slot position sequence value of the text can be finally accurately predicted, and the method is applied to the intelligent dialog system for language understanding, so that the accuracy of the language understanding of the intelligent dialog system can be improved.

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

With further reference to fig. 7, as an implementation of the slot prediction method for the multi-intent text shown in fig. 2, the present application provides an embodiment of a slot prediction apparatus for a multi-intent text, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied to various electronic devices.

As shown in fig. 7, the slot prediction apparatus 300 for a multi-intent text according to the present embodiment includes: a preprocessing module 301, a generating module 302, a recognition module 303, and a prediction module 304. Wherein:

the preprocessing module 301 is configured to obtain a text and preprocess the text to obtain a text sequence;

a generating module 302, configured to input the text sequence into a first neural network model, and generate a word embedding vector sequence corresponding to the text sequence;

an identifying module 303, configured to identify, through a second neural network model, a set of intentions of the text sequence from the word embedding vector sequence;

and the prediction module 304 is configured to construct a third neural network model based on the intention set, establish a connection between the intention set and the slot through the third neural network model, and predict a slot sequence of the text.

Further, the prediction module 304 comprises:

a determination submodule for determining the intent node from the intents in the intent set;

the setting submodule is used for determining the number of layers according to task needs, and each layer is provided with a state node to obtain state nodes of each layer;

and the connection submodule is used for connecting the intention nodes with each other and connecting each intention node with each layer of state nodes, and the state nodes of each layer are connected successively to form the third neural network model.

Further, the prediction module 304 further includes:

an initialization module to initialize an intent node and a first level state node of the third neural network model;

and the calculation module is used for performing connection calculation on the initialized intention node and the first-layer state node, performing connection calculation on the calculation result and the next-layer state node until the last layer of the third neural network model is reached, and taking the output of the last layer as the predicted slot position sequence.

Further, the initialization module includes:

the acquisition submodule is used for acquiring all intention nodes of the third neural network model and encoding the intention nodes into intention vectors;

and the output sub-module is used for inputting the word embedding vector sequence into a fourth neural network model and taking an output vector of the fourth neural network model as a first-layer state node of the third neural network model.

Further, the second neural network model comprises a multi-label intention classifier, and the prediction apparatus 300 further comprises:

the acquisition module is used for acquiring the text data marked with the slot position label and performing sequence processing to obtain a word embedding vector sequence;

the input module is used for embedding the words into the vector sequence and inputting the words into the multi-label intention classifier to obtain an intention set comprising a plurality of intentions;

the building module is used for building a second neural network model and a third neural network model and building a combined loss function through the loss functions of the second neural network model and the third neural network model;

Further, the building module comprises:

an assigning module for assigning weights to the loss functions of the second neural network model and the third neural network model;

the obtaining module is used for multiplying the loss functions of the second neural network model and the third neural network model by corresponding weights and then adding the multiplied loss functions to obtain a combined loss function;

and the optimization module is used for optimizing the joint loss function through a back propagation algorithm.

The slot position prediction device of the multi-intention text provided by the embodiment of the application can realize the implementation modes in the method embodiments of fig. 2 to 6 and the corresponding beneficial effects, that is, the first neural network model of the generating module 302 is used for vectorizing the text sequence preprocessed by the preprocessing module 301 to obtain a word embedded vector sequence rich in context semantic information, which is helpful for the second neural network model of the recognizing module 303 to recognize an intention set of a more accurate text sequence from the word embedded vector sequence, and further the third neural network model of the predicting module 304 is constructed based on the intention set, so that an explicit relationship between the intention set and slots can be dynamically established, and finally, slot sequence values of the text can be accurately predicted, the whole device is applied to the intelligent dialogue system for language understanding, so that the accuracy of language understanding of the intelligent dialogue system is improved.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 8, fig. 8 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 8 comprises a memory 81, a processor 82, a network interface 83 communicatively connected to each other via a system bus. It is noted that only computer device 8 having components 81-83 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 81 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 81 may be an internal storage unit of the computer device 8, such as a hard disk or a memory of the computer device 8. In other embodiments, the memory 81 may also be an external storage device of the computer device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device 8. Of course, the memory 81 may also comprise both an internal storage unit of the computer device 8 and an external storage device thereof. In this embodiment, the memory 81 is generally used for storing an operating system installed in the computer device 8 and various types of application software, such as program codes of a slot prediction method for multi-intent text. Further, the memory 81 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 82 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 82 is typically used to control the overall operation of the computer device 8. In this embodiment, the processor 82 is configured to execute the program code stored in the memory 81 or process data, for example, execute the program code of the slot prediction method for the multi-intent text.

The network interface 83 may comprise a wireless network interface or a wired network interface, and the network interface 83 is generally used for establishing communication connections between the computer device 8 and other electronic devices.

By running the program of the slot prediction method for the multi-intention text stored in the memory 81 by the processor 82, the first neural network model running on the processor 82 can perform vectorization processing on the text sequence obtained by preprocessing to obtain a word embedding vector sequence rich in context semantic information, which is helpful for the second neural network model running on the processor 82 to identify an intention set of a more accurate text sequence from the word embedding vector sequence, and further, a third neural network model running on the processor 82 is constructed based on the intention set, so that an explicit relationship between the intention set and the slot can be dynamically established, and finally, the slot sequence value of the text can be accurately predicted, and the slot sequence value can be applied to an intelligent dialog system for language understanding, thereby improving the accuracy of language understanding of the intelligent dialog system.

The present application provides yet another embodiment, which provides a computer-readable storage medium storing a multi-intended text slot prediction program, which is executable by at least one processor to cause the at least one processor to perform the steps of the multi-intended text slot prediction method as described above. The computer-readable storage medium provided by the application can effectively store the slot prediction program of the multi-intention text, is convenient for the processor to quickly load and execute, and improves the execution efficiency of the slot prediction program of the multi-intention text.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A slot position prediction method of multi-intention text is characterized by comprising the following steps:

acquiring a text and preprocessing the text to obtain a text sequence;

2. The method of claim 1, wherein said building a third neural network model based on said set of intents comprises the steps of:

determining an intention node according to the intents in the intention set;

3. The method of claim 2, wherein the establishing, by the third neural network model, the connection between the intent set and the slots, and the predicting the sequence of slots of text specifically comprises:

4. The method of claim 3, in which the initializing the intent nodes and first level state nodes of the third neural network model comprises:

5. The method of claim 4, wherein the first neural network model comprises a pre-trained BERT model, and wherein the step of inputting the text sequence into the first neural network model further comprises, prior to the step of:

6. The method of any one of claims 1 or 5, wherein the second neural network model comprises a multi-label intent classifier, and wherein the step of identifying the set of intentions for the text sequence from the sequence of word-embedding vectors by the second neural network model further comprises, prior to the step of:

7. The method of claim 6, wherein constructing a joint loss function from the loss functions of the second neural network model and the third neural network model comprises:

and optimizing the joint loss function through a back propagation algorithm.

8. A slot prediction apparatus for multi-intent text, comprising:

9. A computer device comprising a memory having stored therein a computer program and a processor which when executed implements the steps of the slot prediction method of multi-intent text of any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the slot prediction method of multi-intent text according to any one of claims 1 to 7.