CN111724767A

CN111724767A - Spoken language understanding method based on Dirichlet variational self-encoder and related equipment

Info

Publication number: CN111724767A
Application number: CN201911247568.2A
Authority: CN
Inventors: 高望; 朱珣; 邓宏涛; 王煜炜; 曾凡琮
Original assignee: Jianghan University
Current assignee: Jianghan University
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2020-09-29
Anticipated expiration: 2039-12-09
Also published as: CN111724767B

Abstract

The invention discloses a spoken language understanding method based on a Dirichlet variational self-encoder, which belongs to the technical field of computers and comprises the following steps: sampling the training corpus by using a Dirichlet variational self-encoder to generate a sampling corpus set; according to the sampling corpus, data enhancement is carried out; and generating a training corpus. The invention realizes that the semi-supervised learning method based on the Dirichlet variational self-encoder is introduced into the modeling process of spoken language understanding, the latent semantic features of original data are learned and high-quality new data are generated, the labeling cost is reduced, and the beneficial effect of improving the spoken language understanding model is achieved.

Description

Spoken language understanding method based on Dirichlet variational self-encoder and related equipment

Technical Field

The invention relates to the technical field of computers, in particular to a spoken language understanding method based on a Dirichlet variational self-encoder and related equipment.

Background

The task-based dialog system is a human-computer interaction system which helps a user to complete a specific task through multiple rounds of dialog, and is a research direction which is widely concerned and has a wide application prospect. Currently, many research institutes and technology companies have been involved in the field of task-based dialog systems, such as maruazid, Siri, microsoft, xianna, by ariza. Spoken language understanding is a core technology for constructing task-based dialog systems for parsing natural language originally input by a user into computer-understandable structured semantic expressions. The expression comprises a semantic unit which can represent the intention of a user most, and is important for the development of a human-computer interaction system.

In recent years, great progress has been made in spoken language understanding models based on deep neural networks, in particular joint learning models of semantic Slot Filling (Slot Filling) and Intent recognition (Intent Classifier). The basic idea of this model is to use a neural network to learn semantic information of an input sentence and then output an intention category of the whole sentence and semantic slot labels corresponding to each word. In the model, the generation processes of the intention category and the semantic slot label can learn each other, and the performance is improved together. Compared with the traditional machine learning and rule method-based method, the combined learning model has the advantages of higher accuracy, no need of handwriting templates, strong adaptability and the like.

However, the joint learning model is similar to most natural language processing tasks and faces serious data scarcity problems. Furthermore, the sparsity problem is exacerbated by the near infinite domain space in the spoken language understanding dataset and the labor intensive labeling task. While the traditional data enhancement and generation method relies on enhancement/generation functions, the generated sentences are generally poor in robustness and diversity. The problems of overfitting and lacking generalization capability of the joint learning model and the like exist, so that the spoken language understanding effect is influenced, and the key problem to be solved by the invention is also solved.

Disclosure of Invention

The invention provides a spoken language understanding method based on a Dirichlet variational self-encoder and related equipment, which are used for solving the technical problems in the prior art.

In a first aspect, an embodiment of the present invention provides a method for spoken language understanding based on a dirichlet variational self-encoder, where the method includes: sampling the training corpus by using a Dirichlet variational self-encoder to generate a sampling corpus set; and according to the sampling corpus set, performing data enhancement to generate training corpuses.

Further, in the first aspect, the sampling the training corpus by using the dirichlet variational self-encoder, and generating the sampling corpus specifically includes: giving the number n of the sampled corpuses, and initializing a null corpus set M; when the number of the corpora in M is less than n, looping S1121-S1124: s1121 selecting a real word sequence w; s1122 deduces the approximate posterior parameter by the inverse gamma distribution function approximation method

S1123 distribution of q by variation_φ(w | z) sampling

S1124 will sample the corpus

Adding into M; and generating the sampling corpus set.

Further, in the first aspect, the generating the corpus specifically includes the following steps: firstly sampling z-q_φ(z) then approximating p with a Dirichlet variational self-encoder_η(w | z); by p_η(w | z) sampling to obtain a sequence of generated words

Generating word sequences using generated words

Training joint models for spoken language understanding, inference

Generating slot fill and intent recognition results

Will be provided with

And

together form a new corpus

And adding the data to the generated corpus set.

Further, in the first aspect, the performing data enhancement specifically includes: by latent variable z and sample corpus

Data enhancement is performed on semantic slot filling and intent recognition tasks.

In a second aspect, an embodiment of the present invention provides a spoken language understanding system based on a dirichlet variational self-encoder, where the system includes: the system comprises a sampling corpus generating module, a data processing module and a data processing module, wherein the sampling corpus generating module is configured to sample a training corpus by using a Dirichlet variational self-encoder to generate a sampling corpus; the data enhancement module is configured to enhance data according to the sampling corpus; and the corpus generating module is configured to generate corpus.

Further, in the second aspect, the sampling corpus generating module specifically includes: a first sub-module configured to initialize a null corpus M given a sampling corpus number n; a second sub-module configured to loop S1121-S1124 when the number of corpuses in M is less than n: selecting a real word sequence w; s1122, deducing approximate posterior parameters by an inverse gamma distribution function approximation method

S1123, distributing q by variation_φ(w | z) sampling

S1124, corpus of samples

Adding into M; a third sub-module configured to generate the corpus of samples.

Further, in a second aspect, the corpus generating module specifically includes: a first subunit configured to first sample z-q_φ(z) then approximating p with a Dirichlet variational self-encoder_η(w | z); a second subunit configured to utilize p_η(w | z) sampling to obtain a sequence of generated words

A third subunit configured to generate a word sequence using the generated word

Training joint models for spoken language understanding, inference

A fourth subunit configured to generate slot filling and intention recognition results

A fifth subunit configured to connect

And

together form a new corpus

And adding the data to the generated corpus set.

Further, in a second aspect, the data enhancement module is further specifically configured to: by latent variable z and sample corpus

In a third aspect, the present invention further provides an apparatus for spoken language understanding based on a dirichlet variational self-encoder, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the following steps when executing the program: sampling the training corpus by using a Dirichlet variational self-encoder to generate a sampling corpus set; according to the sampling corpus, data enhancement is carried out; and generating a training corpus.

In a fourth aspect, the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of: sampling the training corpus by using a Dirichlet variational self-encoder to generate a sampling corpus set; according to the sampling corpus, data enhancement is carried out; generating a training corpus.

One or more technical schemes provided in the embodiment of the invention have at least the following technical effects or advantages:

the invention provides a spoken language understanding method based on a Dirichlet variational self-encoder, which comprises the steps of firstly, sampling a training corpus by using the Dirichlet variational self-encoder to generate a sampling corpus set; then, data enhancement is carried out according to the sampling corpus set; and finally, a training corpus is generated, so that the semi-supervised learning method based on the Dirichlet variational self-encoder is introduced into the modeling process of spoken language understanding, potential semantic features of original data are learned, high-quality new data are generated, the labeling cost is reduced, and the beneficial effect of improving the spoken language understanding model is achieved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a flowchart of a spoken language understanding method based on a dirichlet variational auto-encoder in an embodiment of the present application;

FIG. 2 is a schematic structural diagram of DirVAE-SLU model in the embodiment of the present application;

FIG. 3 is a schematic view of another embodiment of the present application;

fig. 4 is a schematic structural diagram of a computer-readable storage medium in an embodiment of the present application.

Detailed Description

The spoken language understanding method based on the Dirichlet variational self-encoder provided by the invention realizes that a semi-supervised learning method based on the Dirichlet variational self-encoder is introduced into a modeling process of spoken language understanding, potential semantic features of original data are learned and high-quality new data are generated, the labeling cost is reduced, and the beneficial effect of improving a spoken language understanding model is achieved.

Referring to fig. 1-2, the technical solution in the embodiment of the present invention is as follows:

s11, sampling the training corpus by using a Dirichlet variational self-encoder to generate a sampling corpus set;

s12, enhancing data according to the sampling corpus;

and S13, generating a training corpus.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Moreover, the terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The term "and/or" in the description and claims of the present invention and the above drawings is only one kind of association relationship describing the associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

Example one

An embodiment of the present invention provides a spoken language understanding method based on a dirichlet variational self-encoder, please refer to fig. 1, where the method includes:

s12, enhancing data according to the sampling corpus;

and S13, generating a training corpus.

According to the research of the inventor, the joint learning model is similar to most natural language processing tasks and faces the problem of serious data scarcity. Furthermore, sparsity problems are exacerbated by the close infinite domain space in spoken language understanding datasets and the labeling task that consumes a lot of manpower. While the traditional data enhancement and generation method relies on enhancement/generation functions, the generated sentences are generally poor in robustness and diversity. This will cause the problem of over-fitting and lacking generalization ability of the joint learning model, thereby affecting the understanding effect of the spoken language. Based on this, the present invention provides a spoken language understanding method and related device based on dirichlet variational auto-encoder, so as to solve the above technical problems.

In the following, a mouth language understanding method based on a dirichlet variational self-encoder according to an embodiment of the present invention is described in detail with reference to fig. 1:

the standard spoken language understanding model is a discriminant model highly related to a data set, and the data set understood by the spoken language at least comprises an input word sequence w, a tag sequence s filled with semantic slots and a tag y recognized by an intention. For the training data set (w, s, y), the loss function is shown in equation (1):

L(θ；w,s,y)＝-logp_θ(s,y|w) (1)

where θ represents the parameters that the model needs to solve. Given an input word sequence w, the joint model can simultaneously predict the semantic bin sequence s% and recognition intent using a method that maximizes log-likelihood

As shown in equation (2):

the sampling process is a key step of the Dirichlet variational self-encoder, and the training corpus can be sampled through the sampling process, so that the semantic features of sentences or vocabularies are obtained. A good sampling process can effectively improve the performance of the data-enhanced spoken language understanding model. Assuming that the corpus x is sampled from a real but unknown probability distribution P (x) e P, the exploratory sampling process is a sampling process that approximates the real distribution P (x) by introducing a latent variable z. Specifically, the dirichlet variational self-encoder approximates the true distribution p (x) by using the variational posterior distribution q (z | x) and the parameters (h, f), and measures the difference between the variational posterior distribution q (z | x) and the true posterior distribution p (z | x) by the KL divergence (KL divergence), and the loss function of the model is shown in formula (3):

applying equation (3) to the spoken language understanding task for data enhancement, then:

when solving the optimized parameters of the model

Then, a new word sequence can be obtained through variational distribution sampling of w

Data enhancement is performed on the spoken language understanding model, as shown in formula (5):

the conventional variational auto-coding assumes that the prior distribution of latent variables is a continuous random variable, while the dirichlet variational auto-coder uses the dirichlet distribution conjugated with the polynomial distribution as the prior distribution of latent variables, which is more suitable for the spoken language understanding model, as shown in equation (6):

z～p(z)＝Dirichlet(α),w～p_η(w|z) (6)

wherein α denotes the Dirichlet super parameter the approximate variational posterior distribution q in the encoder_φ(z | w) sampling to

Approximate posterior parameters

The method is not directly sampling z from Dirichlet distribution, but utilizes the characteristic that the Dirichlet distribution can be composed of a plurality of independent gamma distributions, and potential variables are sampled by using a gamma synthesis method_K) Wherein MultiGamma (a, β,1)_K) Representing K random variables subject to a gamma distribution then, using the sum term for v ∑ v_iV is normalized. The loss function is:

for equation (7), an inverse Gamma Distribution Function Approximation (inverse Gamma Distribution Function) method may enable a back-propagating flow to an input through a stochastic gradient method to infer model parameters^-1(u；α,β)≈β^-1(ua(α))^1/αTherefore, the invention replaces the randomness of v by introducing auxiliary variables u-Uniform (0,1), and takes the Gamma sampled v as the determined values of α and β. the exploratory sampling process of DirVAE-SLU may specifically include the following steps:

s111, giving the number n of the sampled corpuses, and initializing a null corpus set M;

s112, when the number of the corpora in the M is less than n, circulating S1121-S1124:

s1121, selecting a real word sequence w;

s1122, deducing approximate posterior parameters by an inverse gamma distribution function approximation method

S1123, distributing q by variation_φ(w | z) sampling

S1124, corpus of samples

Adding into M;

and S13, generating the sampling corpus set.

Then executing S12, and performing data enhancement according to the sampling corpus;

specifically, after obtaining the sample corpus, the DirVAE-SLU passes the latent variable z and the sample corpus

For data enhancement of semantic slot filling and intent recognition tasks, equation (1) can be transformed into:

where, phi represents the parameter of the original corpus w, and ζ is the sampled corpus

Fill semantic slots and identify parameters of intent. Considering the Dirichlet variational autocoder for data enhancement and spoken language understanding together, the joint training loss function of DirVAE-SLU is as follows:

structurally, the DirVAE-SLU model can be divided into two parts, namely a data enhancement part for performing latent variable inference by using a Dirichlet variate autocoder and generating a sampling corpus, and a part for realizing spoken language understanding through the sampling corpus. Wherein the data enhancement partThe DirVAE-SLU model uses a two-way Long-Short term memory (LSTM) network in the encoder part and three one-way LSTM networks in the decoder part, and the training process of the model is to solve the optimal parameters (η) by minimizing the loss function (formula (9))^*,φ^*,ζ^*)

η^*,φ^*,ζ^*＝argminL(η,φ,ζ；w,s,y) (10)

Finally, S13 is executed to generate corpus.

In detail, in the process of generating the training corpus by the DirVAE-SLU model, sampling is performed by using an inverse gamma distribution function approximation method. The method can comprehensively consider the factors such as corpus balance degree and computational resource overhead in the real data set to select data. When elected, DirVAE-SLU generates enough corpora using the following process:

1. first, sampling z to q_φ(z) then approximating p with a Dirichlet variational self-encoder_η(w|z)；

2. By p_η(w | z) sampling to obtain a sequence of generated words

3. Generating word sequences using generated words

Training joint models for spoken language understanding, inference

4. Generating slot fill and intent recognition results

5. Will be provided with

And

together form a new corpus

And adding the data to the generated corpus set.

The method provided by the invention can verify the high efficiency of the method by performing data enhancement experiment comparison on the reference model. The present invention uses two open source evaluation datasets: the aviation Information system data sets atis (Airlinetravel Information systems) and the virtual assistant corpus Snips are used as data sets for experiments. In the experiment, a is 0.99.1₁₀₀And β is 1, the input layer uses a Glove 300-dimensional word vector, the hidden layer dimension of the bi-directional LSTM in the encoder is 256, the hidden layer dimensions of the three unidirectional LSTM in the decoder are 1024, and a Slot-Gated model is used as a reference model.

TABLE 1 comparison of data enhancement effects on different datasets

As can be seen from the experimental results in Table 1, after data enhancement is performed by using DirVAE-SLU, the spoken language understanding performance of the reference model on two data sets is improved, so that the advancement of the invention is verified.

That is, the embodiment of the invention realizes that the semi-supervised learning method based on the dirichlet variational self-encoder is introduced into the modeling process of the spoken language understanding, the latent semantic features of the original data are learned and high-quality new data are generated, the labeling cost is reduced, and the beneficial effect of improving the spoken language understanding model is achieved.

Based on the same inventive concept, the embodiment of the invention also provides a device corresponding to the method in the first embodiment, which is shown in the second embodiment.

Example two

An embodiment of the present invention provides a system, where the system includes:

the sampling corpus generating module is configured to sample the training corpus by using a Dirichlet variational self-encoder to generate a sampling corpus;

the data enhancement module is configured to enhance data according to the sampling corpus;

and the corpus generating module is configured to generate corpus.

In the second embodiment of the present invention, the sampling corpus generating module specifically includes:

the first submodule is configured to give a sampling corpus number n and initialize a null corpus set M; a second sub-module configured to loop S1121-S1124 when the number of corpuses in M is less than n: s1121, selecting a real word sequence w; s1122, deducing approximate posterior parameters by an inverse gamma distribution function approximation method

S1123, distributing q by variation_φ(w | z) sampling

S1124, corpus of samples

Adding into M; a third sub-module configured to generate the corpus of samples.

In the second embodiment of the present invention, the corpus generating module specifically includes: a first subunit configured to first sample z-q_φ(z) then approximating p with a Dirichlet variational self-encoder_η(w | z); a second subunit configured to utilize p_η(w | z) sampling to obtain a sequence of generated words

A third subunit configured to generate a word sequence using the generated word

Training joint models for spoken language understanding, inference

A fifth subunit configured to connect

And

together form a new corpus

And adding the data to the generated corpus set.

In a second embodiment of the present invention, the data enhancement module is further specifically configured to: by latent variable z and sample corpus

Since the system described in the second embodiment of the present invention is a device used for implementing the method of the first embodiment of the present invention, based on the method described in the first embodiment of the present invention, a person skilled in the art can understand the specific structure and the deformation of the device, and thus the detailed description is omitted here. All the devices adopted by the method of the first embodiment of the invention belong to the protection scope of the invention.

EXAMPLE III

Based on the same invention communication between the first embodiment and the second embodiment, a third embodiment of the present invention provides an apparatus, including: radio Frequency (RF) circuitry 310, memory 320, input unit 330, display unit 340, audio circuitry 350, WiFi module 360, processor 370, and power supply 380. Wherein, the memory 320 stores a computer program operable on the processor 370, and the processor 370 implements the steps S110, S120, S130, S140, and S150 described in the first embodiment when executing the computer program; or implementing step S210, step S220, step S230, step S240, step S250 and step S260 described in embodiment two; or step S301, step S302, step S303, and step S304 described in the third embodiment are implemented.

In a specific implementation process, when the processor executes the computer program, either implementation manner of the first embodiment or the second embodiment can be realized.

Those skilled in the art will appreciate that the device configuration shown in fig. 3 is not intended to be limiting of the device itself and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes the components of the computer device in detail with reference to fig. 3:

RF circuitry 310 may be used for receiving and transmitting signals, and in particular, for receiving downlink information from base stations and processing the received downlink information to processor 370. In general, the RF circuit 310 includes, but is not limited to, at least one Amplifier, transceiver, coupler, Low Noise Amplifier (LNA), duplexer, and the like.

The memory 320 may be used to store software programs and modules, and the processor 370 may execute various functional applications of the computer device and data processing by operating the software programs and modules stored in the memory 320. The memory 320 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 320 may include a high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 330 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus. Specifically, the input unit 330 may include a keypad 331 and other input devices 332. The keyboard 331 can collect the input operation of the user thereon and drive the corresponding connection device according to a preset program. The keyboard 331 collects the output information and sends it to the processor 370. The input unit 330 may include other input devices 332 in addition to the keyboard 331. In particular, other input devices 332 may include, but are not limited to, one or more of a touch panel, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 340 may be used to display information input by a user or information provided to the user and various menus of the computer device. The Display unit 340 may include a Display panel 341, and optionally, the Display panel 341 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the keyboard 331 may cover the display panel 341, and when the keyboard 331 detects a touch operation thereon or nearby, the keyboard 331 transmits to the processor 370 to determine the type of the touch event, and then the processor 370 provides a corresponding visual output on the display panel 341 according to the type of the input event. Although the keyboard 331 and the display panel 341 are shown in fig. 3 as two separate components to implement input and output functions of the computer device, in some embodiments, the keyboard 331 and the display panel 341 may be integrated to implement input and output functions of the computer device.

Audio circuitry 350, speaker 351, microphone 352 may provide an audio interface between a user and a computer device. The audio circuit 350 may transmit the electrical signal converted from the received audio data to the speaker 351, and convert the electrical signal into a sound signal by the speaker 351 for output;

WiFi belongs to short-distance wireless transmission technology, and computer equipment can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 360, and provides wireless broadband internet access for the user. Although fig. 3 shows the WiFi module 360, it is understood that it does not belong to the essential constitution of the computer device, and can be omitted entirely within the scope not changing the essence of the invention as needed.

The processor 370 is a control center of the computer device, connects various parts of the entire computer device using various interfaces and lines, performs various functions of the computer device and processes data by operating or executing software programs and/or modules stored in the memory 320 and calling data stored in the memory 320, thereby monitoring the computer device as a whole. Alternatively, processor 370 may include one or more processing units; preferably, the processor 370 may be integrated with an application processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, and the like.

The computer device also includes a power supply 380 (such as a power adapter) for powering the various components, which may preferably be logically connected to the processor 370 through a power management system.

Example four

Based on the same inventive concept, as shown in fig. 4, the fifth embodiment provides a computer-readable storage medium 400, on which a computer program 411 is stored, and when the computer program 411 is executed by a processor, the steps S110, S120, S130, S140 and S150 described in the first embodiment are realized; or implementing step S210, step S220, step S230, step S240, step S250 and step S260 described in embodiment two; or step S301, step S302, step S303, and step S304 described in the third embodiment are implemented.

In a specific implementation, the computer program 411 may implement any one of the first, second, and third embodiments when executed by a processor.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The technical scheme provided by the embodiment of the invention at least has the following technical effects or advantages:

the semi-supervised learning method based on the Dirichlet variational self-encoder is introduced into the modeling process of spoken language understanding, potential semantic features of original data are learned, high-quality new data are generated, the labeling cost is reduced, and the beneficial effect of improving the spoken language understanding model is achieved.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention are within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims

1. A method for spoken language understanding based on a dirichlet variational auto-encoder, the method comprising:

s12, enhancing data according to the sampling corpus;

and S13, generating a training corpus.

2. The method for spoken language understanding based on a dirichlet variational auto-encoder as claimed in claim 1, wherein the sampling the corpus with the dirichlet variational auto-encoder to generate the corpus of samples specifically comprises:

s1121, selecting a real word sequence w;

S1123, distributing q by variation_φ(w | z) sampling

S1124, corpus of samples

Adding into M;

and S13, generating the sampling corpus set.

3. The dirichlet-dependent spoken language understanding method of claim 2, wherein the generating the corpus specifically includes the steps of:

s131, first sampling z to q_φ(z) then approximating p with a Dirichlet variational self-encoder_η(w|z)；

S132, use of p_η(w | z) sampling to obtain a sequence of generated words

S133, generating word sequence by using generated words

Training joint models for spoken language understanding, inference

S134, generating groove filling and intention recognition results

S135, mixing

And

together form a new corpus

And adding the data to the generated corpus set.

4. The dirichlet-dependent spoken language understanding method of claim 3, wherein said performing data enhancement specifically comprises:

by latent variable z and sample corpus

5. A spoken language understanding system based on a dirichlet variational auto-encoder, the system comprising:

the system comprises a sampling corpus generating module, a data processing module and a data processing module, wherein the sampling corpus generating module is configured to sample a training corpus by using a Dirichlet variational self-encoder to generate a sampling corpus;

and the corpus generating module is configured to generate corpus.

6. The spoken language understanding system of claim 5, wherein the sample corpus generating module specifically comprises:

the first submodule is configured to give a sampling corpus number n and initialize a null corpus set M;

a second sub-module configured to loop S1121-S1124 when the number of corpuses in M is less than n:

s1121, selecting a real word sequence w;

S1123, distributing q by variation_φ(w | z) sampling

S1124, corpus of samples

Adding into M;

a third sub-module configured to generate the corpus of samples.

7. The system according to claim 6, wherein the corpus generating module specifically comprises:

a first subunit configured to first sample z-q_φ(z) then approximating p with a Dirichlet variational self-encoder_η(w|z)；

A second subunit configured to generate the word sequence using p η (w | z) sampling

A third subunit configured to generate a word sequence using the generated word

Training joint models for spoken language understanding, inference

A fifth subunit configured to connect

And

together form a new corpus

And adding the data to the generated corpus set.

8. The system of claim 6, wherein the data enhancement module is further specifically configured to:

by latent variable z and sample corpus

9. An apparatus for spoken language understanding based on a dirichlet variational auto-encoder, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of:

s12, enhancing data according to the sampling corpus;

and S13, generating a training corpus.

10. A computer-readable storage medium, on which a computer program is stored, which program, when executed by a processor, carries out the steps of:

s12, enhancing data according to the sampling corpus;

and S13, generating a training corpus.