CN111462734A - Semantic slot filling model training method and system - Google Patents
Semantic slot filling model training method and system Download PDFInfo
- Publication number
- CN111462734A CN111462734A CN202010248117.7A CN202010248117A CN111462734A CN 111462734 A CN111462734 A CN 111462734A CN 202010248117 A CN202010248117 A CN 202010248117A CN 111462734 A CN111462734 A CN 111462734A
- Authority
- CN
- China
- Prior art keywords
- training
- semantic slot
- semantic
- value pair
- filling model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 102
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000012937 correction Methods 0.000 claims abstract description 47
- 230000002787 reinforcement Effects 0.000 claims abstract description 10
- 238000012360 testing method Methods 0.000 claims description 14
- 230000015654 memory Effects 0.000 claims description 9
- 230000002457 bidirectional effect Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000010295 mobile communication Methods 0.000 description 3
- 230000006403 short-term memory Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000002716 delivery method Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 239000010410 layer Substances 0.000 description 2
- 230000007787 long-term memory Effects 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 238000002679 ablation Methods 0.000 description 1
- 125000002015 acyclic group Chemical group 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the invention provides a semantic slot filling model training method. The method comprises the following steps: training a first training data set with labels to generate a first semantic slot filling model; inputting a second training data set of automatic voice recognition into the first semantic slot filling model, and determining a first semantic slot value pair; the rule-based error correction module corrects the first semantic slot value pair and determines a second semantic slot value pair, wherein the error correction module corrects the first semantic slot value pair based on a preset rule; and performing strategy gradient training on the first semantic slot filling model based on the second semantic slot value pair, and determining a trained second semantic slot filling model. The embodiment of the invention also provides a semantic slot filling model training system. The embodiment of the invention directly introduces the error correction based on the rule into the training method through reinforcement learning, and is used for the slot filling task in the spoken language semantic understanding. Thereby improving the robustness of semantic understanding to speech recognition errors.
Description
Technical Field
The invention relates to the field of intelligent voice, in particular to a semantic slot filling model training method and system.
Background
Spoken semantic understanding is a technique for converting the output produced by automatic speech recognition into a structured semantic representation and is therefore very sensitive to speech recognition errors. In semantic understanding, semantic slot filling will typically be used. In order to improve the robustness of semantic understanding to speech recognition errors, the predicted slot values of semantic slot filling are corrected by using a rule-based correction model. Thereby ensuring the accuracy of the semantic understanding of the spoken language.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art:
the drawback of these methods is that the slot filling model and the rule-based error correction model are independent of each other, and even though the two models are trained separately, the quality of the correction result is greatly limited by the rule error correction model. However, the error correction should be a post-processing module and should not affect the semantic understanding of the spoken language too much. Making spoken semantic understanding less robust to speech recognition.
Disclosure of Invention
The method aims to at least solve the problem that in the prior art, a slot filling model and a rule-based error correction model are independent from each other in the spoken language semantic understanding, so that the robustness of the spoken language understanding to speech recognition errors is poor.
In a first aspect, an embodiment of the present invention provides a semantic slot filling model training method, including:
training a first training data set with labels to generate a first semantic slot filling model;
inputting a second training data set of automatic voice recognition into the first semantic slot filling model, and determining a first semantic slot value pair;
correcting the first semantic slot value pair by an error correction module based on rules to determine a second semantic slot value pair, wherein the error correction module corrects the first semantic slot value pair based on preset rules;
and performing strategy gradient training on the first semantic slot filling model based on the second semantic slot value pair, and determining a trained second semantic slot filling model.
In a second aspect, an embodiment of the present invention provides a semantic slot filling model training system, including:
the data training program module is used for training a first training data set with labels to generate a first semantic slot filling model;
a semantic slot value pair determining program module, configured to input a second training data set for automatic speech recognition to the first semantic slot filling model, and determine a first semantic slot value pair;
a correcting program module, configured to correct the first semantic slot value pair by using a rule-based error correcting module, and determine a second semantic slot value pair, where the error correcting module corrects the first semantic slot value pair based on a preset rule;
and the semantic slot filling model training program module is used for performing strategy gradient training on the first semantic slot filling model based on the second semantic slot value pair to determine a trained second semantic slot filling model.
In a third aspect, an electronic device is provided, comprising: the apparatus includes at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the semantic slot filling model training method of any of the embodiments of the present invention.
In a fourth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the semantic slot filling model training method according to any embodiment of the present invention.
The embodiment of the invention has the beneficial effects that: rule-based error correction is directly introduced into the training method through reinforcement learning and is used for a slot filling task in spoken language semantic understanding. On the one hand domain knowledge is utilized and on the other hand two modules of slot filling and error correction are connected. Thereby improving the robustness of semantic understanding to speech recognition errors.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flowchart of a semantic slot filling model training method according to an embodiment of the present invention;
FIG. 2 is a model architecture diagram of a semantic slot filling model training method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating results of a test set of a semantic slot filling model training method according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating an example of a semantic slot filling model training method according to an embodiment of the present invention;
FIG. 5 is a performance diagram of a semantic slot filling model training method according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a semantic slot filling model training system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a semantic slot filling model training method according to an embodiment of the present invention, which includes the following steps:
s11: training a first training data set with labels to generate a first semantic slot filling model;
s12: inputting a second training data set of automatic voice recognition into the first semantic slot filling model, and determining a first semantic slot value pair;
s13: correcting the first semantic slot value pair by an error correction module based on rules to determine a second semantic slot value pair, wherein the error correction module corrects the first semantic slot value pair based on preset rules;
s14, based on the second semantic slot value pair, strategy gradient training is carried out on the first semantic slot filling model, and the trained second semantic slot filling model is determined.
In the embodiment, in order to solve the defects in the prior art, an error correction module is further introduced in the training process of the slot filling model, and since the correction process is a rule-based non-derivable process, the training is performed by using a strategy gradient transfer method in reinforcement learning. The correction module is considered in the training process, so that the output of the slot filling model can be better used for the correction module, and the robustness of the semantic understanding on the speech recognition is improved. The method further comprises two modules, a semantic slot filling model, and a rule-based error correction module.
For step S11, appropriate data, including manually labeled real text, and text for speech recognition hypotheses, needs to be prepared for training the semantic slot filling model. Both of which are used during the training phase. The first data is manually labeled real text data, and each word is clearly labeled, so that the slot filling task can be converted into a sequence labeling task to be processed, and the semantic slot filling model is trained.
As an embodiment, the first training data set with labels is trained via a bidirectional long-term and short-term memory network.
For step S12, the text of the speech recognition hypothesis, i.e., the text of the automatic speech recognition, is input to the semantic slot filling model trained in step S11, as shown in the model architecture diagram of fig. 2, the user speaks "i want to go to the quiet zone", but mistakenly recognizes "i want to go to the quiet bay" due to an error in the speech recognition. The wrong first semantic slot value pair is obtained, and for convenience of representation, represented by a semantic triple, resulting in (interior-end-quiet bay area). Because of the error of semantic slot value pair, "i want to go to quiet bay" can not get the correct alignment mark.
For step S13, the error correction module is composed of a plurality of rules that are gradually enriched by continuously collecting errors of speech recognition in daily use. The "i want to go to quiet bay" (in-end-quiet bay) is corrected based on the error correction module, for example, the corrected user's original meaning "i want to go to quiet bay". The resulting slot value pair is "in form-end-quiet zone". The real text is aligned and labeled to obtain 'I, O, B-in-endpoint, I-in-endpoint area and I-in-endpoint'.
For step S14, the correction process is considered to be a rule-based non-derivable process and is thus trained using a strategic gradient delivery method in reinforcement learning, wherein the strategic gradient delivery method includes Pre-training and R L-training reinforcement learning training.
It can be seen from this embodiment that rule-based error correction is introduced directly into the training method through reinforcement learning for slot filling tasks in spoken semantic understanding. On the one hand domain knowledge is utilized and on the other hand two modules of slot filling and error correction are connected. Thereby improving the robustness of semantic understanding to speech recognition errors.
As an implementation manner, in this embodiment, after determining the trained second semantic slot filling model, the method further includes:
receiving a test data set;
inputting the test data set into the second semantic slot filling model, and determining slot value pairs before correction;
and inputting the slot value pair before correction into the error correction module to obtain a final slot value pair.
In this embodiment, in order to verify the effect of the trained second semantic slot filling model, a pre-prepared test data set is input into the second semantic slot filling model to obtain the slot value pair before correction, for example, the test data set may be a hypothetical text of speech recognition. And inputting the slot value pair before correction into an error correction module for correction to obtain a final slot value pair.
It can be seen from this embodiment that the robustness of semantic understanding to speech recognition errors is further improved by test checking.
To address this issue, the method proposes a policy gradient-based reinforcement learning (R L) method to optimize the S2U model to take into account the final performance after error correction 539.
The present method is fully described by defining some symbols which will be used hereinafter. Wherein, let r ═ (r)1...r|r|) And u ═ u (u)1...u|u|) Respectively represent ASR (Automatic Speech Recognition) best-recognized text and real text, y ═ y1...y|y|) Stands for sentences in the form of act (slot) triplesSub-level semantic tags, and o ═ o (o)1... O | u |) represents a word-level label on u in "BIO" mode (B-begin, I-inside, O-outslide).
B L STM (Bidirectional L ong Short-Term Memory Network, Bidirectional long-and Short-Term Memory Network) encoder reads input sequence x (u or r), and reads input sequence x (u or r) through the encoderGenerating a hidden state at the t-th time step, whereinAndis the hidden state L STM decoder passes s at the t time stept=LSTM(st-1,ψ(ot-1),ct) Recursively updating its hidden state, where ψ (●) is a label embedding function, ctIn the focusing mechanism, i.e. htOnly the hidden state of the alignment is considered. s0By usingAnd (5) initializing. Then, passing P (o)t|o<t;x)=g(St) Generating a Slot tag otWhere g denotes a linear layer followed by a Softmax function for classification.
Assume that there is a predicted act triple a (s v). Representing a corresponding act-slot-value triple candidate set in the current domain ontology as V ═ V (V)1...V|V|). Based on the ontology, firstly constructing an n-gram word Gn. Each value is considered as a word sequence v ═ (v ═ v)1...vM) N-gram set of vn={(vi...vi+n-1) M-n +1 }. Then, a binary-valued eigenvector d ═ is established for v (d)1...d|Gn|) WhereinAnd byIt is normalized. Similarly, the candidate set of values V can be represented as a feature matrix (after normalization)The k-th column is VkThe feature vector of (2). Therefore, the best candidate value can be found in a manner similar to cosine similarity. The index of the best value is:
since some slots have many possible values in the ontology, efficiency can be greatly improved by simply performing matrix multiplication. In practice, n ranges from 1 to 2, so the vocabulary amount is equal to | G1| + | G2 |. A threshold (here 0.5) is set to reject bad selections.
To prune the larger search space, the model is pre-trained using labeled real text to lead R L training.
Let DtscpDenotes real text with alignment labels. The slot filling model is supervised by negative log likelihood loss:
where θ represents the model parameters.
In the R L training phase, automatic speech recognition hypotheses are used in conjunction with a misalignment marker, labeled Dhyp{ (r, y) }. The slot filling model samples through a beam search to generate K tag sequences and then triples act (slot). Finally, a set of semantic tuples is generated after the EC module
For each input speech r, the reward is considered at both the three levels and the sentence level:
where the first term penalizes False Positives (FP) and False Negatives (FN) at the triplet level and the second term is a binary value that indicates whether the entire sentence is predicted correctly. The model is optimized by maximizing the expected cumulative reward using strategic gradient descent. The policy gradient can be calculated as:
whereinIs a baseline for reducing the variance of the gradient estimate, which is obtained by averaging the rewards inside the bundle.
To stabilize the training process, use D alternatelytscpAnd DhypTraining is beneficial.
Experiments were conducted on the above on the first chinese audio text spoken language understanding challenge (CATS L U) dataset containing four dialog domains (map, music, video, weather).
The 200-dimensional char embedding may be initialized by pre-training a L STM-based bi-directional language model (bi L M) using a zhwiki3 corpus, L STM is a single layer with 256 hidden cells, in the training process, the parameters are uniformly sampled in the range of (-0.2; 0.2), dropout with a probability of 0.5 is applied to the acyclic layer, Adpout is selected as the optimizer, the learning rate is set to 0.001 in pre-am training, 0.0005 in R L training, fixed during training, the bundle size is set to 5 in the decoding phase, the best model is selected based on the performance on the validation set, and then the F-score and sentence-level accuracy of act (slot) triplets are evaluated.
By displaying the primary results compared to different benchmarks. In the evaluation phase, all experiments were error corrected. The following criteria were studied:
HD: only unaligned data is employed.
Focus: the annotated real text is trained and the ASR hypotheses are evaluated.
UA, changing the groove filling model from B L STM to Focus.
DA: a data enhancement method in which training data is enhanced by pseudo-aligned ASR hypotheses in two ways: (1) generated from a pre-trained marker model (Gen); (2) aligned with the real text by the minimum edit distance (Align).
The results diagram of the test set shown in fig. 3 shows the overall results of the test set. The results show that models trained in an end-to-end fashion using unaligned data ("HD") are less effective than labeled models ("Focus"). The "UA" method is transferred from real text to ASR hypotheses and obtains results comparable to Focus. No increase was found using the "UA" and "DA" methods, possibly due to noisy datasets. Compared with the "focus" and "DA" benchmarks, the proposed model has significant improvement except in the music domain (the significance level is 95% in the video and weather domain and 90% in the map domain).
FIG. 4 gives an example of how R L training may provide benefits for vacancy filling.A reference model identifies two bin blocks, "company" and "Ganhezi town," separated by the special word "is," which would generate erroneous bin value pairs.S L U models learn by R L to produce outputs more suitable for correction.
The effectiveness of each sub-module in the model was studied by ablation studies. As can be seen from the upper half of the performance diagram shown in FIG. 5, if only ASR hypothesis D is usedhyp(i.e. without "Tscp") training, then due to the lack of a strong supervisory signal from the real text, the training is not performedThe performance may be degraded. Without pre-training ("PT"), the performance of the system also decreased (F score of 0.47%, joint accuracy of 0.72%), indicating the importance of pre-training. Furthermore, without any real text supervision, the average performance drops dramatically. This is because searching in a large space is difficult.
Fig. 6 is a schematic structural diagram of a semantic slot filling model training system according to an embodiment of the present invention, which can execute the semantic slot filling model training method according to any of the above embodiments and is configured in a terminal.
The semantic slot filling model training system provided by the embodiment comprises: data training program module 11, semantic slot value pair determination program module 12, correction program module 13, and semantic slot filling model training program module 14.
The data training program module 11 is configured to train a first training data set with labels to generate a first semantic slot filling model; the semantic slot value pair determining program module 12 is configured to input the second training data set for automatic speech recognition to the first semantic slot filling model, and determine a first semantic slot value pair; the correcting program module 13 is configured to correct the first semantic slot value pair by using a rule-based error correcting module, and determine a second semantic slot value pair, where the error correcting module corrects the first semantic slot value pair based on a preset rule; the semantic slot filling model training program module 14 is configured to perform policy gradient training on the first semantic slot filling model based on the second semantic slot value pair, and determine a trained second semantic slot filling model.
Further, the system includes a test program module for:
receiving a test data set;
inputting the test data set into the second semantic slot filling model, and determining slot value pairs before correction;
and inputting the slot value pair before correction into the error correction module to obtain a final slot value pair.
Further, the data training program module is to:
and training the first training data set with the labels through a bidirectional long-time memory network.
Further, the semantic slot value pair comprises a semantic triple.
Further, the strategy gradient training comprises Pre-training and R L-training reinforcement learning training.
The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the semantic slot filling model training method in any method embodiment;
as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
training a first training data set with labels to generate a first semantic slot filling model;
inputting a second training data set of automatic voice recognition into the first semantic slot filling model, and determining a first semantic slot value pair;
correcting the first semantic slot value pair by an error correction module based on rules to determine a second semantic slot value pair, wherein the error correction module corrects the first semantic slot value pair based on preset rules;
and performing strategy gradient training on the first semantic slot filling model based on the second semantic slot value pair, and determining a trained second semantic slot filling model.
As a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium that, when executed by a processor, perform a semantic slot filling model training method in any of the method embodiments described above.
The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present invention further provides an electronic device, which includes: the apparatus includes at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the semantic slot filling model training method of any of the embodiments of the present invention.
The client of the embodiment of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as tablet computers.
(3) Portable entertainment devices such devices may display and play multimedia content. The devices comprise audio and video players, handheld game consoles, electronic books, intelligent toys and portable vehicle-mounted navigation devices.
(4) Other electronic devices with data processing capabilities.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A semantic slot filling model training method comprises the following steps:
training a first training data set with labels to generate a first semantic slot filling model;
inputting a second training data set of automatic voice recognition into the first semantic slot filling model, and determining a first semantic slot value pair;
correcting the first semantic slot value pair by an error correction module based on rules to determine a second semantic slot value pair, wherein the error correction module corrects the first semantic slot value pair based on preset rules;
and performing strategy gradient training on the first semantic slot filling model based on the second semantic slot value pair, and determining a trained second semantic slot filling model.
2. The method of claim 1, wherein after determining the trained second semantic slot filling model, the method further comprises:
receiving a test data set;
inputting the test data set into the second semantic slot filling model, and determining slot value pairs before correction;
and inputting the slot value pair before correction into the error correction module to obtain a final slot value pair.
3. The method of claim 1, wherein the training the labeled first training data set comprises:
and training the first training data set with the labels through a bidirectional long-time memory network.
4. The method of claim 1, wherein the semantic slot value pairs comprise semantic triples.
5. The method of claim 1, wherein the strategy gradient training comprises Pre-training and R L-training reinforcement learning training.
6. A semantic slot filling model training system, comprising:
the data training program module is used for training a first training data set with labels to generate a first semantic slot filling model;
a semantic slot value pair determining program module, configured to input a second training data set for automatic speech recognition to the first semantic slot filling model, and determine a first semantic slot value pair;
a correcting program module, configured to correct the first semantic slot value pair by using a rule-based error correcting module, and determine a second semantic slot value pair, where the error correcting module corrects the first semantic slot value pair based on a preset rule;
and the semantic slot filling model training program module is used for performing strategy gradient training on the first semantic slot filling model based on the second semantic slot value pair to determine a trained second semantic slot filling model.
7. The system of claim 6, wherein the system further comprises a test program module to:
receiving a test data set;
inputting the test data set into the second semantic slot filling model, and determining slot value pairs before correction;
and inputting the slot value pair before correction into the error correction module to obtain a final slot value pair.
8. The system of claim 6, wherein the data training program module is to:
and training the first training data set with the labels through a bidirectional long-time memory network.
9. The system of claim 6, wherein the semantic slot value pairs comprise semantic triples.
10. The system of claim 6, wherein the strategy gradient training includes Pre-training and R L-training reinforcement learning training.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010248117.7A CN111462734B (en) | 2020-03-31 | 2020-03-31 | Semantic slot filling model training method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010248117.7A CN111462734B (en) | 2020-03-31 | 2020-03-31 | Semantic slot filling model training method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111462734A true CN111462734A (en) | 2020-07-28 |
CN111462734B CN111462734B (en) | 2022-07-26 |
Family
ID=71684351
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010248117.7A Active CN111462734B (en) | 2020-03-31 | 2020-03-31 | Semantic slot filling model training method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111462734B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111951789A (en) * | 2020-08-14 | 2020-11-17 | 北京达佳互联信息技术有限公司 | Training of speech recognition model, speech recognition method, apparatus, device and medium |
CN112380327A (en) * | 2020-11-09 | 2021-02-19 | 天翼爱音乐文化科技有限公司 | Cold-start slot filling method, system, device and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110144986A1 (en) * | 2009-12-10 | 2011-06-16 | Microsoft Corporation | Confidence calibration in automatic speech recognition systems |
CN107240398A (en) * | 2017-07-04 | 2017-10-10 | 科大讯飞股份有限公司 | Intelligent sound exchange method and device |
CN108417205A (en) * | 2018-01-19 | 2018-08-17 | 苏州思必驰信息科技有限公司 | Semantic understanding training method and system |
CN108628830A (en) * | 2018-04-24 | 2018-10-09 | 北京京东尚科信息技术有限公司 | A kind of method and apparatus of semantics recognition |
CN108920497A (en) * | 2018-05-23 | 2018-11-30 | 北京奇艺世纪科技有限公司 | A kind of man-machine interaction method and device |
CN108962224A (en) * | 2018-07-19 | 2018-12-07 | 苏州思必驰信息科技有限公司 | Speech understanding and language model joint modeling method, dialogue method and system |
CN110929875A (en) * | 2019-10-12 | 2020-03-27 | 平安国际智慧城市科技股份有限公司 | Intelligent language learning method, system, device and medium based on machine learning |
-
2020
- 2020-03-31 CN CN202010248117.7A patent/CN111462734B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110144986A1 (en) * | 2009-12-10 | 2011-06-16 | Microsoft Corporation | Confidence calibration in automatic speech recognition systems |
CN107240398A (en) * | 2017-07-04 | 2017-10-10 | 科大讯飞股份有限公司 | Intelligent sound exchange method and device |
CN108417205A (en) * | 2018-01-19 | 2018-08-17 | 苏州思必驰信息科技有限公司 | Semantic understanding training method and system |
CN108628830A (en) * | 2018-04-24 | 2018-10-09 | 北京京东尚科信息技术有限公司 | A kind of method and apparatus of semantics recognition |
CN108920497A (en) * | 2018-05-23 | 2018-11-30 | 北京奇艺世纪科技有限公司 | A kind of man-machine interaction method and device |
CN108962224A (en) * | 2018-07-19 | 2018-12-07 | 苏州思必驰信息科技有限公司 | Speech understanding and language model joint modeling method, dialogue method and system |
CN110929875A (en) * | 2019-10-12 | 2020-03-27 | 平安国际智慧城市科技股份有限公司 | Intelligent language learning method, system, device and medium based on machine learning |
Non-Patent Citations (1)
Title |
---|
侯丽仙等: "融合多约束条件的意图和语义槽填充联合识别", 《计算机科学与探索》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111951789A (en) * | 2020-08-14 | 2020-11-17 | 北京达佳互联信息技术有限公司 | Training of speech recognition model, speech recognition method, apparatus, device and medium |
CN111951789B (en) * | 2020-08-14 | 2021-08-17 | 北京达佳互联信息技术有限公司 | Training of speech recognition model, speech recognition method, apparatus, device and medium |
CN112380327A (en) * | 2020-11-09 | 2021-02-19 | 天翼爱音乐文化科技有限公司 | Cold-start slot filling method, system, device and storage medium |
CN112380327B (en) * | 2020-11-09 | 2022-03-04 | 天翼爱音乐文化科技有限公司 | Cold-start slot filling method, system, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111462734B (en) | 2022-07-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11238845B2 (en) | Multi-dialect and multilingual speech recognition | |
US11586930B2 (en) | Conditional teacher-student learning for model training | |
CN110516253B (en) | Chinese spoken language semantic understanding method and system | |
CN112528637B (en) | Text processing model training method, device, computer equipment and storage medium | |
CN107844481B (en) | Text recognition error detection method and device | |
CN110556100A (en) | Training method and system of end-to-end speech recognition model | |
CN116127953B (en) | Chinese spelling error correction method, device and medium based on contrast learning | |
CN111382231B (en) | Intention recognition system and method | |
CN109637527B (en) | Semantic analysis method and system for dialogue statement | |
CN114596844A (en) | Acoustic model training method, voice recognition method and related equipment | |
CN113421551B (en) | Speech recognition method, speech recognition device, computer readable medium and electronic equipment | |
CN112825114A (en) | Semantic recognition method and device, electronic equipment and storage medium | |
CN111462734B (en) | Semantic slot filling model training method and system | |
CN112767921A (en) | Voice recognition self-adaption method and system based on cache language model | |
CN115017890A (en) | Text error correction method and device based on character pronunciation and character font similarity | |
CN113571045B (en) | Method, system, equipment and medium for identifying Minnan language voice | |
CN113012685B (en) | Audio recognition method and device, electronic equipment and storage medium | |
CN114638231B (en) | Entity linking method and device and electronic equipment | |
CN113160801B (en) | Speech recognition method, device and computer readable storage medium | |
CN115525749A (en) | Voice question-answering method, device, electronic equipment and storage medium | |
CN115240712A (en) | Multi-mode-based emotion classification method, device, equipment and storage medium | |
CN112560431A (en) | Method, apparatus, device, storage medium, and computer program product for generating test question tutoring information | |
CN113096646A (en) | Audio recognition method and device, electronic equipment and storage medium | |
CN114333797B (en) | Audio-to-text model training method, electronic device and storage medium | |
CN112735380B (en) | Scoring method and voice recognition method for re-scoring language model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Applicant after: Sipic Technology Co.,Ltd. Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Applicant before: AI SPEECH Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |