CN110795945A

CN110795945A - Semantic understanding model training method, semantic understanding device and storage medium

Info

Publication number: CN110795945A
Application number: CN201911047125.9A
Authority: CN
Inventors: 袁刚; 赵学敏
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2020-02-14
Anticipated expiration: 2039-10-30
Also published as: CN110795945B

Abstract

The invention provides a semantic understanding model training method, which comprises the following steps: the training samples matched with the vehicle-mounted environment in the data source are recalled; performing boundary corpus expansion processing on the sentence sample with noise matched with the vehicle-mounted environment; labeling the noisy statement samples matched with the vehicle-mounted environment and subjected to boundary corpus expansion processing to form a first training sample set; processing the second training sample set through a semantic understanding model; and according to the updated parameters of the semantic understanding model, iteratively updating the semantic representation layer network parameters and task-related output layer network parameters of the semantic understanding model through a second training sample set. The invention also provides a semantic understanding method, a semantic understanding device and a storage medium. The method can improve the training precision and the training speed of the semantic understanding model, so that the semantic understanding model can adapt to a full-duplex use scene of a vehicle-mounted environment, and the influence of environmental noise on the semantic understanding model is avoided.

Description

Semantic understanding model training method, semantic understanding device and storage medium

Technical Field

The invention relates to a machine learning technology, in particular to a semantic understanding model training method, a semantic understanding device and a storage medium.

Background

In the usage scenario of full-duplex voice interaction, the following operations need to be implemented in a multi-sound-source environment where multiple sound sources continuously emit sound at the same time: such as comparing speech identity recognition (male, female, children), triggering dialogs of different content, speech emotion recognition, music/singing voice recognition, etc.; the background noise recognition and echo cancellation are performed in the environment processing, in the process, under a full-duplex dialogue scene Of a semantic understanding model, background noise and Out-Of-Domain (OOD) linguistic data irrelevant to other people's chatting and other fields are easier to be listened by an assistant, and if the linguistic data is mistakenly responded by the intelligent assistant, the interaction success rate is low, and the use experience Of a user is influenced. Therefore, in a full-duplex scenario, especially in a vehicle-mounted environment, it is required that the domain intention recognition accuracy in the dialog system is higher, and the semantic understanding model is required to understand when the rejection (i.e. response rejection) occurs and when the response user says so as to improve the user experience.

Disclosure of Invention

In view of this, embodiments of the present invention provide a semantic understanding model training method, a semantic understanding device, and a storage medium, so that the generalization capability of a semantic understanding model is stronger, the training precision and the training speed of the semantic understanding model are improved, and meanwhile, a larger number of training samples matched with a vehicle-mounted environment can be obtained by effectively and fully utilizing the gain of the existing noise sentences matched with the vehicle-mounted environment on model training, so that the semantic understanding model achieves better pertinence to a full-duplex speech scene of the vehicle-mounted environment, the accuracy of the semantic understanding model on information processing in the vehicle-mounted environment is improved, and the influence of environmental noise on the semantic understanding model is avoided.

The technical scheme of the embodiment of the invention is realized as follows:

the invention provides a semantic understanding model training method, which is characterized by comprising the following steps:

the training samples matched with the vehicle-mounted environment in the data source are recalled;

triggering a corresponding active learning process according to the recall processing result so as to obtain a noisy statement sample matched with a vehicle-mounted environment in the data source;

responding to the active learning process, and triggering an active exploration process to realize boundary corpus expansion processing on the sentence sample with noise matched with the vehicle-mounted environment;

labeling the noisy statement samples matched with the vehicle-mounted environment and subjected to boundary corpus expansion processing to form a first training sample set;

denoising the first training sample set to form a corresponding second training sample set;

processing the second training sample set through a semantic understanding model to determine initial parameters of the semantic understanding model;

responding to initial parameters of the semantic understanding model, processing the second training sample set through the semantic understanding model, and determining updated parameters of the semantic understanding model;

and according to the updating parameters of the semantic understanding model, iteratively updating the semantic representation layer network parameters and the task-related output layer network parameters of the semantic understanding model through the second training sample set.

In the foregoing solution, the performing denoising processing on the first training sample set to form a corresponding second training sample set includes:

determining a dynamic noise threshold value matched with the use environment of the semantic understanding model;

and denoising the first training sample set according to the dynamic noise threshold value to form a second training sample set matched with the dynamic noise threshold value.

The invention also provides a semantic understanding method of the semantic understanding model, which comprises the following steps:

acquiring voice instruction information, and converting the voice instruction into corresponding recognizable text information;

determining at least one word-level hidden variable corresponding to the recognizable text information through a semantic representation layer network of the semantic understanding model;

determining, by a domain-independent detector network of the semantic understanding model, an object that matches the word-level hidden variables from the at least one word-level hidden variable;

a domain classification network through the semantic understanding model; determining a task field corresponding to the hidden variable of the word level according to the hidden variable of the at least one word level;

and triggering a corresponding business process according to the object matched with the hidden variable of the word level and the task field corresponding to the hidden variable of the word level so as to complete the task corresponding to the voice instruction information.

The embodiment of the invention also provides a training device of the semantic understanding model, which comprises:

the semantic understanding model training module is used for recalling training samples matched with the vehicle-mounted environment in the data source;

the semantic understanding model training module is used for triggering a corresponding active learning process according to the recall processing result so as to obtain a statement sample with noise in the data source;

the semantic understanding model training module is used for responding to the active learning process and triggering an active exploration process so as to realize boundary corpus expansion processing on the sentence sample with noise matched with the vehicle-mounted environment;

the semantic understanding model training module is used for marking the noisy statement samples which are subjected to boundary corpus expansion processing and matched with a vehicle-mounted environment to form a first training sample set;

the denoising module is used for denoising the first training sample set to form a corresponding second training sample set;

the semantic understanding model training module is used for processing the second training sample set through a semantic understanding model so as to determine initial parameters of the semantic understanding model;

the semantic understanding model training module is used for responding to the initial parameters of the semantic understanding model, processing the second training sample set through the semantic understanding model and determining the updating parameters of the semantic understanding model;

and the semantic understanding model training module is used for performing iterative updating on semantic representation layer network parameters and task-related output layer network parameters of the semantic understanding model through the second training sample set according to the updating parameters of the semantic understanding model.

In the above-mentioned scheme, the first step of the method,

the semantic understanding model training module is used for responding to the active learning process and triggering a text similar clustering network in an active exploration process so as to determine a text clustering center of the sentence sample with the noise, which is matched with the vehicle-mounted environment;

the semantic understanding model training module is used for retrieving the data source according to the text clustering center of the noisy sentence sample matched with the vehicle-mounted environment so as to realize text augmentation of the noisy sentence sample matched with the vehicle-mounted environment;

and the semantic understanding model training module is used for triggering a corresponding manifold learning process to perform dimension reduction processing on the text augmentation result according to the text augmentation result of the noisy sentence sample matched with the vehicle-mounted environment so as to realize boundary corpus augmentation on the noisy sentence sample matched with the vehicle-mounted environment.

In the above-mentioned scheme, the first step of the method,

the denoising module is used for determining a dynamic noise threshold value matched with the using environment of the semantic understanding model;

and the denoising module is used for denoising the first training sample set according to the dynamic noise threshold value to form a second training sample set matched with the dynamic noise threshold value.

In the above-mentioned scheme, the first step of the method,

the denoising module is used for determining a fixed noise threshold corresponding to the semantic understanding model;

and the denoising module is used for denoising the first training sample set according to the fixed noise threshold value to form a second training sample set matched with the fixed noise threshold value.

In the above-mentioned scheme, the first step of the method,

the semantic understanding model training module is used for substituting different statement samples in the second training sample set into a loss function corresponding to a task-related output layer network consisting of a domain-independent detector network and a domain classification network of the semantic understanding model;

and the semantic understanding model training module is used for determining that the network parameters of the corresponding domain-independent detector and the domain classification network parameters in the semantic understanding model are used as the update parameters of the semantic understanding model when the loss function meets the corresponding convergence condition.

In the above-mentioned scheme, the first step of the method,

the semantic understanding model training module is used for determining a second noise parameter matched with the second training sample set according to the updated parameter of the semantic understanding model, and the second noise parameter is used for representing the noise value of the parallel statement sample in the second training sample set;

the semantic understanding model training module is used for training the semantic understanding model when the second noise parameter reaches a corresponding noise value threshold,

and the semantic understanding model training module is used for carrying out iterative updating on semantic representation layer network parameters and task-related output layer network parameters of the semantic understanding model according to the noise value of the second noise parameter until a loss function corresponding to a task-related output layer network formed by a field-independent detector network of the semantic understanding model and the field classification network meets a corresponding convergence condition.

In the above-mentioned scheme, the first step of the method,

the semantic understanding model training module is used for responding to a loss function corresponding to a task-dependent output layer network consisting of a domain-independent detector network and a domain classification network of the semantic understanding model,

and the semantic understanding model training module is used for adjusting parameters of a semantic representation layer network of the semantic understanding model so as to realize that the parameters of the semantic representation layer network are matched with the loss function corresponding to the task-related output layer network.

In the above-mentioned scheme, the first step of the method,

the semantic understanding model training module is used for carrying out negative example processing on the second training sample set to form a negative example sample set corresponding to the second training sample set, wherein the negative example sample set is used for adjusting the domain-independent detector network parameters and the domain classification network parameters of the semantic understanding model;

the semantic understanding model training module is used for determining corresponding bilingual evaluation research values according to the negative sample set, wherein the bilingual evaluation research values are used as supervision parameters to evaluate semantic understanding results of the semantic understanding model.

In the above-mentioned scheme, the first step of the method,

the semantic understanding model training module is used for randomly combining sentences to be output in a domain classification network of the semantic understanding model to form a negative example sample set corresponding to the first training sample set;

the semantic understanding model training module is used for carrying out random deletion processing or replacement processing on the sentences to be output in the domain classification network of the semantic understanding model to form a negative example sample set corresponding to the first training sample set.

In the above-mentioned scheme, the first step of the method,

and the semantic understanding model training module is used for labeling the sentence samples with noise acquired in the active learning process to form the first training sample set.

In the above-mentioned scheme, the first step of the method,

the semantic understanding model training module is used for determining the sample type of the sentence sample with the noise;

the semantic understanding model training module is used for sequencing negative examples in the sample types of the statement samples,

and the semantic understanding model training module is used for configuring corresponding weights for the negative examples according to the sequencing result of the negative examples to form a first training sample set comprising training samples with different weights.

The embodiment of the invention also provides a semantic understanding model processing device, which comprises:

the text conversion module is used for acquiring voice instruction information and converting the voice instruction into corresponding recognizable text information;

the semantic representation layer network module is used for determining at least one word-level hidden variable corresponding to the recognizable text information through a semantic representation layer network of the semantic understanding model;

a domain-independent detector network module, configured to determine, according to the at least one word-level hidden variable, an object that matches the word-level hidden variable, through a domain-independent detector network of the semantic understanding model;

the domain classification network module is used for classifying the network through the domain of the semantic understanding model; determining a task field corresponding to the hidden variable of the word level according to the hidden variable of the at least one word level;

the information processing module is used for triggering a corresponding business process according to the object matched with the hidden variable of the word level and the task field corresponding to the hidden variable of the word level so as to complete the task corresponding to the voice instruction information,

a memory for storing executable instructions;

and the processor is used for realizing the training method of the semantic understanding model of the preamble when the executable instructions stored in the memory are run.

An embodiment of the present invention further provides a fused image processing apparatus, where the image processing apparatus includes:

a memory for storing executable instructions;

and the processor is used for realizing the semantic understanding method of the semantic understanding model of the preamble when the executable instructions stored in the memory are run.

An embodiment of the present invention further provides a computer-readable storage medium, which stores executable instructions, and is characterized in that the executable instructions, when executed by a processor, implement a training method for a semantic understanding model of a preamble, or implement a semantic understanding method of a preamble.

The embodiment of the invention has the following beneficial effects:

the training samples matched with the vehicle-mounted environment in the data source are recalled; triggering a corresponding active learning process according to the recall processing result so as to obtain a noisy statement sample matched with a vehicle-mounted environment in the data source; responding to the active learning process, and triggering an active exploration process to realize boundary corpus expansion processing on the sentence sample with noise matched with the vehicle-mounted environment; labeling the noisy statement samples matched with the vehicle-mounted environment and subjected to boundary corpus expansion processing to form a first training sample set; denoising the first training sample set to form a corresponding second training sample set; processing the second training sample set through a semantic understanding model to determine initial parameters of the semantic understanding model; responding to initial parameters of the semantic understanding model, processing the second training sample set through the semantic understanding model, and determining updated parameters of the semantic understanding model; according to the updating parameters of the semantic understanding model, iterative updating is carried out on semantic expression layer network parameters and task-related output layer network parameters of the semantic understanding model through the second training sample set, so that the generalization capability of the semantic understanding model is stronger, the training precision and the training speed of the semantic understanding model are improved, meanwhile, the gain of the existing noise sentences matched with the vehicle-mounted environment on model training can be effectively and fully utilized, more training samples matched with the vehicle-mounted environment are obtained, the semantic understanding model can achieve better pertinence to the full-duplex voice scene of the vehicle-mounted environment, the accuracy of the semantic understanding model on information processing in the vehicle-mounted environment is improved, and the influence of environmental noise on the semantic understanding model is avoided.

Drawings

Fig. 1 is a schematic view of a usage scenario of a semantic understanding model training method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a semantic understanding model training apparatus according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating semantic understanding results generated by a Seq2Seq model based on RNN in the prior art;

FIG. 4 is an alternative flow chart of the semantic understanding model training method according to the embodiment of the present invention;

FIG. 5 is an alternative structural diagram of a semantic representation layer network model in an embodiment of the invention;

FIG. 6 is a diagram of an alternative word-level machine-readable representation of a semantic representation layer network model in an embodiment of the invention;

FIG. 7 is a diagram of an alternative structure of an encoder in a semantic representation layer network model according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of vector splicing of encoders in a semantic representation layer network model according to an embodiment of the present invention;

FIG. 9 is a diagram illustrating an encoding process of an encoder in a semantic representation layer network model according to an embodiment of the present invention;

FIG. 10 is a diagram illustrating a decoding process of a decoder in a semantic representation layer network model according to an embodiment of the present invention;

FIG. 11 is a diagram illustrating a decoding process of a decoder in a semantic representation layer network model according to an embodiment of the present invention;

FIG. 12 is a diagram illustrating a decoding process of a decoder in a semantic representation layer network model according to an embodiment of the present invention;

FIG. 13 is an alternative statement level machine reading diagram of a semantic representation layer network model in accordance with embodiments of the present invention;

FIG. 14 is an alternative flow chart of the semantic understanding model training method according to the embodiment of the present invention;

FIG. 15 is an alternative flow chart of a semantic understanding model training method according to an embodiment of the present invention;

FIG. 16A is an alternative flow chart of a semantic understanding model training method according to an embodiment of the present invention;

FIG. 16B is a schematic diagram illustrating an optional boundary corpus expansion of the semantic understanding model training method according to the embodiment of the present invention;

fig. 17 is a schematic structural diagram of a semantic understanding model processing apparatus according to an embodiment of the present invention;

FIG. 18 is a schematic flow chart illustrating an alternative semantic understanding method of the semantic understanding model according to an embodiment of the present invention;

FIG. 19 is a schematic view of a usage scenario of a semantic understanding model training method according to an embodiment of the present invention;

FIG. 20 is a schematic view of a usage scenario of a semantic understanding model training method according to an embodiment of the present invention;

FIG. 21 is a schematic view of an alternative processing flow of the semantic understanding model training method provided by the present invention;

fig. 22 is a schematic diagram of an active learning process in a processing process of the semantic understanding model training method according to the embodiment of the present invention;

FIG. 23 is a diagram illustrating an alternative model structure of the semantic understanding model according to an embodiment of the present invention;

FIG. 24 is a schematic diagram of a wake application using a semantic understanding model encapsulated in an in-vehicle system;

FIG. 25 is a schematic diagram of a semantic understanding model for weather lookup encapsulated in an on-board system.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.

1) Machine reading understands that: automatic question-answering technology taking text questions and related documents as input and taking text answers as output

2) BERT: the method is called Bidirective Encoder recurrents from transformations, and is a language model training method utilizing massive texts. The method is widely applied to various natural language processing tasks such as text classification, text matching, machine reading understanding and the like.

3) Artificial neural networks: neural Network (NN) is a mathematical model or a computational model for simulating the structure and the function of a biological Neural Network and is used for estimating or approximating functions in the field of machine learning and cognitive science.

4) Model parameters: is a number of functions that use generic variables to establish relationships between functions and variables. In artificial neural networks, the model parameters are typically real matrices.

5) API: the full-name Application Programming Interface can be understood as an Application program Interface semantically, and is some predefined functions or appointments for connection of different components of a software system. The goal is to provide applications and developers the ability to access a set of routines based on certain software or hardware without having to access native code or understand the details of the internal workings.

6) And (3) SDK: the full name Software Development Kit can be semantically understood as a Software Development Kit, and is a collection of Development tools when application Software is established for a specific Software package, a Software framework, a hardware platform, an operating system and the like, which broadly comprises a collection of related documents, paradigms and tools for assisting in developing a certain type of Software.

7) Generating a countermeasure Network (GAN): one method of unsupervised learning is to learn by letting two neural networks game with each other, and generally consists of a generation network and a discrimination network. The generation network takes random samples from the latent space (latency) as input, and its output needs to mimic the real samples in the training set as much as possible. The input of the discrimination network is the real sample or the output of the generation network, and the purpose is to distinguish the output of the generation network from the real sample as much as possible. The generation network should cheat the discrimination network as much as possible. The two networks resist each other and continuously adjust parameters, and the final purpose is to make the judgment network unable to judge whether the output result of the generated network is real or not.

8) Full duplex: under the human-computer interaction scene, repeated awakening is not needed, and the intelligent assistant has the interaction capacity of listening while thinking and interrupting at any time based on the streaming voice and semantic technology.

9) Understanding natural language: nlu (natural Language understanding), in a dialog system, semantic information extraction is performed on the words spoken by the user, including domain intention recognition and slot filling (slot filling).

10) Multi-task learning: in the field of machine Learning, a plurality of related tasks are simultaneously subjected to Joint Learning and optimization to achieve model accuracy better than that of a single task, the tasks are mutually assisted by sharing a presentation layer, and the training method is called Multi-task Learning (Joint Learning).

11) Active learning: active Learning, in supervised Learning, a machine Learning model learns the mapping relationship between data and a prediction result by fitting training data, Active Learning selects sample data with the maximum information amount for the model by designing a data sampling method to label, and compared with a random sampling method, the labeled data is added into the sample again to train, so that the model has the maximum benefit.

12) OOD: out of Domain, for task-oriented dialog systems, a number of vertical domains (domains) are usually predefined: weather, navigation, music and the like are searched to meet the task requirements of the user. The query of the user not falling into any task field is an OOD (OOD) corpus, such as chat, knowledge question and answer, and semantic understanding error, and the like, and the query is an In domain (IND) corpus, i.e. a corpus belonging to any predefined field.

13) FAR: false Acceptance Rate, which is the ratio of OOD corpora in any field to all OOD corpora that are mistakenly identified. The index reflects the error recognition rate of the intelligent assistant, and the lower the index is, the better the index is. In a full-duplex scenario, there is a severe limitation on this index, which must be at a very low level.

14) FRR: the False Rejection Rate is a ratio of the number of corpora not recalled by any one domain to the number of all the IND corpora. The lower the index, the better the index, the rejection rate of the intelligent assistant is reflected.

15) Speech semantic understanding (Speech Translation): the technology is also called automatic speech semantic understanding, is a technology for understanding speech semantics of one natural language into text or speech of another natural language through a computer, and generally comprises two stages of semantic understanding and machine semantic understanding.

Fig. 1 is a schematic view of a usage scenario of a semantic understanding model training method provided in an embodiment of the present invention, referring to fig. 1, a terminal (including a terminal 10-1 and a terminal 10-2) is provided with a client of semantic understanding software, a user can input a corresponding sentence to be semantically understood through the set client of the semantic understanding software, and a chat client can also receive a corresponding semantic understanding result and display the received semantic understanding result to the user; the terminal is connected to the server 200 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two, and uses a wireless link to realize data transmission.

As an example, the server 200 is configured to lay out the semantic understanding model and train the semantic understanding model, so as to iteratively update semantic representation layer network parameters and task-related output layer network parameters of the semantic understanding model, so as to generate a semantic understanding result for a target sentence to be semantically understood through a semantic representation layer network and task-related output layer network in the semantic understanding model, and display the semantic understanding result corresponding to the sentence to be semantically understood, which is generated by the semantic understanding model, through a terminal (terminal 10-1 and/or terminal 10-2).

Certainly, before the target sentence to be semantically understood is processed by the semantic understanding model to generate a corresponding semantic understanding result, the semantic understanding model needs to be trained, which specifically includes: the training samples matched with the vehicle-mounted environment in the data source are recalled;

The following describes the structure of the training apparatus for semantic understanding model according to the embodiment of the present invention in detail, and the training apparatus for semantic understanding model can be implemented in various forms, such as a dedicated terminal with a training function for semantic understanding model, or a server with a training function for semantic understanding model, for example, the server 200 in fig. 1. Fig. 2 is a schematic structural diagram of a component of a training apparatus of a semantic understanding model according to an embodiment of the present invention, and it can be understood that fig. 2 only shows an exemplary structure of the training apparatus of the semantic understanding model, and not a whole structure, and a part of the structure or the whole structure shown in fig. 2 may be implemented as needed.

The training device of the semantic understanding model provided by the embodiment of the invention comprises: at least one processor 201, memory 202, user interface 203, and at least one network interface 204. The various components in the training apparatus 20 of the semantic understanding model are coupled together by a bus system 205. It will be appreciated that the bus system 205 is used to enable communications among the components. The bus system 205 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 205 in fig. 2.

The user interface 203 may include, among other things, a display, a keyboard, a mouse, a trackball, a click wheel, a key, a button, a touch pad, or a touch screen.

It will be appreciated that the memory 202 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. The memory 202 in embodiments of the present invention is capable of storing data to support operation of the terminal (e.g., 10-1). Examples of such data include: any computer program, such as an operating system and application programs, for operating on a terminal (e.g., 10-1). The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application program may include various application programs.

In some embodiments, the training apparatus for semantic understanding models provided in the embodiments of the present invention may be implemented by a combination of hardware and software, and as an example, the training apparatus for semantic understanding models provided in the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the training method for semantic understanding models provided in the embodiments of the present invention. For example, a processor in the form of a hardware decoding processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

As an example of the training device of the semantic understanding model provided by the embodiment of the present invention implemented by combining software and hardware, the training device of the semantic understanding model provided by the embodiment of the present invention may be directly embodied as a combination of software modules executed by the processor 201, the software modules may be located in a storage medium, the storage medium is located in the memory 202, the processor 201 reads executable instructions included in the software modules in the memory 202, and the training device of the semantic understanding model provided by the embodiment of the present invention completes the training method of the semantic understanding model provided by the embodiment of the present invention in combination with necessary hardware (for example, including the processor 201 and other components connected to the bus 205).

By way of example, the Processor 201 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor or the like.

As an example of the training apparatus for semantic understanding models provided by the embodiments of the present invention implemented by hardware, the apparatus provided by the embodiments of the present invention may be implemented by directly using a processor 201 in the form of a hardware decoding processor, for example, by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components, to implement the training method for semantic understanding models provided by the embodiments of the present invention.

The memory 202 in the embodiment of the present invention is used to store various types of data to support the operation of the training apparatus 20 of the semantic understanding model. Examples of such data include: any executable instructions for operating on the training device 20 of the semantic understanding model, such as executable instructions, may be included in the executable instructions, and the program implementing the training method from the semantic understanding model according to the embodiment of the present invention may be included in the executable instructions.

In other embodiments, the training apparatus for semantic understanding models provided by the embodiments of the present invention may be implemented in software, and fig. 2 illustrates the training apparatus for semantic understanding models stored in the memory 202, which may be software in the form of programs and plug-ins, and includes a series of modules, and as an example of the programs stored in the memory 202, the training apparatus for semantic understanding models may include a training apparatus for semantic understanding models, and the training apparatus for semantic understanding models includes the following software modules, namely a semantic understanding model training module 2081 and a denoising module 2082. When the software modules in the training apparatus for semantic understanding models are read into the RAM by the processor 201 and executed, the training method for semantic understanding models provided by the embodiment of the present invention will be implemented, and the functions of the software modules in the training apparatus for semantic understanding models in the embodiment of the present invention will be described below, wherein,

the semantic understanding model training module 2081 is used for recalling training samples matched with the vehicle-mounted environment in the data source;

the semantic understanding model training module 2081 is configured to trigger a corresponding active learning process according to the result of the recall processing, so as to obtain a statement sample with noise in the data source;

the semantic understanding model training module 2081, configured to respond to the active learning process and trigger an active exploration process, so as to implement boundary corpus expansion processing on the noisy statement sample matched with the vehicle-mounted environment;

the semantic understanding model training module 2081 is used for labeling the noisy statement samples matched with the vehicle-mounted environment and subjected to the boundary corpus expansion processing to form a first training sample set;

a denoising module 2082, configured to perform denoising processing on the first training sample set to form a corresponding second training sample set;

the semantic understanding model training module 2081, configured to process the second training sample set through a semantic understanding model to determine initial parameters of the semantic understanding model;

the semantic understanding model training module 2081, configured to respond to the initial parameter of the semantic understanding model, process the second training sample set through the semantic understanding model, and determine an updated parameter of the semantic understanding model;

the semantic understanding model training module 2081 is configured to iteratively update semantic representation layer network parameters and task-related output layer network parameters of the semantic understanding model through the second training sample set according to the update parameters of the semantic understanding model.

Before describing the training method of the semantic understanding model provided by the embodiment of the present invention, first, a process of generating a corresponding semantic understanding result according to a statement to be semantically understood by the semantic understanding model in the present application is described, and fig. 3 is a schematic diagram of generating a semantic understanding result in a conventional scheme, where an eq2seq model is an architectural manner represented by an encoder (Encode) and a decoder (Decode), and the seq2seq model generates an output sequence Y according to an input sequence X. In the seq2seq model represented by an encoder (Encode) which converts an input sequence into a vector of fixed length, and a decoder (Decode) which decodes the input vector of fixed length into an output sequence. As shown in fig. 3, an Encoder (Encoder) encodes an input sentence to be semantically understood to obtain a text feature of the sentence to be semantically understood; and a Decoder (Decoder) decodes the text features and outputs the decoded text features to generate corresponding semantic understanding results, wherein the encoder (Encode) and the Decoder (Decode) are in one-to-one correspondence.

It can be seen that the semantic understanding model based on the Seq2Seq model for the related art shown in fig. 3 has the disadvantages that the model itself in the related art only establishes a one-to-one relationship for the training data target text y-label information, and the model is optimized by using MLE, which results in that the model generates many high-frequency general replies which are often meaningless and short. Meanwhile, in many practical scenes, the same target text y can have a plurality of kinds of labeled information, and the existing Seq2Seq model cannot effectively deal with the one-to-many problem because an encoder (Encode) and a decoder (Decode) are in one-to-one correspondence, and is easily interfered by noise information, triggers useless identification and has poor user experience.

To solve the defects in the related art, referring to fig. 4, fig. 4 is an optional flowchart of the semantic understanding model training method provided in the embodiment of the present invention, and it can be understood that the steps shown in fig. 4 may be executed by various electronic devices operating the semantic understanding model training apparatus, such as a dedicated terminal with a sample generating function, a server with a semantic understanding model training function, or a server cluster. The following is a description of the steps shown in fig. 4.

Step 401: the semantic understanding model training device acquires a first training sample set, wherein the first training sample set is a sentence sample with noise acquired by an active learning process.

In some embodiments of the present invention, the first training sample set may be language samples of the same language, or may also be language samples of different languages, which is not limited to this. The language of the first training sample set can be set according to actual translation requirements. For example, when the translation model is applied to an application scenario of translating chinese and english, the language of the first training sample set may be chinese, and for example, when the translation model is applied to an application scenario of translating english and english, the language of the first training sample set may be english, and for example, when the translation model is applied to an application scenario of translating chinese and french, the language of the first training sample set may include chinese and/or french.

In some embodiments of the present invention, the first training sample set may be in a form of speech, or may also be in a form of text, and the first training sample set in a form of text and/or the first training sample set in a form of speech may be collected in advance, for example, the first training sample set in a form of text and/or the first training sample set in a form of speech may be collected in a normal sentence collection manner, and the collected first training sample set in a form of text and/or the first training sample set in a form of speech may be stored in a preset storage device. Therefore, in the present application, when the translation model is trained, the first training sample set may be obtained from the storage device.

Step 402: and denoising the first training sample set to form a corresponding second training sample set.

In some embodiments of the present invention, the denoising processing on the first training sample set to form the corresponding second training sample set may be implemented by:

determining a dynamic noise threshold value matched with the use environment of the semantic understanding model; and denoising the first training sample set according to the dynamic noise threshold value to form a second training sample set matched with the dynamic noise threshold value. Wherein the dynamic noise threshold value matched with the use environment of the translation model is different due to different use environments of the translation model, for example, in the use environment of academic translation, the dynamic noise threshold value matched with the use environment of the translation model needs to be smaller than that in the article reading environment.

determining a fixed noise threshold corresponding to the semantic understanding model; and denoising the first training sample set according to the fixed noise threshold value to form a second training sample set matched with the fixed noise threshold value. When the translation model is solidified in a corresponding hardware mechanism, such as a vehicle-mounted terminal, and the using environment is spoken translation, the training speed of the translation model can be effectively refreshed and the waiting time of a user is reduced by fixing a fixed noise threshold corresponding to the translation model due to the fact that the noise is single.

Step 403: the semantic understanding model training device processes the second training sample set through a semantic understanding model to determine initial parameters of the semantic understanding model.

Step 404: and the semantic understanding model training device responds to the initial parameters of the semantic understanding model, processes the second training sample set through the semantic understanding model and determines the updating parameters of the semantic understanding model.

In some embodiments of the present invention, the determining the updated parameters of the semantic understanding model by processing the second training sample set through the semantic understanding model in response to the initial parameters of the semantic understanding model may be implemented by:

substituting different statement samples in the second training sample set into a loss function corresponding to a task-related output layer network consisting of a domain-independent detector network and a domain classification network of the semantic understanding model; and determining that the loss function corresponds to the network parameters of the domain-independent detector and the domain classification network parameters in the semantic understanding model when the loss function meets the corresponding convergence condition as the updating parameters of the semantic understanding model. The semantic understanding model can be composed of: the semantic representation layer network and the task related output layer network further comprise a field-independent detector network and a field classification network.

In some embodiments of the invention, the semantic representation layer network may be a Bidirectional attention neural network model (BERT Bidirectional Encoder responses from Transformers). With continuing reference to fig. 5, fig. 5 is an optional schematic structural diagram of a semantic representation layer network model in an embodiment of the present invention, where the Encoder includes: n ═ 6 identical layers, each layer containing two sub-layers. The first sub-layer is a multi-head attention layer (multi-head attention layer) and then a simple fully connected layer. Each sub-layer is added with residual connection (residual connection) and normalization (normalization).

The Decoder includes: the Layer consists of N ═ 6 identical layers, wherein the layers and the encoder are not identical, and the layers comprise three sub-layers, wherein one self-orientation Layer is arranged, and the encoder-decoding Layer is finally a full connection Layer. Both the first two sub-layers are based on multi-head attentional layers.

With continued reference to fig. 6, fig. 6 is an alternative word-level machine-readable representation of a semantic representation layer network model according to an embodiment of the present invention, in which both the encoder and decoder portions include 6 encoders and encoders. Inputs into the first encoder combine embedding and positional embedding. After passing 6 encoders, outputting to each decoder of the decoder part; the input target is 'i is a student t', the semantic representation layer network model is processed, and the output machine reading representation result is as follows: "students".

With continuing reference to fig. 7, fig. 7 is an alternative structural diagram of an encoder in a semantic representation layer network model according to an embodiment of the present invention, where its input consists of a query (Q) and a key (K) with dimension d and a value (V) with dimension d, all keys calculate a dot product of the query, and apply a softmax function to obtain a weight of the value.

With continued reference to FIG. 7, FIG. 7 is a diagram of a vector for a semantic representation encoder in a layer network model in which Q, K, and V are obtained by multiplying the vector x of the input encoder by W ^ Q, W ^ K, W ^ V. W ^ Q, W ^ K, W ^ V are (512, 64) in the dimension of the article, then suppose the dimension of our inputs is (m, 512), where m represents the number of words. The dimension of Q, K and V obtained after multiplying the input vector by W ^ Q, W ^ K, W ^ V is (m, 64).

With continued reference to fig. 8, fig. 8 is a schematic diagram of vector concatenation of an encoder in a semantic representation layer network model according to an embodiment of the present invention, where Z0 to Z7 are corresponding 8 parallel heads (dimension is (m, 64)), and then concat obtains (m, 512) dimension after the 8 heads. After the final multiplication with W ^ O, the output matrix with the dimension (m, 512) is obtained, and the dimension of the matrix is consistent with the dimension of entering the next encoder.

With continued reference to fig. 9, fig. 9 is a schematic diagram illustrating an encoding process of an encoder in a semantic representation layer network model according to an embodiment of the present invention, where x1 goes through self-orientation to a state z1, the tensor that passed through self-orientation needs to go through a residual error network and a larternorm process, and then goes into a fully connected feed-forward network, and the feed-forward network needs to perform the same operation, and perform residual error processing and normalization. The tensor which is finally output can enter the next encoder, then the iteration is carried out for 6 times, and the result of the iteration processing enters the decoder.

With continuing reference to fig. 10, fig. 10 is a schematic diagram illustrating a decoding process of a decoder in a semantic representation layer network model according to an embodiment of the present invention, wherein the decoder inputs and outputs and decodes:

and (3) outputting: probability distribution of output words corresponding to the i position;

inputting: output of encoder & output of corresponding i-1 position decoder. So the middle atttion is not self-atttion, its K, V comes from encoder and Q comes from the output of the decoder at the last position.

With continuing reference to fig. 11 and 12, fig. 11 is a diagram illustrating a decoding process of a decoder in a semantic representation layer network model according to an embodiment of the present invention. The vector output by the last decoder of the decoder network will go through the Linear layer and the softmax layer. Fig. 12 is a schematic diagram of a decoding process of a decoder in a semantic representation layer network model in an embodiment of the present invention, where the Linear layer is used to map a vector from the decoder portion into a logits vector, and then the softmax layer converts the logits vector into a probability value according to the logits vector, and finally finds a position of a maximum probability value, so as to complete output of the decoder.

In some embodiments of the invention, the first reading semantic annotation network may be a Bidirectional attention neural network model (BERT Bidirectional Encoder responses from Transformers). With continuing reference to fig. 5, fig. 5 is an optional schematic structural diagram of a semantic representation layer network model in an embodiment of the present invention, where the Encoder includes: n ═ 6 identical layers, each layer containing two sub-layers. The first sub-layer is a multi-head attention layer (multi-head attention layer) and then a simple fully connected layer. Each sub-layer is added with residual connection (residual connection) and normalization (normalization).

With continuing reference to fig. 13, fig. 13 is an alternative sentence-level machine-readable representation of a layer network model according to an embodiment of the present invention, in which the encoder and decoder portions each include 6 encoders and decoders. Inputs into the first encoder combine embedding and positional embedding. After passing 6 encoders, outputting to each decoder of the decoder part; the input target is English "I am a student" and the output machine reading result is that after the processing of the semantic representation layer network model: "I am a student".

Step 405: and the semantic understanding model training device iteratively updates the semantic representation layer network parameters and the task-related output layer network parameters of the semantic understanding model through the second training sample set according to the updating parameters of the semantic understanding model.

Of course, the BERT model in the present invention is also replaced by a forward neural network model (Bi-LSTM Bi-directional long Short-Term Memory), a Gated round robin Unit network model (GRU Gated current Unit) model, an ELMo embedding from language model, a GPT model, and a GPT2 model, which are not described in detail herein.

With continuing reference to fig. 14, fig. 14 is an optional schematic flow chart of the semantic understanding model training method according to the embodiment of the present invention, and it can be understood that the steps shown in fig. 14 can be executed by various electronic devices operating the semantic understanding model training apparatus, for example, a dedicated terminal with a semantic understanding model training function, a server with a semantic understanding model training function, or a server cluster. The following is a description of the steps shown in fig. 14.

Step 1401: and the semantic understanding model training device determines a second noise parameter matched with the second training sample set through the updated parameter of the semantic understanding model.

Wherein the second noise parameter is used for characterizing the noise value of the parallel statement sample in the second training sample set; wherein the weights of each training sample in the second training sample set are the same, and these training samples with the same weights may be referred to as parallel statement samples.

Step 1402: and when the second noise parameter reaches the corresponding noise value threshold value, the semantic understanding model training device iteratively updates the semantic representation layer network parameters and the task-related output layer network parameters of the semantic understanding model according to the noise value of the second noise parameter until the loss function corresponding to the field-independent detector network of the semantic understanding model and the task-related output layer network formed by the field classification network meets the corresponding convergence condition.

Step 1403: and the semantic understanding model training device responds to a loss function corresponding to a task-related output layer network formed by a domain-independent detector network and a domain classification network of the semantic understanding model.

Step 1404: and the semantic understanding model training device adjusts parameters of a semantic representation layer network of the semantic understanding model.

Therefore, the parameters of the semantic expression layer network are adapted to the loss function corresponding to the task-related output layer network.

Wherein, the loss function of the encoder network is expressed as:

loss _ a ═ Σ (decoder _ a (encoder (warp (x1))) -x1) 2; wherein decoder _ A is decoder A, warp is function of statement to be identified, x₁The encoder is used for the statement to be identified.

In the iterative training process, the sentence to be recognized is substituted into the loss function of the encoder network, parameters of the encoder A and the decoder A when the loss function is reduced according to the gradient (such as the maximum gradient) are solved, and when the loss function is converged (namely when the hidden variable capable of forming the word level corresponding to the sentence to be recognized is determined), the training is finished.

In the training process of the encoder network, the loss function of the encoder network is represented as: loss _ B ═ Σ (decoder _ B (encoder (warp (x2))) -x2) 2; wherein decoder _ B is a decoder B, warp is a function of a statement to be identified, x2 is the statement to be identified, and encoder is an encoder.

In the iterative training process, parameters of an encoder B and a decoder B when a loss function is reduced according to a gradient (such as a maximum gradient) are solved by substituting a statement to be identified into the loss function of the encoder network; when the loss function converges (i.e. when the decoding results in the selected probability of the translation result corresponding to the sentence to be recognized), the adaptation and training is ended.

With continuing reference to fig. 15, fig. 15 is an optional schematic flow chart of the semantic understanding model training method according to the embodiment of the present invention, and it can be understood that the steps shown in fig. 15 may be executed by various electronic devices operating the semantic understanding model training apparatus, for example, a dedicated terminal with a semantic understanding model training function, a server with a semantic understanding model training function, or a server cluster. The following is a description of the steps shown in fig. 15.

Step 1501: the semantic understanding model training device carries out negative example processing on the second training sample set to form a negative example sample set corresponding to the second training sample set.

Wherein the negative example sample set is used to adjust domain-independent detector network parameters and domain classification network parameter adjustments of the semantic understanding model.

In some embodiments of the present invention, the negative example processing on the first training sample set may be implemented by:

randomly combining statements to be output in a domain classification network of the semantic understanding model to form a negative example sample set corresponding to the first training sample set; or,

and carrying out random deletion processing or replacement processing on the sentences to be output in the domain classification network of the semantic understanding model to form a negative example sample set corresponding to the first training sample set.

Step 1502: and the semantic understanding model training device determines a corresponding bilingual evaluation research value according to the negative sample set. When the usage scenario of full-duplex voice interaction applied by the semantic understanding model is a non-chinese usage scenario (which may be a usage scenario of a single english language or other languages, or a usage scenario including at least two language sound sources), the corresponding bilingual assessment research value determined according to the negative example sample set may be used as a supervision parameter to evaluate the semantic understanding result of the semantic understanding model.

In some embodiments of the present invention, the encoder and the corresponding decoder corresponding to the semantic representation layer network may be bidirectional network models, for example, Bi-GRU bidirectional GRU models may be used as the corresponding encoder and the corresponding decoder, where the Bi-GRU bidirectional GRU model is a model that can identify a structure of an inverted sentence. When a user inputs a dialogue sentence, the dialogue sentence may be in an inverted sentence structure, that is, the dialogue sentence is different from a normal sentence structure, for example, the dialogue sentence input by the user is "how today weather" and the normal sentence structure is "how today weather", and the Bi-directional GRU model is adopted to identify the dialogue sentence in the inverted sentence structure, so that functions of the trained model can be enriched, and the robustness of the target model obtained by final training can be improved.

With reference to fig. 16A in conjunction with the foregoing fig. 3, fig. 16A is an optional flowchart of the semantic understanding model training method provided in the embodiment of the present invention, and it can be understood that the steps shown in fig. 16A may be executed by various electronic devices operating the semantic understanding model training apparatus, for example, a dedicated terminal with a semantic understanding model training function, a server with a semantic understanding model training function, or a server cluster, so as to obtain corresponding training samples, and the following description is made with respect to the steps shown in fig. 16A.

Step 1601: the semantic understanding model training device recalls training samples matched with the vehicle-mounted environment in the data source.

For example, the semantic understanding model provided by the invention can be packaged in vehicle-mounted electronic equipment as a software module, can also be packaged in different smart homes (including but not limited to a sound box, a television, a refrigerator, an air conditioner, a washing machine and a kitchen range), and can also be solidified in hardware equipment of the smart robot, and the semantic understanding model can be trained specifically by using corresponding training samples according to different use scenes of the semantic understanding model.

Step 1602: and triggering a corresponding active learning process by the semantic understanding model training device according to the recall processing result so as to obtain the noisy statement sample matched with the vehicle-mounted environment in the data source.

Step 1603: and the semantic understanding model training device responds to the active learning process and triggers an active exploration process so as to realize the boundary corpus expansion processing of the statement sample with noise matched with the vehicle-mounted environment.

Step 1604: the semantic understanding model training device labels the noisy statement samples which are subjected to boundary corpus expansion processing and matched with the vehicle-mounted environment to form a first training sample set.

In some embodiments of the present invention, the triggering an active exploration process in response to the active learning process to implement boundary corpus expansion processing on the noisy speech sample matching with the vehicle-mounted environment may be implemented by:

responding to the active learning process, triggering a text similarity clustering network in an active exploration process to determine a text clustering center of the sentence sample with the noise matched with the vehicle-mounted environment; retrieving the data source according to the text clustering center of the noisy statement sample matched with the vehicle-mounted environment so as to realize text augmentation of the noisy statement sample matched with the vehicle-mounted environment; and triggering a corresponding manifold learning process to perform dimension reduction processing on the text augmentation result according to the text augmentation result of the noisy sentence sample matched with the vehicle-mounted environment so as to realize boundary corpus expansion on the noisy sentence sample matched with the vehicle-mounted environment. Referring to fig. 16B, fig. 16B is an optional boundary corpus expansion schematic diagram of the semantic understanding model training method provided by the embodiment of the present invention, a text similar clustering network in an active exploration process is used to determine a text clustering center of a noisy sentence sample matched with a vehicle-mounted environment, and the data source is retrieved to obtain a sentence sample associated with the noisy sentence sample matched with the vehicle-mounted environment, so that the number of the noisy sentence samples matched with the vehicle-mounted environment can be effectively increased, but in the process of expanding the training sample sentences, the dimensionality of the training sample is increased, so that the result of text expansion is subjected to dimensionality reduction processing by a manifold learning process, the influence of data dimensionality in a subsequent model training process on the training accuracy of the semantic understanding model can be reduced, and the training difficulty is reduced, reducing the waiting time of the user.

In some embodiments of the present invention, the labeling of the noisy sentence samples obtained in the active learning process to form the first training sample set may be implemented by:

determining a sample type of the noisy statement sample; and sequencing negative examples in the sample types of the statement samples, and configuring corresponding weights for the negative examples according to the sequencing results of the negative examples to form a first training sample set comprising training samples with different weights.

As described in detail below, the semantic understanding model processing apparatus according to the embodiment of the present invention may be implemented in various forms, such as a dedicated terminal capable of running a semantic understanding model, or a server with an answer function, so as to generate a corresponding translation result according to a sentence to be translated (e.g., the server 200 in fig. 1 in the foregoing sequence) received by an application program in the terminal. Fig. 17 is a schematic diagram of a constituent structure of a semantic understanding model processing apparatus according to an embodiment of the present invention, and it should be understood that fig. 17 only shows an exemplary structure of the semantic understanding model processing apparatus, and not a whole structure, and a part of or the whole structure shown in fig. 17 may be implemented as needed.

The semantic understanding model processing device provided by the embodiment of the invention comprises: at least one processor 1301, memory 1302, user interface 1303, and at least one network interface 1304. The various components in the semantic understanding model processing apparatus 130 are coupled together by a bus system 1305. It will be appreciated that the bus system 1305 is used to implement connective communication between these components. The bus system 1305 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as the bus system 1305 in FIG. 17.

The user interface 1303 may include a display, a keyboard, a mouse, a trackball, a click wheel, a key, a button, a touch pad, a touch screen, or the like, among others.

It will be appreciated that the memory 1302 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. The memory 1302 in embodiments of the present invention is capable of storing data to support operation of the terminal (e.g., 10-1). Examples of such data include: any computer program, such as an operating system and application programs, for operating on a terminal (e.g., 10-1). The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application program may include various application programs.

In some embodiments, the semantic understanding model processing apparatus provided in the embodiments of the present invention may be implemented by a combination of hardware and software, and as an example, the semantic understanding model processing apparatus provided in the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the semantic understanding method of the semantic understanding model provided in the embodiments of the present invention. For example, a processor in the form of a hardware decoding processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

As an example of the semantic understanding model processing apparatus provided by the embodiment of the present invention implemented by combining software and hardware, the semantic understanding model processing apparatus provided by the embodiment of the present invention may be directly embodied as a combination of software modules executed by the processor 1301, the software modules may be located in a storage medium, the storage medium is located in the memory 1302, the processor 1301 reads executable instructions included in the software modules in the memory 1302, and the semantic understanding method provided by the embodiment of the present invention is completed in combination with necessary hardware (for example, including the processor 1301 and other components connected to the bus 1305).

By way of example, the Processor 1301 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor or the like.

As an example of the semantic understanding model processing apparatus provided in the embodiment of the present invention implemented by hardware, the apparatus provided in the embodiment of the present invention may be implemented by directly using the processor 1301 in the form of a hardware decoding processor, for example, the semantic understanding method for implementing the semantic understanding model provided in the embodiment of the present invention is implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

The memory 1302 in the embodiment of the present invention is used to store various types of data to support the operation of the semantic understanding model processing apparatus 130. Examples of such data include: any executable instructions for operating on the semantic understanding model processing apparatus 130, such as executable instructions, may be included in the executable instructions, and the program implementing the semantic understanding method from the semantic understanding model according to the embodiment of the present invention may be included in the executable instructions.

In other embodiments, the semantic understanding model processing apparatus provided by the embodiments of the present invention may be implemented in software, and fig. 17 illustrates the semantic understanding model processing apparatus stored in the memory 1302, which may be software in the form of programs, plug-ins, and the like, and includes a series of modules, and as an example of the programs stored in the memory 1302, the semantic understanding model processing apparatus may include the following software modules: a text conversion module 13081, a semantic representation layer network module 13082, a domain independent detector network module 13083, a domain classification network module 13084, and an information processing module 13085. When the software modules in the semantic understanding model processing apparatus are read into the RAM by the processor 1301 and executed, the semantic understanding method of the semantic understanding model provided by the embodiment of the present invention is implemented, and the functions of each software module in the semantic understanding model processing apparatus include:

the text conversion module 13081 is configured to obtain voice instruction information and convert the voice instruction into corresponding recognizable text information;

a semantic representation layer network module 13082, configured to determine, through a semantic representation layer network of the semantic understanding model, at least one word-level hidden variable corresponding to the recognizable text information;

a domain-independent detector network module 13083, configured to determine, according to the at least one word-level hidden variable, an object that matches the word-level hidden variable through a domain-independent detector network of the semantic understanding model;

a domain classification network module 13084 for classifying the network by the domain of the semantic understanding model; determining a task field corresponding to the hidden variable of the word level according to the hidden variable of the at least one word level;

and the information processing module 13085 is configured to trigger a corresponding business process according to the object matched with the hidden variable at the word level and the task field corresponding to the hidden variable at the word level, so as to complete the task corresponding to the voice instruction information.

Referring to fig. 18, fig. 18 is an optional flowchart of the semantic understanding method of the semantic understanding model provided in the embodiment of the present invention, and it is understood that the steps shown in fig. 18 may be executed by various electronic devices operating the semantic understanding model processing apparatus, for example, a dedicated terminal with a sentence processing function to be translated, a server with a sentence processing function to be translated, or a server cluster. The following is a description of the steps shown in fig. 18.

Step 1801: the semantic understanding model processing device acquires voice instruction information and converts the voice instruction into corresponding recognizable text information;

step 1802: the semantic understanding model processing device determines at least one word-level hidden variable corresponding to the recognizable text information through a semantic representation layer network of the semantic understanding model;

step 1803: the semantic understanding model processing device determines an object matched with the hidden variables of the word level according to the hidden variables of the at least one word level through a domain-independent detector network of the semantic understanding model;

step 1804: the semantic understanding model processing device classifies the network through the domain of the semantic understanding model; determining a task field corresponding to the hidden variable of the word level according to the hidden variable of the at least one word level;

step 1805: and triggering a corresponding business process by the semantic understanding model processing device according to the object matched with the hidden variable of the word level and the task field corresponding to the hidden variable of the word level.

Therefore, the task corresponding to the voice instruction information is completed.

The following description is given by taking a vehicle-mounted semantic understanding model as an example, referring to fig. 19 and 20, where fig. 19 is a schematic view of a use scenario of a semantic understanding model training method provided in an embodiment of the present invention, the semantic understanding model training method provided in the present invention may serve as a form of cloud service to serve clients (packaged in a vehicle-mounted terminal or packaged in different mobile electronic devices), and fig. 20 is a schematic view of a use scenario of the semantic understanding model training method provided in an embodiment of the present invention, and a specific use scenario is not specifically limited in the present application, where the method is provided as a cloud service to enterprise clients to help the enterprise clients to train the semantic understanding model according to different device use environments.

With continuing reference to fig. 21, fig. 21 is a schematic diagram of an alternative processing flow of the semantic understanding model training method provided by the present invention, including the following steps:

step 2101: and acquiring voice information and converting the voice information into corresponding text information.

Referring to the natural language understanding module shown in fig. 19, a speech signal of a user is converted into a text signal by the semantic understanding module, structured information such as the field, intention, and parameters of the user is extracted from the text by the natural language understanding module, these semantic elements are transmitted to the dialogue management module for inquiry processing or state management, and finally, the output of the system is broadcasted to the user by speech synthesis.

Step 2102: and triggering an active learning process to acquire a corresponding training sample in response to the text information.

Referring to fig. 22, fig. 22 is a schematic diagram of an active learning process in a processing process of a semantic understanding model training method provided in an embodiment of the present invention, where a negative corpus model (OOD model) and a domain classifier model both need to mine a large number of negative samples, but the manual labeling cost is limited. Therefore, the most valuable samples with the largest information amount and the largest model gain are mined from massive data under the condition of limited annotation manpower. Therefore, a data mining process as shown in fig. 22 can be constructed based on the idea of Active Learning, so that the whole set of data closed-loop mining process based on Active Learning is from data generation, picking to labeling and then model training. The method and the device ensure that the generated samples are the most urgent samples which are needed and help the most, and effectively reduce the labor cost of labeling by screening the samples.

Step 2103: and performing optimization processing on the acquired training samples.

In which, a large amount of OOD corpora, domain negative sample corpora, and domain positive sample corpora are mined and accumulated through step 2102. When a semantic understanding model is trained, positive and negative sample organization is carried out in a One V.S All mode, the positive and negative sample proportion of a domain classifier is determined to be unbalanced, and the positive and negative sample proportion reaches 1 in some optional scenes: 100, in some extreme cases reaching 1: 2000. in the practical use of the semantic understanding model, even though negative samples in some fields are sufficient, the FAR index of the trained model is still high, so that a negative sample distribution optimization strategy can be provided by analyzing bad cases and experiments, and the method specifically comprises the following steps: the negative samples are grouped according to importance (public negative samples, domain negative samples, other related domain positive samples, other unrelated domain positive samples), each group of samples is given different weights, the domain negative samples and the other domain related positive samples are given higher weights, and the other negative samples are given lower weights.

Therefore, the false recognition rate of the model can be effectively reduced by finely adjusting the grouping weight of the negative samples.

Step 2104: training a semantic understanding model through the training samples subjected to optimization processing to determine parameters of the semantic understanding model.

Therefore, the trained semantic understanding model can be used for recognizing and processing the voice instruction in the environment with large noise environment.

Referring to fig. 23, fig. 23 is an optional model structure diagram of the semantic understanding model provided in the embodiment of the present invention, and the OOD model and the domain classification model may be jointly trained on the model network side by using a Multi-task Learning (Multi-task Learning) training mode. As shown in the specific network structure diagram 23, the whole network structure is divided into two layers:

1) the pre-training model based on the BERT is used as a semantic representation layer.

2) The output layers associated with the downstream tasks may both be represented using a fully connected network.

The semantic understanding model training method provided by the invention can carry Out combined training on an OOD detector model and a Domain classification model, wherein the OOD model is a two-classification task and is used for judging whether the corpus is IND or Out of Domain. The domain classifier model is composed of a plurality of two classifiers, and can adopt a data organization mode of One V.S All, and the domain classifier is used for judging which domain (weather, navigation, music, and the like) in the IND is the corpus. Further, since OOD and domain classifiers are two very relevant classes of tasks, if the corpus is OOD, it must be a negative sample of all domain two classifiers, and if the corpus is IND, it must be a positive sample of one or more domains in the domain classifiers. By using the correlation between tasks, a joint loss function can be constructed:

L(·)＝L__D(·)+a L__O(·)

wherein L \u_D(. L) loss, L for domain classifier production_O(. the loss generated by the OOD detector, α, is a hyper-parameter, which controls the influence degree of OOD on the loss of the whole model, a can be set to 1 during the actual training, and the loss of the output layer can adopt the cross entropy:

L__D(·)＝-p’logp

p is the soft-max prediction probability of the sample, and p ^' is the group-truth label of the sample. Parameters of the semantic representation layer BERT are subjected to fine tuning, OOD and independent optimization of output layer parameters of each domain classifier in the training process.

Thus, in a full-duplex conversation scenario, the conversation objects of the user may shift, and the user may occasionally talk to surrounding friends, chat, speak himself or herself. The semantic understanding model training method provided by the invention can effectively reduce the misidentification rate of the conversation and ensure that an assistant does not respond wrongly during the conversation. Furthermore, a large number of negative samples are mined through ActiveLearing to carry out model training, and after a plurality of cycles of iteration, the semantic understanding model is reduced to a reasonable range from an initial high false recognition rate. Meanwhile, the distribution of internal samples is adjusted by grouping the negative samples and giving different weights to different groups, so that the false recognition rate is further reduced. The semantic understanding model can learn important information from the negative samples with larger weights through the distribution adjustment of the negative samples, and the information quantity of the negative samples with lower weights tends to be saturated. And finally, on the model structure side, the OOD rejection model is introduced for joint learning, so that the error recognition rate of an internal development set and a test set can be reduced to different degrees finally. Therefore, the method and the device can ensure that the intelligent assistant effectively responds to the correct dialogue appeal of the user, reject the non-dialogue appeal, ensure the feasibility and the fluency of interaction and effectively improve the use experience of the user by optimizing the misidentification rate of the intelligent assistant in a full-duplex scene. FIG. 24 is a schematic diagram of a semantic understanding model wake-up application packaged in a vehicle-mounted system; FIG. 25 is a schematic diagram of a semantic understanding model for weather lookup encapsulated in an on-board system. Of course, in some embodiments of the present invention, a post-processed rank model may be further connected to the task specific layers, where the inputs of the model are the OOD and the prediction scores of the classifiers in each domain, and the prediction result of the whole model is output. In the invention, only one level of logic processing is carried out on the OOD prediction result and the field classifier prediction result, namely when the OOD model is predicted to be out of domain, the direct return result does not carry out the prediction of the field classifier. However, the OOD model is likely to have prediction errors, the confidence of the domain classifier model is high, the final result is IND, and the substitute scheme can give a reasonable prediction result on the basis of comprehensive comparison by learning the combination relation so as to reduce the error rate of the semantic understanding result of the semantic understanding model.

The invention has the following beneficial technical effects:

The above description is only exemplary of the present invention and should not be taken as limiting the scope of the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A semantic understanding model training method, the method comprising:

2. The method according to claim 1, wherein the triggering an active exploration process in response to the active learning process to implement boundary corpus expansion processing on the noisy speech sample matching the vehicle-mounted environment comprises:

responding to the active learning process, triggering a text similarity clustering network in an active exploration process to determine a text clustering center of the sentence sample with the noise matched with the vehicle-mounted environment;

retrieving the data source according to the text clustering center of the noisy statement sample matched with the vehicle-mounted environment so as to realize text augmentation of the noisy statement sample matched with the vehicle-mounted environment;

and triggering a corresponding manifold learning process to perform dimension reduction processing on the text augmentation result according to the text augmentation result of the noisy sentence sample matched with the vehicle-mounted environment so as to realize boundary corpus expansion on the noisy sentence sample matched with the vehicle-mounted environment.

3. The method according to claim 1, wherein labeling the noisy speech samples matching the vehicle-mounted environment after the boundary corpus expansion process to form a first training sample set comprises:

determining a sample type of the noisy statement sample;

negative examples samples in the sample type of the statement sample are sorted,

and configuring corresponding weights for the negative example samples according to the sequencing result of the negative example samples to form a first training sample set comprising training samples with different weights.

4. The method of claim 1, wherein denoising the first set of training samples to form a corresponding second set of training samples comprises:

determining a fixed noise threshold corresponding to the semantic understanding model;

and denoising the first training sample set according to the fixed noise threshold value to form a second training sample set matched with the fixed noise threshold value.

5. The method of claim 1, wherein the processing the second set of training samples by the semantic understanding model in response to the initial parameters of the semantic understanding model to determine updated parameters of the semantic understanding model comprises:

substituting different statement samples in the second training sample set into a loss function corresponding to a task-related output layer network consisting of a domain-independent detector network and a domain classification network of the semantic understanding model;

and determining that the loss function corresponds to the network parameters of the domain-independent detector and the domain classification network parameters in the semantic understanding model when the loss function meets the corresponding convergence condition as the updating parameters of the semantic understanding model.

6. The method of claim 5, wherein iteratively updating semantic representation layer network parameters and task-related output layer network parameters of the semantic understanding model with the second set of training samples according to the updated parameters of the semantic understanding model comprises:

determining a second noise parameter matched with the second training sample set through the updated parameter of the semantic understanding model, wherein the second noise parameter is used for representing the noise value of the parallel statement samples in the second training sample set;

when the second noise parameter reaches the corresponding noise value threshold,

and according to the noise value of the second noise parameter, iteratively updating the semantic representation layer network parameters and the task-related output layer network parameters of the semantic understanding model until a loss function corresponding to the task-related output layer network formed by the field-independent detector network of the semantic understanding model and the field classification network meets the corresponding convergence condition.

7. The method of claim 5, further comprising:

in response to a loss function corresponding to a task-dependent output-layer network formed by a domain-independent detector network and a domain classification network of the semantic understanding model,

and adjusting parameters of a semantic representation layer network of the semantic understanding model to realize that the parameters of the semantic representation layer network are matched with the loss function corresponding to the task-related output layer network.

8. The method of claim 1, further comprising:

negative example processing is carried out on the second training sample set to form a negative example sample set corresponding to the second training sample set, wherein the negative example sample set is used for adjusting domain-independent detector network parameters and domain classification network parameters of the semantic understanding model;

and determining a corresponding bilingual evaluation research value according to the negative sample set, wherein the bilingual evaluation research value is used as a supervision parameter for evaluating a semantic understanding result of the semantic understanding model.

9. The method of claim 5, wherein the negating the first set of training samples comprises:

10. A semantic understanding method of a semantic understanding model, the method comprising:

triggering a corresponding business process according to the object matched with the hidden variable of the word level and the task field corresponding to the hidden variable of the word level so as to complete the task corresponding to the voice instruction information,

wherein the semantic understanding model is trained based on the method of any one of claims 1 to 9.

11. A training apparatus for a semantic understanding model, the training apparatus comprising:

12. A semantic understanding model processing apparatus, characterized in that the apparatus comprises:

and the information processing module is used for triggering a corresponding business process according to the object matched with the hidden variable of the word level and the task field corresponding to the hidden variable of the word level so as to complete the task corresponding to the voice instruction information.

13. A training apparatus for a semantic understanding model, the training apparatus comprising:

a memory for storing executable instructions;

a processor configured to implement the method of training a semantic understanding model according to any one of claims 1 to 9 when executing the executable instructions stored in the memory.

14. A fused image processing apparatus, characterized in that the image processing apparatus comprises:

a memory for storing executable instructions;

a processor for implementing the semantic understanding method of the semantic understanding model of claim 10 when executing the executable instructions stored by the memory.

15. A computer-readable storage medium storing executable instructions, wherein the executable instructions, when executed by a processor, implement a training method of a semantic understanding model according to any one of claims 1 to 9 or implement a semantic understanding method of a semantic understanding model according to claim 10.