WO2021082786A1 - 语义理解模型的训练方法、装置、电子设备及存储介质 - Google Patents
语义理解模型的训练方法、装置、电子设备及存储介质 Download PDFInfo
- Publication number
- WO2021082786A1 WO2021082786A1 PCT/CN2020/115755 CN2020115755W WO2021082786A1 WO 2021082786 A1 WO2021082786 A1 WO 2021082786A1 CN 2020115755 W CN2020115755 W CN 2020115755W WO 2021082786 A1 WO2021082786 A1 WO 2021082786A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- semantic understanding
- understanding model
- sample set
- semantic
- training sample
- Prior art date
Links
- 238000012549 training Methods 0.000 title claims abstract description 291
- 238000000034 method Methods 0.000 title claims abstract description 144
- 238000012545 processing Methods 0.000 claims abstract description 84
- 230000008569 process Effects 0.000 claims abstract description 72
- 238000003672 processing method Methods 0.000 claims abstract description 17
- 230000004044 response Effects 0.000 claims abstract description 15
- 230000006870 function Effects 0.000 claims description 43
- 230000015654 memory Effects 0.000 claims description 30
- 238000002372 labelling Methods 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 4
- 230000010365 information processing Effects 0.000 claims description 4
- 238000011160 research Methods 0.000 claims description 4
- 238000012217 deletion Methods 0.000 claims description 2
- 230000037430 deletion Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 48
- 238000013519 translation Methods 0.000 description 17
- 239000013598 vector Substances 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 8
- 239000000203 mixture Substances 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000003416 augmentation Effects 0.000 description 5
- 230000002457 bidirectional effect Effects 0.000 description 5
- 230000007613 environmental effect Effects 0.000 description 5
- 238000010606 normalization Methods 0.000 description 5
- 230000003993 interaction Effects 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 102100033814 Alanine aminotransferase 2 Human genes 0.000 description 1
- 101000779415 Homo sapiens Alanine aminotransferase 2 Proteins 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013529 biological neural network Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000013550 semantic technology Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- This application relates to machine learning technology, and in particular to a training method, semantic processing method, device, electronic device, and storage medium of a semantic understanding model.
- the following operations need to be implemented in a multi-sound source environment where multiple sound sources continue to emit sound at the same time: for example, the recognition of voice identities (male, female, child), triggering dialogues with different content, and voice Emotion recognition, music/singing voice recognition, etc.; environmental processing, background noise recognition and echo cancellation, in this process, the semantic understanding model is a full-duplex dialogue scenario, background noise, and other people’s small chats have nothing to do with other fields (OOD, Out -Of-Domain) corpus is more likely to be listened to by the assistant. If such corpus is mis-responded by the intelligent assistant, the interaction success rate will be lower, which will affect the user's experience.
- the domain intent recognition accuracy in the dialogue system is required to be higher, and the semantic understanding model is required to know when to reject (ie reject to respond) and when to respond to what the user said, so as to improve user usage Experience also reduces the power consumption of electronic devices due to frequent invalid triggers.
- the embodiments of the present application provide a semantic understanding model training method, semantic processing method, device, electronic device, and storage medium, which can make the semantic understanding model more generalized and improve the training accuracy of the semantic understanding model. Training speed; at the same time, it can make full use of the existing noise sentences to obtain the gain in model training, so that the semantic understanding model can adapt to different usage scenarios, and reduce the impact of environmental noise on the semantic understanding model.
- This application provides a method for training a semantic understanding model, including:
- the first training sample set is a sentence sample with noise acquired through an active learning process
- processing the second training sample set through the semantic understanding model to determine update parameters of the semantic understanding model
- the semantic representation layer network parameters and task-related output layer network parameters of the semantic understanding model are iteratively updated through the second training sample set.
- This application also provides a semantic processing method of a semantic understanding model, including:
- the semantic understanding model is obtained by training based on the aforementioned method.
- An embodiment of the present application also provides a training device for a semantic understanding model, including:
- a data transmission module configured to obtain a first training sample set, where the first training sample set is a sentence sample with noise obtained through an active learning process;
- a denoising module configured to perform denoising processing on the first training sample set to form a corresponding second training sample set
- a semantic understanding model training module configured to process the second training sample set through the semantic understanding model to determine the initial parameters of the semantic understanding model
- the semantic understanding model training module is configured to respond to the initial parameters of the semantic understanding model and process the second training sample set through the semantic understanding model to determine the update parameters of the semantic understanding model;
- the semantic understanding model training module is configured to iteratively update the semantic representation layer network parameters and task-related output layer network parameters of the semantic understanding model through the second training sample set according to the update parameters of the semantic understanding model .
- the embodiment of the present application also provides a semantic understanding model processing device, including:
- a text conversion module configured to obtain voice command information and convert the voice command into corresponding recognizable text information
- the semantic presentation layer network module is configured to determine at least one word-level hidden variable corresponding to the identifiable text information through the semantic presentation layer network of the semantic understanding model;
- a domain-independent detector network module configured to determine an object matching the word-level hidden variable according to the at least one word-level hidden variable through the domain-independent detector network of the semantic understanding model;
- a domain classification network module configured as a domain classification network through the semantic understanding model; according to the at least one word-level hidden variable, determine a task domain corresponding to the word-level hidden variable;
- the information processing module is configured to trigger the corresponding business process according to the object matching the word-level hidden variable and the task field corresponding to the word-level hidden variable, so as to realize the completion of the voice command information
- the corresponding task
- An embodiment of the application also provides an electronic device, including:
- Memory configured to store executable instructions
- the processor is configured to implement the training method of the preamble semantic understanding model when running the executable instructions stored in the memory.
- An embodiment of the application also provides an electronic device, including:
- Memory configured to store executable instructions
- the processor is configured to implement the semantic processing method of the pre-order semantic understanding model when running the executable instructions stored in the memory.
- the embodiment of the present application also provides a computer-readable storage medium that stores executable instructions, where the executable instructions are executed by the processor to implement the training method of the semantic understanding model of the preamble, or implement the semantics of the preamble. Approach.
- the first training sample set is a sentence sample with noise acquired through an active learning process
- denoising processing is performed on the first training sample set to form a corresponding second Training sample set
- the second training sample set is processed through the semantic understanding model to determine the initial parameters of the semantic understanding model
- the semantic understanding model is used to
- the second training sample set is processed to determine the update parameters of the semantic understanding model; according to the update parameters of the semantic understanding model, the semantic representation layer network parameters and tasks of the semantic understanding model are analyzed through the second training sample set.
- the relevant output layer network parameters are updated iteratively, thereby making the generalization ability of the semantic understanding model stronger, improving the training accuracy and training speed of the semantic understanding model, and effectively making full use of the gains of the existing noise sentences for model training.
- the semantic understanding model can be adapted to different usage scenarios, reducing the impact of environmental noise on the semantic understanding model, reducing invalid triggers of electronic devices, and facilitating the deployment of the semantic understanding model in mobile terminals.
- FIG. 1 is a schematic diagram of a usage scenario of a semantic understanding model training method provided by an embodiment of the application
- FIG. 2 is a schematic diagram of the composition structure of a semantic understanding model training device provided by an embodiment of the application;
- Figure 3 is a schematic diagram of the RNN-based Seq2Seq model generating semantic understanding results
- FIG. 4 is a schematic diagram of an optional process of a semantic understanding model training method provided by an embodiment of the application.
- FIG. 5 is a schematic diagram of an optional structure of a semantic presentation layer network model in an embodiment of this application.
- FIG. 6 is a schematic diagram of an optional word-level machine reading of the semantic presentation layer network model in an embodiment of this application.
- FIG. 7 is a schematic diagram of an optional structure of an encoder in a semantic presentation layer network model in an embodiment of this application.
- FIG. 8 is a schematic diagram of vector splicing of the encoder in the semantic representation layer network model in an embodiment of the application
- FIG. 9 is a schematic diagram of the encoding process of the encoder in the semantic presentation layer network model in an embodiment of the application.
- FIG. 10 is a schematic diagram of the decoding process of the decoder in the semantic presentation layer network model in an embodiment of this application;
- 11 is a schematic diagram of the decoding process of the decoder in the semantic presentation layer network model in an embodiment of the application;
- FIG. 12 is a schematic diagram of the decoding process of the decoder in the semantic presentation layer network model in an embodiment of the application;
- FIG. 13 is a schematic diagram of an optional sentence-level machine reading of the semantic presentation layer network model in an embodiment of this application.
- FIG. 14 is a schematic diagram of an optional process of a semantic understanding model training method provided by an embodiment of this application.
- FIG. 15 is a schematic diagram of an optional process of the semantic understanding model training method provided by an embodiment of this application.
- FIG. 16A is an optional flowchart of the semantic understanding model training method provided by an embodiment of this application.
- 16B is a schematic diagram of an optional boundary corpus expansion of the semantic understanding model training method provided by an embodiment of the application.
- FIG. 17 is a schematic diagram of the composition structure of a semantic understanding model processing device provided by an embodiment of the application.
- FIG. 19 is a schematic diagram of a usage scenario of a semantic understanding model training method provided by an embodiment of the application.
- FIG. 20 is a schematic diagram of a usage scenario of a semantic understanding model training method provided by an embodiment of the application.
- FIG. 21 is a schematic diagram of an optional processing flow of the semantic understanding model training method provided by this application.
- 22 is a schematic diagram of the active learning process in the processing process of the semantic understanding model training method provided by an embodiment of the application;
- FIG. 23 is a schematic diagram of an optional model structure of a semantic understanding model provided by an embodiment of this application.
- Figure 24 is a schematic diagram of using a semantic understanding model to wake up applications encapsulated in a vehicle system
- Figure 25 is a schematic diagram of checking weather encapsulated in a vehicle-mounted system using a semantic understanding model.
- Machine reading comprehension an automatic question and answer technology that takes text questions and related documents as input and text answers as output
- BERT The full name is Bidirectional Encoder Representations from Transformers, a language model training method using massive text. This method is widely used in a variety of natural language processing tasks, such as text classification, text matching, machine reading comprehension, etc.
- N Artificial Neural Network
- NN Neural Network
- machine learning and cognitive science it is a mathematical model or calculation model that imitates the structure and function of biological neural networks and is used to estimate or approximate functions .
- Model parameters It is a quantity that uses common variables to establish the relationship between functions and variables. In artificial neural networks, the model parameters are usually real matrixes.
- API The full name of Application Programming Interface, which can be understood semantically as an application program interface, is some predefined functions, or refers to the agreement for the connection of different components of the software system. The purpose is to provide applications and developers with the ability to access a set of routines based on certain software or hardware without having to access the original code or understand the details of the internal working mechanism.
- SDK the full name of Software Development Kit, which can be understood as a software development kit semantically. It is a collection of development tools used to build application software for specific software packages, software frameworks, hardware platforms, operating systems, etc., in a broad sense, including auxiliary development. A collection of related documents, examples and tools for similar software.
- Generative Adversarial Network A method of unsupervised learning that learns by letting two neural networks play against each other, generally composed of a generative network and a discriminant network.
- the generating network randomly samples from the latent space as input, and its output result needs to imitate the real samples in the training set as much as possible.
- the input of the discriminant network is the real sample or the output of the generating network, and its purpose is to distinguish the output of the generating network from the real samples as much as possible.
- the generation network must deceive the discrimination network as much as possible.
- the two networks confront each other and constantly adjust their parameters. The ultimate goal is to make the discriminating network unable to judge whether the output of the generated network is true.
- Natural language understanding NLU (Natural Language Understanding), which extracts semantic information from the words spoken by the user in the dialogue system, including domain intention recognition and slot filling.
- Multi-task learning in the field of machine learning, by jointly learning and optimizing multiple related tasks at the same time, it can achieve better model accuracy than a single task, and multiple tasks help each other by sharing the presentation layer
- This training method is called multi-task learning, also called joint learning.
- Active learning In supervised learning, the machine learning model learns the mapping relationship between data and prediction results by fitting training data, and active learning selects information for the model by designing data sampling methods The sample data with the largest amount is labeled. Compared with the random sampling method, after the labeled data is re-added to the sample training, the profit of the model is the largest.
- OOD Out of Domain.
- domains weather checking, navigation, music, etc.
- OOD corpus such as small chat, knowledge question and answer, semantic understanding errors, etc.
- IND In domain
- FAR False Acceptance Rate, the proportion of OOD corpus that is misidentified in any field in all OOD corpus. This indicator reflects the misrecognition rate of intelligent assistants. The lower the indicator, the better. In the full-duplex scenario, there are strict restrictions on this indicator and must be at a very low level.
- FRR False Rejection Rate
- Speech and semantic understanding also known as automatic speech and semantic understanding, it is a technology that uses a computer to understand the speech semantics of one natural language into text or speech of another natural language. Generally speaking, it can be understood by semantic understanding and machine semantics. Understand the two-stage composition.
- FIG. 1 is a schematic diagram of the usage scenario of the semantic understanding model training method provided by an embodiment of the application.
- a client terminal including terminal 10-1 and terminal 10-2
- the semantic understanding software client can input the corresponding semantic understanding sentence, and the chat client can also receive the corresponding semantic understanding result, and display the received semantic understanding result to the user;
- the terminal connects to the server 200 and the network 300 through the network 300 It can be a wide area network or a local area network, or a combination of the two, using wireless links to achieve data transmission.
- the server 200 is configured to deploy the semantic understanding model and train the semantic understanding model to iteratively update the semantic representation layer network parameters and task-related output layer network parameters of the semantic understanding model to achieve
- the semantic understanding result of the sentence to be understood by the target will be generated through the semantic presentation layer network and the task-related output layer network in the semantic understanding model, and the semantic understanding model generated by the terminal (terminal 10-1 and/or terminal 10-2) will be displayed The result of semantic understanding corresponding to the sentence to be understood.
- the semantic understanding model also needs to be trained.
- it may include: obtaining the first training sample set, The first training sample set is a sentence sample with noise obtained through an active learning process; the first training sample set is denoised to form a corresponding second training sample set; the semantic understanding model Processing the second training sample set to determine the initial parameters of the semantic understanding model; in response to the initial parameters of the semantic understanding model, processing the second training sample set through the semantic understanding model, Determine the update parameters of the semantic understanding model; according to the update parameters of the semantic understanding model, iteratively update the semantic representation layer network parameters and task-related output layer network parameters of the semantic understanding model through the second training sample set .
- the training device for the semantic understanding model can be implemented in various forms, such as a dedicated terminal with a semantic understanding model training function, or it can be equipped with semantic understanding.
- the server for the model training function is, for example, the server 200 in the preceding figure 1.
- Figure 2 is a schematic diagram of the composition structure of the training device of the semantic understanding model provided by the embodiment of the application. It is understandable that Figure 2 only shows the exemplary structure of the training device of the semantic understanding model rather than the entire structure. Figure 2 can be implemented as needed. Part of the structure or all of the structure shown.
- the training device for the semantic understanding model includes: at least one processor 201, a memory 202, a user interface 203, and at least one network interface 204.
- the various components in the training device 20 of the semantic understanding model are coupled together through the bus system 205.
- the bus system 205 is configured to implement connection and communication between these components.
- the bus system 205 also includes a power bus, a control bus, and a status signal bus.
- various buses are marked as the bus system 205 in FIG. 2.
- the user interface 203 may include a display, a keyboard, a mouse, a trackball, a click wheel, keys, buttons, a touch panel, or a touch screen.
- the memory 202 may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memory.
- the memory 202 in the embodiment of the present application can store data to support the operation of the terminal (such as 10-1). Examples of such data include: any computer program configured to operate on a terminal (such as 10-1), such as an operating system and application programs.
- the operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, etc., which are configured to implement various basic services and process hardware-based tasks.
- Applications can include various applications.
- the semantic understanding model training device provided in the embodiments of the present application may be implemented in a combination of software and hardware.
- the semantic understanding model training device provided in the embodiments of the present application may be in the form of a hardware decoding processor.
- the processor is programmed to execute the semantic understanding model training method provided in the embodiments of the present application.
- a processor in the form of a hardware decoding processor may adopt one or more application specific integrated circuits (ASIC, Application Specific Integrated Circuit), DSP, programmable logic device (PLD, Programmable Logic Device), and complex programmable logic device (CPLD, Complex Programmable Logic Device, Field-Programmable Gate Array (FPGA, Field-Programmable Gate Array) or other electronic components.
- ASIC application specific integrated circuits
- DSP digital signal processor
- PLD programmable logic device
- CPLD Complex Programmable Logic Device
- FPGA Field-Programmable Gate Array
- the training device of the semantic understanding model provided by the embodiment of the present application adopts a combination of software and hardware
- the training device of the semantic understanding model provided by the embodiment of the present application can be directly embodied as a combination of software modules executed by the processor 201, and the software modules It may be located in a storage medium, and the storage medium is located in the memory 202.
- the processor 201 reads the executable instructions included in the software module in the memory 202, and combines the necessary hardware (for example, the processor 201 and other components connected to the bus 205) to complete the task. Apply the semantic understanding model training method provided by the embodiment.
- the processor 201 may be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP, Digital Signal Processor), or other programmable logic devices, discrete gates, or transistor logic devices , Discrete hardware components, etc., where the general-purpose processor may be a microprocessor or any conventional processor.
- DSP Digital Signal Processor
- the general-purpose processor may be a microprocessor or any conventional processor.
- the device provided in the embodiment of the application can directly use the processor 201 in the form of a hardware decoding processor to complete the execution, for example, by one or more Application Specific Integrated Circuits (ASIC, Application Specific Integrated Circuit), DSP, Programmable Logic Device (PLD, Programmable Logic Device), Complex Programmable Logic Device (CPLD, Complex Programmable Logic Device), Field Programmable Gate Array (FPGA, Field-Programmable Gate Array) or other electronic components implement the semantic understanding model training method provided in the embodiments of the present application.
- ASIC Application Specific Integrated Circuit
- DSP Digital Signal Processing
- PLD Programmable Logic Device
- CPLD Complex Programmable Logic Device
- FPGA Field Programmable Gate Array
- FPGA Field-Programmable Gate Array
- the memory 202 in the embodiment of the present application is configured to store various types of data to support the operation of the training device 20 of the semantic understanding model. Examples of these data include: any executable instructions configured to operate on the training device 20 of the semantic understanding model, such as executable instructions, the program implementing the semantic understanding model training method of the embodiment of the present application may be included in the executable instructions .
- the training device for the semantic understanding model provided in the embodiments of the present application can be implemented in software.
- FIG. 2 shows the training device for the semantic understanding model stored in the memory 202, which can be a program, a plug-in, etc. A form of software, and includes a series of modules.
- the program stored in the memory 202 it may include a training device for a semantic understanding model.
- the training device for a semantic understanding model includes the following software modules: data transmission module 2081, denoising Module 2082 and semantic understanding model training module 2083.
- the software module in the training device of the semantic understanding model is read into RAM and executed by the processor 201, the semantic understanding model training method provided in the embodiment of the present application will be implemented.
- the training of the semantic understanding model in the embodiment of the present application will be described below.
- the function of each software module in the device among which,
- the data transmission module 2081 is configured to obtain a first training sample set, where the first training sample set is a noisy sentence sample obtained through an active learning process;
- the denoising module 2082 is configured to perform denoising processing on the first training sample set to form a corresponding second training sample set;
- the semantic understanding model training module 2083 is configured to process the second training sample set through the semantic understanding model to determine the initial parameters of the semantic understanding model
- the semantic understanding model training module 2083 is configured to process the second training sample set through the semantic understanding model in response to the initial parameters of the semantic understanding model, and determine the update parameters of the semantic understanding model;
- the semantic understanding model training module 2083 is configured to iterate the semantic representation layer network parameters and task-related output layer network parameters of the semantic understanding model through the second training sample set according to the updated parameters of the semantic understanding model Update.
- Figure 3 is a schematic diagram of the semantic understanding result generated in the traditional scheme.
- the eq2seq model is represented by an encoder (Encode) and a decoder (Decode).
- Seq2seq model is based on the input sequence X to generate the output sequence Y.
- the encoder (Encode) converts the input sequence into a fixed-length vector
- the decoder (Decode) decodes the input fixed-length vector into an output sequence
- the encoder (Encoder) encodes the input sentence to be semantically understood to obtain the text feature of the sentence to be semantically understood
- the decoder (Decoder) decodes the text feature and outputs the corresponding semantic understanding result, where , Encode and Decode are in one-to-one correspondence.
- the disadvantage of the semantic understanding model based on the Seq2Seq model is that the model in the related technology only establishes a one-to-one relationship with the training data target text y-labeling information, and uses MLE to perform The optimization of the model results in the model generating many high-frequency general responses, which are often meaningless and short.
- the same target text y can have many kinds of annotation information.
- the existing Seq2Seq model has a one-to-one correspondence between the encoder (Encode) and the decoder (Decode). Dealing with multiple problems, at the same time, it is easy to be interfered by noise information, triggering useless recognition, and the user experience is poor.
- FIG. 4 is an optional flowchart of the semantic understanding model training method provided by the embodiment of this application. Understandably, the steps shown in FIG. 4 can be understood by running semantics.
- the various electronic devices of the model training apparatus can be executed by, for example, a dedicated terminal with a sample generation function, a server with a semantic understanding model training function, or a server cluster. The steps shown in FIG. 4 are described below.
- Step 401 The semantic understanding model training device obtains a first training sample set, where the first training sample set is a sentence sample with noise obtained through an active learning process.
- the first training sample set may be language samples of the same language, or may also be language samples of different languages, which is not limited.
- the language of the first training sample set can be set according to actual translation requirements. For example, when the translation model is applied to a Chinese-English application scenario, the language of the first training sample set may be Chinese.
- the language of the first training sample set may be It is English.
- the language of the first training sample set may include Chinese and/or French.
- the first training sample set may be in the form of speech or text, and the first training sample set in text form and/or the first training sample set in speech form may be collected in advance, for example ,
- the first training sample set in text form and/or the first training sample set in speech form can be collected in the usual sentence collection manner, and the first training sample set in text form and/or the first training sample set in speech form can be collected
- the sample collection is stored in a preset storage device. Therefore, in the present application, when training the translation model, the first training sample set can be obtained from the above-mentioned storage device.
- Step 402 Perform denoising processing on the first training sample set to form a corresponding second training sample set.
- the denoising processing on the first training sample set to form a corresponding second training sample set can be implemented in the following manner:
- the dynamic noise threshold that matches the use environment of the translation model is also different, for example, in the use environment of academic translation, the dynamic noise threshold that matches the use environment of the translation model It needs to be smaller than the dynamic noise threshold in the article reading environment.
- the denoising processing on the first training sample set to form a corresponding second training sample set can be implemented in the following manner:
- Step 403 The semantic understanding model training device processes the second training sample set through the semantic understanding model to determine the initial parameters of the semantic understanding model.
- Step 404 In response to the initial parameters of the semantic understanding model, the semantic understanding model training device processes the second training sample set through the semantic understanding model to determine update parameters of the semantic understanding model.
- the second training sample set in response to the initial parameters of the semantic understanding model, is processed by the semantic understanding model to determine the update parameters of the semantic understanding model. This is achieved in the following ways:
- the composition of the semantic understanding model may include: a semantic representation layer network and a task-related output layer network. Further, the task-related output layer network includes a domain-independent detector network and a domain classification network.
- the semantic representation layer network may be a bidirectional attention neural network model (BERT Bidirectional Encoder Representations from Transformers).
- the layer here contains three sub-layers, one of which is a self-attention layer, and the encoder-decoder attention layer is a full connection at the end Floor.
- the first two sub-layers are based on the multi-head attention layer.
- FIG. 6 is an optional word-level machine reading schematic diagram of the semantic presentation layer network model in the embodiment of the application, where the encoder and decoder parts both include 6 encoders and decoders.
- the inputs that enter the first encoder combine embedding and positional embedding. After passing the 6 encoders, they are output to each decoder in the decoder section; the input target is "I am a student t", which is processed by the semantic presentation layer network model, and the output machine reading result is: "student”.
- FIG. 7 is an optional structural diagram of the encoder in the semantic presentation layer network model in the embodiment of the application, where the input is composed of a query (Q) and a key (K) with a dimension of d and a dimension of d Calculate the dot product of the query for all keys, and apply the softmax function to obtain the weight of the value.
- FIG. 7 is a schematic diagram of the vector of the encoder in the semantic representation layer network model in the embodiment of the present application, where Q, K, and V are the vectors x and W ⁇ Q, W ⁇ K, W ⁇ V input through the encoder Multiply to get Q, K and V.
- the dimensions of W ⁇ Q, W ⁇ K, and W ⁇ V in the article are (512, 64), and then assume that the dimension of our inputs is (m, 512), where m represents the number of words. Therefore, the dimensions of Q, K, and V obtained by multiplying the input vector with W ⁇ Q, W ⁇ K, and W ⁇ V are (m, 64).
- Fig. 8 is a schematic diagram of vector splicing of the encoder in the semantic representation layer network model in the embodiment of the application, where Z0 to Z7 are the corresponding 8 parallel heads (dimensions are (m, 64)), and then After the 8 heads of concat, the (m, 512) dimension is obtained. Finally, after multiplying with W ⁇ O, the output matrix with dimension (m, 512) is reached, then the dimension of this matrix is consistent with the dimension of entering the next encoder.
- Figure 9 is a schematic diagram of the encoding process of the encoder in the semantic presentation layer network model in the embodiment of the application, where x1 is in the state of z1 through self-attention, and the tensor that has passed the self-attention needs to be entered.
- the residual network and LaterNorm processing and then enter the fully connected feedforward network, the feedforward network needs to perform the same operation, residual processing and normalization.
- the last output tensor can enter the next encoder, and then this operation, after 6 iterations, the result of the iterative processing enters the decoder.
- FIG. 10 is a schematic diagram of the decoding process of the decoder in the semantic presentation layer network model in the embodiment of the application, in which the input, output and decoding process of the decoder:
- Output the probability distribution of the output word corresponding to position i;
- Input the output of the encoder & the output of the decoder corresponding to the i-1 position. So the middle attention is not self-attention, its K and V come from the encoder, and Q comes from the output of the decoder at the previous position.
- FIG. 11 is a schematic diagram of the decoding process of the decoder in the semantic presentation layer network model in an embodiment of this application, where.
- the vector output by the last decoder of the decoder network will pass through the Linear layer and the softmax layer.
- Figure 12 is a schematic diagram of the decoding process of the decoder in the semantic representation layer network model in the embodiment of the application.
- the function of the Linear layer is to map the vector from the decoder part into a logits vector, and then the softmax layer converts it according to the logits vector For the probability value, finally find the position of the maximum probability, that is, complete the output of the decoder.
- the first reading semantic annotation network may be a bidirectional attention neural network model (BERT Bidirectional Encoder Representations from Transformers).
- the layer here contains three sub-layers, including a self-attention layer, and the encoder-decoder attention layer at the end is a full connection Floor.
- the first two sub-layers are based on multi-head attention layer.
- FIG. 13 is an optional sentence-level machine reading schematic diagram of the semantic presentation layer network model in the embodiment of the application, where the encoder and decoder parts both include 6 encoders and decoders.
- the inputs that enter the first encoder combine embedding and positional embedding. After passing the 6 encoders, they are output to each decoder in the decoder section; the input target is English "I am a student" and processed by the semantic presentation layer network model, and the output machine reading shows the result: "I am a student” .
- the BERT model in this application also uses a forward neural network model (Bi-LSTM Bi-directional Long Short-Term Memory), a gated recurrent unit network model (GRU Gated Recurrent Unit) model, and a deep contextualized word representation network Model (ELMo embedding from language model), GPT model, and GPT2 model are replaced, and this application will not repeat them.
- Bi-LSTM Bi-directional Long Short-Term Memory Bi-LSTM Bi-directional Long Short-Term Memory
- GRU Gated Recurrent Unit gated recurrent Unit
- ELMo embedding from language model GPT model
- GPT2 model GPT2 model
- Step 405 The semantic understanding model training device iteratively updates the semantic representation layer network parameters and task-related output layer network parameters of the semantic understanding model through the second training sample set according to the update parameters of the semantic understanding model.
- FIG. 14 is an optional flowchart of the semantic understanding model training method provided by an embodiment of the application. Understandably, the steps shown in FIG. 14 can be executed by various electronic devices running the semantic understanding model training device. For example, it can be a dedicated terminal with a semantic understanding model training function, a server or a server cluster with a semantic understanding model training function. The steps shown in FIG. 14 will be described below.
- Step 1401 The semantic understanding model training device determines a second noise parameter matching the second training sample set through the update parameters of the semantic understanding model.
- the second noise parameter is used to characterize the noise value of the parallel sentence samples in the second training sample set; wherein, the weight of each training sample in the second training sample set is the same, and these weights are the same for training.
- the sample can be called a parallel sentence sample.
- Step 1402 When the second noise parameter reaches the corresponding noise value threshold, the semantic understanding model training device outputs the semantic representation layer network parameters and task-related output of the semantic understanding model according to the noise value of the second noise parameter The layer network parameters are updated iteratively until the loss function corresponding to the task-related output layer network formed by the domain-independent detector network of the semantic understanding model and the domain classification network meets the corresponding convergence condition.
- Step 1403 The semantic understanding model training device responds to the loss function corresponding to the task-related output layer network composed of the domain-independent detector network and the domain classification network of the semantic understanding model.
- Step 1404 The semantic understanding model training device adjusts the parameters of the semantic representation layer network of the semantic understanding model.
- the parameters of the semantic representation layer network are adapted to the loss function corresponding to the task-related output layer network.
- the loss function of the encoder network is expressed as:
- loss_A ⁇ (decoder_A (encoder ( warp (x1))) - x1) 2; wherein, decoder_A the decoder A, warp to be recognized as a function of the statement, x 1 is the sentence to be recognized, encoder encoder.
- the training is ended.
- loss_B ⁇ (decoder_B(encoder(warp(x2)))-x2)2; where decoder_B is decoder B, warp is the sentence to be recognized The function of x2 is the sentence to be recognized, and encoder is the encoder.
- the parameters of encoder B and decoder B when the loss function decreases according to the gradient are solved; when the loss function converges (that is, when When the selected probability of the translation result corresponding to the sentence to be recognized is obtained by decoding), the adjustment and training are ended.
- FIG. 15 is an optional flowchart of the semantic understanding model training method provided by an embodiment of this application. Understandably, the steps shown in FIG. 15 can be executed by various electronic devices running the semantic understanding model training device. For example, it can be a dedicated terminal with a semantic understanding model training function, a server or a server cluster with a semantic understanding model training function. The steps shown in FIG. 15 will be described below.
- Step 1501 The semantic understanding model training device performs negative example processing on the second training sample set to form a negative example sample set corresponding to the second training sample set.
- the negative example sample set is configured to adjust the domain-independent detector network parameters and domain classification network parameter adjustments of the semantic understanding model.
- processing of negative examples on the first training sample set may be implemented in the following manner:
- Random deletion processing or replacement processing is performed on the sentences to be output in the domain classification network of the semantic understanding model to form a negative example sample set corresponding to the first training sample set.
- Step 1502 The semantic understanding model training device determines the corresponding bilingual evaluation research value according to the set of negative example samples.
- the use scenario of full-duplex voice interaction applied by the semantic understanding model is a use environment other than Chinese (it can be a use environment of a single English or other languages, or a use environment that includes at least two languages)
- the corresponding bilingual evaluation research value determined according to the negative example sample set can be configured as a supervision parameter to evaluate the semantic understanding result of the semantic understanding model.
- the encoder and corresponding decoder corresponding to the semantic presentation layer network may be a two-way network model.
- the Bi-GRU two-way GRU model may be selected as the corresponding encoder and corresponding decoder.
- the Bi-GRU two-way GRU model at the office is a model that can recognize the structure of inverted sentences. When the user enters a dialogue sentence, the dialogue sentence may be inverted, that is, it is different from the normal sentence structure.
- the Bi-GRU two-way GRU model can identify dialogue sentences with inverted sentence structure, which can enrich the functions of the trained model and improve the robustness of the final trained target model.
- FIG. 16A is an optional flowchart of the semantic understanding model training method provided by an embodiment of the application. Understandably, the steps shown in FIG. 15 can be executed by various electronic devices running the semantic understanding model training device. For example, it can be a dedicated terminal with a semantic understanding model training function, a server or a server cluster with a semantic understanding model training function. The steps shown in FIG. 16A will be described below.
- Step 1601 The semantic understanding model training device performs recall processing on the training samples in the data source.
- the data source includes the data of various types of application scenarios as the data source of the corresponding training book.
- the semantic understanding model provided in this application can be packaged as a software module in an in-vehicle electronic device, or it can be packaged in different smart homes. (Including but not limited to: speakers, TVs, refrigerators, air conditioners, washing machines, stoves), of course, it can also be solidified in the hardware equipment of smart robots.
- corresponding training samples can be used to pair semantics. Understand the model for targeted training.
- Step 1602 The semantic understanding model training device triggers a corresponding active learning process according to the result of the recall processing, so as to obtain a sentence sample with noise in the data source.
- Step 1603 The semantic understanding model training device marks the sentence samples with noise acquired in the active learning process to form the first training sample set.
- the labeling of noisy sentence samples obtained in the active learning process to form the first training sample set can be implemented in the following manner:
- the semantic understanding model training device may trigger an active exploration process in response to the active learning process, so as to implement boundary corpus expansion processing on the noisy sentence samples matching the vehicle environment .
- the semantic understanding model training device may respond to the active learning process to trigger the text similarity clustering network in the active exploration process to determine the text clustering center of the noisy sentence sample matching the vehicle environment; Search the data source according to the text clustering center of the noisy sentence samples that match the vehicle environment, so as to implement text augmentation for the noisy sentence samples that match the vehicle environment ; According to the results of text augmentation on the noisy sentence samples matching the vehicle environment, trigger the corresponding manifold learning process to perform dimensionality reduction processing on the results of the text augmentation, so as to achieve the The noisy sentence samples matching the vehicle environment are used for boundary corpus expansion. 16B, FIG. 16B is an optional boundary corpus expansion schematic diagram of the semantic understanding model training method provided by an embodiment of the application.
- the belt matching the vehicle environment is determined
- the text cluster center of the noisy sentence samples, and the data source is retrieved based on this, and the sentence samples associated with the noisy sentence samples that match the vehicle environment can be effectively increased.
- the number of noisy sentence samples but because the dimensionality of the training samples increases during the augmentation process of the training sample sentences, therefore, the dimensionality reduction processing of the results of the text augmentation through the manifold learning process can reduce the subsequent model training process data
- the impact of dimensionality on the accuracy of semantic understanding model training while reducing the difficulty of training and reducing the waiting time of users.
- FIG. 17 is a schematic diagram of the composition structure of the semantic understanding model processing device provided by an embodiment of the application. It can be understood that FIG. 17 only shows an exemplary structure of the semantic understanding model processing device rather than the entire structure. It can be implemented as required. Part of the structure or the entire structure.
- the semantic understanding model processing apparatus includes: at least one processor 1301, memory 1302, user interface 1303, and at least one network interface 1304.
- the various components in the semantic understanding model processing device 130 are coupled together through the bus system 1305.
- the bus system 1305 is configured to implement connection and communication between these components.
- the bus system 1305 also includes a power bus, a control bus, and a status signal bus.
- various buses are marked as the bus system 1305 in FIG. 17.
- the user interface 1303 may include a display, a keyboard, a mouse, a trackball, a click wheel, keys, buttons, a touch panel, or a touch screen.
- the memory 1302 may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memory.
- the memory 1302 in the embodiment of the present application can store data to support the operation of the terminal (such as 10-1). Examples of such data include: any computer program configured to operate on a terminal (such as 10-1), such as an operating system and application programs.
- the operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, etc., which are configured to implement various basic services and process hardware-based tasks.
- Applications can include various applications.
- the semantic understanding model processing apparatus provided in the embodiments of the present application may be implemented in a combination of software and hardware.
- the semantic understanding model processing apparatus provided in the embodiments of the present application may be in the form of a hardware decoding processor.
- the processor is programmed to execute the semantic processing method of the semantic understanding model provided in the embodiment of the present application.
- a processor in the form of a hardware decoding processor may adopt one or more application specific integrated circuits (ASIC, Application Specific Integrated Circuit), DSP, programmable logic device (PLD, Programmable Logic Device), and complex programmable logic device (CPLD, Complex Programmable Logic Device, Field-Programmable Gate Array (FPGA, Field-Programmable Gate Array) or other electronic components.
- ASIC application specific integrated circuits
- DSP digital signal processor
- PLD programmable logic device
- CPLD Complex Programmable Logic Device
- FPGA Field-Programmable Gate Array
- the semantic understanding model processing apparatus provided by the embodiment of the present application may be directly embodied as a combination of software modules executed by the processor 1301, and the software modules may be located in In the storage medium, the storage medium is located in the memory 1302, and the processor 1301 reads the executable instructions included in the software module in the memory 1302, and combines necessary hardware (for example, including the processor 1301 and other components connected to the bus 1305) to complete the implementation of this application
- the semantic processing method of the semantic understanding model provided by the example.
- the processor 1301 may be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP, Digital Signal Processor), or other programmable logic devices, discrete gates, or transistor logic devices , Discrete hardware components, etc., where the general-purpose processor may be a microprocessor or any conventional processor.
- DSP Digital Signal Processor
- the general-purpose processor may be a microprocessor or any conventional processor.
- the device provided in the embodiment of the application can directly use the processor 1301 in the form of a hardware decoding processor to perform the execution, for example, by one or more Application Specific Integrated Circuit (ASIC, Application Specific Integrated Circuit), DSP, Programmable Logic Device (PLD, Programmable Logic Device), Complex Programmable Logic Device (CPLD, Complex Programmable Logic Device), Field Programmable Gate Array (FPGA, Field -Programmable Gate Array) or other electronic components execute the semantic processing method that realizes the semantic understanding model provided by the embodiment of the present application.
- ASIC Application Specific Integrated Circuit
- DSP Programmable Logic Device
- PLD Programmable Logic Device
- CPLD Complex Programmable Logic Device
- FPGA Field Programmable Gate Array
- Field -Programmable Gate Array Field -Programmable Gate Array
- the memory 1302 in the embodiment of the present application is configured to store various types of data to support the operation of the semantic understanding model processing device 130. Examples of these data include: any executable instructions configured to operate on the semantic understanding model processing device 130, such as executable instructions, and the program that implements the semantic processing method from the semantic understanding model of the embodiment of the present application may be included in the executable instructions in.
- the semantic understanding model processing device provided by the embodiments of the present application can be implemented in software.
- FIG. 17 shows the semantic understanding model processing device stored in the memory 1302, which can be in the form of programs, plug-ins, etc.
- Software and includes a series of modules.
- the program stored in the memory 1302 it may include a semantic understanding model processing device.
- the semantic understanding model processing device includes the following software modules: text conversion module 13081, semantic presentation layer network module 13082 , Domain-independent detector network module 13083, domain classification network module 13084, and information processing module 13085.
- the text conversion module 13081 is configured to obtain voice command information and convert the voice command into corresponding recognizable text information
- the semantic presentation layer network module 13082 is configured to determine at least one word-level hidden variable corresponding to the identifiable text information through the semantic presentation layer network of the semantic understanding model;
- the domain-independent detector network module 13083 is configured to determine an object matching the word-level hidden variable according to the at least one word-level hidden variable through the domain-independent detector network of the semantic understanding model;
- the domain classification network module 13084 is configured to use the domain classification network of the semantic understanding model; according to the at least one word-level hidden variable, determine the task domain corresponding to the word-level hidden variable;
- the information processing module 13085 is configured to trigger the corresponding business process according to the object matching the word-level hidden variable and the task field corresponding to the word-level hidden variable, so as to complete the voice command The task corresponding to the information.
- FIG. 18 is an optional semantic processing method of the semantic understanding model provided by the embodiment of the present application. It is understandable that the steps shown in Figure 18 can be executed by various electronic devices that run the semantic understanding model processing device, for example, it can be a special terminal with a sentence processing function to be translated, or a sentence processing function with a sentence to be translated. Server or server cluster. The steps shown in FIG. 18 will be described below.
- Step 1801 The semantic understanding model processing device obtains voice command information, and converts the voice command into corresponding recognizable text information;
- Step 1802 The semantic understanding model processing device determines at least one word-level hidden variable corresponding to the identifiable text information through the semantic representation layer network of the semantic understanding model;
- Step 1803 The semantic understanding model processing device determines an object that matches the word-level hidden variable according to the at least one word-level hidden variable through the domain-independent detector network of the semantic understanding model;
- Step 1804 The semantic understanding model processing device uses the domain classification network of the semantic understanding model; according to the at least one word-level hidden variable, determining a task domain corresponding to the word-level hidden variable;
- Step 1805 The semantic understanding model processing device triggers a corresponding business process according to the object matching the word-level hidden variable and the task field corresponding to the word-level hidden variable.
- Figure 19 is a schematic diagram of the usage scenario of the semantic understanding model training method provided by an embodiment of this application.
- the semantic understanding model training method provided by this application can serve as a form of cloud service for various types of customers (packaged in a vehicle terminal or packaged in different mobile electronic devices).
- Figure 20 is a semantic understanding model provided by an embodiment of this application.
- FIG. 21 is a schematic diagram of an optional processing flow of the semantic understanding model training method provided by this application, including the following steps:
- Step 2101 Acquire voice information, and convert the voice information into corresponding text information.
- the user’s voice signal is converted into a text signal through the semantic understanding module, and the text is extracted from the user’s domain, intention, and parameter structure information through the natural language understanding module, and these semantic elements are passed to the dialogue
- the management module performs query processing or status management strategies, and finally the output of the system is broadcast to the user through speech synthesis.
- Step 2102 In response to the text information, trigger an active learning process to obtain corresponding training samples.
- FIG. 22 is a schematic diagram of the active learning process in the processing of the semantic understanding model training method provided by the embodiment of the application.
- OTD model negative corpus model
- the domain classifier model both need to mine a large number of negative samples.
- the cost of manual labeling is limited. Therefore, it is necessary to mine the most valuable samples with the largest amount of information and the largest gains to the model from the massive data with limited annotation manpower.
- the data mining process shown in Figure 22 can be constructed based on the idea of Active Learning. Therefore, a complete set of closed-loop data mining process based on Active Learning, from data generation, selection to labeling, to model training. It is ensured that the generated samples are the most needed and helpful samples for the semantic understanding model, and the labeling labor cost is effectively reduced by screening the samples.
- Step 2103 Perform optimization processing on the acquired training samples.
- the One VS All method is used to organize the positive and negative samples. This method determines that the ratio of positive and negative samples of a domain classifier is unbalanced. In some optional scenarios, the positive and negative samples are The ratio has reached 1:100, and in some extreme cases it has reached 1:2000. In the actual use of the semantic understanding model, even if there are sufficient negative samples in certain fields, the FAR index of the trained model is still relatively high. Therefore, a negative sample distribution optimization strategy can be proposed through analysis of bad cases and experiments.
- the negative samples are grouped according to their importance (public negative samples, domain negative samples, positive samples in other related fields, and positive samples in other unrelated fields), and each group of samples is assigned a different weight. Negative domain samples and positive samples related to other domains are assigned higher weights, and other negative samples are assigned lower weights.
- Step 2104 Train the semantic understanding model through the optimized training samples to determine the parameters of the semantic understanding model.
- the trained semantic understanding model can be used to recognize and process voice commands in a noisy environment.
- Figure 23 is a schematic diagram of an optional model structure of the semantic understanding model provided by an embodiment of the application.
- a multi-task learning (Multi-task Learning) training method can be used to jointly perform the OOD model and The domain classification model is trained.
- An optional network structure is shown in Figure 23. The entire network structure is divided into two layers:
- the pre-training model based on BERT is used as the semantic representation layer.
- the output layer related to downstream tasks can be represented by a fully connected network.
- the semantic understanding model training method provided in this application can jointly train the OOD detector model and the domain classification model.
- the OOD model is a binary classification task used to determine whether the corpus is IND or Out of Domain.
- the domain classifier model is composed of multiple two classifiers, which can adopt the One V.S All data organization method.
- the domain classifier is used to determine which domain (weather, navigation, music, etc.) the corpus is in the IND.
- OOD and domain classifiers are two very related tasks, if the corpus is OOD, then it must be a negative sample of the two classifiers in all domains. If the corpus is IND, then it must be one of the domain classifiers or Positive samples in multiple fields.
- a joint loss function can be constructed:
- L_ D ( ⁇ ) is the loss generated by the domain classifier
- L_ O ( ⁇ ) is the loss generated by the OOD detector
- ⁇ is a hyperparameter that controls the degree of influence of OOD on the loss of the entire model.
- A can be used in actual training. When set to 1, the loss of the output layer can use cross entropy:
- p is the soft-max prediction probability of the sample
- p ⁇ ' is the ground-truth label of the sample.
- the parameters of the semantic representation layer BERT are fine-tuning during the training process, and the output layer parameters of the OOD and the classifiers in each field are independently optimized.
- the semantic understanding model training method can effectively reduce the false recognition rate of the dialogue and ensure that the assistant will not respond incorrectly during the dialogue. Further, through Active Learning mining a large number of negative samples for model training, after several cycles of iteration, the semantic understanding model dropped from the initial high false recognition rate to a reasonable range. At the same time, by grouping negative samples and assigning different weights to different groups to adjust the internal sample distribution, the misrecognition rate is further reduced.
- Figure 24 is a schematic diagram of using the semantic understanding model to wake up the application packaged in the vehicle system
- Figure 25 is a schematic diagram of using the semantic understanding model to check the weather packaged in the vehicle system.
- a post-processed rank model can also be added to task-specific layers.
- the input of the model is OOD and the prediction scores of classifiers in various fields, and the prediction results of the entire model are output.
- only one level of logical processing is performed on the OOD prediction result and the domain classifier prediction result, that is, when the OOD model predicts out of domain, the result is directly returned and no domain classifier prediction is performed.
- the OOD model may make prediction errors.
- the domain classifier model has a high degree of confidence in the prediction.
- the final result is IND.
- the alternative solution can give a reasonable prediction result on the basis of comprehensive comparison by learning this combination relationship. In order to reduce the error rate of the semantic understanding results of the semantic understanding model.
- the first training sample set is a sentence sample with noise acquired through an active learning process
- denoising processing is performed on the first training sample set to form a corresponding second Training sample set
- the second training sample set is processed through the semantic understanding model to determine the initial parameters of the semantic understanding model
- the semantic understanding model is used to
- the second training sample set is processed to determine the update parameters of the semantic understanding model; according to the update parameters of the semantic understanding model, the semantic representation layer network parameters and tasks of the semantic understanding model are analyzed through the second training sample set.
- the relevant output layer network parameters are updated iteratively, thereby making the generalization ability of the semantic understanding model stronger, improving the training accuracy and training speed of the semantic understanding model, and effectively making full use of the gains of the existing noise sentences for model training. , So that the semantic understanding model can adapt to different usage scenarios, and reduce the impact of environmental noise on the semantic understanding model.
- a first training sample set is obtained, where the first training sample set is a sentence sample with noise obtained through an active learning process; the first training sample set is denoised to Form a corresponding second training sample set; process the second training sample set through the semantic understanding model to determine the initial parameters of the semantic understanding model; in response to the initial parameters of the semantic understanding model, pass the semantics
- the understanding model processes the second training sample set to determine the update parameters of the semantic understanding model; according to the update parameters of the semantic understanding model, the semantic representation of the semantic understanding model through the second training sample set
- Layer network parameters and task-related output layer network parameters are updated iteratively, thereby making the generalization ability of the semantic understanding model stronger, improving the training accuracy and training speed of the semantic understanding model, and making full use of existing noise sentences.
- the gain in model training enables the semantic understanding model to adapt to different usage scenarios, reduces the impact of environmental noise on the semantic understanding model, reduces the invalid triggering of electronic devices, and is more conducive to the deployment of the semantic understanding model in mobile terminals.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
一种语义理解模型训练方法、语义理解模型处理方法、装置、电子设备及存储介质,所述语义理解模型训练方法包括:获取第一训练样本集合,其中所述第一训练样本集合为通过主动学习进程所获取的带有噪声的语句样本(401);对第一训练样本集合进行去噪处理,以形成相应的第二训练样本集合(402);通过语义理解模型对第二训练样本集合进行处理,以确定语义理解模型的初始参数(403);响应于语义理解模型的初始参数,通过语义理解模型对第二训练样本集合进行处理,确定语义理解模型的更新参数(404);根据语义理解模型的更新参数,通过第二训练样本集合对语义理解模型的语义表示层网络参数和任务相关输出层网络参数进行迭代更新(405)。
Description
本申请基于申请号为201911047037.9、申请日为2019年10月30日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
本申请涉及机器学习技术,尤其涉及一种语义理解模型的训练方法、语义处理方法、装置、电子设备及存储介质。
全双工语音交互的使用场景中,需要在多个音源同时持续发出声音的多声源环境中实现以下操作:例如对比语音身份的识别(男、女、儿童),触发不同内容的对话,语音情绪识别、音乐/歌声识别等;环境处理,针对背景的噪声识别与回声消除,这一过程中语义理解模型全双工的对话场景下,背景噪声、和他人的闲聊等领域无关(OOD,Out-Of-Domain)的语料更容易被助手收听进来,这样的语料如果被智能助手误响应,那么交互成功率较低,影响用户的使用体验。因此,在全双工场景下对对话系统中的领域意图识别精度要求更高,需要语义理解模型懂得何时该拒识(即拒绝响应),何时该响应用户说的话,以提升用户的使用体验,也减少电子设备由于频繁地无效触发所造成的电量消耗。
发明内容
有鉴于此,本申请实施例提供一种语义理解模型的训练方法、语义处理方法、装置、电子设备及存储介质,能够使得语义理解模型的泛化能力更强,提升语义理解模型的训练精度与训练速度;同时还可以充分利用已有 的噪声语句来获取模型训练上的增益,使得语义理解模型能够适应不同的使用场景,减少环境噪声对语义理解模型的影响。
本申请实施例的技术方案是这样实现的:
本申请提供了一种语义理解模型训练方法,包括:
获取第一训练样本集合,其中所述第一训练样本集合为通过主动学习进程所获取的带有噪声的语句样本;
对所述第一训练样本集合进行去噪处理,以形成相应的第二训练样本集合;
通过语义理解模型对所述第二训练样本集合进行处理,以确定所述语义理解模型的初始参数;
响应于所述语义理解模型的初始参数,通过所述语义理解模型对所述第二训练样本集合进行处理,确定所述语义理解模型的更新参数;
根据所述语义理解模型的更新参数,通过所述第二训练样本集合对所述语义理解模型的语义表示层网络参数和任务相关输出层网络参数进行迭代更新。
本申请还提供了一种语义理解模型的语义处理方法,包括:
获取语音指令信息,并将所述语音指令转换为相应的可识别文本信息;
通过所述语义理解模型的语义表示层网络,确定与可识别文本信息所对应的至少一个词语级的隐变量;
通过所述语义理解模型的领域无关检测器网络,根据所述至少一个词语级的隐变量,确定与所述词语级的隐变量相匹配的对象;
通过所述语义理解模型的领域分类网络;根据所述至少一个词语级的隐变量,确定与所述词语级的隐变量相对应的任务领域;
根据与所述词语级的隐变量相匹配的对象,和与所述词语级的隐变量相对应的任务领域,触发相应的业务进程,以实现完成与所述语音指令信 息相对应的任务,
其中,所述语义理解模型基于如前述的方法训练得到。
本申请实施例还提供了一种语义理解模型的训练装置,包括:
数据传输模块,配置为获取第一训练样本集合,其中所述第一训练样本集合为通过主动学习进程所获取的带有噪声的语句样本;
去噪模块,配置为对所述第一训练样本集合进行去噪处理,以形成相应的第二训练样本集合;
语义理解模型训练模块,配置为通过语义理解模型对所述第二训练样本集合进行处理,以确定所述语义理解模型的初始参数;
所述语义理解模型训练模块,配置为响应于所述语义理解模型的初始参数,通过所述语义理解模型对所述第二训练样本集合进行处理,确定所述语义理解模型的更新参数;
所述语义理解模型训练模块,配置为根据所述语义理解模型的更新参数,通过所述第二训练样本集合对所述语义理解模型的语义表示层网络参数和任务相关输出层网络参数进行迭代更新。
本申请实施例还提供了一种语义理解模型处理装置,包括:
文本转换模块,配置为获取语音指令信息,并将所述语音指令转换为相应的可识别文本信息;
语义表示层网络模块,配置为通过所述语义理解模型的语义表示层网络,确定与可识别文本信息所对应的至少一个词语级的隐变量;
领域无关检测器网络模块,配置为通过所述语义理解模型的领域无关检测器网络,根据所述至少一个词语级的隐变量,确定与所述词语级的隐变量相匹配的对象;
领域分类网络模块,配置为通过所述语义理解模型的领域分类网络;根据所述至少一个词语级的隐变量,确定与所述词语级的隐变量相对应的 任务领域;
信息处理模块,配置为根据与所述词语级的隐变量相匹配的对象,和与所述词语级的隐变量相对应的任务领域,触发相应的业务进程,以实现完成与所述语音指令信息相对应的任务,
本申请实施例还提供了一种电子设备,包括:
存储器,配置为存储可执行指令;
处理器,配置为运行所述存储器存储的可执行指令时,实现前序的语义理解模型的训练方法。
本申请实施例还提供了一种电子设备,包括:
存储器,配置为存储可执行指令;
处理器,配置为运行所述存储器存储的可执行指令时,实现前序的语义理解模型的语义处理方法。
本申请实施例还提供了一种计算机可读存储介质,存储有可执行指令,其中,所述可执行指令被处理器执行时实现前序的语义理解模型的训练方法,或者实现前序的语义处理方法。
本申请实施例具有以下有益效果:
通过获取第一训练样本集合,其中所述第一训练样本集合为通过主动学习进程所获取的带有噪声的语句样本;对所述第一训练样本集合进行去噪处理,以形成相应的第二训练样本集合;通过语义理解模型对所述第二训练样本集合进行处理,以确定所述语义理解模型的初始参数;响应于所述语义理解模型的初始参数,通过所述语义理解模型对所述第二训练样本集合进行处理,确定所述语义理解模型的更新参数;根据所述语义理解模型的更新参数,通过所述第二训练样本集合对所述语义理解模型的语义表示层网络参数和任务相关输出层网络参数进行迭代更新,由此,使得语义理解模型的泛化能力更强,提升语义理解模型的训练精度与训练速度,同 时还可以有效充分利用已有的噪声语句对模型训练的增益,使得语义理解模型能够适应不同的使用场景,减少环境噪声对语义理解模型的影响,减少电子设备的无效触发,有利于语义理解模型部署在移动终端中。
为了更清楚地说明本申请实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的语义理解模型训练方法的使用场景示意图;
图2为本申请实施例提供的语义理解模型训练装置的组成结构示意图;
图3为基于RNN的Seq2Seq模型生成语义理解结果的示意图;
图4为本申请实施例提供的语义理解模型训练方法一个可选的流程示意图;
图5为本申请实施例中语义表示层网络模型一个可选的结构示意图;
图6为本申请实施例中语义表示层网络模型一个可选的词语级机器阅读示意图;
图7为本申请实施例中语义表示层网络模型中编码器一个可选的结构示意图;
图8为本申请实施例中语义表示层网络模型中编码器的向量拼接示意图;
图9为本申请实施例中语义表示层网络模型中编码器的编码过程示意图;
图10为本申请实施例中语义表示层网络模型中解码器的解码过程示意图;
图11为本申请实施例中语义表示层网络模型中解码器的解码过程示意图;
图12为本申请实施例中语义表示层网络模型中解码器的解码过程示意图;
图13为本申请实施例中语义表示层网络模型一个可选的语句级机器阅读示意图;
图14为本申请实施例提供的语义理解模型训练方法一个可选的流程示意图;
图15为本申请实施例提供的语义理解模型训练方法一个可选的流程示意图;
图16A为本申请实施例提供的语义理解模型训练方法一个可选的流程示意图;
图16B为本申请实施例提供的语义理解模型训练方法一个可选的边界语料扩充示意图;
图17为本申请实施例提供的语义理解模型处理装置的组成结构示意图;
图18为本申请实施例提供的语义理解模型的语义处理方法一个可选的流程示意图;
图19为本申请实施例提供的语义理解模型训练方法的使用场景示意图;
图20为本申请实施例提供的语义理解模型训练方法的使用场景示意图;
图21为本申请所提供的语义理解模型训练方法的一个可选的处理流程示意图;
图22为本申请实施例提供的语义理解模型训练方法的处理过程中主动 学习进程示意图;
图23为本申请实施例提供的语义理解模型一个可选的模型结构示意图;
图24为封装于车载系统中使用语义理解模型唤醒应用的示意图;
图25为封装于车载系统中使用语义理解模型查阅天气的示意图。
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述,所描述的实施例不应视为对本申请的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
对本申请实施例进行进一步详细说明之前,对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释。
1)机器阅读理解:一种将文本问题和相关文档作为输入将文本答案作为输出的自动问答技术
2)BERT:全称为Bidirectional Encoder Representations from Transformers,一种利用海量文本的语言模型训练方法。该方法被广泛用于多种自然语言处理任务,如文本分类、文本匹配、机器阅读理解等。
3)人工神经网络:简称神经网络(Neural Network,NN),在机器学习和认知科学领域,是一种模仿生物神经网络结构和功能的数学模型或计算模型,用于对函数进行估计或近似。
4)模型参数:是使用通用变量来建立函数和变量之间关系的一个数量。 在人工神经网络中,模型参数通常是实数矩阵。
5)API:全称Application Programming Interface,可语义理解成应用程序接口,是一些预先定义的函数,或指软件系统不同组成部分衔接的约定。目的是提供应用程序与开发人员基于某软件或硬件得以访问一组例程的能力,而又无需访问原码,或理解内部工作机制的细节。
6)SDK:全称Software Development Kit,可语义理解成软件开发工具包,是为特定的软件包、软件框架、硬件平台、操作系统等建立应用软件时的开发工具的集合广义上包括辅助开发某一类软件的相关文档、范例和工具的集合。
7)生成对抗网络(Generative Adversarial Network,简称GAN):非监督式学习的一种方法,通过让两个神经网络相互博弈的方式进行学习,一般由一个生成网络与一个判别网络组成。生成网络从潜在空间(latent space)中随机采样作为输入,其输出结果需要尽量模仿训练集中的真实样本。判别网络的输入则为真实样本或生成网络的输出,其目的是将生成网络的输出从真实样本中尽可能分辨出来。而生成网络则要尽可能地欺骗判别网络。两个网络相互对抗、不断调整参数,最终目的是使判别网络无法判断生成网络的输出结果是否真实。
8)全双工:在人机交互对话场景下,不用重复唤醒,基于流式语音、语义技术让智能助手拥有边听边想,随时打断的交互能力。
9)自然语言理解:NLU(Natural Language Understanding),在对话系统中对用户所说的话进行语义的信息抽取,包括领域意图识别和槽填充(slot filling)。
10)多任务学习:Multi-task Learning,在机器学习领域,通过同时对多个相关任务进行联合学习、优化,可以达到比单个任务更好的模型精度,多个任务通过共享表示层来彼此帮助,这种训练方法称为多任务学习,也 叫联合学习(Joint Learning)。
11)主动学习:Active Learning,在监督学习中,机器学习模型通过对训练数据的拟合,来学习数据到预测结果之间的映射关系,主动学习通过设计数据采样方法来挑选对于模型而言信息量最大的样本数据来标注,相对于随机采样方法,标注后的数据重新加入样本训练后,模型的收益最大。
12)OOD:Out of Domain,对于任务型(task-oriented)的对话系统而言,通常会预先定义多个垂直领域(domain):查天气,导航,音乐等,来满足用户的任务需求。不落入任何一个任务型领域中的用户query即为OOD语料,比如有闲聊、知识问答、语义理解错误等,与之相对的是In domain(IND)语料,即属于任意一个预先定义领域中的语料。
13)FAR:False Acceptance Rate,被错误识别到任何一个领域中的OOD语料占所有OOD语料的比例。该指标反映了智能助手的误识率,该指标越低越好。在全双工场景下,对该指标有严格的限制,必须处于一个非常低的水平。
14)FRR:False Rejection Rate,在所有的IND语料中,未被任意一个领域召回的语料数量占所有IND语料数的比例。该指标越低越好,反映了智能助手的拒识率。
15)语音语义理解(Speech Translation):又称自动语音语义理解,是通过计算机将一种自然语言的语音语义理解为另一种自然语言的文本或语音的技术,一般可以由语义理解和机器语义理解两阶段组成。
图1为本申请实施例提供的语义理解模型训练方法的使用场景示意图,参考图1,终端(包括终端10-1和终端10-2)上设置有语义理解软件的客户端,用户通过所设置的语义理解软件客户端可以输入相应的待语义理解语句,聊天客户端也可以接收相应的语义理解结果,并将所接收的语义理解结果向用户进行展示;终端通过网络300连接服务器200,网络300可以 是广域网或者局域网,又或者是二者的组合,使用无线链路实现数据传输。
作为一个示例,服务器200配置为布设所述语义理解模型并对所述语义理解模型进行训练,以对所述语义理解模型的语义表示层网络参数和任务相关输出层网络参数进行迭代更新,以实现将通过语义理解模型中语义表示层网络和任务相关输出层网络生成针对目标待语义理解语句的语义理解结果,并通过终端(终端10-1和/或终端10-2)展示语义理解模型所生成的与待语义理解语句相对应的语义理解结果。
当然在通过语义理解模型对目标待语义理解语句进行处理以生成相应的语义理解结果之前,还需要对语义理解模型进行训练,在本申请的一些实施例中可以包括:获取第一训练样本集合,其中所述第一训练样本集合为通过主动学习进程所获取的带有噪声的语句样本;对所述第一训练样本集合进行去噪处理,以形成相应的第二训练样本集合;通过语义理解模型对所述第二训练样本集合进行处理,以确定所述语义理解模型的初始参数;响应于所述语义理解模型的初始参数,通过所述语义理解模型对所述第二训练样本集合进行处理,确定所述语义理解模型的更新参数;根据所述语义理解模型的更新参数,通过所述第二训练样本集合对所述语义理解模型的语义表示层网络参数和任务相关输出层网络参数进行迭代更新。
下面对本申请实施例的语义理解模型的训练装置的结构做详细说明,语义理解模型的训练装置可以各种形式来实施,如带有语义理解模型训练功能的专用终端,也可以为设置有语义理解模型训练功能的服务器,例如前序图1中的服务器200。图2为本申请实施例提供的语义理解模型的训练装置的组成结构示意图,可以理解,图2仅仅示出了语义理解模型的训练装置的示例性结构而非全部结构,根据需要可以实施图2示出的部分结构或全部结构。
本申请实施例提供的语义理解模型的训练装置包括:至少一个处理器 201、存储器202、用户接口203和至少一个网络接口204。语义理解模型的训练装置20中的各个组件通过总线系统205耦合在一起。可以理解,总线系统205配置为实现这些组件之间的连接通信。总线系统205除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图2中将各种总线都标为总线系统205。
其中,用户接口203可以包括显示器、键盘、鼠标、轨迹球、点击轮、按键、按钮、触感板或者触摸屏等。
可以理解,存储器202可以是易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。本申请实施例中的存储器202能够存储数据以支持终端(如10-1)的操作。这些数据的示例包括:配置为在终端(如10-1)上操作的任何计算机程序,如操作系统和应用程序。其中,操作系统包含各种系统程序,例如框架层、核心库层、驱动层等,配置为实现各种基础业务以及处理基于硬件的任务。应用程序可以包含各种应用程序。
在一些实施例中,本申请实施例提供的语义理解模型的训练装置可以采用软硬件结合的方式实现,作为示例,本申请实施例提供的语义理解模型训练装置可以是采用硬件译码处理器形式的处理器,其被编程以执行本申请实施例提供的语义理解模型训练方法。例如,硬件译码处理器形式的处理器可以采用一个或多个应用专用集成电路(ASIC,Application Specific Integrated Circuit)、DSP、可编程逻辑器件(PLD,Programmable Logic Device)、复杂可编程逻辑器件(CPLD,Complex Programmable Logic Device)、现场可编程门阵列(FPGA,Field-Programmable Gate Array)或其他电子元件。
作为本申请实施例提供的语义理解模型的训练装置采用软硬件结合实施的示例,本申请实施例所提供的语义理解模型的训练装置可以直接体现 为由处理器201执行的软件模块组合,软件模块可以位于存储介质中,存储介质位于存储器202,处理器201读取存储器202中软件模块包括的可执行指令,结合必要的硬件(例如,包括处理器201以及连接到总线205的其他组件)完成本申请实施例提供的语义理解模型训练方法。
作为示例,处理器201可以是一种集成电路芯片,具有信号的处理能力,例如通用处理器、数字信号处理器(DSP,Digital Signal Processor),或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,其中,通用处理器可以是微处理器或者任何常规的处理器等。
作为本申请实施例提供的语义理解模型的训练装置采用硬件实施的示例,本申请实施例所提供的装置可以直接采用硬件译码处理器形式的处理器201来执行完成,例如,被一个或多个应用专用集成电路(ASIC,Application Specific Integrated Circuit)、DSP、可编程逻辑器件(PLD,Programmable Logic Device)、复杂可编程逻辑器件(CPLD,Complex Programmable Logic Device)、现场可编程门阵列(FPGA,Field-Programmable Gate Array)或其他电子元件执行实现本申请实施例提供的语义理解模型训练方法。
本申请实施例中的存储器202配置为存储各种类型的数据以支持语义理解模型的训练装置20的操作。这些数据的示例包括:配置为在语义理解模型的训练装置20上操作的任何可执行指令,如可执行指令,实现本申请实施例的从语义理解模型训练方法的程序可以包含在可执行指令中。
在另一些实施例中,本申请实施例提供的语义理解模型的训练装置可以采用软件方式实现,图2示出了存储在存储器202中的语义理解模型的训练装置,其可以是程序和插件等形式的软件,并包括一系列的模块,作为存储器202中存储的程序的示例,可以包括语义理解模型的训练装置,语义理解模型的训练装置中包括以下的软件模块:数据传输模块2081,去 噪模块2082和语义理解模型训练模块2083。当语义理解模型的训练装置中的软件模块被处理器201读取到RAM中并执行时,将实现本申请实施例提供的语义理解模型训练方法,下面介绍本申请实施例中语义理解模型的训练装置中各个软件模块的功能,其中,
数据传输模块2081,配置为获取第一训练样本集合,其中所述第一训练样本集合为通过主动学习进程所获取的带有噪声的语句样本;
去噪模块2082,配置为对所述第一训练样本集合进行去噪处理,以形成相应的第二训练样本集合;
语义理解模型训练模块2083,配置为通过语义理解模型对所述第二训练样本集合进行处理,以确定所述语义理解模型的初始参数;
所述语义理解模型训练模块2083,配置为响应于所述语义理解模型的初始参数,通过所述语义理解模型对所述第二训练样本集合进行处理,确定所述语义理解模型的更新参数;
所述语义理解模型训练模块2083,配置为根据所述语义理解模型的更新参数,通过所述第二训练样本集合对所述语义理解模型的语义表示层网络参数和任务相关输出层网络参数进行迭代更新。
结合图2示出的语义理解模型的训练装置20说明本申请实施例提供的语义理解模型的训练方法,在介绍本申请实施例提供的语义理解模型的训练方法之前,首先介绍本申请中语义理解模型根据待语义理解语句生成相应语义理解结果的过程中,图3为传统方案中生成语义理解结果的示意图,其中,eq2seq模型是以编码器(Encode)和解码器(Decode)为代表的架构方式,seq2seq模型是根据输入序列X来生成输出序列Y。编码器(Encode)和解码器(Decode)为代表的seq2seq模型中,编码器(Encode)是将输入序列转化成一个固定长度的向量,解码器(Decode)将输入的固定长度向量解码成输出序列。如图3所示,编码器(Encoder)对输入的待语义理解 语句进行编码,得到待语义理解语句的文本特征;解码器(Decoder)对文本特征进行解码后输出生成相应的语义理解结果,其中,编码器(Encode)和解码器(Decode)是一一对应的。
可见,对于图3所示的相关技术来说基于Seq2Seq模型的语义理解模型的缺点在于,相关技术中的模型本身只对训练数据目标文本y-标注信息建立一对一的关系,并且使用MLE进行模型的优化,这导致了模型会生成很多高频的通用回复,这些回复往往没有意义且很短。同时,很多实际场景中,同一个目标文本y可以有很多种标注信息,现有的Seq2Seq模型由于编码器(Encode)和解码器(Decode)是一一对应的,并不能够有效对这种一对多问题进行处理,同时很容易受到噪声信息的干扰,触发无用的识别,用户体验差。
为解决这一相关技术中的缺陷,参见图4,图4为本申请实施例提供的语义理解模型训练方法一个可选的流程示意图,可以理解地,图4所示的步骤可以由运行语义理解模型训练装置的各种电子设备执行,例如可以是如带有样本生成功能的专用终端、带有语义理解模型训练功能的服务器或者服务器集群。下面针对图4示出的步骤进行说明。
步骤401:语义理解模型训练装置获取第一训练样本集合,其中所述第一训练样本集合为通过主动学习进程所获取的带有噪声的语句样本。
在本申请的一些实施例中,第一训练样本集合可以为同一语种的语言样本,或者也可以为不同语种的语言样本,对此不作限制。其中,第一训练样本集合的语种可以根据实际翻译需求进行设置。例如,当翻译模型应用于中译英的应用场景时,第一训练样本集合的语种可以为中文,再例如,当翻译模型应用于英译法的应用场景时,第一训练样本集合的语种可以为英文,又例如,当翻译模型应用于中法互译的应用场景时,第一训练样本集合的语种可以包括中文和/或法文。
在本申请的一些实施例中,第一训练样本集合可以为语音形式,或者也可以为文本形式,可以预先采集文本形式的第一训练样本集合和/或语音形式的第一训练样本集合,例如,可以通常的语句收集方式,采集文本形式的第一训练样本集合和/或语音形式的第一训练样本集合,并将采集的文本形式的第一训练样本集合和/或语音形式的第一训练样本集合存储在预设存储装置中。从而,本申请中,在对翻译模型进行训练时,可以从上述存储装置中,获取第一训练样本集合。
步骤402:对所述第一训练样本集合进行去噪处理,以形成相应的第二训练样本集合。
在本申请的一些实施例中,所述对所述第一训练样本集合进行去噪处理,以形成相应的第二训练样本集合,可以通过以下方式实现:
确定与所述语义理解模型的使用环境相匹配的动态噪声阈值;根据所述动态噪声阈值对所述第一训练样本集合进行去噪处理,以形成与所述动态噪声阈值相匹配的第二训练样本集合。其中由于翻译模型的使用环境不同,与所述翻译模型的使用环境相匹配的动态噪声阈值也不相同,例如,学术翻译的使用环境中,与所述翻译模型的使用环境相匹配的动态噪声阈值需要小于文章阅读环境中的动态噪声阈值。
在本申请的一些实施例中,所述对所述第一训练样本集合进行去噪处理,以形成相应的第二训练样本集合,可以通过以下方式实现:
确定与所述语义理解模型相对应的固定噪声阈值;根据所述固定噪声阈值对所述第一训练样本集合进行去噪处理,以形成与所述固定噪声阈值相匹配的第二训练样本集合。其中,当翻译模型固化于相应的硬件机构中,例如车载终端,使用环境为口语化翻译时,由于噪声较为单一,通过固定翻译模型相对应的固定噪声阈值,能够有效提神翻译模型的训练速度,减少用户的等待时间。
步骤403:语义理解模型训练装置通过语义理解模型对所述第二训练样本集合进行处理,以确定所述语义理解模型的初始参数。
步骤404:语义理解模型训练装置响应于所述语义理解模型的初始参数,通过所述语义理解模型对所述第二训练样本集合进行处理,确定所述语义理解模型的更新参数。
在本申请的一些实施例中,所述响应于所述语义理解模型的初始参数,通过所述语义理解模型对所述第二训练样本集合进行处理,确定所述语义理解模型的更新参数,可以通过以下方式实现:
将所述第二训练样本集合中不同语句样本,代入由所述语义理解模型的领域无关检测器网络和领域分类网络所组成的任务相关输出层网络所对应的损失函数;确定所述损失函数满足相应的收敛条件时对应所述语义理解模型中领域无关检测器网络参数和领域分类网络参数作为所述语义理解模型的更新参数。其中,语义理解模型的组成可以包括:语义表示层网络和任务相关输出层网络,进一步地,任务相关输出层网络包括了领域无关检测器网络和领域分类网络。
在本申请的一些实施例中,语义表示层网络可以为双向注意力神经网络模型(BERT Bidirectional Encoder Representations from Transformers)。继续参考图5,图5为本申请实施例中语义表示层网络模型一个可选的结构示意图,其中,Encoder包括:N=6个相同的layers组成,每一层包含两个sub-layers。第一个sub-layer就是多头注意力层(multi-head attention layer)然后是一个简单的全连接层。其中每个sub-layer都加了残差连接(residual connection)和归一化(normalisation)。
Decoder包括:由N=6个相同的Layer组成,其中layer和encoder并不相同,这里的layer包含了三个sub-layers,其中有一个self-attention layer,encoder-decoder attention layer最后是一个全连接层。前两个 sub-layer都是基于multi-head attention layer。
继续参考图6,图6为本申请实施例中语义表示层网络模型一个可选的词语级机器阅读示意图,其中,其中,encoder和decoder部分都包含了6个encoder和decoder。进入到第一个encoder的inputs结合embedding和positional embedding。通过了6个encoder之后,输出到了decoder部分的每一个decoder中;输入目标为“我是一个学生t”经过语义表示层网络模型的处理,输出的机器阅读示结果为:“学生”。
继续参考图7,图7为本申请实施例中语义表示层网络模型中编码器一个可选的结构示意图,其中,其输入由维度为d的查询(Q)和键(K)以及维度为d的值(V)组成,所有键计算查询的点积,并应用softmax函数获得值的权重。
继续参考图7,图7本申请实施例中语义表示层网络模型中编码器的向量示意图,其中Q,K和V的是通过输入encoder的向量x与W^Q,W^K,W^V相乘得到Q,K和V。W^Q,W^K,W^V在文章的维度是(512,64),然后假设我们inputs的维度是(m,512),其中m代表了字的个数。所以输入向量与W^Q,W^K,W^V相乘之后得到的Q、K和V的维度就是(m,64)。
继续参考图8,图8为本申请实施例中语义表示层网络模型中编码器的向量拼接示意图,其中,Z0到Z7就是对应的8个并行的head(维度是(m,64)),然后concat这个8个head之后就得到了(m,512)维度。最后与W^O相乘之后就到了维度为(m,512)的输出的矩阵,那么这个矩阵的维度就和进入下一个encoder的维度保持一致。
继续参考图9,图9为本申请实施例中语义表示层网络模型中编码器的编码过程示意图,其中,x1经过self-attention到了z1的状态,通过了self-attetion的张量还需要进过残差网络和LaterNorm的处理,然后进入到 全连接的前馈网络中,前馈网络需要进行同样的操作,进行的残差处理和正规化。最后输出的张量才可以的进入到了下一个encoder之中,然后这样的操作,迭代经过了6次,迭代处理的结果进入到decoder中。
继续参考图10,图10为本申请实施例中语义表示层网络模型中解码器的解码过程示意图,其中,decoder的输入输出和解码过程:
输出:对应i位置的输出词的概率分布;
输入:encoder的输出&对应i-1位置decoder的输出。所以中间的attention不是self-attention,它的K,V来自encoder,Q来自上一位置decoder的输出。
继续参考图11和图12,图11为本申请实施例中语义表示层网络模型中解码器的解码过程示意图,其中。解码器网络的最后一个decoder输出的向量会经过Linear层和softmax层。图12为本申请实施例中语义表示层网络模型中解码器的解码过程示意图,Linear层的作用就是对decoder部分出来的向量做映射成一个logits向量,然后softmax层根据这个logits向量,将其转换为了概率值,最后找到概率最大值的位置,即完成了解码器的输出。
在本申请的一些实施例中,第一阅读语义标注网络可以为双向注意力神经网络模(BERT Bidirectional Encoder Representations from Transformers)。继续参考图5,图5为本申请实施例中语义表示层网络模型一个可选的结构示意图,其中,Encoder包括:N=6个相同的layers组成,每一层包含两个sub-layers。第一个sub-layer就是多头注意力层(multi-head attention layer)然后是一个简单的全连接层。其中每个sub-layer都加了残差连接(residual connection)和归一化(normalisation)。
Decoder包括:由N=6个相同的Layer组成,其中layer和encoder并不相同,这里的layer包含了三个sub-layers,其中有一个self-attention layer,encoder-decoder attention layer最后是一个全连接层。前两个sub-layer都是基于multi-head attention layer。
继续参考图13,图13为本申请实施例中语义表示层网络模型一个可选的语句级机器阅读示意图,其中,其中,encoder和decoder部分都包含了6个encoder和decoder。进入到第一个encoder的inputs结合embedding和positional embedding。通过了6个encoder之后,输出到了decoder部分的每一个decoder中;输入目标为英语“I am a student”经过语义表示层网络模型的处理,输出的机器阅读示结果为:“我是一个学生”。
当然,本申请中的BERT模型也使用前向神经网络模型(Bi-LSTM Bi-directional Long Short-Term Memory)、门控循环单元网络模型(GRU Gated Recurrent Unit)模型、深度语境化词表征网络模型(ELMo embedding from language model)、GPT模型、GPT2模型代替,对此,本申请不再赘述。
步骤405:语义理解模型训练装置根据所述语义理解模型的更新参数,通过所述第二训练样本集合对所述语义理解模型的语义表示层网络参数和任务相关输出层网络参数进行迭代更新。
继续参考图14,图14为本申请实施例提供的语义理解模型训练方法一个可选的流程示意图,可以理解地,图14所示的步骤可以由运行语义理解模型训练装置的各种电子设备执行,例如可以是如带有语义理解模型训练功能的专用终端、带有语义理解模型训练功能的服务器或者服务器集群。下面针对图14示出的步骤进行说明。
步骤1401:语义理解模型训练装置通过所述语义理解模型的更新参数,确定与所述第二训练样本集合相匹配的第二噪声参数。
其中,所述第二噪声参数用于表征所述第二训练样本集合中平行语句样本的噪声值;其中,第二训练样本集合中的每一个训练样本的权重都是相同的,这些权重相同训练样本可以称为平行语句样本。
步骤1402:语义理解模型训练装置当所述第二噪声参数到达相应的噪声值阈值时,根据所述第二噪声参数的噪声值,对所述语义理解模型的语义表示层网络参数和任务相关输出层网络参数进行迭代更新,直至所述语义理解模型的领域无关检测器网络和所述领域分类网络构成的任务相关输出层网络对应的损失函数满足对应的收敛条件。
步骤1403:语义理解模型训练装置响应于所述语义理解模型的领域无关检测器网络和领域分类网络所组成的任务相关输出层网络所对应的损失函数。
步骤1404:语义理解模型训练装置对所述语义理解模型的语义表示层网络进行参数调整。
由此,以实现所述语义表示层网络的参数与所述任务相关输出层网络所对应的损失函数相适配。
其中,其中,编码器网络的损失函数表示为:
loss_A=∑(decoder_A(encoder(warp(x1)))-x1)2;其中,decoder_A为解码器A,warp为待识别语句的函数,x
1为待识别语句,encoder为编码器。
在迭代训练的过程中,通过将待识别语句代入编码器网络的损失函数,求解损失函数按照梯度(例如最大梯度)下降时编码器A和解码器A的参数,当损失函数收敛时(即确定能够形成与所述待识别语句所对应的词语级的隐变量时),结束训练。
对编码器网络的训练过程中,编码器网络的损失函数表示为:loss_B=∑(decoder_B(encoder(warp(x2)))-x2)2;其中,decoder_B为解码器B,warp为待识别语句的函数,x2为待识别语句,encoder为编码器。
在迭代训练的过程中,通过将待识别语句代入编码器网络的损失函数,求解损失函数按照梯度(例如最大梯度)下降时编码器B和解码器B的参 数;当损失函数收敛时(即当解码得到与所述待识别语句相对应的翻译结果的被选取概率时),结束调整和训练。
继续参考图15,图15为本申请实施例提供的语义理解模型训练方法一个可选的流程示意图,可以理解地,图15所示的步骤可以由运行语义理解模型训练装置的各种电子设备执行,例如可以是如带有语义理解模型训练功能的专用终端、带有语义理解模型训练功能的服务器或者服务器集群。下面针对图15示出的步骤进行说明。
步骤1501:语义理解模型训练装置对所述第二训练样本集合进行负例处理,以形成与所述第二训练样本集合相对应的负例样本集合。
其中,所述负例样本集合配置为调整所述语义理解模型的领域无关检测器网络参数和领域分类网络参数调整。
在本申请的一些实施例中,所述对所述第一训练样本集合进行负例处理,可以通过以下方式实现:
将所述语义理解模型的领域分类网络中待输出语句进行随机组合,以形成与所述第一训练样本集合相对应的负例样本集合;或者,
对所述语义理解模型的领域分类网络中待输出语句进行随机删除处理或替换处理以形成与所述第一训练样本集合相对应的负例样本集合。
步骤1502:语义理解模型训练装置根据所述负例样本集合确定相应的双语评估研究值。其中,当语义理解模型所应用的全双工语音交互的使用场景为非中文(可以是单一的英语或其他语种的使用环境,也可以是至少包括两种语言声源的使用环境)使用环境时,根据所述负例样本集合所确定相应的双语评估研究值可以配置为作为监督参数对所述语义理解模型的语义理解结果进行评价。
在本申请的一些实施例中,语义表示层网络对应的编码器和对应的解码器可以为双向网络模型,例如可以均选用Bi-GRU双向GRU模型作为对 应的编码器和对应的解码器,此处的Bi-GRU双向GRU模型是一种可以识别倒装句结构的模型。由于用户在输入对话语句时,可能使得该对话语句为倒装句结构,即与正常的语句结构不一样,例如用户输入的对话语句为“天气怎么样今天”,而正常的语句结构为“今天天气怎么样”,采用Bi-GRU双向GRU模型可以识别出倒装句结构的对话语句,从而可以丰富训练后的模型的功能,进而可以提高最终训练得到的目标模型的鲁棒性。
继续参考图16A,图16A为本申请实施例提供的语义理解模型训练方法一个可选的流程示意图,可以理解地,图15所示的步骤可以由运行语义理解模型训练装置的各种电子设备执行,例如可以是如带有语义理解模型训练功能的专用终端、带有语义理解模型训练功能的服务器或者服务器集群。下面针对图16A示出的步骤进行说明。
步骤1601:语义理解模型训练装置对数据源中的训练样本进行召回处理。
其中,数据源中包括各类型应用场景的数据作为相应的训练本的数据来源,例如,本申请所提供的语义理解模型可以作为软件模块封装于车载电子设备中,也可以封装于不同的智能家居(包括但不限于:音箱、电视、冰箱、空调、洗衣机、灶具),当然也可以固化于智能机器人的硬件设备中,针对这些语义理解模型的不同使用场景,可以使用相对应的训练样本对语义理解模型进行针对性性的训练。
步骤1602:语义理解模型训练装置根据所述召回处理的结果,触发相应的主动学习进程,以实现获取所述数据源中带有噪声的语句样本。
步骤1603:语义理解模型训练装置对所述主动学习进程中所获取的带有噪声的语句样本进行标注,以形成所述第一训练样本集合。
在本申请的一些实施例中,所述对所述主动学习进程中所获取的带有噪声的语句样本进行标注,以形成所述第一训练样本集合,可以通过以下 方式实现:
确定所述带有噪声的语句样本的样本类型;对所述语句样本的样本类型中的负例样本进行排序,根据对所述负例样本的排序结果,为所述负例样本配置相应的权重,以形成包括不同权重训练样本的第一训练样本集合。
在本申请的一些实施例中,语义理解模型训练装置可以响应于所述主动学习进程,触发主动探索进程,以实现对所述与车载环境相匹配的带有噪声的语句样本进行边界语料扩充处理。
其中,语义理解模型训练装置可以响应于所述主动学习进程,触发主动探索进程中的文本相似聚类网络,以确定所述与车载环境相匹配的带有噪声的语句样本的文本聚类中心;根据所述与车载环境相匹配的带有噪声的语句样本的文本聚类中心,对所述数据源进行检索,以实现对所述与车载环境相匹配的带有噪声的语句样本进行文本增广;根据对所述与车载环境相匹配的带有噪声的语句样本进行文本增广的结果,触发相应的流形学习进程对所述文本增广的结果进行降维处理,以实现对所述与车载环境相匹配的带有噪声的语句样本进行边界语料扩充。其中,参考图16B,图16B为本申请实施例提供的语义理解模型训练方法一个可选的边界语料扩充示意图,通过主动探索进程中的文本相似聚类网络,确定与车载环境相匹配的带有噪声的语句样本的文本聚类中心,并以此对所述数据源进行检索,获取与车载环境相匹配的带有噪声的语句样本相关联的语句样本,可以有效增加与车载环境相匹配的带有噪声的语句样本的数量,但是由于训练样本语句的增广过程中,训练样本的维度增高,因此,通过流形学习进程对文本增广的结果进行降维处理,可以减少后续模型训练过程数据维度对于语义理解模型训练准确性的影响,同时降低训练难度,减少用户的等待时间。
下面对本申请实施例的语义理解模型处理装置的结构做详细说明,语 义理解模型处理装置可以各种形式来实施,如带有根据能够运行语义理解模型的专用终端,也可以为带有回答的功能的服务器,以根据终端中的应用程序所接收的待翻译语句生成相应的翻译结果(例如前序图1中的服务器200)。图17为本申请实施例提供的语义理解模型处理装置的组成结构示意图,可以理解,图17仅仅示出了语义理解模型处理装置的示例性结构而非全部结构,根据需要可以实施图17示出的部分结构或全部结构。
本申请实施例提供的语义理解模型处理装置包括:至少一个处理器1301、存储器1302、用户接口1303和至少一个网络接口1304。语义理解模型处理装置130中的各个组件通过总线系统1305耦合在一起。可以理解,总线系统1305配置为实现这些组件之间的连接通信。总线系统1305除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图17中将各种总线都标为总线系统1305。
其中,用户接口1303可以包括显示器、键盘、鼠标、轨迹球、点击轮、按键、按钮、触感板或者触摸屏等。
可以理解,存储器1302可以是易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。本申请实施例中的存储器1302能够存储数据以支持终端(如10-1)的操作。这些数据的示例包括:配置为在终端(如10-1)上操作的任何计算机程序,如操作系统和应用程序。其中,操作系统包含各种系统程序,例如框架层、核心库层、驱动层等,配置为实现各种基础业务以及处理基于硬件的任务。应用程序可以包含各种应用程序。
在一些实施例中,本申请实施例提供的语义理解模型处理装置可以采用软硬件结合的方式实现,作为示例,本申请实施例提供的语义理解模型处理装置可以是采用硬件译码处理器形式的处理器,其被编程以执行本申请实施例提供的语义理解模型的语义处理方法。例如,硬件译码处理器形 式的处理器可以采用一个或多个应用专用集成电路(ASIC,Application Specific Integrated Circuit)、DSP、可编程逻辑器件(PLD,Programmable Logic Device)、复杂可编程逻辑器件(CPLD,Complex Programmable Logic Device)、现场可编程门阵列(FPGA,Field-Programmable Gate Array)或其他电子元件。
作为本申请实施例提供的语义理解模型处理装置采用软硬件结合实施的示例,本申请实施例所提供的语义理解模型处理装置可以直接体现为由处理器1301执行的软件模块组合,软件模块可以位于存储介质中,存储介质位于存储器1302,处理器1301读取存储器1302中软件模块包括的可执行指令,结合必要的硬件(例如,包括处理器1301以及连接到总线1305的其他组件)完成本申请实施例提供的语义理解模型的语义处理方法。
作为示例,处理器1301可以是一种集成电路芯片,具有信号的处理能力,例如通用处理器、数字信号处理器(DSP,Digital Signal Processor),或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,其中,通用处理器可以是微处理器或者任何常规的处理器等。
作为本申请实施例提供的语义理解模型处理装置采用硬件实施的示例,本申请实施例所提供的装置可以直接采用硬件译码处理器形式的处理器1301来执行完成,例如,被一个或多个应用专用集成电路(ASIC,Application Specific Integrated Circuit)、DSP、可编程逻辑器件(PLD,Programmable Logic Device)、复杂可编程逻辑器件(CPLD,Complex Programmable Logic Device)、现场可编程门阵列(FPGA,Field-Programmable Gate Array)或其他电子元件执行实现本申请实施例提供的语义理解模型的语义处理方法。
本申请实施例中的存储器1302配置为存储各种类型的数据以支持语义理解模型处理装置130的操作。这些数据的示例包括:配置为在语义理解 模型处理装置130上操作的任何可执行指令,如可执行指令,实现本申请实施例的从语义理解模型的语义处理方法的程序可以包含在可执行指令中。
在另一些实施例中,本申请实施例提供的语义理解模型处理装置可以采用软件方式实现,图17示出了存储在存储器1302中的语义理解模型处理装置,其可以是程序和插件等形式的软件,并包括一系列的模块,作为存储器1302中存储的程序的示例,可以包括语义理解模型处理装置,语义理解模型处理装置中包括以下的软件模块:文本转换模块13081,语义表示层网络模块13082,领域无关检测器网络模块13083、领域分类网络模块13084和信息处理模块13085。当语义理解模型处理装置中的软件模块被处理器1301读取到RAM中并执行时,将实现本申请实施例提供的语义理解模型的语义处理方法,语义理解模型处理装置中各个软件模块的功能包括:
文本转换模块13081,配置为获取语音指令信息,并将所述语音指令转换为相应的可识别文本信息;
语义表示层网络模块13082,配置为通过所述语义理解模型的语义表示层网络,确定与可识别文本信息所对应的至少一个词语级的隐变量;
领域无关检测器网络模块13083,配置为通过所述语义理解模型的领域无关检测器网络,根据所述至少一个词语级的隐变量,确定与所述词语级的隐变量相匹配的对象;
领域分类网络模块13084,配置为通过所述语义理解模型的领域分类网络;根据所述至少一个词语级的隐变量,确定与所述词语级的隐变量相对应的任务领域;
信息处理模块13085,配置为根据与所述词语级的隐变量相匹配的对象,和与所述词语级的隐变量相对应的任务领域,触发相应的业务进程,以实现完成与所述语音指令信息相对应的任务。
结合图17示出的语义理解模型处理装置130说明本申请实施例提供的语义理解模型的语义处理方法,参见图18,图18为本申请实施例提供的语义理解模型的语义处理方法一个可选的流程示意图,可以理解地,图18所示的步骤可以由运行语义理解模型处理装置的各种电子设备执行,例如可以是如带有待翻译语句处理功能的专用终端、带有待翻译语句处理功能的服务器或者服务器集群。下面针对图18示出的步骤进行说明。
步骤1801:语义理解模型处理装置获取语音指令信息,并将所述语音指令转换为相应的可识别文本信息;
步骤1802:语义理解模型处理装置通过所述语义理解模型的语义表示层网络,确定与可识别文本信息所对应的至少一个词语级的隐变量;
步骤1803:语义理解模型处理装置通过所述语义理解模型的领域无关检测器网络,根据所述至少一个词语级的隐变量,确定与所述词语级的隐变量相匹配的对象;
步骤1804:语义理解模型处理装置通过所述语义理解模型的领域分类网络;根据所述至少一个词语级的隐变量,确定与所述词语级的隐变量相对应的任务领域;
步骤1805:语义理解模型处理装置根据与所述词语级的隐变量相匹配的对象,和与所述词语级的隐变量相对应的任务领域,触发相应的业务进程。
由此,以实现完成与所述语音指令信息相对应的任务。
下面以车载语义理解模型为例,对本申请所提供的语义理解模型训练方法的使用环境进行说明,参考图19和图20,图19为本申请实施例提供的语义理解模型训练方法的使用场景示意图,本申请所提供的语义理解模型训练方法可以作为云服务的形式服务可类型的客户(封装于车载终端或者封装于不同的移动电子设备中),图20为本申请实施例提供的语义理解 模型训练方法的使用场景示意图,具体使用场景本申请不做具体限制,其中,作为云服务提供给企业客户,帮助其根据不同的设备使用环境对语义理解模型进行训练。
继续参考图21,图21为本申请所提供的语义理解模型训练方法的一个可选的处理流程示意图,包括以下步骤:
步骤2101:获取语音信息,并将所述语音信息转换为对应的文本信息。
其中,参考图19的自然语言理解模块,用户的语音信号通过语义理解模块转换成文本信号,文本通过自然语言理解模块抽取出用户的领域、意图和参数等结构化信息,这些语义要素传递给对话管理模块进行询参处理,或者状态管理等策略,最后系统的输出通过语音合成播报给用户。
步骤2102:响应于所述文本信息,触发主动学习进程以获取相应的训练样本。
其中,参考图22,图22为本申请实施例提供的语义理解模型训练方法的处理过程中主动学习进程示意图,由于负语料模型(OOD模型)和领域分类器模型都需要挖掘大量的负样本,但是人工标注成本是有限的。因此需要在有限的标注人力情况下从海量的数据中挖掘到最有价值的、信息量最大的、对模型增益最大的样本。为此,可以基于Active Learning的思想,构建如图22所示的数据挖掘进程,由此,基于Active Learning的整套数据闭环挖掘流程,从数据产生、挑选到标注,再到模型训练。保障了所产生的样本对于语义理解模型来说是最亟需的、帮助最大的样本,并且通过筛选样本有效降低了标注人力成本。
步骤2103:对所获取的训练样本进行优化处理。
其中,通过步骤2102挖掘和积累了大量的OOD语料、领域负样本语料以及领域的正样本语料。在训练语义理解模型的时候,采用了One V.S All的方式进行正负样本组织,这种方式决定了一个领域分类器的正负样本比 例是不均衡的,在一些可选的场景下正负样本比例达到了1:100,在一些极端的情况下达到了1:2000。在所述语义理解模型的实际使用中,即便某些领域的负样本充足,训练出来的模型FAR指标依然比较高,因此可以通过分析bad cases和实验提出了一种负样本分布调优的策略,在本申请的一些实施例中包括:对负样本按重要程度进行分组(公共负样本、领域负样本、其他相关领域正样本、其他不相关领域正样本),每组样本赋予不同的权重,对领域负样本和其他领域相关的正样本赋予较高的权重,其他负样本赋予较低的权重。
由此,通过对负样本进行分组权重的精细化调优,能够有效地降低模型的误识率。
步骤2104:通过经过优化处理的训练样本对语义理解模型进行训练,以确定所述语义理解模型的参数。
由此,可以通过训练完成的语义理解模型对噪声环境较大的环境中的语音指令进行识别与处理。
其中,参考图23,图23为本申请实施例提供的语义理解模型一个可选的模型结构示意图,在模型网络侧可以使用多任务学习(Multi-task Learning)的训练方式来联合对OOD模型和领域分类模型进行训练。一个可选的网络结构如图23所示,整个网络结构分为两层:
1)基于BERT的预训练模型作为语义表示层。
2)与下游任务相关的输出层,二者可以使用一个全连接网络来表示。
本申请所提供的语义理解模型的训练方法,可以将OOD检测器模型和领域分类模型进行联合训练,OOD模型是一个二分类任务,用来判断该语料是IND还是Out of Domain。领域分类器模型是多个二分类器构成的,可以采用了One V.S All的数据组织方式,领域分类器用来判断该语料是IND中的哪个领域(天气、导航、音乐等)。进一步地,由于OOD和领域分类 器是非常相关的两类任务,如果该语料是OOD那么一定是所有领域二分类器的负样本,如果该语料是IND,那么一定是领域分类器中的一个或者多个领域的正样本。利用任务之间的相关性,可以构建了一个联合损失函数:
L(·)=L_
D(·)+a L_
O(·)
其中L_
D(·)为领域分类器产生的loss,L_
O(·)为OOD检测器产生的loss,α是一个超参数,控制了OOD对整个模型loss的影响程度,a可以在实际训练的时候设置为1,输出层的loss可以采用了交叉熵:
L_
D(·)=-p,logp
p为样本的soft-max预测概率,p^'为样本的ground-truth标签。语义表示层BERT的参数在训练过程中进行fine tuning,OOD和各个领域分类器的输出层参数独立优化。
由此,全双工对话场景下,用户的对话对象会发生转移,用户会时不时地和周围朋友的交谈、闲聊,以及自言自语等。通过本申请所提供的语义理解模型训练方法可以实现在将对话的误识率有效地降低,保障在对话的时候助手不会错误响应。进一步地,通过Active Learning挖掘了大量的负样本进行模型训练,在迭代了数次周期之后,语义理解模型从初始较高的误识率下降到一个合理的范围。同时,通过对负样本分组、对不同组赋予不同的权重来调整内部的样本分布,误识率进一步下降。说明语义理解模型通过负样本的分布调整,能够从权重较大的负样本中学习到重要的信息,而权重较低的负样本信息量已经趋于饱和。最后,在模型结构侧,通过引入OOD拒识模型进行联合学习,可以最终在内部的开发集和测试集上误识率均有不同程度的下降。由此,本申请通过优化智能助手在全双工场景下的误识率,能够保证智能助手有效地响应用户正确的对话诉求,对非对话诉求进行拒识,保障了交互的可行性和流畅性,有效提升用户的使用体验。其中,图24为封装于车载系统中使用语义理解模型唤醒应用的示意 图;图25为封装于车载系统中使用语义理解模型查阅天气的示意图。当然,在本申请的一些实施例中,还可以在task specific layers之上再接一个后处理的rank模型,模的输入是OOD和各个领域分类器的预测得分,输出整个模型的预测结果。而本申请中,只是将OOD预测结果和领域分类器预测结果进行了一个层次的逻辑处理,即OOD模型预测为out of domain时,直接返回结果不再进行领域分类器的预测。但是OOD模型有可能预测错误,领域分类器模型预测的置信度很高,最终的结果却是IND,替代方案通过学习这种组合关系,可以在综合比较的基础上给出一个合理的预测结果,以降低语义理解模型的语义理解结果的错误率。
本申请具有以下有益技术效果:
通过获取第一训练样本集合,其中所述第一训练样本集合为通过主动学习进程所获取的带有噪声的语句样本;对所述第一训练样本集合进行去噪处理,以形成相应的第二训练样本集合;通过语义理解模型对所述第二训练样本集合进行处理,以确定所述语义理解模型的初始参数;响应于所述语义理解模型的初始参数,通过所述语义理解模型对所述第二训练样本集合进行处理,确定所述语义理解模型的更新参数;根据所述语义理解模型的更新参数,通过所述第二训练样本集合对所述语义理解模型的语义表示层网络参数和任务相关输出层网络参数进行迭代更新,由此,使得语义理解模型的泛化能力更强,提升语义理解模型的训练精度与训练速度,同时还可以有效充分利用已有的噪声语句对模型训练的增益,使得语义理解模型能够适应不同的使用场景,减少环境噪声对语义理解模型的影响。
以上所述,仅为本申请的实施例而已,并非配置为限定本申请的保护范围,凡在本申请的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本申请的保护范围之内。
本申请实施例中通过获取第一训练样本集合,其中所述第一训练样本集合为通过主动学习进程所获取的带有噪声的语句样本;对所述第一训练样本集合进行去噪处理,以形成相应的第二训练样本集合;通过语义理解模型对所述第二训练样本集合进行处理,以确定所述语义理解模型的初始参数;响应于所述语义理解模型的初始参数,通过所述语义理解模型对所述第二训练样本集合进行处理,确定所述语义理解模型的更新参数;根据所述语义理解模型的更新参数,通过所述第二训练样本集合对所述语义理解模型的语义表示层网络参数和任务相关输出层网络参数进行迭代更新,由此,使得语义理解模型的泛化能力更强,提升语义理解模型的训练精度与训练速度,同时还可以有效充分利用已有的噪声语句对模型训练的增益,使得语义理解模型能够适应不同的使用场景,减少环境噪声对语义理解模型的影响,减少电子设备的无效触发,更有利于语义理解模型部署在移动终端中。
Claims (17)
- 一种语义理解模型训练方法,所述方法由电子设备执行,所述语义理解模型训练方法包括:获取第一训练样本集合,其中所述第一训练样本集合为通过主动学习进程所获取的带有噪声的语句样本;对所述第一训练样本集合进行去噪处理,以形成相应的第二训练样本集合;通过语义理解模型对所述第二训练样本集合进行处理,以确定所述语义理解模型的初始参数;响应于所述语义理解模型的初始参数,通过所述语义理解模型对所述第二训练样本集合进行处理,确定所述语义理解模型的更新参数;根据所述语义理解模型的更新参数,通过所述第二训练样本集合对所述语义理解模型的语义表示层网络参数和任务相关输出层网络参数进行迭代更新。
- 根据权利要求1所述的方法,其中,所述对所述第一训练样本集合进行去噪处理,以形成相应的第二训练样本集合,包括:确定与所述语义理解模型的使用环境相匹配的动态噪声阈值;根据所述动态噪声阈值对所述第一训练样本集合进行去噪处理,以形成与所述动态噪声阈值相匹配的第二训练样本集合。
- 根据权利要求1所述的方法,其中,所述对所述第一训练样本集合进行去噪处理,以形成相应的第二训练样本集合,包括:确定与所述语义理解模型相对应的固定噪声阈值;根据所述固定噪声阈值对所述第一训练样本集合进行去噪处理,以形成与所述固定噪声阈值相匹配的第二训练样本集合。
- 根据权利要求1所述的方法,其中,所述响应于所述语义理解模型的初始参数,通过所述语义理解模型对所述第二训练样本集合进行处理,确定所述语义理解模型的更新参数,包括:将所述第二训练样本集合中不同语句样本,代入由所述语义理解模型的领域无关检测器网络和领域分类网络所组成的任务相关输出层网络所对应的损失函数;确定所述损失函数满足相应的收敛条件时对应所述语义理解模型中领域无关检测器网络参数和领域分类网络参数作为所述语义理解模型的更新参数。
- 根据权利要求4所述的方法,其中,所述根据所述语义理解模型的更新参数,通过所述第二训练样本集合对所述语义理解模型的语义表示层网络参数和任务相关输出层网络参数进行迭代更新,包括:通过所述语义理解模型的更新参数,确定与所述第二训练样本集合相匹配的第二噪声参数,所述第二噪声参数配置为表征所述第二训练样本集合中平行语句样本的噪声值;当所述第二噪声参数到达相应的噪声值阈值时,根据所述第二噪声参数的噪声值,对所述语义理解模型的语义表示层网络参数和任务相关输出层网络参数进行迭代更新,直至所述语义理解模型的领域无关检测器网络和所述领域分类网络构成的任务相关输出层网络对应的损失函数满足对应的收敛条件。
- 根据权利要求4所述的方法,其中,所述方法还包括:响应于所述语义理解模型的领域无关检测器网络和领域分类网络所组成的任务相关输出层网络所对应的损失函数,对所述语义理解模型的语义表示层网络进行参数调整,以实现所述语义表示层网络的参数与所述任务相关输出层网络所对应的损失函数相适 配。
- 根据权利要求1所述的方法,其中,所述方法还包括:对所述第二训练样本集合进行负例处理,以形成与所述第二训练样本集合相对应的负例样本集合,其中,所述负例样本集合配置为调整所述语义理解模型的领域无关检测器网络参数和领域分类网络参数调整;根据所述负例样本集合确定相应的双语评估研究值,其中,所述双语评估研究值,配置为作为监督参数对所述语义理解模型的语义理解结果进行评价。
- 根据权利要求7所述的方法,其中,所述对所述第一训练样本集合进行负例处理,包括:将所述语义理解模型的领域分类网络中待输出语句进行随机组合,以形成与所述第一训练样本集合相对应的负例样本集合;或者,对所述语义理解模型的领域分类网络中待输出语句进行随机删除处理或替换处理以形成与所述第一训练样本集合相对应的负例样本集合。
- 根据权利要求1所述的方法,其中,所述方法还包括:对数据源中的训练样本进行召回处理;根据所述召回处理的结果,触发相应的主动学习进程,以实现获取所述数据源中带有噪声的语句样本;对所述主动学习进程中所获取的带有噪声的语句样本进行标注,以形成所述第一训练样本集合。
- 根据权利要求9所述的方法,其中,所述对所述主动学习进程中所获取的带有噪声的语句样本进行标注,以形成所述第一训练样本集合,包括:确定所述带有噪声的语句样本的样本类型;对所述语句样本的样本类型中的负例样本进行排序,根据对所述负例样本的排序结果,为所述负例样本配置相应的权重,以形成包括不同权重训练样本的第一训练样本集合。
- 一种语义理解模型的语义处理方法,所述方法由电子设备执行,所述语义理解模型的语义处理方法包括:获取语音指令信息,并将所述语音指令转换为相应的可识别文本信息;通过所述语义理解模型的语义表示层网络,确定与可识别文本信息所对应的至少一个词语级的隐变量;通过所述语义理解模型的领域无关检测器网络,根据所述至少一个词语级的隐变量,确定与所述词语级的隐变量相匹配的对象;通过所述语义理解模型的领域分类网络;根据所述至少一个词语级的隐变量,确定与所述词语级的隐变量相对应的任务领域;根据与所述词语级的隐变量相匹配的对象,和与所述词语级的隐变量相对应的任务领域,触发相应的业务进程,以实现完成与所述语音指令信息相对应的任务,其中,所述语义理解模型基于如权利要求1至10任一项所述的方法训练得到。
- 根据权利要求11所述的方法,其中,所述方法还包括:对数据源中的与车载环境相匹配的语义理解模型对应的训练样本进行召回处理;根据所述召回处理的结果,触发相应的主动学习进程,以实现获取所述数据源中与所述车载环境相匹配的语义理解模型对应的带有噪声的语句样本;对所述主动学习进程中所获取的带有噪声的语句样本进行标注,以形成所述第一训练样本集合,其中,所述第一训练样本集合包括至少一个经过标注的与所述车载环境相匹配的语义理解模型对应的带有噪声的语句样 本。
- 一种语义理解模型的训练装置,所述训练装置包括:数据传输模块,配置为获取第一训练样本集合,其中所述第一训练样本集合为通过主动学习进程所获取的带有噪声的语句样本;去噪模块,配置为对所述第一训练样本集合进行去噪处理,以形成相应的第二训练样本集合;语义理解模型训练模块,配置为通过语义理解模型对所述第二训练样本集合进行处理,以确定所述语义理解模型的初始参数;所述语义理解模型训练模块,配置为响应于所述语义理解模型的初始参数,通过所述语义理解模型对所述第二训练样本集合进行处理,确定所述语义理解模型的更新参数;所述语义理解模型训练模块,配置为根据所述语义理解模型的更新参数,通过所述第二训练样本集合对所述语义理解模型的语义表示层网络参数和任务相关输出层网络参数进行迭代更新。
- 一种语义理解模型处理装置,所述装置包括:文本转换模块,配置为获取语音指令信息,并将所述语音指令转换为相应的可识别文本信息;语义表示层网络模块,配置为通过所述语义理解模型的语义表示层网络,确定与可识别文本信息所对应的至少一个词语级的隐变量;领域无关检测器网络模块,配置为通过所述语义理解模型的领域无关检测器网络,根据所述至少一个词语级的隐变量,确定与所述词语级的隐变量相匹配的对象;领域分类网络模块,配置为通过所述语义理解模型的领域分类网络;根据所述至少一个词语级的隐变量,确定与所述词语级的隐变量相对应的任务领域;信息处理模块,配置为根据与所述词语级的隐变量相匹配的对象,和与所述词语级的隐变量相对应的任务领域,触发相应的业务进程,以实现完成与所述语音指令信息相对应的任务。
- 一种电子设备,所述电子设备包括:存储器,配置为存储可执行指令;处理器,配置为运行所述存储器存储的可执行指令时,实现权利要求1至10任一项所述的语义理解模型的训练方法。
- 一种电子设备,所述电子设备包括:存储器,配置为存储可执行指令;处理器,配置为运行所述存储器存储的可执行指令时,实现权利要求11-12任一项所述的语义理解模型的语义处理方法。
- 一种计算机可读存储介质,存储有可执行指令,所述可执行指令被处理器执行时实现权利要求1至10任一项所述的语义理解模型的训练方法,或者实现权利要求11-12任一项所述的语义理解模型的语义处理方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2022517929A JP7391452B2 (ja) | 2019-10-30 | 2020-09-17 | 意味理解モデルのトレーニング方法、装置、電子デバイスおよびコンピュータプログラム |
US17/503,174 US11967312B2 (en) | 2019-10-30 | 2021-10-15 | Method and apparatus for training semantic understanding model, electronic device, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911047037.9A CN110807332B (zh) | 2019-10-30 | 2019-10-30 | 语义理解模型的训练方法、语义处理方法、装置及存储介质 |
CN201911047037.9 | 2019-10-30 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/503,174 Continuation US11967312B2 (en) | 2019-10-30 | 2021-10-15 | Method and apparatus for training semantic understanding model, electronic device, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021082786A1 true WO2021082786A1 (zh) | 2021-05-06 |
Family
ID=69489666
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/115755 WO2021082786A1 (zh) | 2019-10-30 | 2020-09-17 | 语义理解模型的训练方法、装置、电子设备及存储介质 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11967312B2 (zh) |
JP (1) | JP7391452B2 (zh) |
CN (1) | CN110807332B (zh) |
WO (1) | WO2021082786A1 (zh) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113158644A (zh) * | 2021-05-13 | 2021-07-23 | 山西大学 | 一种基于多任务学习的修辞格及隐式情绪识别方法 |
CN113360711A (zh) * | 2021-06-29 | 2021-09-07 | 北京百度网讯科技有限公司 | 视频理解任务的模型训练和执行方法、装置、设备及介质 |
CN114049581A (zh) * | 2021-09-27 | 2022-02-15 | 中国科学院信息工程研究所 | 一种基于动作片段排序的弱监督行为定位方法和装置 |
CN114463339A (zh) * | 2022-01-10 | 2022-05-10 | 武汉大学 | 一种基于自注意力的医疗影像分割方法 |
CN114881040A (zh) * | 2022-05-12 | 2022-08-09 | 桂林电子科技大学 | 一种段落的语义信息处理方法、装置及存储介质 |
CN115329749A (zh) * | 2022-10-14 | 2022-11-11 | 成都数之联科技股份有限公司 | 一种语义检索的召回和排序联合训练方法及系统 |
CN117334186A (zh) * | 2023-10-13 | 2024-01-02 | 武汉赛思云科技有限公司 | 一种基于机器学习的语音识别方法及nlp平台 |
CN117649846A (zh) * | 2024-01-29 | 2024-03-05 | 北京安声科技有限公司 | 语音识别模型生成方法、语音识别方法、设备和介质 |
Families Citing this family (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110807332B (zh) | 2019-10-30 | 2024-02-27 | 腾讯科技(深圳)有限公司 | 语义理解模型的训练方法、语义处理方法、装置及存储介质 |
CN111400443B (zh) * | 2020-03-04 | 2023-10-20 | 北京小米松果电子有限公司 | 信息处理方法、装置及存储介质 |
CN113571062B (zh) * | 2020-04-28 | 2024-05-24 | 中国移动通信集团浙江有限公司 | 基于语音数据的客户标签识别方法、装置及计算设备 |
US20210374361A1 (en) * | 2020-06-02 | 2021-12-02 | Oracle International Corporation | Removing undesirable signals from language models using negative data |
CN111859997B (zh) * | 2020-06-16 | 2024-01-26 | 北京百度网讯科技有限公司 | 机器翻译中的模型训练方法、装置、电子设备及存储介质 |
US20230229963A1 (en) * | 2020-06-22 | 2023-07-20 | Hewlett-Packard Development Company, L.P. | Machine learning model training |
CN111539226B (zh) * | 2020-06-25 | 2023-07-04 | 北京百度网讯科技有限公司 | 语义理解框架结构的搜索方法和装置 |
CN111539225B (zh) * | 2020-06-25 | 2023-07-21 | 北京百度网讯科技有限公司 | 语义理解框架结构的搜索方法和装置 |
CN111949769B (zh) * | 2020-08-23 | 2024-03-12 | 云知声智能科技股份有限公司 | 一种增强阅读理解系统鲁棒性的方法及装置 |
CN112035649B (zh) * | 2020-09-02 | 2023-11-17 | 腾讯科技(深圳)有限公司 | 问答模型处理方法、装置、计算机设备及存储介质 |
CN112115264B (zh) * | 2020-09-14 | 2024-03-22 | 中科苏州智能计算技术研究院 | 面向数据分布变化的文本分类模型调整方法 |
CN112163715B (zh) * | 2020-10-14 | 2024-06-14 | 腾讯科技(深圳)有限公司 | 生成式对抗网络的训练方法及装置、电力负荷预测方法 |
CN112329438B (zh) * | 2020-10-27 | 2024-03-08 | 中科极限元(杭州)智能科技股份有限公司 | 基于域对抗训练的自动谎言检测方法及系统 |
CN112530399A (zh) * | 2020-11-30 | 2021-03-19 | 上海明略人工智能(集团)有限公司 | 一种语音数据的扩充方法、系统、电子设备及存储介质 |
CN114691815A (zh) * | 2020-12-25 | 2022-07-01 | 科沃斯商用机器人有限公司 | 模型训练方法、装置、电子设备和存储介质 |
CN112749264A (zh) * | 2020-12-30 | 2021-05-04 | 平安科技(深圳)有限公司 | 基于智能机器人的问题分发方法、装置、电子设备及存储介质 |
CN112765993A (zh) * | 2021-01-20 | 2021-05-07 | 上海德拓信息技术股份有限公司 | 语义解析方法、系统、设备及可读存储介质 |
CN112530415B (zh) * | 2021-02-10 | 2021-07-16 | 北京百度网讯科技有限公司 | 负向回复识别模型获取及负向回复识别方法和装置 |
CN113239844B (zh) * | 2021-05-26 | 2022-11-01 | 哈尔滨理工大学 | 一种基于多头注意力目标检测的智能化妆镜系统 |
CN113297364B (zh) * | 2021-06-07 | 2023-06-09 | 吉林大学 | 一种面向对话系统中的自然语言理解方法及装置 |
CN114283794A (zh) * | 2021-12-14 | 2022-04-05 | 达闼科技(北京)有限公司 | 噪音过滤方法、装置、电子设备和计算机可读存储介质 |
US20230196024A1 (en) * | 2021-12-21 | 2023-06-22 | Genesys Cloud Services, Inc. | Systems and methods relating to knowledge distillation in natural language processing models |
CN114444718B (zh) * | 2022-01-26 | 2023-03-24 | 北京百度网讯科技有限公司 | 机器学习模型的训练方法、信号控制方法和装置 |
CN114120972B (zh) * | 2022-01-28 | 2022-04-12 | 科大讯飞华南有限公司 | 一种基于场景化的语音智能识别方法及系统 |
CN114626372B (zh) * | 2022-02-25 | 2024-06-04 | 华南理工大学 | 基于扰动改良的自注意力机制社交网络文本情感分析方法 |
CN114765568B (zh) * | 2022-04-01 | 2022-11-18 | 北京科技大学 | 一种面向语义通信系统的多址接入方法及装置 |
CN114925181B (zh) * | 2022-04-28 | 2024-10-18 | 支付宝(杭州)信息技术有限公司 | 数据处理方法及装置、计算机存储介质及终端 |
CN114817473A (zh) * | 2022-05-09 | 2022-07-29 | 北京百度网讯科技有限公司 | 用于压缩语义理解模型的方法、装置、设备、介质和产品 |
CN114757214B (zh) * | 2022-05-12 | 2023-01-31 | 北京百度网讯科技有限公司 | 用于优化翻译模型的样本语料的选取方法、相关装置 |
CN115149986B (zh) * | 2022-05-27 | 2023-05-23 | 北京科技大学 | 一种针对语义通信的信道分集方法及装置 |
CN115545570B (zh) * | 2022-11-28 | 2023-03-24 | 四川大学华西医院 | 一种护理教育培训的成果验收方法及系统 |
CN116089593B (zh) * | 2023-03-24 | 2023-06-13 | 齐鲁工业大学(山东省科学院) | 基于时序特征筛选编码模块的多回合人机对话方法和装置 |
CN116824305B (zh) * | 2023-08-09 | 2024-06-04 | 中国气象服务协会 | 应用于云计算的生态环境监测数据处理方法及系统 |
CN117133275B (zh) * | 2023-08-25 | 2024-03-22 | 长春理工大学 | 基于单元点积相似度特征的并行化语音识别模型建立方法 |
US11960515B1 (en) * | 2023-10-06 | 2024-04-16 | Armada Systems, Inc. | Edge computing units for operating conversational tools at local sites |
CN117093715B (zh) * | 2023-10-18 | 2023-12-29 | 湖南财信数字科技有限公司 | 词库扩充方法、系统、计算机设备及存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106558309A (zh) * | 2015-09-28 | 2017-04-05 | 中国科学院声学研究所 | 一种口语对话策略生成方法及口语对话方法 |
US20180068652A1 (en) * | 2016-09-05 | 2018-03-08 | Kabushiki Kaisha Toshiba | Apparatus and method for training a neural network language model, speech recognition apparatus and method |
CN107871496A (zh) * | 2016-09-23 | 2018-04-03 | 北京眼神科技有限公司 | 语音识别方法和装置 |
CN109947918A (zh) * | 2019-03-12 | 2019-06-28 | 南京邮电大学 | 面向智能客服对话场景的语义分析方法 |
CN110097130A (zh) * | 2019-05-07 | 2019-08-06 | 深圳市腾讯计算机系统有限公司 | 分类任务模型的训练方法、装置、设备及存储介质 |
CN110807332A (zh) * | 2019-10-30 | 2020-02-18 | 腾讯科技(深圳)有限公司 | 语义理解模型的训练方法、语义处理方法、装置及存储介质 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060277028A1 (en) * | 2005-06-01 | 2006-12-07 | Microsoft Corporation | Training a statistical parser on noisy data by filtering |
US7856351B2 (en) * | 2007-01-19 | 2010-12-21 | Microsoft Corporation | Integrated speech recognition and semantic classification |
CN104035992B (zh) * | 2014-06-10 | 2017-05-10 | 复旦大学 | 利用图像处理技术及语义向量空间的文本语义处理方法和系统 |
US10133729B2 (en) * | 2015-08-28 | 2018-11-20 | Microsoft Technology Licensing, Llc | Semantically-relevant discovery of solutions |
KR102410825B1 (ko) * | 2017-08-14 | 2022-06-20 | 삼성전자주식회사 | 문장의 도메인 판단 방법 및 장치 |
CN108417205B (zh) * | 2018-01-19 | 2020-12-18 | 苏州思必驰信息科技有限公司 | 语义理解训练方法和系统 |
US20190354810A1 (en) * | 2018-05-21 | 2019-11-21 | Astound Ai, Inc. | Active learning to reduce noise in labels |
CN108920622B (zh) * | 2018-06-29 | 2021-07-20 | 北京奇艺世纪科技有限公司 | 一种意图识别的训练方法、训练装置和识别装置 |
CN109214349B (zh) * | 2018-09-20 | 2021-08-06 | 天津大学 | 一种基于语义分割增强的物体检测方法 |
US11023683B2 (en) * | 2019-03-06 | 2021-06-01 | International Business Machines Corporation | Out-of-domain sentence detection |
CN110188331B (zh) * | 2019-06-03 | 2023-05-26 | 腾讯科技(深圳)有限公司 | 模型训练方法、对话系统评价方法、装置、设备及存储介质 |
CN110188202B (zh) * | 2019-06-06 | 2021-07-20 | 北京百度网讯科技有限公司 | 语义关系识别模型的训练方法、装置及终端 |
CN110275939B (zh) * | 2019-06-10 | 2023-01-17 | 腾讯科技(深圳)有限公司 | 对话生成模型的确定方法及装置、存储介质、电子设备 |
CN110222164B (zh) * | 2019-06-13 | 2022-11-29 | 腾讯科技(深圳)有限公司 | 一种问答模型训练方法、问题语句处理方法、装置及存储介质 |
-
2019
- 2019-10-30 CN CN201911047037.9A patent/CN110807332B/zh active Active
-
2020
- 2020-09-17 JP JP2022517929A patent/JP7391452B2/ja active Active
- 2020-09-17 WO PCT/CN2020/115755 patent/WO2021082786A1/zh active Application Filing
-
2021
- 2021-10-15 US US17/503,174 patent/US11967312B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106558309A (zh) * | 2015-09-28 | 2017-04-05 | 中国科学院声学研究所 | 一种口语对话策略生成方法及口语对话方法 |
US20180068652A1 (en) * | 2016-09-05 | 2018-03-08 | Kabushiki Kaisha Toshiba | Apparatus and method for training a neural network language model, speech recognition apparatus and method |
CN107871496A (zh) * | 2016-09-23 | 2018-04-03 | 北京眼神科技有限公司 | 语音识别方法和装置 |
CN109947918A (zh) * | 2019-03-12 | 2019-06-28 | 南京邮电大学 | 面向智能客服对话场景的语义分析方法 |
CN110097130A (zh) * | 2019-05-07 | 2019-08-06 | 深圳市腾讯计算机系统有限公司 | 分类任务模型的训练方法、装置、设备及存储介质 |
CN110807332A (zh) * | 2019-10-30 | 2020-02-18 | 腾讯科技(深圳)有限公司 | 语义理解模型的训练方法、语义处理方法、装置及存储介质 |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113158644B (zh) * | 2021-05-13 | 2022-09-20 | 山西大学 | 一种基于多任务学习的修辞格及隐式情绪识别方法 |
CN113158644A (zh) * | 2021-05-13 | 2021-07-23 | 山西大学 | 一种基于多任务学习的修辞格及隐式情绪识别方法 |
CN113360711B (zh) * | 2021-06-29 | 2024-03-29 | 北京百度网讯科技有限公司 | 视频理解任务的模型训练和执行方法、装置、设备及介质 |
CN113360711A (zh) * | 2021-06-29 | 2021-09-07 | 北京百度网讯科技有限公司 | 视频理解任务的模型训练和执行方法、装置、设备及介质 |
CN114049581A (zh) * | 2021-09-27 | 2022-02-15 | 中国科学院信息工程研究所 | 一种基于动作片段排序的弱监督行为定位方法和装置 |
CN114463339A (zh) * | 2022-01-10 | 2022-05-10 | 武汉大学 | 一种基于自注意力的医疗影像分割方法 |
CN114881040A (zh) * | 2022-05-12 | 2022-08-09 | 桂林电子科技大学 | 一种段落的语义信息处理方法、装置及存储介质 |
CN114881040B (zh) * | 2022-05-12 | 2022-12-06 | 桂林电子科技大学 | 一种段落的语义信息处理方法、装置及存储介质 |
CN115329749A (zh) * | 2022-10-14 | 2022-11-11 | 成都数之联科技股份有限公司 | 一种语义检索的召回和排序联合训练方法及系统 |
CN115329749B (zh) * | 2022-10-14 | 2023-01-10 | 成都数之联科技股份有限公司 | 一种语义检索的召回和排序联合训练方法及系统 |
CN117334186A (zh) * | 2023-10-13 | 2024-01-02 | 武汉赛思云科技有限公司 | 一种基于机器学习的语音识别方法及nlp平台 |
CN117334186B (zh) * | 2023-10-13 | 2024-04-30 | 北京智诚鹏展科技有限公司 | 一种基于机器学习的语音识别方法及nlp平台 |
CN117649846A (zh) * | 2024-01-29 | 2024-03-05 | 北京安声科技有限公司 | 语音识别模型生成方法、语音识别方法、设备和介质 |
CN117649846B (zh) * | 2024-01-29 | 2024-04-30 | 北京安声科技有限公司 | 语音识别模型生成方法、语音识别方法、设备和介质 |
Also Published As
Publication number | Publication date |
---|---|
US20220036890A1 (en) | 2022-02-03 |
US11967312B2 (en) | 2024-04-23 |
JP7391452B2 (ja) | 2023-12-05 |
CN110807332A (zh) | 2020-02-18 |
JP2022549238A (ja) | 2022-11-24 |
CN110807332B (zh) | 2024-02-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021082786A1 (zh) | 语义理解模型的训练方法、装置、电子设备及存储介质 | |
CN110795945B (zh) | 一种语义理解模型训练方法、语义理解方法、装置及存储介质 | |
CN110807333B (zh) | 一种语义理解模型的语义处理方法、装置及存储介质 | |
US11503155B2 (en) | Interactive voice-control method and apparatus, device and medium | |
CN108829757B (zh) | 一种聊天机器人的智能服务方法、服务器及存储介质 | |
CN110956018B (zh) | 文本处理模型的训练方法、文本处理方法、装置及存储介质 | |
CN110795552B (zh) | 一种训练样本生成方法、装置、电子设备及存储介质 | |
KR20210158344A (ko) | 디지털 어시스턴트를 위한 머신 러닝 시스템 | |
WO2021147041A1 (zh) | 语义分析方法、装置、设备及存储介质 | |
CN112749274B (zh) | 基于注意力机制和干扰词删除的中文文本分类方法 | |
CN113239169A (zh) | 基于人工智能的回答生成方法、装置、设备及存储介质 | |
WO2023137911A1 (zh) | 基于小样本语料的意图分类方法、装置及计算机设备 | |
CN114168619B (zh) | 语言转换模型的训练方法及装置 | |
CN112115242A (zh) | 一种基于朴素贝叶斯分类算法的智能客服问答系统 | |
US12079211B2 (en) | Natural-language processing across multiple languages | |
CN116956835A (zh) | 一种基于预训练语言模型的文书生成方法 | |
Jeong et al. | Multi-domain spoken language understanding with transfer learning | |
CN115116443A (zh) | 语音识别模型的训练方法、装置、电子设备及存储介质 | |
US12067970B2 (en) | Method and apparatus for mining feature information, and electronic device | |
Wang et al. | Weakly Supervised Chinese short text classification algorithm based on ConWea model | |
CN110909142B (zh) | 一种问答模型的问题语句处理方法、装置、电子设备及存储介质 | |
Ribeiro et al. | Automatic recognition of the general-purpose communicative functions defined by the ISO 24617-2 standard for dialog act annotation | |
Miao et al. | [Retracted] English Speech Feature Recognition‐Based Fuzzy Algorithm and Artificial Intelligent | |
CN112052320A (zh) | 一种信息处理方法、装置及计算机可读存储介质 | |
CN118095261B (zh) | 一种文本数据处理方法、装置、设备以及可读存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20880708 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022517929 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20880708 Country of ref document: EP Kind code of ref document: A1 |