WO2022057712A1 - Dispositif électronique et procédé d'analyse sémantique associé, support et système de dialogue homme-machine - Google Patents

Dispositif électronique et procédé d'analyse sémantique associé, support et système de dialogue homme-machine Download PDF

Info

Publication number
WO2022057712A1
WO2022057712A1 PCT/CN2021/117251 CN2021117251W WO2022057712A1 WO 2022057712 A1 WO2022057712 A1 WO 2022057712A1 CN 2021117251 W CN2021117251 W CN 2021117251W WO 2022057712 A1 WO2022057712 A1 WO 2022057712A1
Authority
WO
WIPO (PCT)
Prior art keywords
slot
intent
word
semantic
corpus data
Prior art date
Application number
PCT/CN2021/117251
Other languages
English (en)
Chinese (zh)
Inventor
童甜甜
祝官文
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022057712A1 publication Critical patent/WO2022057712A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the invention relates to the technical field of man-machine dialogue, in particular to an electronic device and a semantic analysis method, medium and man-machine dialogue system thereof.
  • the human-machine dialogue dialogue system is more and more applied in various intelligent terminal electronic devices, such as: smart speakers, smart phones, car smart In-vehicle intelligent systems such as voice navigation and robots, etc.
  • the human-computer dialogue system uses technologies such as speech recognition, semantic analysis and language generation to realize dialogue and information exchange between humans and machines.
  • the spoken language comprehension task in semantic parsing technology includes two sub-tasks, intent recognition and slot filling.
  • intent recognition and slot filling are mainly for single-intent and single-slot identification, that is, for the same speech, the closest intent is selected from multiple intent recognition result options as the recognition result.
  • the corpus of multiple intents and the corpus of a single intent may have the same sentence pattern, and the single-intent classification model cannot distinguish the corpus of multiple intents, which eventually leads to a high mis-entry rate of the model, that is, the intent recognition slot filling result Error rate is high.
  • the prior art intent slot identification architecture cannot display the modeling of the relationship between intent slots, the accuracy of intent identification and slot filling for multiple tags is poor, and it is not compatible with single-intent and multi-intent mixed scenarios.
  • Embodiments of the present application provide an electronic device, a semantic analysis method, a medium, and a human-machine dialogue system thereof. By recognizing multiple intents close to the user's true intent from a user's voice, and then using the identified multiple intents to predict slot positions information, thereby improving the accuracy of slot filling, and correspondingly improving the speed or efficiency of slot filling, thereby improving the accuracy of semantic parsing in human-computer dialogue.
  • an embodiment of the present application provides a semantic parsing method, the method includes: acquiring corpus data to be parsed; The degree of intent correlation, and the degree of correlation between the word and the slot represented by the corpus data to be parsed; based on the semantic information of the word and the above semantic information of the word, and the intent of the word correlation The degree of correlation with the slot position is used to predict the slot position of the corpus data to be parsed.
  • the corpus data can be obtained by performing voice recognition and conversion on the user's voice command.
  • the degree of intent correlation between the words included in the corpus data to be parsed and the intent represented by the corpus data to be parsed can be represented by an intent attention vector, and the relationship between the word and the slot represented by the corpus data to be parsed can be represented.
  • the degree of slot correlation can be represented by the slot attention vector.
  • the semantic information of the word can be understood as the word meaning information of the word, that is, the literal meaning of the word and the meaning it refers to. Sentences as nouns (eg in song titles, hello old days).
  • the above semantic information of the word can be the semantic information of the previous word continuous with the current word in the corpus data, if the current word processed is the first word, then the above semantic information can be the sentence semantic information of the corpus data,
  • the above semantic information is mainly based on its significance to the slot prediction of the current word.
  • the above semantic information can be expressed in the hidden state vector output at the previous moment (relative to the current moment).
  • the above-mentioned method further includes: predicting multiple intents from the corpus data to be parsed; from the predicted slots, determining each of the multiple intents The slot corresponding to the intent.
  • multiple intents are obtained by parsing the corpus data converted from a user's voice command. If the corpus data only contains a single intent, the present application can also be applied to parse a single intent in such single-intent corpus data, which has a certain versatility. , the user experience is better.
  • each intent should correspond to at least one slot, and some intents may have three or more slots corresponding to it.
  • the present application can accurately sort out the correspondence between multiple intents and multiple slots.
  • the above-mentioned method further includes: the above-mentioned semantic information includes semantic information of at least one word located in front of the word in the corpus data to be parsed.
  • the above semantic information of the first word is the sentence semantic information of this piece of corpus data.
  • the semantic information of the second word above is the semantic information of the first word, and at this time the semantic information of the first word contains the semantic information of the sentence of the corpus data in The message passed to the first word at the first moment.
  • the above semantic information of each subsequent character is the semantic information of the previous character, and the semantic information of the previous character includes the semantic information transmitted by the previous character of the previous character. This transfer relationship is progressive.
  • the word sense information correlation degree of two adjacent words is the largest, and the word meaning information correlation degree of two non-adjacent words is small, or the correlation degree gradually approaches 0 with the increase of the spaced characters.
  • the method further includes: generating sentence semantic information of the corpus data to be parsed and semantic information of each word in the corpus data to be parsed.
  • sentence character representing the sentence in the corpus data is encoded by the encoder, so that the sentence character can express specific semantic information, and this specific semantic information is the same as or close to the semantic information obtained by human understanding the sentence.
  • word character of each word in the corpus data is encoded by the encoder, so that the word character can express specific word meaning information, and this specific word meaning information is related to the human understanding of the sentence.
  • the word meaning information of word comprehension is the same or close.
  • sentence semantic information of the corpus data can be represented by a sentence vector
  • word meaning information of each word in the corpus data can be represented by a word vector.
  • the above-mentioned method further includes: the method is implemented by a neural network model.
  • the neural network model includes a fully connected layer and a long short-term memory network model.
  • a semantic parsing model is trained through a neural network model combined with a BERT model, an attention mechanism, a slot gate mechanism, and a sigmoid activation function, enabling it to implement the above method.
  • the above-mentioned method further includes: sentence semantic information of the corpus data to be parsed, the above semantic information of the word, the degree of intent correlation and the degree of slot correlation of the word It is represented in the form of a vector in the neural network model.
  • the sentence semantic information of the corpus data to be parsed is represented by a sentence vector
  • the above semantic information of the word is represented by a hidden state vector at the previous moment
  • the intention correlation degree and slot correlation degree of the word are respectively represented by intention attention. Force vector and slot attention vector representation.
  • an embodiment of the present application provides a man-machine dialogue method, which includes: receiving a user voice command; converting the user voice command into a text-form corpus to be parsed; Parse the intent in the corpus and the slot corresponding to each intent; based on the parsed intent and the slot corresponding to each intent, execute the operation corresponding to the user's voice command or generate a response voice.
  • the method further includes: the operations include one or more of sending instructions to the smart home device, opening application software, searching web pages, making calls, and sending and receiving short messages.
  • the parsed intent is to book a ticket and a hotel
  • the slots corresponding to these two intents are the origin, destination, (hotel) location, and (hotel) star rating
  • the operation performed by the smartphone may be to open a ticket hotel reservation software, query the ticket information corresponding to the departure and destination for the user to choose, and recommend a five-star hotel in a certain location to the user for selection.
  • a smart home may include, but is not limited to, laptop computers, desktop computers, tablet computers, smartphones, wearable devices, portable music players, reader devices, or other electronic devices capable of accessing a network.
  • an embodiment of the present application provides a human-machine dialogue system, the system includes: a speech recognition module for converting a user's voice command into corpus data in text form; a semantic parsing module for performing the above semantic parsing method; a problem solving module for finding a solution for the results obtained by the semantic parsing module analysis; a language generating module for generating natural language sentences corresponding to the solution; a speech synthesis module for converting the natural language The language sentence is synthesized into the response voice; the dialogue management module is used to schedule the speech recognition module, the semantic analysis module, the problem solving module, the language generation module and the speech synthesis module to cooperate with each other to realize the man-machine dialogue.
  • an embodiment of the present application provides a readable medium, where an instruction is stored on the readable medium, and the instruction, when executed on an electronic device, causes the electronic device to execute the above semantic parsing method or the above man-machine dialogue method.
  • an embodiment of the present application provides an electronic device, including: a memory for storing instructions executed by one or more processors of the electronic device, and a processor, which is one of the processors of the electronic device, and uses for executing the above semantic parsing method or the above man-machine dialogue method.
  • Fig. 1 is a schematic software block diagram of a common man-machine dialogue system
  • FIG. 2 is a schematic diagram of a man-machine dialogue scene to which an embodiment of the present application is applicable;
  • FIG. 3 is a schematic structural diagram of an exemplary structure of a semantic parsing model in an embodiment of the present application
  • FIG. 4 is a schematic diagram of processing results of corpus data at different stages in the semantic parsing method according to an embodiment of the present application
  • FIG. 5 is a schematic diagram of a training process of a semantic parsing model in the semantic parsing method according to an embodiment of the present application
  • FIG. 6 is a schematic diagram of an interaction flow between a mobile phone 100 and a user according to an embodiment of the present application
  • FIG. 7 is a schematic interface diagram of a mobile phone 100 according to an embodiment of the present application performing corresponding operations according to user voice commands;
  • FIG. 8 is an exemplary structural diagram of a mobile phone 100 according to an embodiment of the present application.
  • Illustrative embodiments of the present application include, but are not limited to, electronic devices and semantic parsing methods and media thereof.
  • the embodiment of the present application first identifies multiple intents close to the user's true intent from the user's voice, and then uses the identified multiple intents to predict slot information, thereby improving the accuracy of slot filling, and correspondingly The speed or efficiency of slot filling is improved, thereby improving the accuracy of semantic parsing in human-machine dialogue.
  • NLP Natural language processing
  • Natural language is human language
  • natural language processing is the processing of human language.
  • Natural language processing is the process of systematically analyzing, understanding, and extracting information from text data in an intelligent and efficient manner.
  • NLP natural language processing
  • NER Named Entity Recognition
  • RE Relation Extraction
  • IE Information Extraction
  • Sentiment Analysis Speech Recognition, Question Answering, Topic Segmentation, etc.
  • natural language processing tasks can fall into the following categories.
  • Sequence tagging Each word in a sentence requires the model to give a categorical category based on the context. Such as Chinese word segmentation, part-of-speech tagging, named entity recognition, semantic role tagging.
  • Classification tasks output a classification value for the entire sentence, such as text classification.
  • Sentence relationship inference Given two sentences, determine whether the two sentences have a nominal relationship. For example, enlightenment, QA, semantic rewriting, natural language inference.
  • Generative task output a piece of text, generate another piece of text.
  • Intent The voice commands input by the user all correspond to the user's intention. It is understandable that the so-called intention is the expression of the user's will. In the human-machine dialogue system, the intention is generally named after "verb + noun", for example Check the weather, book hotels, etc.
  • intent recognition also known as intent classification, mainly extracts the intent corresponding to the current voice command according to the voice command input by the user.
  • An intent is a collection of one or more expressions, such as "I want to watch a movie” and "I want to see an action movie made by a certain star in a certain year” can belong to the same intent to play a video.
  • An intent can be configured with one or more slots.
  • the slot is the key information used to express the user's intention, and the accuracy of the slot filling directly affects whether the electronic device can match the correct intention.
  • a slot corresponds to a keyword of a type of attribute, and the information in the slot can be filled with keywords of the same type, that is, slot filling.
  • the query pattern corresponding to the intent to play a song could be "I want to hear ⁇ song ⁇ of ⁇ singer ⁇ ".
  • ⁇ singer ⁇ is the singer's slot
  • ⁇ song ⁇ is the song's slot.
  • the electronic device can extract the slot information filled in the ⁇ singer ⁇ slot from the voice command as: Faye Wong, the slot information filled in the slot ⁇ song ⁇ is: red beans. In this way, the electronic device (or server) can identify, according to the two slot information, that the user's intention of this voice input is: to play Faye Wong's song Red Bean.
  • the semantic parsing method of the present application is suitable for various scenarios requiring semantic parsing, for example, a user sends a voice command to an intelligent electronic device, and a user conducts a man-machine dialogue with a voice assistant of the intelligent electronic device.
  • a user sends a voice command to an intelligent electronic device
  • a user conducts a man-machine dialogue with a voice assistant of the intelligent electronic device.
  • the following introduces the semantic parsing solution of the present application based on the human-machine dialogue system.
  • a common human-machine dialogue system 110 mainly includes the following six technical modules: speech recognition module 111; semantic analysis module 112; problem solving module 113; language generation module 114; dialogue management module 115; speech synthesis module 116. in,
  • the speech recognition module 111 is used to realize speech-to-text recognition and conversion through speech recognition technology (Automatic Speech Recognition, ASR).
  • ASR Automatic Speech Recognition
  • the recognition result is generally in the form of the top n (n ⁇ 1) sentences or word lattices with the highest scores.
  • Output corpus data is generally in the form of the top n (n ⁇ 1) sentences or word lattices with the highest scores.
  • the semantic parsing module 112 also known as the Natural Language Understanding (NLU) module, is mainly used for performing natural language processing (NLP) tasks, including semantic parsing and identifying the corpus data output by the speech recognition module.
  • NLP natural language processing
  • the function of the semantic parsing module is implemented by a pre-trained semantic parsing model 121, and the semantic parsing model 121 will be described in detail below, and will not be repeated here.
  • the problem solving module 113 is mainly used for reasoning or querying according to the intention identified by the semantic analysis and the corresponding slot, so as to feed back the solution corresponding to the intention and the corresponding slot to the user.
  • the language generation module 114 mainly generates a natural language sentence for the solution found by the problem solving module 113 and needs to be output to the user, and feeds it back to the user in text or further converted into voice.
  • the dialogue management module 115 is the central hub in the human-machine dialogue system, and is used for scheduling the mutual cooperation of other modules in the human-computer interaction system based on the dialogue history, assisting the language parsing module to correctly understand the results of the speech recognition, and providing the problem solving module. Provides assistance and guides the natural language generation process of the language generation module.
  • the speech synthesis module 116 is used for converting the natural language sentences generated by the language generation module into speech output.
  • FIG. 2 shows a schematic diagram of a man-machine dialogue scene according to an embodiment of the present application.
  • the application scenario includes the electronic device 100 and the electronic device 200 .
  • the electronic device 100 is a terminal intelligent device that interacts with a user, and an application system capable of semantic analysis, such as the above-mentioned human-machine dialogue system 110 , is installed thereon.
  • the electronic device 100 can recognize the user's voice command through the man-machine dialogue system 110, and perform corresponding operations according to the voice command or answer the questions raised by the user.
  • the electronic device 100 may include, but is not limited to, smart speakers, smart phones, wearable devices, head-mounted displays, in-vehicle intelligent systems such as in-vehicle intelligent voice navigation, as well as intelligent robots, portable music players, and readers.
  • the electronic device 200 can be used to train the semantic parsing model 121 , and transplant the trained semantic parsing model 121 to the electronic device 100 for the electronic device 100 to perform semantic parsing and perform corresponding operations.
  • the electronic device 200 can also perform semantic parsing on the corpus data sent by the electronic device 100 through the trained semantic parsing model 121, and feed the result back to the electronic device 100, and the electronic device 100 further performs corresponding operations.
  • electronic device 200 may include, but is not limited to, clouds, servers, laptops, desktops, tablet computers, and other electronic devices capable of accessing a network with one or more processors embedded or coupled therein.
  • the technical solutions of the present application are described in detail below by taking the electronic device 100 as a mobile phone and the electronic device 200 as a server as an example.
  • the mobile phone 100 is installed with the human-machine dialogue system 110, and the semantic analysis module 112 in the human-machine dialogue system 110 has a semantic analysis model 121, which can perform semantic analysis on user speech based on the technical solution of the present application.
  • the semantic parsing model 121 of the present application will be described in detail below.
  • the semantic parsing model 121 is a natural language processing model pre-trained by the server 200 based on natural language processing and the above-mentioned various neural network structures and models.
  • the pre-trained semantic parsing model 121 can extract multiple intents in a single piece of corpus data, and predict slots based on multiple intents, so as to accurately identify intents and corresponding slots in the corpus data, which can greatly improve the efficiency of slot filling. accuracy.
  • the data input into the semantic parsing model 121 is the data obtained after preprocessing the corpus data, wherein the corpus data is obtained after the user's voice instruction is recognized and transformed.
  • the preprocessing of the corpus data is a routine operation for understanding text in the human-machine dialogue system 110, and is one of the natural language processing tasks performed by the semantic parsing module 112.
  • preprocessing generally includes segmenting the corpus data, filling and marking. (Token) sequence and segmentation mark (Segmentation) and create masks.
  • the data preprocessing finally obtains the Token sequence containing the text characters of the sentence and the text characters of each word in the sentence, the segmentation mark representing the sentence position corresponding to each word, and the corresponding mask indicating whether each character position in the Token sequence is a valid character.
  • the word segmentation process mainly uses word segmentation tools (such as Chinese vocabulary) to divide the corpus data into sentences and individual words that make up sentences, and to the obtained sentences. Possible slot labels.
  • word segmentation processing is to prepare data for the next step of filling the Token sequence.
  • PLAY_MUSIC PLAY_VIDEO
  • PLAY_VOICE PLAY_VOICE
  • the slot labels marked for each word are:
  • Your corresponding 3 slots are songName-B, videoName-B and mediaName-B;
  • the four words good, old, time, and light correspond to the three slot labels (songName-I, videoName-I, mediaName-I).
  • Forming the Token sequence mainly uses the data obtained by word segmentation to obtain a Token sequence that meets the character length requirements by truncating sentences or filling characters.
  • the Token sequence contains sentence characters corresponding to the entire sentence of the voice command, and word characters corresponding to each word in the sentence.
  • the first character in the Token sequence is generally ⁇ CLS>, which marks the sentence obtained by word segmentation (for example, the character ⁇ CLS> marks the sentence: please play you for me Good old times)
  • the ending character in the Token sequence is generally the truncation character ⁇ SEP>
  • ⁇ SEP> indicates that the preceding sentence is a complete sentence that meets the character length requirements of a single sentence
  • each character between the characters ⁇ CLS> and ⁇ SEP> is a complete sentence.
  • the words are marked with a punctuation mark "Sentence 1", indicating that these words are the words that make up Sentence 1.
  • Character length + 2 are required to meet the maximum character length requirements. Generally, the character length + 2 contained in the user instruction is within the range of the maximum character length of 32 bits.
  • Creating a mask is mainly to create a mask (Mask) corresponding to each character in the Token sequence obtained by the above filling.
  • the purpose of creating a mask is to mark whether each character in the Token sequence expresses valid information into a computer-readable marking code.
  • the value of the created mask element corresponding to the character ⁇ pad> in the Token sequence is 0, and the value of the created mask element corresponding to the character other than the character ⁇ pad> is 1.
  • Token sequence ⁇ CLS> Please play hello old times for me ⁇ pad>... ⁇ pad> ⁇ SEP>;
  • the corpus data recognized by the voice command input by the user is "help me book a train ticket from Shanghai to Beijing and book a five-star hotel near Beijing Railway Station"
  • the three data obtained after the above data preprocessing are:
  • Token sequence ⁇ CLS> help me book train tickets from Shanghai to Beijing and book a five-star hotel near Beijing Railway Station ⁇ SEP>;
  • Sentence 1 help, me, pre-determined, from, Shanghai, sea, to, north, Beijing, de, fire, train, ticket, and, pre, fixed, north, Beijing, train, train, station, Near, near, of, five, star, grade, hotel, shop);
  • the three data obtained after the above-mentioned data preprocessing of the corpus data can be input into the semantic parsing model 121 for semantic parsing.
  • the semantic parsing model 121 will be described in detail below.
  • the semantic parsing model 121 includes a BERT encoding layer 1211, an intent classification layer 1212, an attention layer 1213, a slot filling layer 1214, and a post-processing layer 1215.
  • the BERT encoding layer 1211 takes as input the Token sequence, segmentation mark and mask obtained after data preprocessing of the expected data, and outputs the encoded vector sequence after encoding.
  • the coding vector sequence includes sentence vector and word vector
  • the sentence vector represents the semantic information of the corpus data to be parsed
  • the word vector contains the lexical information of each word in the corpus data to be parsed.
  • semantic information and word meaning information are the meaning expressions of corpus data based on natural language understanding, and these semantic information and word meaning information can express the real intention of the user and the real slot corresponding to the real intention of the user.
  • the semantic information represented by the sentence vector h 0 may include PLAY_MUSIC, PLAY_VIDEO, PLAY_VOICE, hello, old time, hello old time, etc.
  • the word meaning information represented by the word vectors h 1 , h 2 , ..., h t may include songName, videoName, mediaName and the literal meaning of each word that constitutes a sentence, where the word corresponding to h 1 is please, and the word corresponding to h 2 is The word corresponding to wei, h 3 is me, ..., the word corresponding to h 10 is light.
  • the corpus data to be parsed is "help me book a train ticket from Shanghai to Beijing and book a five-star hotel near Beijing Railway Station"
  • the encoded vector sequence ⁇ h 0 ,h 1 output by the BERT encoding layer 1211 ,h 2 ,...,h t ⁇ the semantic information represented by the sentence vector h 0 may include ticket booking, hotel booking, departure place, destination, Shanghai, Beijing, hotel, star, five-star and so on.
  • the word meaning information represented by the word vectors h 1 , h 2 , ..., h t may include the origin, destination, Shanghai, Beijing, hotel, star, five-star, and the literal meaning of each word composing the sentence, where h The word corresponding to 1 is gang, the word corresponding to h 2 is me, the word corresponding to h 3 is pre, ..., the word corresponding to h 30 is shop.
  • the Token sequence, segmented tag and the mask generated by the corresponding Token sequence obtained after data preprocessing will be used as the input of the BERT coding layer 1211 .
  • the BERT encoding layer 1211 sequentially identifies the valid characters ⁇ CLS>, x 1 , x 2 , .
  • the value of the code element is 1, and the value of the blank character mask element is 0).
  • the character ⁇ CLS> that marks the sentence in the Token sequence is input into the trained BERT coding layer 1211 for semantic encoding, and then the character ⁇ CLS> is given semantic information of the corpus data to generate a high-dimensional sentence vector h 0 .
  • the characters x 1 , x 2 ,...,x t-1 between the character ⁇ CLS> and the truncated character ⁇ SEP> in the Token sequence correspond to each word that constitutes the sentence in the corpus data, and the characters x 1 , x 2 ,... , x t-1 is input into the trained BERT coding layer 1211 for semantic encoding, and then assigns the semantic information of the corpus data to the characters x 1 , x 2 ,..., x t-1 , correspondingly generates a high-dimensional word vector h 1 , h 2 ,...,h t .
  • the mask element value corresponding to the blank character ⁇ pad> in the Token sequence is 0, and no word is marked, so it is not used as the input of the BERT encoding layer 1211.
  • the BERT coding layer 1211 can be obtained by training based on the BERT model.
  • the BERT model is a multi-layer bidirectional transformer encoder model based on fine-tuning, and the key technological innovation of the BERT model is to apply the bidirectional training of the transformer to language modeling.
  • a striking feature of the BERT model is its unified architecture across different tasks, so there is little difference between its pretrained architecture and the final downstream architecture.
  • the BERT model can further increase the generalization ability of the word vector model, and fully describe the character-level, word-level, sentence-level and even inter-sentence relationship features.
  • the BERT encoding layer 1211 can also be obtained by training other encoders or encoding models, which is not limited here.
  • the intent classification layer 1212 is used to predict candidate intents in the corpus data, wherein the intent classification layer 1212 can extract multiple intent labels in the corpus data, and retain the intent labels that meet the conditions as candidate intent outputs.
  • the intent classification layer 1212 takes the sentence vector h 0 obtained by the above-mentioned BERT encoding layer 1211 as input, and based on the semantic information represented by the sentence vector h 0 , the intent classification layer 1212 can extract all possible intent labels, and for each extracted The intent label calculates the intent confidence to judge whether the intent label satisfies the output condition.
  • the intent confidence represents the closeness of the extracted intent label to the real intent expressed by the corpus data, and may also be referred to as intent reliability.
  • the intent with higher intent confidence is closer to the real intent expressed by the corpus data.
  • a certain threshold can be set for the intent confidence, for example, the threshold of the intent confidence is set to 0.5, and the intent label whose intent confidence is greater than or equal to the threshold satisfies the output condition, and the corresponding intent label will be output As a candidate intent; an intent label whose intent confidence is less than the threshold does not meet the output conditions, and its corresponding intent label will be deleted and will not be output from the intent classification layer 1212 .
  • the semantic information represented by the sentence vector h 0 output by the BERT coding layer may include 3 possible intent labels : PLAY_MUSIC, PLAY_VIDEO, PLAY_VOICE.
  • the intent classification layer 1212 extracts the above three possible intent labels, and calculates the intent confidence of each intent label as 0.8, 0.75, and 0.5, respectively.
  • the intent classification layer 1212 sets the The intent confidence threshold is 0.5, then the intent confidence of the above three intent labels all meet the condition of being greater than or equal to 0.5, that is, the above three intent labels satisfy the output condition, and the final intent classification layer 1212 outputs three candidate intents: PLAY_MUSIC, PLAY_VIDEO , PLAY_VOICE.
  • the semantic information represented by the sentence vector h 0 output by the BERT coding layer May include 4 possible intent labels to check train times, book tickets, find hotels, and book hotels.
  • the intent classification layer 1212 extracts the above four possible intent labels, and calculates the intent confidence of each intent label as 0.48, 0.87, 0.45, and 0.7, respectively.
  • the intent confidence threshold set by 1212 is 0.5, then the intent labels with the corresponding intent confidence greater than or equal to 0.5 among the above four intent labels are ticket booking and hotel booking, which satisfy the output conditions, then the intent classification layer 1212 outputs 2 candidate intents : Book tickets, book hotels. However, the two intent labels whose intent confidence is less than 0.5: checking the number of trains and finding a hotel do not meet the output conditions, so they will not be output from the intent classification layer 1212 .
  • the working process of the intent classification layer 1212 is shown in FIG. 3 :
  • the intent classification layer 1212 takes the sentence vector h 0 in the encoded vector sequence output by the BERT encoding layer 1211 as input, and the intent classification layer 1212 extracts all possible intents in the semantic information represented by the sentence vector h 0 by decoding and activating the sentence vector h 0 labels, and compute the intent confidence y I for each intent label.
  • the calculation formula of the intention confidence y I obtained after passing through the Sigmoid activation function is as follows:
  • I represents the number of schematic diagrams
  • W I is the random weight coefficient of the sentence vector h 0
  • b I represents the deviation value
  • the intent classification layer 1212 can be obtained by training a fully connected layer (dense) and a sigmoid function as an activation function.
  • a fully connected layer dense
  • a sigmoid function a sigmoid function as an activation function.
  • the decoder a deep neural network with the same function as the fully connected layer
  • other functions with the same function as the Sigmoid function can also be used as the activation of the corresponding deep neural network decoder. function, there is no restriction here.
  • the attention layer 1213 is used to quantify the degree of correlation between each word in the corpus data and the intent expressed by the sentence. For example, it can be represented by an intent attention vector, and the intent attention vector can also be understood as an intent context vector; and the attention layer 1213 also It is used to quantify the degree of correlation between each word in the corpus data and the slot expressed by the sentence, for example, represented by the slot attention vector.
  • the intent attention vector output by the attention layer 1213 will be used as the input of the slot filling layer 1214 to guide the slot prediction to improve the accuracy of the slot prediction; the slot attention vector output by the attention layer 1213 is used as the slot The deviation value of the bit calculation to correct the deviation of the slot prediction calculation.
  • the attention layer 1213 takes the encoding vector sequence output by the BERT encoding layer 1211 as input, and based on the semantic information represented by the sentence vector h 0 and the semantic information represented by the word vectors h 1 , h 2 ,..., h t , the attention
  • the intent attention vector output by the layer can be understood as the correlation degree between the words corresponding to each word vector and the sentence expression intent corresponding to the sentence vector
  • the slot attention vector output by the attention layer can be understood as using It is used to quantify the degree of correlation between the word corresponding to each word vector and the slot expressed by the sentence corresponding to the sentence vector.
  • the semantic information represented by the sentence vector h 0 in the output coding vector sequence through the BERT coding layer may include 3 Possible intent labels: PLAY_MUSIC, PLAY_VIDEO, PLAY_VOICE, the word meaning information represented by word vectors h 1 , h 2 , ..., h t may include songName, videoName, mediaName, and the literal meaning of each word that composes the sentence.
  • the intent attention vector CI output by the attention layer 1213 (corresponding to: please play hello old times for me , play, play), in which the sentence "Please play for me The intention expressed by "playing your good old days” may be PLAY_MUSIC, PLAY_VIDEO, PLAY_VOICE, and "playing, playing" has a relatively high degree of correlation with the intention expressed in the sentence, "you, good, old, time, light, please, for, me ” is less or less relevant to the intent expressed by the sentence.
  • the correlation degree is 0.9, which means that the degree of correlation is relatively large; in the end, it can be concluded that the degree of correlation between "you, good, old, time, light” and the above three slots Both are relatively high, and "play, play, please, for, me” has a low or no correlation with the slot expressed in the sentence.
  • the intent attention vector output by the attention layer 1213 (help me book a train ticket from Train tickets from Shanghai to Beijing and book five-star hotels near Beijing Railway Station, book, book, train, train, ticket, hotel, shop), in which the sentence "Help me book train tickets from Shanghai to Beijing and book Beijing
  • the intention expressed by the five-star hotel near the railway station” may be to book a ticket or a hotel.
  • Sea, Beijing, Beijing, fire, train, station, five, star, grade, help, me” have a low degree of relevance or irrelevance to the intent expressed by the sentence.
  • the degree of correlation between " ⁇ " and the slot “Departure” is 0.9, while the correlation with the other 3 slots (destination, location, star rating) is 0.9.
  • a degree of 0.3 indicates that "up” has a high degree of correlation with the slot “departure”, and a low degree of correlation with the other three slots (destination, location, star rating).
  • the attention layer 1213 takes the encoding vector sequence ⁇ h 0 , h 1 , h 2 ,..., h t ⁇ output by the BERT encoding layer 1211 as input, and the attention layer 1213 extracts the semantic information represented by the sentence vector h 0 and the word vector h 1 , h 2 ,...,h t represents the semantic information, and outputs a hidden state vector at each time step t, which represents the extracted data before the previous moment (t-1 moment) of the corresponding time step t. Semantic information, word meaning information.
  • t
  • the attention vector calculation formula based on the attention mechanism is as follows:
  • Q in the above formula (2) represents the sentence vector h 0 in the encoding vector sequence input to the attention layer 1213
  • V represents the input to the attention layer 1213 at each time step t
  • the word vectors h 1 , h 2 ,...,h t in the encoded vector sequence of the attention vector obtained by the above formula (3) can quantify the degree of correlation between each word vector and the sentence vector.
  • the semantic information represented by the sentence vector h 0 contains all possible intent label information.
  • the sentence vector h 0 is combined with the attention vector calculated by the above formula (2) to obtain the intent attention vector C I , and the obtained intent
  • the attention vector CI is used to quantify the degree of correlation between the word corresponding to each word vector and the sentence expression intent corresponding to the sentence vector.
  • Q in the above formula (2) represents the hidden state vector C output by the attention layer 1213 at the previous moment (time t-1), and V represents the encoding vector sequence input to the attention layer 1213 ⁇ h 0 , h 1 , h 2 ,...,h t ⁇
  • the attention vector obtained by the above formula (2) can be combined with the hidden state vector at the previous moment to learn the correlation degree of the word vector processed at the current moment t.
  • the hidden state vector C output at time t-1 is calculated by the above formula (2).
  • the attention vector is combined to get the slot attention vector
  • the resulting slot attention vector It is used to quantify the degree of correlation between the word corresponding to each word vector and the slot expressed by the sentence corresponding to the sentence vector.
  • the attention layer 1213 can be obtained by training a Long Short Term Memory (LSTM) model and an attention mechanism.
  • LSTM Long Short Term Memory
  • the attention mechanism for the specific training process, please refer to the detailed description below, which will not be repeated here.
  • LSTM Long Short Term Memory
  • other neural network models that have the same functions as the LSTM model and the attention mechanism, as well as other neural network models that are used to learn the degree of correlation between the words in a sentence and the intent or slot expressed in the sentence can be used. mechanism, there is no restriction here.
  • the slot filling layer 1214 is used to predict candidate slots in the corpus data and fill in the slot value, wherein the slot filling layer 1214 can predict multiple slot labels in the corpus data, and retain the slot labels that meet the conditions Output as a candidate slot.
  • the slot filling layer 1214 uses the coding vector h t output by the BERT coding layer 1211 and the hidden state vector C output by the attention layer 1213 at time t-1 (That is, the semantic information of the sentence before the currently processed word or the semantic information of the word), and the intent attention vector CI and slot attention vector output by the attention layer 1213 at time t For input, output the candidate slot at time t.
  • the slot filling layer 1214 predicts possible slot labels based on the four input vectors at each time step t, and calculates the slot position reliability for the predicted slot label to determine whether the slot label satisfies the output condition.
  • the slot filling layer 1214 is based on an encoding vector including a word vector (containing the semantic information of each word in the corpus data to be parsed), the semantic information of the sentence before the currently processed word or the semantic information of the word, the currently processed word and sentence.
  • the degree of correlation of the expressed intention and the degree of correlation between the currently processed word and the slot expressed by the sentence obtain the possible slot labels of the corpus data to be parsed, and calculate the slot position reliability of each slot label, and then select the Slot labels that satisfy the condition or that have a correlation degree with the slot that is actually expressed by the corpus data to be analyzed exceeds the threshold are output as candidate slots.
  • the slot position reliability represents the closeness of the predicted slot label to the actual slot expressed by the corpus data, and may also be referred to as slot reliability.
  • the slot label with higher slot position reliability is closer to the real slot expressed by the corpus data.
  • a certain threshold can be set for the slot position reliability, for example, the threshold value of the slot position reliability is set to 0.5, the slot label whose slot position reliability is greater than or equal to the threshold satisfies the output condition, and the corresponding slot
  • the bit label will be output as a candidate slot; the slot label whose slot position reliability is less than the threshold does not meet the output condition, and its corresponding slot label will be deleted and will not be output from the slot filling layer 1214 .
  • the threshold set for the slot position reliability in the slot filling layer 1214 is 0.5. Then, in the slot label predicted by the 5 words "please, for, me, play, play” in the slot filling layer 1214, the slot position reliability (for example, 0.7) of slot O is greater than or equal to 0.5, and the other slots (such as songName), the slot position reliability (for example, 0.3) is less than 0.5. Therefore, the candidate slots corresponding to the output of the five words "please, for, me, play, play” are all O slots.
  • the slot filling layer 1214 is the slot label predicted by the five words "you, good, old, time, light", the slot position reliability of songName, videoName, and mediaName (for example, the slot position reliability is 0.86, 0.7, 0.55) is greater than or equal to 0.5, and the slot position reliability of slot O (for example, 0.3) is less than 0.5. Therefore, the candidate slots corresponding to "you" are songName-B, videoName-B, mediaName-B, "OK". , old, time, light” corresponding output candidate slots are songName-I, videoName-I, mediaName-I, among which, B marks the word of the starting position of the name, which means that "you” is the first in the name word; I marks the word after the start of the name. Since the O slot represents an empty slot or an unimportant slot, the output of the slot filling layer 1214 finally outputs three candidate slots songName, videoName, and mediaName, and fills each candidate slot with the slot value "you" good old times".
  • the threshold set for the slot position reliability in the slot filling layer 1214 is 0.5. .
  • the slot position reliability (for example, 0.7) of the slot label "Departure” is greater than or equal to 0.5, therefore, "Shanghai, Sea”
  • the candidate slots corresponding to the output of the two words are all "departures”;
  • the slot filling layer 1214 is the slot label predicted by "Beijing, Beijing"
  • the slot position reliability of the slot label "destination” (for example, 0.8 ) is greater than or equal to 0.5, therefore, the candidate slots corresponding to the output of the two words "Beijing, Beijing” are all "destination”;
  • the slot filling layer 1214 is the predicted slot of "Beijing, Beijing, train, train, station”
  • the slot position reliability (for example, 0.75) of the slot label "Location” is greater than or equal to 0.5.
  • the candidate slots corresponding to the output of the five words "Beijing, Beijing, Train, Station” are all "Location”; in the slot label predicted by the slot filling layer 1214 as “five, star, level", the slot location reliability (for example, 0.75) of the slot label “Location” is greater than or equal to 0.5, therefore, "five, The three words "star, level” correspond to the output candidate slots that are all "star”.
  • the slot filling layer 1214 finally outputs 4 candidate slots: departure, destination, location, star rating, and the slot value filled in the slot (departure) is (Shanghai), and the slot (destination)
  • the value of the filled slot is (Beijing)
  • the value of the filled slot (location) is (Beijing Railway Station)
  • the value of the filled slot (star) is (five star).
  • the vector h 0 is input as the initial value. Since the semantic information represented by the intent attention vector and the sentence vector includes all possible intent labels, the slot filling layer 1214 predicts possible slot labels based on the possible intent labels. The slot label is associated with the intent label. Therefore, the accuracy of slot prediction is greatly improved, and the speed or efficiency of slot prediction is also improved accordingly.
  • FIG. 3 the working process of the slot filling layer 1214 is shown in FIG. 3 :
  • the slot filling layer 1214 uses the coding vector h t output by the BERT coding layer 1211 at time t, the intent attention vector C I output by the attention layer 1213 at time t, and the slot attention vector and the hidden state vector C output by the attention layer 1213 at time t-1 as input.
  • the slot filling layer 1214 firstly models the relationship between the intent and the slot based on the slot gate mechanism, and obtains the intent attention vector C I and the slot attention vector
  • the fusion vector gS of further predicts the slot label corresponding to each time step t, and calculates the slot position reliability of each slot label.
  • v represents the random weight coefficient of the hyperbolic tangent function tanh(x) in the above formula (3)
  • W represents the random weight coefficient of the schematic attention vector C I
  • W greater than 1 means the schematic attention vector C I pair slot Bit prediction is more influential than the slot attention vector is greater, and W is less than 1, which means that the influence of the schematic attention vector C I on the slot prediction is greater than that of the slot attention vector.
  • the degree of influence is small, and W equal to 1 means that the influence degree of the schematic attention vector C I on the slot prediction and the slot attention vector the same degree of impact.
  • the slot filling layer 1214 can obtain a slot vector representing slot label information based on the above four input vectors and then based on the slot vector Calculate the slot position reliability of the corresponding slot label
  • the slot position reliability The calculation formula obtained after passing the Sigmoid activation function is as follows:
  • S is the number of slots
  • W S is the slot vector
  • b S represents the bias value
  • the layer 1214 uses the encoding vector h 3 (corresponding to: me), the hidden state vector C (corresponding to: is) and the intention attention vector C I (corresponding to: please play hello for me, respectively) output by the attention layer 1213 at time t-1.
  • the hidden state vector C includes the word meaning information passed by the word vector (corresponding: please), the word vector (Corresponding to: please) It also includes the semantic information transmitted by the sentence vector (corresponding: please play hello old times for me).
  • the slot filling layer 1214 uses the encoding vector h 6 (corresponding to: you) and the hidden state vector C output by the attention layer 1213 at time t-1 (corresponding to: put ), intent attention vector C I (corresponding to: please play hello old times for me, play, play) and slot attention vector (corresponding to: playing, you) as input; among them, the hidden state vector C (corresponding to: you ) includes the word meaning information passed by the word vector (corresponding: play), and the word vector (corresponding: put) also includes the word meaning information passed by the previous word vector (corresponding: playing), and so on, the word vector (corresponding: please ) also includes the semantic information passed by the sentence vector (corresponding: please play hello old times for me).
  • the reliability is 0.7, the slot position reliability of the slot label mediaName is 0.55, and the slot position reliability of the slot label O is 0.2, then, the final predicted slot for "you" is songName, videoName, mediaName, as the slot Fill the output of layer 1214.
  • the slot filling layer 1214 can be obtained by training based on the slot-gate mechanism, the LSTM model and the Sigmoid activation function.
  • the slot gate mechanism focuses on learning the relationship between the intent attention vector and the slot attention vector, and obtains a better semantic frame through global optimization.
  • the slot gate mechanism mainly uses the intent context vector to model the relationship between intent and slot to improve slot filling performance.
  • other deep neural network models with the same function as the LSTM model can be used as the decoder, and other functions with the same function as the Sigmoid function can also be used as the activation function of the corresponding deep neural network decoder. There are no restrictions.
  • the slot filling layer 1214 is used to sort out the correspondence between candidate intents and candidate slots.
  • the result obtained after the candidate intent corresponds to the candidate slot is output from the post-processing layer 1215 as the semantic parsing result.
  • the candidate intents PLAY_MUSIC, PLAY_VIDEO, PLAY_VOICE
  • the candidate slots SongName
  • videoName mediaName
  • the semantic parsing result output after inference and prediction based on the intent-slot mapping table in the post-processing layer 1215 is:
  • the candidate intents PLAY_MUSIC, PLAY_VIDEO, and PLAY_VOICE are the intents for parsing and identifying the corpus data
  • the candidate slots songName, videoName, and mediaName are the slots obtained by parsing the corpus data
  • "Hello, old time” is the filled slot value.
  • the candidate intent output by the intent classification layer 1212 (booking a ticket, booking a hotel), slot After the candidate slots (departure and destination) output by the filling layer 1214 are input to the post-processing layer 1215, the semantic parsing result output after inference and prediction based on the intent-slot mapping table in the post-processing layer 1215 is:
  • the candidate intent (booking a ticket, booking a hotel) is the intent to parse and identify the corpus data
  • the candidate slot (departure, destination, location, star rating) is the slot obtained by parsing the corpus data.
  • Beijing Railway Station, and Five Star are the slot values filled in for the corresponding slot (departure, destination, location, star rating).
  • FIG. 3 the working process of the post-processing layer 1215 is shown in FIG. 3 :
  • the post-processing layer 1215 takes the candidate intents obtained by the above-mentioned intent classification layer 1212 and the candidate slots obtained by the slot filling layer 1214 as input, and sorts out candidate intents and candidates based on the intent-slot mapping table obtained during the pre-training process of the semantic parsing model 121 . Correspondence between slots.
  • the intent slot mapping table obtained based on the pre-training process of the semantic parsing model 121 is described in detail below, and details are not repeated here.
  • the intent slot mapping table is based on the result of the candidate intent and candidate slot combing obtained by training a large number of samples. Therefore, in the process of performing the semantic parsing task, the intent slot can be continuously updated based on more corpus data in practical applications. Bitmap table.
  • the above BERT encoding layer 1211 , intent classification layer 1212 , attention layer 1213 , slot filling layer 1214 and post-processing layer 1215 together constitute the semantic parsing model 121 .
  • each layer in the structure of the semantic parsing model 121 needs to be pre-trained with a large amount of sample expected data so that it has the corresponding function of each layer above.
  • the semantic parsing model 121 is pre-trained by the server 200. Afterwards, the trained semantic parsing model 121 can either be transplanted to the mobile phone 100 to directly perform the semantic parsing task, or it can continue to exist in the server 200 to execute data from the mobile phone 100. The semantic parsing task requested to be performed.
  • the pre-training process of the semantic parsing model 121 will be described in detail below.
  • the pre-training process of the semantic parsing model 121 includes:
  • the server 200 collects sample corpus data for training the semantic parsing model 121 .
  • the collected samples are expected to cover as many fields as possible and as many verbs, proper nouns, common nouns, etc. as possible, so that the generalization performance of the trained semantic parsing model 121 will be better.
  • sample corpus data used for training the semantic parsing model 121 needs to be input into the layers of the semantic parsing model 121 for training in batches.
  • concepts related to sample data are introduced below.
  • (a) batch batch
  • the loss function required for each parameter update of deep learning is not obtained by a data label ⁇ data: label ⁇ , but by a set of data weighted, the number of this set of data is batchsize .
  • batchsize batch size, the number of samples in a batch. Each training takes batchsize samples in the training set for training.
  • (c) iteration The number of iterations is the number of times the batch needs to complete an epoch. 1 iteration is equal to using batchsize samples to train once; in an epoch, the number of batches and iterations are equal.
  • (d) epoch When a complete dataset passes through the neural network once and returns once, the process is called an epoch. That is to say, 1 epoch is equivalent to using all the samples in the training set to train once.
  • training the entire sample set requires 100 iterations and 1 epoch.
  • training the entire sample set requires 100 iterations and 1 epoch.
  • a dataset with 2000 training samples Divide 2000 samples into batches of size 500, then it takes 4 iterations to complete an epoch.
  • the server 200 performs data preprocessing on the sample corpus data to be input into the training of the semantic parsing model 121 through the NLP module.
  • data preprocessing of the sample corpus data please refer to the relevant description of the data preprocessing in the BERT coding layer 1211 above, which will not be repeated here.
  • a Token sequence, a segment mark and a mask corresponding to the Token sequence corresponding to the piece of sample corpus data are obtained.
  • the server 200 In an epoch training, the server 200 respectively inputs the Token sequence, the segmentation mark and the mask corresponding to the Token sequence obtained by data preprocessing for each sample corpus into the BERT coding layer 1211 in the semantic parsing model 121 for training, so that the It can output a sequence of encoded vectors as described in the BERT encoding layer 1211 above.
  • the BERT coding layer 1211 is obtained based on the training of the BERT model. During the training process, it is necessary to continuously fine-tune the upstream and downstream parameters of the semantic parsing model 121, so that the BERT coding layer can output the above coding vector sequence ⁇ h 0 , h 1 , h 2 , ..., h t ⁇ .
  • the server 200 In an epoch training, the server 200 respectively inputs the sentence vector h 0 output by the BERT encoding layer 1211 in the above process 503 into the intent classification layer 1212 in the semantic parsing model 121 for training, so that it can output the intent classification layer 1212 as above.
  • the candidate intents described in are not repeated here.
  • the intent classification layer 1212 is obtained by training based on the fully connected layer and the Sigmoid function as the activation function. During the training process, it is necessary to continuously fine-tune the upstream and downstream parameters of the semantic parsing model 121, so that the intent classification layer 1212 expects the learning of the data after a long enough time or a large enough number of samples. After that, all possible intent labels and the intent confidence corresponding to each intent label can be extracted, and then multiple intent labels that meet the output conditions can be extracted as candidate intents, which are output from the intent classification layer 1212. For details, please refer to the above formula (1) and related descriptions , and will not be repeated here.
  • the candidate intents output by the intent classification layer 1212 are input to the post-processing layer 1215.
  • the server 200 respectively inputs the coding vector sequence ⁇ h 0 , h 1 , h 2 , , h t ⁇ output by the BERT coding layer 1211 trained in the above process 503 into the attention in the semantic parsing model 121
  • the force layer 1213 is trained to output the intent attention vector CI and the slot attention vector as described in the attention layer 1213 above It is not repeated here.
  • the attention layer 1213 is obtained from training based on the attention mechanism and the LSTM model. During the training process, it is necessary to continuously fine-tune the upstream and downstream parameters in the semantic parsing model 121, so that the attention layer 1213 can quantify the degree of relevance of the expression intent of the word pair corresponding to each word vector. , and quantify the correlation degree of the slot represented by the word pair corresponding to each word vector, and finally output the intent attention vector and the slot attention vector.
  • the attention layer 1213 can quantify the degree of relevance of the expression intent of the word pair corresponding to each word vector. , and quantify the correlation degree of the slot represented by the word pair corresponding to each word vector, and finally output the intent attention vector and the slot attention vector.
  • the LSTM model is a special RNN model, which is proposed to solve the problem of gradient dispersion of the RNN model. Its core is the cell state, which is temporarily called the cell state. It can also be understood as a conveyor belt, which is actually the memory in the entire model. space changes over time.
  • the working principle of the LSTM model can be simply described as: (1) forget gate: choose to forget some information in the past: (2) input gate: remember some information in the present: (3) merge the past and present memory: (4) )output gate: choose to output some information.
  • the attention mechanism imitates the internal process of biological observation behavior, that is, a mechanism that aligns internal experience and external sense to increase the fineness of observation in some areas, and can use limited attention resources to quickly screen out high-value information from a large amount of information. .
  • Attention mechanism The attention mechanism can quickly extract important features of sparse data.
  • the essential idea of the attention mechanism can be rewritten as the following formula:
  • Lx
  • represents the length of Source
  • the meaning of the formula is to imagine that the constituent elements in Source are composed of a series of ⁇ Key, Value> data pairs.
  • an element Query in the target Target is given.
  • the weight coefficient of each Key corresponding to Value is obtained, and then the weighted sum of Value is obtained, that is, the final Attention value is obtained.
  • the Attention mechanism is to weight and sum the Value values of the elements in the Source, and Query and Key are used to calculate the weight coefficient of the corresponding Value.
  • the server 200 respectively uses the encoding vector h t output by the BERT coding layer 1211 trained in the above process 503 at time t, and the intent attention vector output by the attention layer 1213 trained in the above process 505 at time t.
  • the hidden state vector C output by the LSTM model at time t-1 in the attention layer 1213 (that is, the semantic information of the sentence before the currently processed word or the word meaning information of the word) is input into the semantic analysis model 121
  • the slot filling layer 1214 is trained , so that it can output candidate slots as described in the slot filling layer 1214 above, which will not be repeated here.
  • the slot filling layer 1214 is obtained by training based on the slot gate mechanism, the LSTM model as the decoder, and the Sigmoid function as the activation function. During the training process, it is necessary to continuously fine-tune the upstream and downstream parameters of the semantic parsing model 121, so that the slot filling layer 1214 has a long enough time. Or a large enough number of samples are expected to be able to predict all possible slot labels and the slot position reliability corresponding to each slot label corresponding to the possible intent labels after the learning of the data, and then extract multiple candidate slots that meet the output conditions.
  • For the output of the bit filling layer 1214 refer to the above formulas (3) to (4) and related descriptions for details, which will not be repeated here.
  • the candidate slots output by the slot filling layer 1214 are input to the post-processing layer 1215 .
  • the server 200 determines whether the training results of the above-mentioned processes 501-506 satisfy the training termination condition. If the training result satisfies the training termination condition, go to 508 ; if the training result does not satisfy the training termination condition, go to 509 .
  • an Early Stopping mechanism may be used to determine the termination of model training. That is, when the number of training epochs reaches the number threshold or the epoch interval with the last optimal model is greater than the set interval threshold, the training result satisfies the training termination condition; otherwise, the training result does not meet the training termination condition.
  • the early stopping mechanism can make the trained neural network model have good generalization performance, that is, it can fit the data well. Its basic meaning is to calculate the performance of the model on the validation set during training. When performance starts to drop, stop training to avoid overfitting problems caused by continuing training.
  • the server 200 terminates the training of the BERT encoding layer 1211, the intent classification layer 1212, the attention layer 1213, and the slot filling layer 1214 in the semantic parsing model 121, and further adds a large number of candidate intents and A large number of candidate slots are input into the post-processing layer 1215 in the semantic parsing model 121 to sort out relationships, for example, sorting out candidate slots based on candidate intents to obtain an intent-slot mapping table.
  • the semantic parsing model training ends.
  • the candidate intent and the candidate slot will be obtained corresponding to each sample corpus data after the training of the above processes 502 to 506.
  • the candidate intent and candidate slot of the input post-processing layer 1215 are also sufficient. pass.
  • the post-processing layer 1215 is trained based on a sufficient number of candidate intents and candidate slots, so that it can sort out candidate slots based on the candidate intents, and output an ordered correspondence between intents and slots, for example, training to obtain one intent Slot mapping table. Based on the intent slot mapping table, the post-processing layer 1215 can accurately and quickly find the corresponding relationship between the candidate intent and the candidate slot for the candidate intent and the candidate slot input.
  • the server 200 continues to input the sample corpus data of the next epoch and repeats the processes 502 to 507 to continue training the semantic parsing model 121 .
  • the objective loss function adopted for the joint optimization of intent and slot is added by the intent classification loss function, the slot filling loss function and the regularization term of the weight.
  • the intent classification loss function adopts the multi-label Sigmoid cross entropy loss (Cross Entropy Loss) function
  • the slot filling loss function adopts the serialized multi-label Sigmoid Cross Entropy Loss function.
  • the calculation formula of Sigmoid Cross Entropy Loss is deduced as follows:
  • L y (y,f(x)) is the intent classification loss function calculated according to the above formula 6
  • L c (y,f(x)) is the slot filling loss function calculated according to the above formula 6
  • is the super Parameter
  • m is the number of data in a batch
  • the reason for dividing by 2 is to cancel it out when derivation; represents the sum of the W parameters of the lth layer; is a matrix, and k and j represent the rows and columns of the matrix.
  • the joint optimization function is mainly to jointly optimize the intention classification loss or slot filling loss generated in the process of matrix transformation in the neural network.
  • the semantic parsing model 121 trained by the server 200 can parse the corpus data to be parsed into candidate intents and candidate slots that are closer to the real intent and the real slot.
  • the trained semantic parsing model 121 can either be transplanted to the mobile phone 100 to directly perform the semantic parsing task, or can continue to exist in the server 200 to execute requests from the mobile phone 100 Semantic parsing tasks performed. Specifically, as shown in FIG. 6 , the user enters a voice command by waking up the voice assistant of the mobile phone 100 , and the mobile phone 100 extracts one or more intents and information corresponding to the user's voice command through the internal human-machine dialogue system 110 based on the above semantic analysis model 121 .
  • the mobile phone 100 further performs corresponding operations based on the identified intent and the slot, for example, opening an application software, or performing a web page search.
  • the specific interaction process between the mobile phone 100 with the semantic parsing model 121 transplanted and the user please refer to the following examples:
  • the mobile phone 100 obtains the user's voice instruction.
  • a voice assistant is installed in the mobile phone 100 , and the user can send a voice command to the mobile phone 100 by waking up the voice assistant of the mobile phone 100 .
  • the mobile phone 100 acquires the user's voice instruction "help me book a train ticket from Shanghai to Beijing and book a five-star hotel near Beijing Railway Station".
  • the speech recognition module 111 in the man-machine dialogue system 110 of the mobile phone 100 recognizes and converts the acquired user speech instruction into corpus data in the form of text. For example, converting the above voice command into textual corpus data "help me book a train ticket from Shanghai to Beijing and book a five-star hotel near Beijing Railway Station".
  • the semantic parsing module 112 in the human-machine dialogue system 110 of the mobile phone 100 is configured to perform semantic parsing on the corpus data to obtain a semantic parsing result intended to correspond to the slot.
  • the semantic parsing module 112 preprocesses the corpus data to obtain a Token sequence, a sentence segmentation mark, and a mask created corresponding to the Token sequence.
  • the semantic parsing module 112 uses the Token sequence, the segmentation mark and the mask created corresponding to the Token sequence as the input of the semantic parsing model 121, performs semantic parsing, extracts multiple candidate intents and multiple candidate slots, and finally the semantic parsing model 121 After sorting out the correspondence between multiple candidate intents and multiple candidate slots, it is output as the semantic parsing result.
  • a simple single-intent corpus can also be parsed by the semantic parsing model 121 to extract a single candidate intent and one or more corresponding candidate slots, which is not limited herein.
  • the semantic analysis result obtained by the semantic analysis module 112 of the human-machine dialogue system 110 through the semantic analysis model 121 is: :
  • the problem solving module 113 in the human-machine dialogue system 110 of the mobile phone 100 searches for a corresponding application or network resource based on the semantic analysis result obtained by the semantic analysis module 112 to obtain a solution to the intent and slot in the semantic analysis result.
  • the solution searched by the problem solving module 114 is that the mobile phone 100 can open the installed booking service software application or travel software application to query
  • the train ticket information and hotel information are available for the user to choose to book, or select a train ticket by default according to the user's historical usage records. Enter the booking interface and ask the user to confirm.
  • the mobile phone interface is shown in Figure 7.
  • the intent and slot mapping result obtained by parsing the corpus data identified by the instruction include the user's three intents and relative to each 3 slots for each intention, and the slot value filled in each slot, then the mobile phone 100 can open the music player software to play the local music "Hello Old Times” by default based on the user's usage habits, or open the audio player software to obtain Music or video files about "Hello Old Times" for users to choose to play.
  • the language generation module 114 in the man-machine dialogue system 110 of the mobile phone 100 generates a natural language sentence for the solution found by the problem solving module 113 , and feeds it back to the user through the display interface of the mobile phone 100 .
  • the solution searched by the above problem solving module 114 is:
  • the mobile phone 100 can open the installed booking service software application or travel software application to query train ticket information and hotel information for the user to select and reserve, or select a train ticket by default according to the user's historical usage record to enter the reservation interface and ask the user to confirm.
  • the language generation module 114 can correspondingly generate the train number information of the train ticket or the introduction information of the hotel, and feed it back to the user through the display interface of the mobile phone 100 , as shown in FIG. 7 .
  • the user's voice command obtained by the mobile phone 100 is to query the weather for the last three days.
  • the solution searched by the problem solving module 113 is to open the browser on the mobile phone 100 or open the browser installed on the mobile phone 100.
  • the weather query software searches for the weather conditions of the last three days.
  • the language generation module 114 generates natural language texts from the searched weather conditions as follows:
  • the weather today is 28-32°C;
  • the dialogue management module 115 in the human-machine dialogue system 110 of the mobile phone 100 may schedule other modules based on the user's dialogue history to further improve the accurate understanding of the user's voice command. For example, in the process of searching the weather by the problem solving module 113, the location is not clearly indicated in the user's voice command, then the dialogue management module 115 can schedule the problem solving module 113 based on the user's dialogue history to search for Beijing, which is frequently inquired by the user, as a search address, and provide feedback to the user.
  • the dialogue management module 115 can also dispatch the problem solving module 113 based on the location information of the mobile phone 100 to search for the weather in the current location of the user for the past three days, and further dispatch the language generation module 114 to generate the following natural language sentences:
  • the weather today is 28-32°C;
  • the dialogue management module 115 in the human-machine dialogue system 110 of the mobile phone 100 can flexibly schedule other modules in the human-machine dialogue system 110 to perform corresponding functions.
  • the speech synthesis module 116 in the man-machine dialogue system 110 of the mobile phone 100 further synthesizes and converts the natural language sentences generated by the language generation module 114 into speech, which is played back to the user through the mobile phone 100 .
  • the weather conditions generated by the language generation module 114 in the above process 605 are converted into voice and played to the user, so that the user can hear the weather conditions without looking at the mobile phone.
  • the trained semantic parsing model 121 may also continue to exist in the server 200 to perform the semantic parsing task requested from the mobile phone 100 .
  • the user inputs voice commands by waking up the voice assistant of the mobile phone 100, the mobile phone 100 converts the user's voice commands into corpus data through the internal man-machine dialogue system 110, and the mobile phone 100 interacts with the server 200 to send the converted corpus data to the server 200 for semantic processing.
  • the server 200 extracts multiple candidate intents and candidate slots corresponding to the intents in the user's voice instruction based on the semantic parsing model 121 . Further, the server 200 feeds back the extracted intent and the corresponding result of the slot to the mobile phone 100, and the mobile phone 100 further performs corresponding operations based on the identified intent and the slot, such as opening an application software or performing a web page search.
  • FIG. 8 shows a schematic structural diagram of a mobile phone 100 according to an embodiment of the present application.
  • the mobile phone 100 may include a processor 101, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, Mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and user Identity module (subscriber identification module, SIM) card interface 195 and so on.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
  • the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the mobile phone 100 .
  • the mobile phone 100 may include more or less components than shown, or some components are combined, or some components are separated, or different components are arranged.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the mobile phone 100 can obtain the user's voice command and feed back the response voice to the user through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor.
  • the mobile phone 100 obtains the user's voice command through the receiver 170B or the microphone 170C, and sends the obtained user's voice command to the human-machine dialogue system 110 for voice recognition and semantic analysis.
  • the corresponding solution is matched and executed through the mobile phone 100.
  • the corresponding operation is used to realize the solution corresponding to the semantic parsing result.
  • the man-machine dialogue system 110 can also generate a response voice from the solution corresponding to the semantic analysis result and feed back the response voice to the user through the speaker 170A of the mobile phone 100 or the earphone plugged in the earphone interface 170D.
  • the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 101 , or some functional modules of the audio module 170 may be provided in the processor 101 .
  • Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also referred to as "earpiece" is used to convert audio electrical signals into sound signals.
  • the voice can be answered by placing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
  • the processor 101 may include one or more processing units, for example, the processor 101 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal Image signal processor (ISP), controller, video codec, digital signal processor (DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • the processor 101 realizes the function of the semantic parsing model 121 by running the program, and the human-machine dialogue system 110 converts the user's voice command recognition into text corpus data, which is input into the semantic parsing model 121 run by the processor 101 after data preprocessing for semantic parsing. , get the semantic parsing result.
  • the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 101 for storing instructions and data.
  • the memory in processor 101 is a cache memory.
  • the memory may hold instructions or data that have just been used or recycled by the processor 101 . If the processor 101 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 101 is reduced, thereby improving the efficiency of the system.
  • the processor 101 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a general-purpose input/output (GPIO) interface, a SIM interface, and/or or USB interface, etc.
  • the interface connection relationship between the modules illustrated in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the mobile phone 100 .
  • the mobile phone 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from the wired charger through the USB interface 130.
  • the charging management module 140 may receive wireless charging input through the wireless charging coil of the mobile phone 100 . While the charging management module 140 charges the battery 142 , it can also supply power to the electronic device through the power management module 141 .
  • the power management module 141 is used to connect the battery 142 , the charging management module 140 and the processor 101 .
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 101, the internal memory 121, the display screen 194, the camera 193, and the wireless communication module 160.
  • the wireless communication function of the mobile phone 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in handset 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the mobile phone 100 . .
  • the wireless communication module 160 can provide applications on the mobile phone 100 including wireless local area networks (WLAN), such as wireless fidelity (Wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • Wi-Fi wireless fidelity
  • BT global navigation satellite system
  • GNSS global navigation satellite system
  • frequency modulation frequency modulation, FM
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the antenna 1 of the mobile phone 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the mobile phone 100 can communicate with the network and other devices through wireless communication technology.
  • the mobile phone 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
  • Display screen 194 is used to display images, videos, and the like. Display screen 194 includes a display panel. In some embodiments, the handset 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the SIM card interface 195 is used to connect a SIM card.
  • the present disclosure also relates to apparatuses for performing operations in text.
  • This apparatus may be specially constructed for the required purposes or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer.
  • a computer program may be stored on a computer readable medium, such as, but not limited to, any type of disk, including floppy disks, optical disks, CD-ROMs, magneto-optical disks, read only memory (ROM), random access memory (RAM) , EPROM, EEPROM, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of medium suitable for storing electronic instructions, and each may be coupled to a computer system bus.
  • the computers referred to in the specification may include a single processor or may be architectures employing multiple processors for increased computing power.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

La présente invention a trait au domaine technique des dialogues homme-machine, et concerne en particulier un dispositif électronique et un procédé d'analyse sémantique associé, ainsi qu'un support et un système de dialogue homme-machine. Le procédé d'analyse sémantique selon l'invention consiste : à obtenir des données de corpus à analyser ; à calculer le degré de corrélation d'intention entre un mot compris dans lesdites données de corpus et une intention représentée par lesdites données de corpus, et le degré de corrélation de créneau entre le mot et un créneau représenté par lesdites données de corpus ; et à prédire le créneau desdites données de corpus en fonction des informations sémantiques du mot, des informations sémantiques précédentes du mot, et du degré de corrélation d'intention et du degré de corrélation de créneau du mot. Une pluralité d'intentions proches de l'intention réelle d'un utilisateur sont reconnues à partir de la voix de l'utilisateur, puis des informations de créneau sont prédites au moyen de la pluralité d'intentions reconnues, ce qui améliore la précision de remplissage de créneau, améliore également de façon correspondante la vitesse ou l'efficacité de remplissage de créneau, et améliore encore la précision d'analyse sémantique dans un dialogue homme-machine.
PCT/CN2021/117251 2020-09-15 2021-09-08 Dispositif électronique et procédé d'analyse sémantique associé, support et système de dialogue homme-machine WO2022057712A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010970477.8A CN114186563A (zh) 2020-09-15 2020-09-15 电子设备及其语义解析方法、介质和人机对话系统
CN202010970477.8 2020-09-15

Publications (1)

Publication Number Publication Date
WO2022057712A1 true WO2022057712A1 (fr) 2022-03-24

Family

ID=80539263

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/117251 WO2022057712A1 (fr) 2020-09-15 2021-09-08 Dispositif électronique et procédé d'analyse sémantique associé, support et système de dialogue homme-machine

Country Status (2)

Country Link
CN (1) CN114186563A (fr)
WO (1) WO2022057712A1 (fr)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114818659A (zh) * 2022-06-29 2022-07-29 北京澜舟科技有限公司 一种文本情感来源分析方法、系统及存储介质
CN115358186A (zh) * 2022-08-31 2022-11-18 南京擎盾信息科技有限公司 一种槽位标签的生成方法、装置及存储介质
CN115934922A (zh) * 2023-03-09 2023-04-07 杭州心识宇宙科技有限公司 一种对话业务执行方法、装置、存储介质及电子设备
CN116050427A (zh) * 2022-12-30 2023-05-02 北京百度网讯科技有限公司 信息生成方法、训练方法、装置、电子设备以及存储介质
CN116227496A (zh) * 2023-05-06 2023-06-06 国网智能电网研究院有限公司 一种基于深度学习的电力舆情实体关系抽取方法及系统
CN116227629A (zh) * 2023-05-10 2023-06-06 荣耀终端有限公司 一种信息解析方法、模型训练方法、装置及电子设备
CN116959442A (zh) * 2023-07-29 2023-10-27 浙江阳宁科技有限公司 用于智能开关面板的芯片及其方法
CN117238277A (zh) * 2023-11-09 2023-12-15 北京水滴科技集团有限公司 意图识别方法、装置、存储介质及计算机设备

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115440200B (zh) * 2021-06-02 2024-03-12 上海擎感智能科技有限公司 车机系统的控制方法及控制系统
CN115292463B (zh) * 2022-08-08 2023-05-12 云南大学 一种基于信息抽取的联合多意图检测和重叠槽填充的方法
CN115934913B (zh) * 2022-12-23 2024-03-22 国义招标股份有限公司 基于深度学习数据生成的碳排放核算方法和系统
CN115906874A (zh) * 2023-03-08 2023-04-04 小米汽车科技有限公司 语义解析方法、系统、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309276A (zh) * 2018-03-28 2019-10-08 蔚来汽车有限公司 电动汽车对话状态管理方法及系统
CN110309277A (zh) * 2018-03-28 2019-10-08 蔚来汽车有限公司 人机对话语义解析方法和系统
US20190385611A1 (en) * 2018-06-18 2019-12-19 Sas Institute Inc. System for determining user intent from text
CN110705267A (zh) * 2019-09-29 2020-01-17 百度在线网络技术(北京)有限公司 语义解析方法、装置及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309276A (zh) * 2018-03-28 2019-10-08 蔚来汽车有限公司 电动汽车对话状态管理方法及系统
CN110309277A (zh) * 2018-03-28 2019-10-08 蔚来汽车有限公司 人机对话语义解析方法和系统
US20190385611A1 (en) * 2018-06-18 2019-12-19 Sas Institute Inc. System for determining user intent from text
CN110705267A (zh) * 2019-09-29 2020-01-17 百度在线网络技术(北京)有限公司 语义解析方法、装置及存储介质

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114818659B (zh) * 2022-06-29 2022-09-23 北京澜舟科技有限公司 一种文本情感来源分析方法、系统及存储介质
CN114818659A (zh) * 2022-06-29 2022-07-29 北京澜舟科技有限公司 一种文本情感来源分析方法、系统及存储介质
CN115358186B (zh) * 2022-08-31 2023-11-14 南京擎盾信息科技有限公司 一种槽位标签的生成方法、装置及存储介质
CN115358186A (zh) * 2022-08-31 2022-11-18 南京擎盾信息科技有限公司 一种槽位标签的生成方法、装置及存储介质
CN116050427A (zh) * 2022-12-30 2023-05-02 北京百度网讯科技有限公司 信息生成方法、训练方法、装置、电子设备以及存储介质
CN116050427B (zh) * 2022-12-30 2023-10-27 北京百度网讯科技有限公司 信息生成方法、训练方法、装置、电子设备以及存储介质
CN115934922A (zh) * 2023-03-09 2023-04-07 杭州心识宇宙科技有限公司 一种对话业务执行方法、装置、存储介质及电子设备
CN115934922B (zh) * 2023-03-09 2024-01-30 杭州心识宇宙科技有限公司 一种对话业务执行方法、装置、存储介质及电子设备
CN116227496A (zh) * 2023-05-06 2023-06-06 国网智能电网研究院有限公司 一种基于深度学习的电力舆情实体关系抽取方法及系统
CN116227496B (zh) * 2023-05-06 2023-07-14 国网智能电网研究院有限公司 一种基于深度学习的电力舆情实体关系抽取方法及系统
CN116227629B (zh) * 2023-05-10 2023-10-20 荣耀终端有限公司 一种信息解析方法、模型训练方法、装置及电子设备
CN116227629A (zh) * 2023-05-10 2023-06-06 荣耀终端有限公司 一种信息解析方法、模型训练方法、装置及电子设备
CN116959442A (zh) * 2023-07-29 2023-10-27 浙江阳宁科技有限公司 用于智能开关面板的芯片及其方法
CN116959442B (zh) * 2023-07-29 2024-03-19 浙江阳宁科技有限公司 用于智能开关面板的芯片及其方法
CN117238277A (zh) * 2023-11-09 2023-12-15 北京水滴科技集团有限公司 意图识别方法、装置、存储介质及计算机设备
CN117238277B (zh) * 2023-11-09 2024-01-19 北京水滴科技集团有限公司 意图识别方法、装置、存储介质及计算机设备

Also Published As

Publication number Publication date
CN114186563A (zh) 2022-03-15

Similar Documents

Publication Publication Date Title
WO2022057712A1 (fr) Dispositif électronique et procédé d'analyse sémantique associé, support et système de dialogue homme-machine
CN111933129B (zh) 音频处理方法、语言模型的训练方法、装置及计算机设备
WO2021072875A1 (fr) Procédé de génération de dialogue intelligent, dispositif, appareil informatique et support de stockage informatique
CN113205817B (zh) 语音语义识别方法、系统、设备及介质
CN110516253B (zh) 中文口语语义理解方法及系统
JP2021021955A (ja) 声紋の作成・登録の方法及び装置
CN112100349A (zh) 一种多轮对话方法、装置、电子设备及存储介质
WO2021147041A1 (fr) Procédé et appareil d'analyse sémantique, dispositif, et support de stockage
CN108986790A (zh) 语音识别联系人的方法和装置
CN110047481A (zh) 用于语音识别的方法和装置
WO2021218028A1 (fr) Procédé, appareil et dispositif d'affinage de contenu d'entretien basé sur l'intelligence artificielle, et support
CN112052333B (zh) 文本分类方法及装置、存储介质和电子设备
CN111062217A (zh) 语言信息的处理方法、装置、存储介质及电子设备
CN112101044B (zh) 一种意图识别方法、装置及电子设备
CN116955699B (zh) 一种视频跨模态搜索模型训练方法、搜索方法及装置
CN110866090A (zh) 用于语音交互的方法、装置、电子设备和计算机存储介质
CN112632244A (zh) 一种人机通话的优化方法、装置、计算机设备及存储介质
US20190303393A1 (en) Search method and electronic device using the method
CN112669842A (zh) 人机对话控制方法、装置、计算机设备及存储介质
WO2023272616A1 (fr) Procédé et système de compréhension de texte, dispositif terminal et support de stockage
CN110647613A (zh) 一种课件构建方法、装置、服务器和存储介质
CN106971721A (zh) 一种基于嵌入式移动设备的地方口音语音识别系统
CN113393841B (zh) 语音识别模型的训练方法、装置、设备及存储介质
KR102297480B1 (ko) 비정형 질문 또는 요구 발화의 구조화된 패러프레이징 시스템 및 방법
CN110809796B (zh) 具有解耦唤醒短语的语音识别系统和方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21868537

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21868537

Country of ref document: EP

Kind code of ref document: A1