WO2021190259A1 - 一种槽位识别方法及电子设备 - Google Patents

一种槽位识别方法及电子设备 Download PDF

Info

Publication number
WO2021190259A1
WO2021190259A1 PCT/CN2021/078762 CN2021078762W WO2021190259A1 WO 2021190259 A1 WO2021190259 A1 WO 2021190259A1 CN 2021078762 W CN2021078762 W CN 2021078762W WO 2021190259 A1 WO2021190259 A1 WO 2021190259A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
word segmentation
slot
word
sequence
Prior art date
Application number
PCT/CN2021/078762
Other languages
English (en)
French (fr)
Inventor
季冬
孟函可
祝官文
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021190259A1 publication Critical patent/WO2021190259A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72433User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for voice messaging, e.g. dictaphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering

Definitions

  • This application relates to the field of terminal technology, and in particular to a slot identification method and electronic equipment.
  • the user's input can be inquiring about weather, ordering air tickets, medical consultation and other questioning information.
  • the man-machine dialogue system can feed back response dialogue information to the user based on the user's questioning information.
  • the question information input by the user can be "What's the weather like tomorrow in Beijing”
  • the man-machine dialogue system can search the preset database and feed back to the user the response dialogue message "Tomorrow's weather in Beijing will be sunny to cloudy”. It can be seen that, for a human-machine dialogue system, it is very important to accurately identify the questioning information input by the user.
  • intent recognition and slot filling technology are the key to ensuring the accuracy of speech recognition results.
  • the intent recognition model can be abstracted as a classification problem, and then the intent recognition model can be trained using the classifier of convolution and knowledge representation.
  • intent recognition model in addition to embedding the user’s speech questions into words, knowledge is also introduced.
  • the model has the defect of slot filling deviation in practical applications, which affects the accuracy of the intention recognition model.
  • slot filling the essence is to formalize the sentence sequence into a labeling sequence.
  • labeling sequences There are many commonly used methods for labeling sequences, such as hidden Markov models or conditional random field models, but these slot filling models are used in specific applications
  • the lack of contextual information will lead to ambiguities in slots under different semantic intents, and thus cannot meet actual application requirements. It can be seen that the training of the two models in the prior art is carried out independently, and there is no combined optimization for the intent recognition task and the slot filling task, which ultimately leads to the problem of low recognition accuracy of the trained model in speech recognition. Reduce the user experience.
  • the present application provides a slot recognition method and electronic equipment, which are used to perform joint optimization training on intent recognition and slot filling, and use a joint training model to recognize voice conversations, so as to improve the accuracy of voice recognition.
  • an embodiment of the present application provides a slot identification method, which can be executed by a human-machine dialogue system or a man-machine dialogue device.
  • the method includes: preprocessing user commands to obtain the original word sequence, and by performing BERT encoding processing on the original word sequence, obtaining the intent vector and the hidden state vector of each word segmentation, and executing for any word segmentation, that is, the first word segmentation
  • the processing is as follows: According to the hidden state vector and the intention vector of the first word segmentation, the attention vector of the first word segmentation is determined.
  • the intention is used as the input of the slot filling task to correct the slot prediction result, thereby helping to improve the accuracy of the dialogue system in understanding the user's request information and improving the user experience.
  • the attention vector of the participle can be determined by further combining the slot probability vector corresponding to the previous participle That is to say, the attention vector of the word segmentation can be determined according to the hidden state vector of the word segmentation, the intention vector, and the slot probability vector corresponding to the previous word segmentation of the word segmentation.
  • the intent and the slot of the previous word segmentation are used as the input of the slot filling task of the next word segmentation, which helps to further correct the slot prediction results, thereby helping to further improve the dialogue system’s understanding of users.
  • the accuracy of the requested information is used as the input of the slot filling task of the next word segmentation.
  • the method of preprocessing the user command to obtain the original word sequence includes: first generating the Token sequence according to the user command, then randomly sorting the Token sequence, and dividing the Token sequence into multiple batches according to batch_size Token sequence; Finally, truncation or filling operation is performed on the Token sequence of each Batch to obtain the preprocessed original word sequence.
  • the user commands are preprocessed in the above manner, which helps to filter out invalid information.
  • the specific way to generate the hidden state vector can be: first perform BERT semantic encoding on the original word sequence, then generate the vector sequence h 0 , h 1 ,..., h T , where h 0 is the user command
  • the sentence vector encoding information of h 1 ,..., h T is the hidden state vector corresponding to the T segmentation; then, according to the sentence vector encoding information h 0 of the user command, the intent vector of the user command is generated, wherein the intention Vector satisfies Among them, y I ⁇ R 1 ⁇ I , I represents the number of possible intentions of the user command, the intention corresponding to the maximum probability value in y I is the intention of the user command, and h 0 is the sentence vector of the user command Encoding information, Is the bias term, Is the weight matrix.
  • the specific probability vector calculation method may comprise slots Perform splicing to generate deep vector coding information, deep vector coding information satisfy Among them, concat is the splicing operation function, Represents the deep vector coding information after splicing. Then, the deep vector encoding information Perform a logistic regression model softmax conversion to obtain the slot probability vector of the first word segmentation, and the slot probability vector of the first word segmentation satisfy Among them, softmax represents the normalized exponential function, Represents the weight matrix, Represents the deep vector coding information, the Represents the bias term.
  • an embodiment of the present application provides an electronic device including a processor and a memory, where the memory is used to store one or more computer programs; when the one or more computer programs stored in the memory are executed by the processor,
  • the electronic device can implement any possible design method in any of the above-mentioned aspects.
  • an embodiment of the present application further provides a device, which includes a module/unit that executes any one of the possible design methods in any of the foregoing aspects.
  • modules/units can be realized by hardware, or by hardware executing corresponding software.
  • an embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium includes a computer program.
  • the computer program runs on an electronic device, the electronic device executes any of the above aspects. Any one of the possible design methods.
  • the embodiments of the present application also provide a method that includes a computer program product, which when the computer program product runs on a terminal, causes the electronic device to execute any one of the possible designs in any of the above-mentioned aspects.
  • an embodiment of the present application further provides a chip, which is coupled with a memory and is used to execute a computer program stored in the memory to execute any possible design method of any one of the above aspects.
  • FIG. 1 is a schematic diagram of a possible dialog system architecture to which an embodiment of this application is applicable;
  • Figure 2 is a schematic diagram of a joint training model provided by an embodiment of the application.
  • FIG. 3 is a schematic flowchart of a slot identification method provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of a dialogue interface of a dialogue system provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of an example of joint intention estimation and slot filling according to an embodiment of this application.
  • FIG. 7 is an exemplary block diagram of a device provided by an embodiment of this application.
  • FIG. 8 is a schematic diagram of a device provided by an embodiment of the application.
  • user commands are entered by the user, which can also be called user requirements.
  • the user command may be one or a combination of voice, image, video, audio and video, text, and the like.
  • the user command is the voice input by the user through the microphone.
  • the user command can also be called "voice command"; for another example, the user command is the text input by the user through the keyboard or virtual keyboard.
  • the user command can also be called Is a "text command"; for another example, the user command is an image input by the user through the camera, and the virtual keyboard is used to input "Who is the character in the image?", at this time, the user command is a combination of image and text; another example, the user The command is a piece of audio and video input by the user through the camera and microphone. At this time, the user command can also be referred to as an "audio and video command.”
  • Speech recognition technology also known as automatic speech recognition (ASR), computer speech recognition (computer speech recognition), or speech to text recognition (speech to text, STT), is a kind of The method of converting to the corresponding text.
  • ASR automatic speech recognition
  • computer speech recognition computer speech recognition
  • speech to text recognition speech to text, STT
  • the user command When the user command is a voice command or a command containing voice, the user command can be converted into text through ASR.
  • ASR the working principle of ASR is: the first step is to split the audio signal input by the user into frames to obtain frame information; the second step is to identify the obtained frame information as states, where several frames of information correspond to a state; The third step is to combine the states into phonemes, where every three states are combined into a phoneme; the fourth step is to combine phonemes into words, and several phonemes form a word. It can be seen that as long as you know which state each frame of information corresponds to, the result of speech recognition will come out. However, how to determine the state corresponding to each frame of information can generally identify which state the frame information corresponds to with the greatest probability, and then which state the frame information belongs to.
  • an acoustic model (AM) and a language model (LM) can be used to determine a set of word sequences corresponding to a speech.
  • the acoustic model can be understood as the modeling of the utterance, which can convert the speech input into the output of the acoustic representation, that is, decode the acoustic characteristics of a speech into units such as phonemes or words, or more accurately, give The probability that a speech belongs to a certain acoustic symbol (such as a phoneme).
  • the language model gives the probability that a set of word sequences is the speech, that is, it decodes the words into a set of word sequences (ie a complete sentence).
  • Natural language understanding is to hope that machines have the language understanding ability of normal people just like humans.
  • an important function is intent identification. For example, if the user command is "How far is the Hilton Hotel from Baiyun Airport?", the user command intent is "query distance”, and the slot where the intent is configured has “starting place”. "And “Destination”, the information of the slot “origin” is “Hilton Hotel”, the information of the slot “destination” is "Baiyun Airport”, with the information of the intent and the slot, the machine can answer .
  • Intent refers to identifying what the user wants to do. Intention recognition can be understood as a problem of semantic expression classification. It can also be said that intent recognition is a classifier (also referred to as an intent classifier in the embodiment of this application) that determines which intent a user command is.
  • the commonly used intent classifier for intent recognition is support vector machine (SVM). Decision trees and deep neural networks (DNN).
  • the deep neural network can be a convolutional neural network (convolutional neural network, CNN) or a recurrent neural network (recurrent neural network, RNN), etc.
  • the RNN can include a long short-term memory (LSTM) network, a stacked ring neural network (stacked recurrent neural network, SRNN), etc.
  • LSTM long short-term memory
  • SRNN stacked recurrent neural network
  • the general process of intent recognition includes, firstly, preprocessing the corpus (ie a set of word sequences), such as removing punctuation from the corpus, removing stop words, etc.; secondly, using word embedding algorithms, such as, The word2vec algorithm generates word embedding from the preprocessed corpus; further, an intent classifier (such as LSTM) is used for feature extraction, intent classification and other tasks.
  • the intention classifier is a trained model, which can identify intentions in one or more scenes, or identify any intentions.
  • the intent classifier can identify the intent in the air ticket booking scenario, including booking air tickets, filtering air tickets, inquiring air ticket prices, inquiring air ticket information, refunding air tickets, changing air tickets, and inquiring the distance to the airport.
  • the intent classifier can identify intents in multiple scenarios.
  • the NLU module needs to further understand the content of the user command. For the sake of simplicity, you can choose the most core part to understand, and the others can be ignored. Those most important parts can be called Slots. .
  • the slot is the definition of the key information in the user expression (such as a set of word sequences in which the user command is recognized).
  • One or more slots can be configured according to the intention of the user's command to obtain the information of the slot, and the machine can respond to the user's command. For example, in the intention of booking a ticket, the slots have "departure time", "origin", and "destination". These three key information need to be recognized when natural language is understood, and the slot can be accurately identified.
  • slot type is a structured knowledge base of specific knowledge, which is used to identify and transform the slots for users' spoken expressions.
  • intent+slot can be seen as a function to describe the needs of users, where "intent corresponds to a function”, “slot corresponds to a function parameter”, and "slot_type corresponds to a parameter type”.
  • Slots configured with different intentions can be divided into necessary slots and optional slots.
  • the necessary slots are the slots that must be filled to execute user commands, and the optional slots are the fill or optional slots that can be selected to execute user commands. Slots that are not filled, unless described, the slots in this application may be necessary slots, optional slots, or necessary slots.
  • Slot filling is to extract structured fields in user commands, or it can be said to read some semantic components in sentences (user commands in the embodiment of this application). Therefore, slot filling can be regarded as a sequence labeling problem. Sequence tagging problems include word segmentation, part-of-speech tagging, named entity recognition (NER), keyword extraction, word meaning role tagging, etc. in natural language processing. When doing sequence labeling, given a specific label set, sequence labeling can be performed. Methods to solve the problem of sequence labeling include Maximum Entropy Markov Model (MEMM), Conditional Random Field (CRF) and Recurrent Neural Network (RNN).
  • MEMM Maximum Entropy Markov Model
  • CRF Conditional Random Field
  • RNN Recurrent Neural Network
  • Sequence labeling is to label each character in a given text, which is essentially a problem of categorizing each element in a linear sequence according to the context content. That is, for a one-dimensional linear input sequence, each element in the linear input sequence is labeled with a certain label in the label set.
  • the slot extraction classifier can be used to implement the text labeling of the user command.
  • the linear sequence is the text of the user command (the text input by the user or the input A Chinese character can often be regarded as an element of a linear sequence.
  • the label set represents different meanings. Sequence labeling is to put a suitable label on the Chinese character according to the context of the Chinese character. That is to determine its slot.
  • the filling information of the slot is missing in the user command
  • the user command is "How far is this hotel from Hongqiao Airport?”
  • the machine needs to know which hotel "this hotel” refers to in response to the user command.
  • the machine may ask the user "Which hotel do you want to check the distance from Hongqiao Airport?” to obtain the information of the slot. It can be seen that the machine needs to interact with the user multiple times to obtain the information of the slot that is missing in the user's command.
  • FIG. 1 is a schematic diagram of a possible human-machine dialogue system architecture to which an embodiment of this application is applicable.
  • the human-machine dialogue system architecture may include a server 101 and one or more user-side devices (such as the user-side device 1021 and the user-side device 1022 shown in FIG. 1).
  • the dialogue system architecture may also include one or more customer service side devices (for example, the customer service side device 1023 and the customer service side device 1024 shown in FIG. 1).
  • the user-side device or the customer service-side device may be a terminal device, such as a mobile phone, a tablet computer, or a desktop computer, which is not specifically limited.
  • the client device is divided into a user-side device (client device for user operation) and a customer service device (client device for manual customer service operation).
  • the user-side device may also be referred to as a user client or other names
  • the customer service-side device may also be referred to as a manual agent client or other names, which is not specifically limited.
  • the user-side device may be used to obtain the information input by the user and send the information input by the user to the server. For example, if the user enters text information in a dialog box, the user-side device can obtain the text information and send it to the server; if the user enters voice information in the dialog box, the user-side device can use voice recognition technology to convert the voice It is text information, and then the text information is sent to the server.
  • the user side device may also communicate with the customer service side device. For example, the user side device sends the user's input information to the customer service side device and receives the information returned by the customer service side device, thereby realizing manual customer service to provide services to the user.
  • the server is used to process various calculations required by the human-machine dialogue system, such as question-and-answer matching, that is, searching a preset database according to the user's request information to obtain response information corresponding to the request information.
  • the preset database may include a question database and a response database corresponding to the question database.
  • the question database includes multiple preset request information
  • the response database includes multiple preset request information corresponding response information.
  • the server can send the user's request information Compare with multiple preset request information, and then feed back the response information corresponding to the preset request information with the greatest similarity to the user's request information in the Q&A database to the user-side device, and then the user-side device presents it to the user-side device There may be multiple ways of presentation for the user, which are not specifically limited.
  • virtual personal assistants or smart customer service have task-type multi-round conversations with users, they can return response information that meets the user's expectations based on the request information entered by the user.
  • the task of intent recognition and the task of slot filling are performed independently, because of different intentions.
  • Corresponding slots may be different, and there may be a problem that the intent in the recognition result is not aligned with the slot, which may cause the virtual personal assistant or smart customer service to misunderstand the user's semantics, and the searched response information does not meet the user's expectations Return to the user.
  • the voice content entered by the user is "buy a ticket to Shanghai tomorrow morning”
  • the intent recognition result of the intent recognition task in the human-machine dialogue system is determined to be “book a ticket”
  • the slot recognition of the slot filling task may be In order to mark "Shanghai” as the "navigation destination", it can be seen that since the intent recognition task and the slot filling task are performed independently, it is very likely that the intelligent customer service thinks that the slot of the "navigation starting point" is to be filled, so it will inform the user Enter an error response message such as the guidance message "Is the navigation point of departure is the current location?"
  • the traditional slot filling process fails to consider the result of intent recognition, which results in misalignment of the intent and the slot, which affects the result of semantic analysis.
  • slot filling is related to intent.
  • the same location entity such as Chenghuang Temple
  • the embodiment of the application provides a slot recognition method, which improves the training model in the existing human-machine dialogue system.
  • the improved joint training model uses the intent recognition result as the slot filling
  • the intent recognition result is associated with the slot filling, so as to use the intent recognition result to correct the slot filling.
  • Using the joint training model trained by this method to perform semantic analysis on user input information can improve the accuracy of semantic analysis results.
  • the joint training model includes a Bert coding layer, a dense layer, a masked attention layer, and a softmax (logistic regression model) layer.
  • the logistic regression model of the joint training model can output the intent and slot information corresponding to the dialogue material.
  • from the dense layer and the mask attention layer The connection relationship of the Masked Attention Layer can be seen.
  • the intention is an input of the masked attention layer.
  • the dense layer and the masked attention layer realize the association between the intent recognition result and the slot filling.
  • the objective loss function for the joint optimization of intent and slot is added by the intent classification loss function, the slot filling loss function, and the regularization term of the weight, where the intent classification loss
  • the function can use two types of Cross Entropy Loss functions, and the slot filling loss function can use multiple types of Cross Entropy Loss functions.
  • the training termination condition that triggers the joint training model can be: when the number of epochs (periods) of training reaches the set threshold or the interval between the batch (batch) of the last optimal model is greater than the set threshold, the training is terminated and the final joint training is generated Model.
  • the slot identification method provided in the embodiment of the present application can be applied to a variety of possible human-machine dialogue systems, and is especially suitable for task-type multi-round dialogue systems.
  • the following takes a task-type multi-round dialogue system as an example to describe the slot identification method.
  • FIG. 3 is a schematic flowchart of a slot identification method provided by an embodiment of the application, including:
  • Step 301 After obtaining the user command, the server preprocesses the user command to obtain the original word sequence.
  • the user command may be one or a combination of voice, image, video, audio and video, text, etc.
  • the user command is the voice input by the user through the microphone.
  • the user command can also be called “voice command”; for another example, the user command is the text input by the user through the keyboard or virtual keyboard.
  • the user command can also be called As "text commands”.
  • the server can first mark the user command, generate a Token sequence, then randomly sort the Token sequence, and divide the Token sequence into multiple batch Token sequences according to batch_size (batch size), and finally The batch Token sequence is truncated or filled to obtain the preprocessed original word sequence.
  • the server may also create a mask of the same dimension for the Token sequence after truncation or filling operation, where the value of the element in the mask sequence is 0 corresponding to the position of the ⁇ pad> element in the Token sequence, otherwise it is 1.
  • the mobile phone user can run the voice assistant software program in the mobile phone.
  • the voice assistant software program converts the voice into English text and sends it to the server corresponding to the program.
  • the server serializes the English text.
  • WordPiece word fragment
  • Token sequence from the text.
  • the server randomly sorts the serialized corpus, and divides the corpus into multiple batches according to the batch_size, and then truncates or fills the token sequence of each batch.
  • step 302 the joint training model in the server first performs BERT encoding processing on the original word sequence to obtain the intention vector and the hidden state vector corresponding to the T segmentation words.
  • BERT bidirectional encoder representations from Transformers
  • BERT is a deep bidirectional pre-training language understanding model used as a feature extractor.
  • the server performs BERT semantic encoding on the original word sequence input to the joint training model, it generates a hidden state vector sequence h 0 , h 1 ,..., h T , where h 0 is the sentence vector code corresponding to the user command Information (that is, the coding vector corresponding to the position of ⁇ CLS>), h 1 ,..., h T are the hidden state vectors corresponding to the T segmentation words (that is, the coding vector corresponding to the remaining positions).
  • the BERT coding layer inputs the sentence vector h 0 to the Logistic Regression Model (Softmax) layer to generate an intention vector, namely y I ⁇ R 1 ⁇ I , I represents the number of intents corresponding to the user command, the intent corresponding to the maximum probability value in y I is the intent corresponding to the user command, h 0 is the sentence vector encoding information of the user command, Is the bias term, Is the weight matrix.
  • Softmax Logistic Regression Model
  • the output intention is "play music", as shown in Figure 2.
  • the hidden state vector h 1 corresponding to "play” the hidden state vector h 2 corresponding to "red”
  • the hidden state vector h 3 corresponding to "breast” are input to the fully connected layer.
  • step 303 the joint training model in the server performs the following processing for the first word segmentation in the original word sequence, and the first word segmentation is any one of the T word segments:
  • the hidden state vector and intent vector of the first participle determine the attention vector of the first participle; then concatenate the hidden state vector of the first participle and the attention vector of the first participle to determine the slot probability vector of the first participle ; According to the K probability values in the slot probability vector of the first word segmentation, the slot corresponding to the first word segmentation is determined.
  • the way to concatenate the hidden state vector of the first participle and the attention vector of the first participle can be:
  • the first word of the hidden state vector h i and the first word of attention vector Perform splicing to generate deep vector coding information, deep vector coding information satisfy Among them, concat is the splicing operation function, Represents the deep vector coding information after splicing. Further, the server encodes the information in the deep vector Input to the softmax layer for softmax conversion to obtain the slot probability vector of the first word segmentation, and the slot probability vector of the first word segmentation satisfy Among them, softmax represents the weight matrix, Represents the weight matrix, Represents the deep vector coding information, the Represents the bias term.
  • c3 in the fully connected layer is the intent vector y I , the intent vector y I and the hidden state vector h 2 are used as the input of the mask attention layer, thus Generate the attention vector corresponding to the third participle "breast”.
  • the attention vector of the first participle can be determined as follows: according to the first participle The hidden state vector, the intention vector, and the slot probability vector corresponding to the previous participle of the first participle are determined, the attention vector of the first participle is determined, and the slot probability vector corresponding to "play" is finally obtained
  • c1 in the fully connected layer is the intent vector y I , and the intent vector y I and the hidden state vector h 1 are used as masks.
  • the mask attention layer can calculate the attention vector as follows:
  • Step d calculate the attention vector information at the current moment, in
  • the server when it receives a user command, it first performs preprocessing according to step 301 to generate the original word sequence, and then uses the original word sequence as the input of the joint training model. After model prediction and inference, the intent is obtained.
  • the intention vector y I the intention corresponding to the maximum probability can be selected as the predicted intention;
  • the i-th slot vector y i S the slot corresponding to the maximum probability can be selected as the i-th predicted slot Bit.
  • the intention is used as the input of the slot filling task to correct the slot prediction result, thereby improving the accuracy of the dialogue system in understanding the user's request information and improving the user experience.
  • the slot identification method provided in this application can be specifically applied to the system architecture shown in FIG. 4, where the NLU module integrates the joint training model.
  • the specific application process of the above method will be explained with examples in conjunction with the system architecture. The specific steps are as follows.
  • Step 401 The user opens the voice assistant software program of the mobile phone, and sends out the voice message "Help me book a flight from Shenzhen to Shanghai" in the dialog box.
  • Step 402 The ASR module in the mobile phone voice assistant software program converts the voice information into text information, as shown in FIG. 5, and sends the converted text information to DM (Dialog management) in the voice assistant software program. Module.
  • Step 403 The DM module in the voice assistant software program obtains context information corresponding to the voice information from the dialog box (historical dialogue information and current dialogue information in Figure 5), and status information, etc., and the DM module uses the voice Information, and other related information are sent to the NLU (Natural Language Understanding) module.
  • NLU Natural Language Understanding
  • step 404 the NLU module recognizes the intent and slot in the text "Book a ticket from Shenzhen to Shanghai" in the text according to the method provided in the embodiment of the application.
  • the hidden state vector h 6 and intent vector y I As input, generate the slot probability vector corresponding to the sixth participle "Shenzhen” in,
  • the slot corresponding to the highest probability value in is the "destination”, so it is marked as “destination” or "ToLoc”. Therefore, the intent of the joint training model output corresponding to "book a ticket from Shenzhen to Shanghai” is "book a ticket”, the slot corresponding to "Shenzhen” is the “departure place”, and the slot corresponding to "Shanghai” is the "purpose”land".
  • step 405 the NLU module returns the intent and slot identification result to the DM module.
  • step 406 the DM module inputs the intent and slot recognition result to the NLG (Natural Language Generation) module.
  • NLG Natural Language Generation
  • the DM module is divided into two sub-modules, namely dialogue state tracking (DST) and dialogue strategy learning (DPL). Its main function is to update the state of the dialogue system according to the recognition results of the NLU module and generate corresponding system actions. For example, inquire about air tickets.
  • DST dialogue state tracking
  • DPL dialogue strategy learning
  • step 407 and step 408 the NLG module textualizes the system actions output by the DM, expresses the system actions in the form of text, and sends other related information to the DM module.
  • step 409 the DM module sends the execution result of the system action to the TTS (speech synthesis) module.
  • the TTS module converts the text into speech, and outputs the speech to the user. For example, output the voice content corresponding to the queried flight information.
  • the server can further send guidance information to the user-side device, For example, "Which day do you want to book a ticket", as shown in Figure 5, the guidance information is used to guide the user to provide relevant information for the requested information.
  • the user is guided to provide the related information of the requested information through the guidance information, so that the server can query the response information that meets the user's expectations in the preset database based on the related information of the requested information, thereby avoiding unnecessary transfer of manual labor. , Improve user satisfaction.
  • the guidance information may also include third historical dialogue information, and the similarity between the historical request information in the third historical dialogue information and the request information is greater than the sixth threshold, that is, the guidance information It is also possible to add historical request information similar to the user's request information, thereby reminding the user that similar questions have been asked.
  • the guidance information can also include other possible content.
  • Those skilled in the art can set the content included in the guidance information according to actual experience and needs, but in order to actively seek to meet user expectations.
  • the information sent to the user for guiding, prompting, comforting, etc., is within the protection scope of the present invention.
  • the intention recognition and the slot filling are carried out jointly, and the intention is taken into account during the slot filling process. Therefore, the granularity of the slot filling will be finer and more accurate, so that only one joint model result needs to be trained. Two tasks can be accomplished very well.
  • step number is only an example of the execution process of the embodiment of the present application, and there is no strict execution sequence among the steps that do not have a time sequence dependency between each of the above steps.
  • step 401 to step 410 is not necessarily performed. In specific implementation, some of the steps can be selectively performed according to actual needs.
  • the foregoing mainly introduces the solution provided by this application from the perspective of interaction between various devices.
  • the above-mentioned implementing devices include hardware structures and/or software modules corresponding to the respective functions.
  • the present invention can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present invention.
  • FIG. 7 shows a possible exemplary block diagram of a device involved in an embodiment of the present application, and the device 700 may exist in the form of software.
  • the apparatus 700 may include: a processing unit 702 and a communication unit 703.
  • the processing unit 702 is used to control and manage the actions of the device 700.
  • the communication unit 703 is used to support communication between the apparatus 700 and other devices (such as user-side equipment or customer service-side equipment).
  • the device 700 may further include a storage unit 701 for storing program codes and data of the device 700.
  • the processing unit 702 may be a processor or a controller, for example, a general-purpose central processing unit (central processing unit, CPU), a general-purpose processor, a digital signal processing (digital signal processing, DSP), and an application specific integrated circuit (application specific integrated circuit). circuits, ASIC), field programmable gate array (FPGA) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It can implement or execute various exemplary logical blocks, modules, and circuits described in conjunction with the disclosure of the present invention.
  • the processor may also be a combination for realizing computing functions, for example, including a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and so on.
  • the communication unit 703 may be a communication interface, a transceiver, or a transceiver circuit, etc., where the communication interface is a general term. In a specific implementation, the communication interface may include multiple interfaces.
  • the storage unit 701 may be a memory.
  • the apparatus 700 may be the server in the foregoing embodiment, or may also be a semiconductor chip provided in the server.
  • the processing unit 702 can support the apparatus 700 to perform the actions of the server in the above-mentioned method examples, and the communication unit 703 can support the communication between the apparatus 700 and the user side equipment or the customer service side equipment; for example, the processing unit 702 is used to support the apparatus 700 to execute the diagram. From step 301 to step 303 in 3, the communication unit 703 is used to support the device 700 to execute step 405 in FIG. 4.
  • the processing unit 702 is configured to preprocess the user command to obtain the original word sequence after obtaining the user command.
  • the joint training model in the server first performs BERT encoding processing on the original word sequence to obtain the intention The vector and the hidden state vector corresponding to the T segmentation words respectively.
  • the joint training model in the server aims at the first participle in the original word sequence, and the first participle is any one of the T participles, and performs the following processing: according to the hidden state vector and intent vector of the first participle, the attention of the first participle is determined Force vector; then concatenate the hidden state vector of the first participle and the attention vector of the first participle to determine the slot probability vector of the first participle; from the K probability values in the slot probability vector of the first participle, Select the slot corresponding to the maximum probability value as the slot corresponding to the first word segmentation.
  • the processing unit 702 determines according to the hidden state vector of the first participle and the intention
  • the attention of the first word segmentation may be determined according to the hidden state vector of the first word segmentation, the intention, and the slot probability vector corresponding to the previous participle of the first word segmentation. Force vector.
  • the apparatus may be the above-mentioned server, or may also be a chip set in the server.
  • the device 800 includes a processor 802, a communication interface 803, and a memory 801.
  • the device 800 may further include a communication line 804.
  • the communication interface 803, the processor 802, and the memory 801 may be connected to each other through a communication line 804;
  • the communication line 804 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (extended industry standard architecture). , Referred to as EISA) bus and so on.
  • the communication line 804 can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one thick line is used in FIG. 8, but it does not mean that there is only one bus or one type of bus.
  • the processor 802 may be a CPU, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the present application.
  • the communication interface 803 uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area networks (WLAN), Wired access network, etc.
  • RAN radio access network
  • WLAN wireless local area networks
  • Wired access network etc.
  • the memory 801 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions
  • the dynamic storage device can also be electrically erasable programmable read-only memory (electrically programmable read-only memory, EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, Optical disc storage (including compressed optical discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can Any other medium accessed by the computer, but not limited to this.
  • the memory may exist independently, and is connected to the processor through a communication line 804. The memory can also be integrated with the processor.
  • the memory 801 is used to store computer-executed instructions for executing the solutions of the present application, and the processor 802 controls the execution.
  • the processor 802 is configured to execute computer-executable instructions stored in the memory 801, so as to implement the method provided in the foregoing embodiment of the present application.
  • the computer-executable instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • the embodiments of the present application also provide a computer storage medium that stores computer instructions, and when the computer instructions run on an electronic device, the electronic device executes the above-mentioned related method steps to implement the method in the above-mentioned embodiment.
  • the embodiments of the present application also provide a computer program product, which when the computer program product runs on a computer, causes the computer to execute the above-mentioned related steps, so as to implement the method in the above-mentioned embodiment.
  • the embodiments of the present application also provide a device.
  • the device may specifically be a chip, component or module.
  • the device may include a processor and a memory connected to each other.
  • the memory is used to store computer execution instructions.
  • the processor can execute the computer-executable instructions stored in the memory, so that the chip executes the methods in the foregoing method embodiments.
  • the disclosed device and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of modules or units is only a logical function division.
  • there may be other division methods for example, multiple units or components may be combined or It can be integrated into another device, or some features can be discarded or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate parts may or may not be physically separate, and the parts displayed as units may be one physical unit or multiple physical units, that is, they may be located in one place, or they may be distributed to multiple different places. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium.
  • the technical solutions of the embodiments of the present application are essentially or the part that contributes to the prior art, or all or part of the technical solutions can be embodied in the form of a software product, and the software product is stored in a storage medium. It includes several instructions to make a device (which may be a single-chip microcomputer, a chip, etc.) or a processor (processor) execute all or part of the steps of the methods of the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read only memory (read only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program codes.

Abstract

一种槽位识别方法及电子设备,该方法属于语音识别技术领域,该方法可以由人机对话系统或者人机对话装置执行。该方法包括:将用户命令进行预处理得到原始词序列,并通过对原始词序列进行BERT编码处理,得到意图向量、以及每个分词的隐藏状态向量,针对任意一个分词,执行如下处理:根据该分词的隐藏状态向量和意图向量确定注意力向量,将该分词的隐藏状态向量和该分词的注意力向量进行拼接,确定该分词的槽位概率向量;最终根据该分词的槽位概率向量中的K个概率值,确定该分词的对应的槽位。可见,该方法中将意图作为槽位填充任务的输入,从而有助于提高对话系统理解用户的请求信息的准确度,提升用户使用体验。

Description

一种槽位识别方法及电子设备
相关申请的交叉引用
本申请要求在2020年03月23日提交中国专利局、申请号为202010210034.9、申请名称为“一种槽位识别方法及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端技术领域,尤其涉及一种槽位识别方法及电子设备。
背景技术
随着互联网的飞速发展,人机对话系统的应用越来越广泛。以任务导向型人机对话系统为例,用户的输入可以是询问天气、订购机票、医疗问诊等问话信息,人机对话系统可根据用户的问话信息,向用户反馈应答对话信息。例如,用户输入的问话信息可以是“北京明天的天气怎么样”,人机对话系统可以通过搜索预设数据库并反馈给用户的应答对话信息为“北京明天的天气为晴转多云”。由此可知,对于人机对话系统来说,准确识别用户输入的问话信息显得至关重要。在语音识别的过程中,意图识别及槽位填充技术是保证语音识别结果准确率的关键。
对于意图识别来说,可以将其抽象为一个分类问题,然后使用卷积和知识表示的分类器训练意图识别模型,在意图识别模型中除了将用户的语音问题进行单词嵌入外,还引入了知识的语义表示来增加表示层的泛化能力,但在实际应用中发现该模型存在槽位填充偏差的缺陷,影响了意图识别模型的准确性。对于槽位填充来说,其本质是将句子序列形式化为标注序列,常用的标注序列的方法有很多,如隐马尔科夫模型或者条件随机场模型,但是这些槽位填充模型在特定的应用场景中,由于缺乏上下文信息会导致槽位在不同语义意图下存在歧义,进而无法满足实际应用需求。可见,现有技术中的两个模型的训练是独立进行的,没有针对意图识别任务和槽位填充任务进行结合优化,最终导致训练出的模型在语音识别方面存在识别准确率较低的问题,降低了用户体验。
发明内容
本申请提供一种槽位识别方法及电子设备,用于将意图识别和槽位填充进行联合优化训练,利用联合训练模型对语音对话进行识别,以提高语音识别的准确率。
第一方面,本申请实施例提供一种槽位识别方法,该方法可以由人机对话系统或者人机对话装置执行。该方法包括:将用户命令进行预处理得到原始词序列,并通过对原始词序列进行BERT编码处理,得到意图向量、以及每个分词的隐藏状态向量,针对任意一个分词,即第一分词,执行如下处理:根据该第一分词的隐藏状态向量和意图向量,确定该第一分词的注意力向量。然后,将该第一分词的隐藏状态向量和该第一分词的注意力向量进行拼接,确定该第一分词的槽位概率向量;根据该第一分词的槽位概率向量中的K个概率值,确定该第一分词对应的槽位。
可见,本申请实施例中将意图作为槽位填充任务的输入,以纠正槽位预测结果,从而有助于提高对话系统理解用户的请求信息的准确度,提升用户使用体验。
在一种可能的设计中,针对原始词序列除了首个分词之外的T-1个分词中的任意一个分词,还可以进一步结合前一个分词对应的槽位概率向量确定该分词的注意力向量,也就是说可以根据该分词的隐藏状态向量、所述意图向量,以及该分词的前一个分词对应的槽位概率向量,确定该分词的注意力向量。
本申请实施例中,将意图和前一个分词的槽位作为后一个分词的槽位填充任务的输入,有助于进一步地纠正槽位预测结果,从而有助于进一步地提高对话系统理解用户的请求信息的准确度。
在一种可能的设计中,将用户命令进行预处理得到原始词序列的方式包括:先根据用户命令生成Token序列,然后将Token序列进行随机排序,并按照batch_size将Token序列划分为多个批batch的Token序列;最终对每个Batch的Token序列进行截断或填充操作,得到预处理后的原始词序列。该方法中通过上述方式对用户命令进行预处理,有助于过滤掉无效信息。
在一种可能的设计中,生成隐藏状态向量的具体方式可以是:先对原始词序列进行BERT语义编码后,生成向量序列h 0,h 1,……,h T,其中h 0为用户命令的句向量编码信息,h 1,……,h T为T个分词分别对应的隐藏状态向量;然后,根据用户命令的句向量编码信息h 0,生成用户命令的意图向量,其中,所述意图向量满足
Figure PCTCN2021078762-appb-000001
其中,y I∈R 1×I,I表示所述用户命令的可能的意图的数量,y I中最大概率值对应的意图为所述用户命令的意图,h 0为所述用户命令的句向量编码信息,
Figure PCTCN2021078762-appb-000002
为偏置项,
Figure PCTCN2021078762-appb-000003
为权重矩阵。
在一种可能的实施例中,槽位概率向量的具体计算方式可以包括:先将第一分词的隐藏状态向量h i和第一分词的注意力向量
Figure PCTCN2021078762-appb-000004
进行拼接,生成深层向量编码信息,深层向量编码信息
Figure PCTCN2021078762-appb-000005
满足
Figure PCTCN2021078762-appb-000006
其中,concat为拼接操作函数,
Figure PCTCN2021078762-appb-000007
表示拼接后的深层向量编码信息。然后,将所述深层向量编码信息
Figure PCTCN2021078762-appb-000008
进行逻辑回归模型softmax转换,从而得到所述第一分词的槽位概率向量,所述第一分词的槽位概率向量
Figure PCTCN2021078762-appb-000009
满足
Figure PCTCN2021078762-appb-000010
其中,softmax表示归一化指数函数,
Figure PCTCN2021078762-appb-000011
表示权重矩阵,
Figure PCTCN2021078762-appb-000012
表示所述深层向量编码信息,所述
Figure PCTCN2021078762-appb-000013
表示偏置项。
第二方面,本申请实施例提供一种电子设备,包括处理器和存储器,其中,存储器用于存储一个或多个计算机程序;当存储器存储的一个或多个计算机程序被处理器执行时,使得该电子设备能够实现上述任一方面的任意一种可能的设计的方法。
第三方面,本申请实施例还提供一种装置,该装置包括执行上述任一方面的任意一种可能的设计的方法的模块/单元。这些模块/单元可以通过硬件实现,也可以通过硬件执行相应的软件实现。
第四方面,本申请实施例中还提供一种计算机可读存储介质,所述计算机可读存储介质包括计算机程序,当计算机程序在电子设备上运行时,使得所述电子设备执行上述任一方面的任意一种可能的设计的方法。
第五方面,本申请实施例还提供一种包含计算机程序产品,当所述计算机程序产品在终端上运行时,使得所述电子设备执行上述任一方面的任意一种可能的设计的方法。
第六方面,本申请实施例还提供一种芯片,所述芯片与存储器耦合,用于执行所述存储器中存储的计算机程序,以执行上述任一方面的任意一种可能的设计的方法。
附图说明
图1为本申请实施例适用的一种可能的对话系统架构示意图;
图2为本申请实施例提供的一种联合训练模型示意图;
图3为本申请实施例提供的一种槽位识别方法流程示意图;
图4为本申请实施例提供的一种系统模块交互示意图;
图5为本申请实施例提供的一种对话系统的对话界面示意图;
图6为本申请实施例提供的一种联合意图推测和槽位填充示例示意图;
图7为本申请实施例提供的一种装置的示例性框图;
图8为本申请实施例提供的一种装置示意图。
具体实施方式
首先,对本申请中涉及的部分用语进行解释说明,以便使本领域技术人员理解。
(1)、用户命令
在人机对话领域,用户命令为用户输入的,也可以称为用户需求。本申请实施例中用户命令可以是语音、图像、视频、音视频、文本等中的一种或多种的组合。例如,用户命令是用户通过麦克风输入的语音,此时,用户命令也可以称为“语音命令”;又例如,用户命令是用户通过键盘或虚拟键盘输入的文本,此时,用户命令也可以称为“文本命令”;又例如,用户命令是用户通过摄像头输入的图像,并通过虚拟键盘输入“图像中的人物是谁?”,此时,用户命令为图像与文本的组合;又例如,用户命令为用户通过摄像头和麦克风输入的一段音视频,此时,用户命令也可以称为“音视频命令”。
(2)语音识别(speech recognition)
语音识别技术,也被称为自动语音识别(automatic speech recognition,ASR)、计算机语音识别(computer speech recognition)、或语音转文本识别(speech to text,STT),是一种通过计算机将人类的语音转换为相应的文本的方法。
在用户命令为语音命令或包含语音的命令时,可以通过ASR将用户命令转换为文本。通常,ASR的工作原理是:第一步,将用户输入的音频信号按帧进行拆分,得到帧信息;第二步,将得到的帧信息识别成状态,其中,若干帧信息对应一个状态;第三步,将状态组合成音素,其中,每三个状态组合成一个音素;第四步,将音素组合成单词,若干个音素组成一个单词。可以看出,只要知道每帧信息对应哪个状态了,语音识别的结果也就出来了。而如何确定每帧信息对应的状态,通常可以识别帧信息对应哪个状态的概率最大,则该帧信息就属于哪个状态。
在语音识别的过程中,可以采用声学模型(acoustic model,AM)和语言模型(language model,LM)来确定一条语音对应的一组字序列。其中,声学模型可以理解为是对发声的建模,它能够把语音输入转换成声学表示的输出,即把一条语音的声学特征解码为音素或字词这样的单元,更准确的说是给出语音属于某个声学符号(如音素)的概率。语言模型则给出一组字序列为这条语音的概率,即把字词解码成一组字序列(即一个完整的句子)。
(3)自然语言理解(natural language generation,NLU)
自然语言理解就是希望机器像人一样,具备正常人的语言理解能力。其中,一个重要的功能就是意图识别,例如,用户命令为“希尔顿酒店距离白云机场有多远?”,则用户 命令的意图是“查询距离”,该意图被配置的槽位有“起始地”和“目的地”,槽位“起始地”的信息为“希尔顿酒店”,槽位“目的地”的信息为“白云机场”,有了意图、槽位的信息,机器就可以应答了。
(4)意图(Intent)和意图识别
意图是指识别用户命令具体是想做什么。意图识别可以理解为语义表达分类的问题,也可以说,意图识别是一个分类器(本申请实施例中也称意图分类器),确定用户命令是哪个意图。常用的用于意图识别的意图分类器为支持向量机(SVM)。决策树和深度神经网络(DNN)。其中,深度神经网络可以是卷积神经网络(convolutional neural network,CNN)或循环神经网络(recurrent neural network,RNN)等,RNN可以包括长短期记忆(longshort-termmemory,LSTM)网络、堆叠环神经网络(stacked recurrent neural network,SRNN)等。
意图识别的大致的流程包括,首先,对语料(即一组字序列)进行预处理,如去除语料的标点符号,去除停用词等等;其次,利用词嵌入(word embedding)算法,如,word2vec算法将预处理后的语料生成词向量(word embedding);进而,利用意图分类器(如,LSTM)来进行特征提取、意图分类等工作。本申请实施例中,意图分类器为训练后的模型,可以识别一个或多个场景下的意图,或识别任意意图。例如,意图分类器可以识别机票预订场景下的意图,包括,订机票、筛选机票、查询机票价格、查询机票信息、退机票、改签机票、查询到机场距离等。又例如,意图分类器可以识别多个场景下的意图。
(5)槽位(slot)
当用户意图被确定之后,NLU模块就需要进一步理解用户命令中的内容,为简便起见,可以选择最核心的部分进行理解,其他可以忽略,那些最重要的部分可以称之为槽位(Slot)。也就是说,槽位是对用户表达(如用户命令被识别出的一组字序列)中关键信息的定义。针对用户命令的意图可以配置一个或多个槽位,以获取到该槽位的信息,机器就可以响应该用户命令。例如在订机票的意图中,槽位有“起飞时间”、“起始地”、“目的地”,这三个关键信息需要在自然语言理解的时候被识别出来,而能够准确识别槽位,需要用到槽位类型(Slot-Type)。依然举上面的例子,如果想精确的识别出“起飞时间”、“起始地”、“目的地”这三个槽位,就需要有背后对应的槽位类型,分别是“时间”,“城市名称”。可以说,槽位类型就是特定知识的结构化知识库,用以识别和转化用户口语化表达的槽位。从编程语言的角度来理解,intent+slot可以看成是用一个函数来描述用户的需求,其中"intent对应函数"、"slot对应的是函数的参数","slot_type对应参数的类型"。不同的意图被配置的槽位中可以分为必要槽位和可选槽位,其中,必要槽位是执行用户命令必须的填充的槽位,可选槽位是执行用户命令可选择的填充或不填充的槽位,在不进行说明的情况下,本申请中槽位可以是必要槽位或可选槽位,也可以为必要槽位。
上述“订机票”这个例子中定义了三个核心槽位,分别是“起飞时间”,“起始地”和“目的地”。如果要全面考虑用户订机票需要输入的内容,则能想到更多槽位,比如旅客人数、航空公司、起飞机场、降落机场等,对于槽位的设计者来说,可以基于意图的粒度来设计槽位。
(6)槽位填充(slot filling)
槽位填充就是提取用户命令中结构化字段,也可以说是读取句子(本申请实施例中指用户命令)中一些语义成分,因此,槽位填充可以看成一个序列标注问题。序列标注问题包括 自然语言处理中的分词、词性标注、命名实体识别(named entity recognition,NER)、关键词抽取、词义角色标注等等。在做序列标注时给定特定的标签集合,就可以进行序列标注。解决序列标注问题的方法包括最大熵马尔可夫模型(MEMM),条件随机场(CRF)以及循环神经网络(RNN)等。
序列标注就是对给定文本中每一个字符打上标签,其本质上是对线性序列中每个元素根据上下文内容进行分类的问题。即,对于一个一维的线性输入序列,给该线性输入序列中的每个元素打上标记集合中的某个标签。在本申请实施例中,可以通过槽位提取分类器来实现对用户命令的文本标注槽位,在本申请实施例中涉及的NLU中,线性序列就是用户命令的文本(用户输入的文本或者输入的语音被识别到的文本),往往可以把一个汉字看作线性序列的一个元素,针对不同任务,其标签集合代表的含义不同,序列标注就是将根据汉字的上下文给汉字打上一个合适的标签,即确定其槽位。
示例性地,当在用户命令中缺失槽位的填充信息,比如用户命令为“这个酒店距离虹桥机场有多远?”机器响应该用户命令需要知道“这个酒店”是指哪个酒店,现有技术中,机器可能会向用户发问“您要查询哪家酒店与虹桥机场的距离?”以获取该槽位的信息。可见,机器需要与用户进行多次交互,来获取用户命令中缺失的槽位的信息。
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述。
图1为本申请实施例适用的一种可能的人机对话系统架构示意图。如图1所示,该人机对话系统架构中可以包括服务器101、一个或一个以上用户侧设备(比如图1中所示意出的用户侧设备1021、用户侧设备1022)。可选地,该对话系统架构中还可以包括一个或一个以上客服侧设备(比如图1中所示意出的客服侧设备1023、客服侧设备1024)。
其中,用户侧设备或客服侧设备可以为终端设备,比如手机、平板电脑或台式电脑等,具体不做限定。本申请实施例中基于客户端设备的操作界面的不同,将客户端设备划分为用户侧设备(供用户操作的客户端设备)和客服侧设备(供人工客服操作的客户端设备)。在其它实施例中,用户侧设备也可以称为用户客户端或其它名称,客服侧设备也可以称为人工坐席客户端或其它名称,具体不做限定。
用户侧设备可以用于获取用户输入的信息,并将用户输入的信息发送给服务器。比如,若用户在对话框中输入文本信息,则用户侧设备可以获取到该文本信息并发送给服务器;若用户在对话框中输入语音信息,则用户侧设备可以通过语音识别技术,将语音转换为文本信息,进而将文本信息发送给服务器。可选地,用户侧设备还可以与客服侧设备进行通信,比如用户侧设备将用户的输入信息发送给客服侧设备,并接收客服侧设备返回的信息,从而实现人工客服为用户提供服务。
服务器用于处理人机对话系统所需的各种计算,比如问答匹配,即根据用户的请求信息搜索预设数据库以获得请求信息对应的应答信息。其中,预设数据库可以包括问题库和问题库对应的应答库,问题库中包括多个预设请求信息,应答库中包括多个预设请求信息对应的应答信息,服务器可以将用户的请求信息与多个预设请求信息进行比较,进而将与用户的请求信息之间的相似度最大的预设请求信息在问答库中所对应的应答信息反馈给用户侧设备,进而由用户侧设备呈现给用户,呈现的方式可能有多种,具体不做限定。
以任务型多轮对话系统为例,其应用场景可以是虚拟个人助理、智能客服等。目前,通常情况下,虚拟个人助理或智能客服在与用户进行任务型多轮对话时,可以基于用户输 入的请求信息,向用户返回符合用户期望的应答信息。但由于语言表达具有较强的灵活性,目前虚拟个人助理或智能客服对用户所输入的语句的语义进行理解的过程中,意图识别任务和槽位填充任务是分别独立进行的,因不同的意图对应的槽位可能不同,可能会存在识别结果中意图与槽位不对齐的问题,从而可能会出现虚拟个人助理或智能客服误解用户的语义,而将搜素到的不符合用户期望的应答信息返回给用户。比如,用户输入的语音内容为“买一张明早飞往上海的机票”,人机对话系统中的意图识别任务的意图识别结果确定为“订机票”,而槽位填充任务的槽位识别可能为将“上海”标记为“导航目的地”,可见由于意图识别任务和槽位填充任务是分别独立进行的,很可能会出现智能客服认为“导航出发地”的槽位待填充,因此向用户输入引导信息“请问导航出发地是当前所在地吗”之类的错误应答信息。也就是说,传统的槽位填充过程中未能考虑到意图识别结果,导致意图和槽位不对齐,影响语义解析结果。
通常在实际应用中发现,槽位填充与意图是相关的,例如,同一个地点实体(例如城隍庙),在美食搜索意图下,表示餐厅;而在导航意图下,则可能表示起点或者终点。基于这一分析,本申请实施例提供一种槽位识别方法,该方法对现有的人机对话系统中的训练模型进行了改进,改进之后的联合训练模型中将意图识别结果作为槽位填充的输入参数,也就是说,将意图识别结果与槽位填充关联起来,以实现利用意图识别结果对槽位填充进行修正。利用该方法所训练得到的联合训练模型对用户的输入信息进行语义解析,可以提高语义解析结果的准确性。
其中,改进之后的联合训练模型如图2所示,该联合训练模型包括Bert编码层、全连接层(dense Layer)、掩码注意力层(Masked Attention Layer)和softmax(逻辑回归模型)层。当向联合训练模型中输入对话语料,则联合训练模型的逻辑回归模型可以输出与该对话语料对应的意图和槽位的信息,其中,从全连接层(dense Layer)和掩码注意力层(Masked Attention Layer)的连接关系可见,意图是掩码注意力层的一个输入,全连接层(dense Layer)和掩码注意力层(Masked Attention Layer)实现将意图识别结果与槽位填充关联起来。
需要说明的是,在联合训练模型的训练过程中,对于意图和槽位联合优化的目标损失函数则由意图分类损失函数、槽位填充损失函数和权重的正则化项相加,其中意图分类损失函数可以采用二类Cross Entropy Loss函数,槽位填充损失函数可以采用多类Cross Entropy Loss函数。触发联合训练模型的训练终止条件可以是:当训练的epoch(时期)次数达到设定阈值或者与上一次最优模型的batch(批)间隔大于设定阈值,则训练终止,生成最终的联合训练模型。
需要说明的是,本申请实施例提供的槽位识别方法可以适用于多种可能的人机对话系统,尤其适用于任务型多轮对话系统。为了便于理解本申请实施例的方案,以下以任务型多轮对话系统为例,对该槽位识别方法进行说明。
图3为本申请实施例提供的一种槽位识别方法流程示意图,包括:
步骤301,服务器获取用户命令后,将用户命令进行预处理得到原始词序列。
其中,用户命令可以是语音、图像、视频、音视频、文本等中的一种或多种的组合。例如,用户命令是用户通过麦克风输入的语音,此时,用户命令也可以称为“语音命令”;又例如,用户命令是用户通过键盘或虚拟键盘输入的文本,此时,用户命令也可以称为“文本命令”。具体地,服务器可以先对用户命令进行标记,生成Token(标记)序列,然后将Token序列进行随机排序,并按照batch_size(批大小)将Token序列划分为多个batch的 Token序列,最后对每个batch的Token序列进行截断或填充操作,得到预处理后的原始词序列。可选地,服务器还可以为截断或填充操作后的Token序列,创建相同维数的掩码,其中对应Token序列中<pad>元素位置,掩码序列中元素值为0,否则为1。
示例性地,手机用户可以运行手机中的语音助手软件程序,当用户通过麦克输入语音“play red breast”,该语音助手软件程序将该语音转换成英文文本后发送至该程序对应的服务器。第一步,服务器对该英文文本进行序列化处理,比如可以采用WordPiece(词碎片)技术将文本生成Token序列。需要说明的是,如果是该文本是中文文本,则可以采用基于字符的方式将文本生成Token序列。第二步,服务器将序列化处理后的语料进行随机排序,并按照batch_size大小将语料划分为多个batch,接着,对每个Batch的Token序列进行截断或填充操作。具体来说,对于每个batch的每个Token序列来说,如果它的长度+2小于预定最大序列长度(通常maxLength=512),则填充<Pad>,如果它的长度+2大于预定最大序列长度,则截断多余的Token词;在截断/填充后,分别在Token序列首端填充<CLS>,用于标记为分类任务,在Token序列尾端填充<SEP>,用于句子分割,表示前面是一个完整的句子,例如,如表1所示。
表1
Figure PCTCN2021078762-appb-000014
步骤302,服务器中的联合训练模型先通过对原始词序列进行BERT编码处理,得到意图向量、以及T个分词分别对应的隐藏状态向量。
其中,BERT(bidirectional encoder representations from Transformers),BERT是一个用作为特征抽取器的深度双向预训练语言理解模型。具体来说,服务器对输入至联合训练模型中的原始词序列进行BERT语义编码后,生成隐藏状态向量序列h 0,h 1,……,h T,其中h 0为用户命令对应的句向量编码信息(也就是对应<CLS>位置的编码向量),h 1,……,h T为T个分词分别对应的隐藏状态向量(也就是对应剩余位置的编码向量)。进一步地,BERT编码层将该句向量h 0输入至逻辑回归模型(Softmax)层,生成意图向量,即
Figure PCTCN2021078762-appb-000015
y I∈R 1×I,I表示用户命令对应的意图的数量,y I中最大概率值对应的意图为用户命令对应的意图,h 0为用户命令的句向量编码信息,
Figure PCTCN2021078762-appb-000016
为偏置项,
Figure PCTCN2021078762-appb-000017
为权重矩阵。
示例性地,当用户输入如表1所示的语音命令“play red breast”,该语音命令对应的原始词序列被输入至联合训练模型后,输出的意图为“play music”,如图2所示。另外,“play”对应的隐藏状态向量h 1、“red”对应的隐藏状态向量h 2、“breast”对应的隐藏状态向量h 3被输入至全连接层。
步骤303,服务器中的联合训练模型针对原始词序列中的第一分词,第一分词为T个分词中的任意一个,执行如下处理:
根据第一分词的隐藏状态向量和意图向量,确定第一分词的注意力向量;然后将第一分词的隐藏状态向量和第一分词的注意力向量进行拼接,确定第一分词的槽位概率向量;根据第一分词的槽位概率向量中的K个概率值中,确定第一分词对应的槽位。
其中,将第一分词的隐藏状态向量和第一分词的注意力向量进行拼接的方式可以是:
将第一分词的隐藏状态向量h i和第一分词的注意力向量
Figure PCTCN2021078762-appb-000018
进行拼接,生成深层向量编码信息,深层向量编码信息
Figure PCTCN2021078762-appb-000019
满足
Figure PCTCN2021078762-appb-000020
其中,concat为拼 接操作函数,
Figure PCTCN2021078762-appb-000021
表示拼接后的深层向量编码信息。进一步地,服务器将深层向量编码信息
Figure PCTCN2021078762-appb-000022
输入至softmax层进行softmax转换,从而得到第一分词的槽位概率向量,第一分词的槽位概率向量
Figure PCTCN2021078762-appb-000023
满足
Figure PCTCN2021078762-appb-000024
其中,softmax表示权重矩阵,
Figure PCTCN2021078762-appb-000025
表示权重矩阵,
Figure PCTCN2021078762-appb-000026
表示所述深层向量编码信息,所述
Figure PCTCN2021078762-appb-000027
表示偏置项。
结合图2来说,针对语音命令“play red breast”中的第一个分词“play”,全连接层中的c1为意图向量y I,意图向量y I和隐藏状态向量h 1作为掩码注意力层的输入,从而生成第一个分词“play”对应的注意力向量。针对语音命令“play red breast”中的第二个分词“red”,全连接层中的c2为意图向量y I,意图向量y I和隐藏状态向量h 2作为掩码注意力层的输入,从而生成第二个分词“red”对应的注意力向量。针对语音命令“play red breast”中的第二个分词“breast”,全连接层中的c3为意图向量y I,意图向量y I和隐藏状态向量h 2作为掩码注意力层的输入,从而生成第三个分词“breast”对应的注意力向量。
在另一种可能的实现中,假设第一分词是除了首个分词之外的T-1个分词中的任意一个,可以执行按照如下方式确定该第一分词的注意力向量:根据第一分词的隐藏状态向量、意图向量,以及所述第一分词的前一个分词对应的槽位概率向量,确定第一分词的注意力向量,最终得到“play”对应的槽位概率向量
Figure PCTCN2021078762-appb-000028
比如,结合图2来说,针对语音命令“play red breast”中的第一个分词“play”,全连接层中的c1为意图向量y I,意图向量y I和隐藏状态向量h 1作为掩码注意力层的输入,从而生成第一个分词“play”对应的注意力向量。针对语音命令“play red breast”中的第二个分词“red”,全连接层中的c2为意图向量y I,意图向量y I和隐藏状态向量h 2,以及“play”对应的槽位概率向量
Figure PCTCN2021078762-appb-000029
作为掩码注意力层的输入,从而生成第二个分词“red”对应的注意力向量
Figure PCTCN2021078762-appb-000030
针对语音命令“play red breast”中的第二个分词“breast”,全连接层中的c3为意图向量y I,意图向量y I和隐藏状态向量h 2,以及第二个分词“red”对应的注意力向量
Figure PCTCN2021078762-appb-000031
作为掩码注意力层的输入,从而生成第三个分词“breast”对应的注意力向量
Figure PCTCN2021078762-appb-000032
具体来说,在计算注意力向量的过程中,掩码注意力层可以按照如下计算方式计算注意力向量:
步骤a,对于每一时刻的隐藏向量h i(i=1,…,T),查询向量
Figure PCTCN2021078762-appb-000033
Figure PCTCN2021078762-appb-000034
它只接收意图向量信息y I、当前时刻的隐藏向量信息h i和之前时刻(时刻t从1到i-1)预测的槽位输出信息
Figure PCTCN2021078762-appb-000035
其中q i∈R 1×d
步骤b,Key向量信息经过线性变换为
Figure PCTCN2021078762-appb-000036
其中k i∈R 1×d,所有时刻的key向量信息构成一个矩阵K=(k 1,k 2…,k T)∈R n×d
步骤c,Value向量信息经过线性变换为
Figure PCTCN2021078762-appb-000037
其中v i∈R 1×d,所有时刻的value向量信息构成一个矩V=(v 1,v 2…,v T)∈R n×d
步骤d,计算当前时刻的注意力向量信息,
Figure PCTCN2021078762-appb-000038
其中
Figure PCTCN2021078762-appb-000039
M矩阵为掩码矩阵,其形式为上三角单位矩阵,即当i≤j时,m ij=1,而i>j时,m ij=-∞。
可见,本申请实施例中,当服务器接收到用户命令后,先按照步骤301进行预处理,生成原始词序列,然后将原始词序列作为该联合训练模型的输入,经过模型预测推理后, 得到意图向量y I和槽位向量序列y i S。对意图向量y I来说,可以选取概率最大值对应的意图作为预测的意图;对于第i个槽位向量y i S来说,可以选取概率最大值对应的槽位作为第i个预测的槽位。该方法中将意图作为槽位填充任务的输入,以纠正槽位预测结果,从而提高对话系统理解用户的请求信息的准确度,提升用户使用体验。
本申请所提供的槽位识别方法具体地可以应用于图4所示的系统架构中,其中,NLU模块集成有该联合训练模型。以下结合该系统架构,通过举例阐述上述方法的具体应用过程,具体步骤如下。
步骤401,用户打开手机的语音助手软件程序,在对话框中发出语音信息“帮我订深圳飞往上海的机票”。
步骤402,手机语音助手软件程序中的ASR模块将该语音信息转换成文本信息,如图5所示,并将转换后的文本信息发送到语音助手软件程序中的DM(Dialog manage,对话管理)模块。
步骤403,语音助手软件程序中的DM模块从对话框中获取与该语音信息对应的上下文信息(如图5中的历史对话信息,以及当前对话信息),以及状态信息等,DM模块将该语音信息,以及其它相关信息都发送到NLU(自然语言理解)模块。
步骤404,NLU模块按照本申请实施例所提供的方法,识别该文本“帮我订深圳飞往上海的机票”中的意图和槽位。
具体地,如图6所示,“帮我订深圳飞往上海的机票”进行BERT语义编码后,生成隐藏状态向量序列h 0,h 1,……,h 8,BERT编码层将该句向量h 0输入至Softmax层,生成意图向量y I,y I中最大概率值对应的意图为订机票。在进行槽位填充时,隐藏状态向量h 1和意图向量y I作为输入,生成与第一个分词“帮”对应的槽位概率向量
Figure PCTCN2021078762-appb-000040
其中,
Figure PCTCN2021078762-appb-000041
中最大概率值对应的槽位为空,因此被标记为“o”,另外,在进行槽位填充时,隐藏状态向量h 4和意图向量y I,以及前一个分词“订”的槽位概率向量
Figure PCTCN2021078762-appb-000042
作为输入,生成与第四个分词“深圳”对应的槽位概率向量
Figure PCTCN2021078762-appb-000043
其中,
Figure PCTCN2021078762-appb-000044
中最大概率值对应的槽位为“出发地”,因此被标记为“出发地”或“FromLoc”。依次类推,在进行槽位填充时,隐藏状态向量h 6和意图向量y I,以及前一个分词“订”的槽位概率向量
Figure PCTCN2021078762-appb-000045
作为输入,生成与第六个分词“深圳”对应的槽位概率向量
Figure PCTCN2021078762-appb-000046
其中,
Figure PCTCN2021078762-appb-000047
中最大概率值对应的槽位为“目的地”,因此被标记为“目的地”或“ToLoc”。因此,联合训练模型输出与“帮我订深圳飞往上海的机票”对应的意图为“订机票”,“深圳”对应的槽位为“出发地”,“上海”对应的槽位为“目的地”。
步骤405,NLU模块将意图和槽位识别结果返回给DM模块。
步骤406,DM模块将意图和槽位识别结果输入至NLG(自然语言生成)模块。
其中,DM模块分为两个子模块,分别为对话状态追踪(DST)和对话策略学习(DPL),其主要作用是根据NLU模块的识别结果来更新对话系统的状态,并生成相应的系统动作,例如查询机票。
步骤407和步骤408,NLG模块将DM输出的系统动作文本化,用文本的形式将系统的动作表达出来,以及其它相关信息都发送到DM模块。
步骤409,DM模块向TTS(语音合成)模块发送系统动作执行结果。
步骤410,TTS模块将文本转换成语音,并向用户输出该语音。例如,输出查询的航班信息对应的语音内容。
进一步地,若服务器判断尚有槽位空缺,例如“帮我订深圳飞往上海的机票”这一请 求信息中的“时间”槽位空缺,所以服务器可以进一步地向用户侧设备发送引导信息,例如“请问您想订哪一天的机票”,如图5所示,该引导信息用于引导用户提供请求信息的关联信息。采用这种方式,通过引导信息来引导用户提供所述请求信息的关联信息,便于服务器基于请求信息的关联信息在预设数据库查询到符合用户期望的应答信息,从而能够避免不必要的转接人工,提高用户满意度。
本申请实施例中,引导信息中还可以包括第三历史对话信息,所述第三历史对话信息中的历史请求信息与所述请求信息的相似度大于第六阈值,也就是说,引导信息中还可以附加与用户的请求信息相似的历史请求信息,从而提醒用户已经提问过相似的问题。
可以理解地,在其它可能的情形下,引导信息中还可以包括其它可能的内容,本领域技术人员可以根据实际经验和需要来设置引导信息中所包括的内容,但凡是为积极寻求符合用户期望的应答信息而向用户发送的起引导、提示和安抚等作用的信息,均在本发明的保护范围之内。
可见,本申请实施例中,意图识别和槽位填充是联合进行的,槽位填充过程中考虑到了意图,因此槽位填充的粒度也会更细且更准确,这样只需要训练一个联合模型结果就能很好的完成两个任务。
需要说明的是:(1)上述步骤编号仅为本申请实施例执行流程的一种示例,上述各个步骤中相互之间没有时序依赖关系的步骤之间没有严格的执行顺序。步骤401至步骤410中的各个步骤并非必须执行步骤,具体实施中,可以根据实际需要选择性执行其中的部分步骤。
上述主要从各个设备之间交互的角度对本申请提供的方案进行了介绍。可以理解的是,上述实现各设备为了实现上述功能,其包括了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本发明能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
在采用集成的单元的情况下,图7示出了本申请实施例中所涉及的装置的可能的示例性框图,该装置700可以以软件的形式存在。装置700可以包括:处理单元702和通信单元703。处理单元702用于对装置700的动作进行控制管理。通信单元703用于支持装置700与其他设备(比如用户侧设备或客服侧设备)的通信。装置700还可以包括存储单元701,用于存储装置700的程序代码和数据。
其中,处理单元702可以是处理器或控制器,例如可以是通用中央处理器(central processing unit,CPU),通用处理器,数字信号处理(digital signal processing,DSP),专用集成电路(application specific integrated circuits,ASIC),现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本发明公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包括一个或多个微处理器组合,DSP和微处理器的组合等等。通信单元703可以是通信接口、收发器或收发电路等,其中,该通信接口是统称,在具体实现中,该通信接口可以包括多个接口。存储单元701可以是存储器。
该装置700可以为上述实施例中的服务器、或者还可以为设置在服务器中的半导体芯片。处理单元702可以支持装置700执行上文中各方法示例中服务器的动作,通信单元703可以支持装置700与用户侧设备或客服侧设备之间的通信;例如,处理单元702用于支持装置700执行图3中的步骤301至步骤303,通信单元703用于支持装置700执行图4中的步骤405。
具体地,在一个实施例中,处理单元702,用于获取用户命令后,将用户命令进行预处理得到原始词序列,服务器中的联合训练模型先通过对原始词序列进行BERT编码处理,得到意图向量、以及T个分词分别对应的隐藏状态向量。服务器中的联合训练模型针对原始词序列中的第一分词,第一分词为T个分词中的任意一个,执行如下处理:根据第一分词的隐藏状态向量和意图向量,确定第一分词的注意力向量;然后将第一分词的隐藏状态向量和第一分词的注意力向量进行拼接,确定第一分词的槽位概率向量;从第一分词的槽位概率向量中的K个概率值中,选择最大概率值对应的槽位为第一分词对应的槽位。
在一种可能的实施例中,假设第一分词为除了首个分词之外的T-1个分词中的任意一个,处理单元702根据所述第一分词的隐藏状态向量和所述意图,确定所述第一分词的注意力向量时,具体可以根据该第一分词的隐藏状态向量、所述意图,以及该第一分词的前一个分词对应的槽位概率向量,确定该第一分词的注意力向量。
相关具体实现可以参见上述方法中的内容,在此不再重复赘述。
参阅图8所示,为本申请提供的一种装置示意图,该装置可以是上述服务器,或者,也可以是设置在服务器中的芯片。该装置800包括:处理器802、通信接口803、存储器801。可选的,装置800还可以包括通信线路804。其中,通信接口803、处理器802以及存储器801可以通过通信线路804相互连接;通信线路804可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。所述通信线路804可以分为地址总线、数据总线、控制总线等。为便于表示,图8中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
处理器802可以是一个CPU,微处理器,ASIC,或一个或多个用于控制本申请方案程序执行的集成电路。
通信接口803,使用任何收发器一类的装置,用于与其他设备或通信网络通信,如以太网,无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN),有线接入网等。
存储器801可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically er服务器able programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过通信线路804与处理器相连接。存储器也可以和处理器集成在一起。
其中,存储器801用于存储执行本申请方案的计算机执行指令,并由处理器802来控 制执行。处理器802用于执行存储器801中存储的计算机执行指令,从而实现本申请上述实施例提供的方法。
可选的,本申请实施例中的计算机执行指令也可以称之为应用程序代码,本申请实施例对此不作具体限定。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
本申请实施例还提供一种计算机存储介质,该计算机存储介质中存储有计算机指令,当该计算机指令在电子设备上运行时,使得电子设备执行上述相关方法步骤实现上述实施例中的方法。
本申请实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述相关步骤,以实现上述实施例中的方法。
另外,本申请的实施例还提供一种装置,这个装置具体可以是芯片,组件或模块,该装置可包括相连的处理器和存储器;其中,存储器用于存储计算机执行指令,当装置运行时,处理器可执行存储器存储的计算机执行指令,以使芯片执行上述各方法实施例中的方法。
通过以上实施方式的描述,所属领域的技术人员可以了解到,为描述的方便和简洁, 仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其他的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以丢弃,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其他的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (12)

  1. 一种槽位识别方法,其特征在于,包括:
    将用户命令进行预处理得到原始词序列;
    通过对所述原始词序列进行深度双向预训练语言理解模型BERT编码处理,得到意图向量、以及所述原始词序列中的T个分词分别对应的隐藏状态向量;
    针对所述T个分词中的第一分词,所述第一分词为T个分词中的任意一个,执行如下处理:
    根据所述第一分词的隐藏状态向量和所述意图向量,确定所述第一分词的注意力向量;
    将所述第一分词的隐藏状态向量和所述第一分词的注意力向量进行拼接,确定所述第一分词的槽位概率向量;
    根据所述第一分词的槽位概率向量中的K个概率值,确定所述第一分词对应的槽位。
  2. 根据权利要求1所述的方法,其特征在于,根据所述第一分词的隐藏状态向量和所述意图向量,确定所述第一分词的注意力向量,包括:
    根据所述第一分词的隐藏状态向量、所述意图向量,以及所述第一分词的前一个分词对应的槽位概率向量,确定所述第一分词的注意力向量;
    其中,所述第一分词为所述T个分词中除首个分词之外的T-1个分词中的任意一个。
  3. 根据权利要求1或2所述的方法,其特征在于,所述将用户命令进行预处理得到原始词序列,包括:
    根据所述用户命令生成标记Token序列;
    将所述Token序列进行随机排序,并按照批大小batch_size将所述Token序列划分为多个批batch的Token序列;
    对每个Batch的Token序列进行截断或填充操作,得到预处理后的原始词序列。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述通过对所述原始词序列进行BERT编码处理,得到意图向量、以及T个分词分别对应的隐藏状态向量,包括:
    对所述原始词序列进行BERT语义编码后,生成向量序列h 0,h 1,……,h T,其中h 0为所述用户命令的句向量编码信息,h 1,……,h T为所述T个分词分别对应的隐藏状态向量;
    根据所述用户命令的句向量编码信息h 0,生成所述用户命令的意图向量,其中,所述意图向量满足
    Figure PCTCN2021078762-appb-100001
    其中,y I∈R 1×I,I表示所述用户命令的可能的意图的数量,y I中最大概率值对应的意图为所述用户命令的意图,h 0为所述用户命令的句向量编码信息,
    Figure PCTCN2021078762-appb-100002
    为偏置项,
    Figure PCTCN2021078762-appb-100003
    为权重矩阵。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,将所述第一分词的隐藏状态向量和所述第一分词的注意力向量进行拼接,确定所述第一分词的槽位概率向量,包括:
    将所述第一分词的隐藏状态向量h i和所述第一分词的注意力向量
    Figure PCTCN2021078762-appb-100004
    进行拼接,生成深层向量编码信息,所述深层向量编码信息
    Figure PCTCN2021078762-appb-100005
    满足
    Figure PCTCN2021078762-appb-100006
    其中,concat为拼接操作函数,
    Figure PCTCN2021078762-appb-100007
    表示拼接后的深层向量编码信息;
    将所述深层向量编码信息
    Figure PCTCN2021078762-appb-100008
    进行逻辑回归模型softmax转换,得到所述第一分词的槽位概率向量,所述第一分词的槽位概率向量
    Figure PCTCN2021078762-appb-100009
    满足
    Figure PCTCN2021078762-appb-100010
    其中,softmax表示归一化指数函数,
    Figure PCTCN2021078762-appb-100011
    表示权重矩阵,
    Figure PCTCN2021078762-appb-100012
    表示所述深层向量编码信息,所述
    Figure PCTCN2021078762-appb-100013
    表示偏置项。
  6. 一种电子设备,其特征在于,所述电子设备包括处理器和存储器;
    所述存储器存储有程序指令;
    所述处理器用于运行所述存储器存储的所述程序指令,使得所述电子设备执行:
    将用户命令进行预处理得到原始词序列;
    通过对所述原始词序列进行深度双向预训练语言理解模型BERT编码处理,得到意图向量、以及T个分词分别对应的隐藏状态向量;
    针对所述T个分词中的第一分词,所述第一分词为T个分词中的任意一个,执行如下处理:
    根据所述第一分词的隐藏状态向量和所述意图向量,确定所述第一分词的注意力向量;
    将所述第一分词的隐藏状态向量和所述第一分词的注意力向量进行拼接,确定所述第一分词的槽位概率向量;
    根据所述第一分词的槽位概率向量中的K个概率值中,确定所述第一分词对应的槽位。
  7. 根据权利要求6所述的电子设备,其特征在于,所述处理器根据所述第一分词的隐藏状态向量和所述意图向量,确定所述第一分词的注意力向量时,具体执行:
    根据所述第一分词的隐藏状态向量、所述意图向量,以及所述第一分词的前一个分词对应的槽位,确定所述第一分词的注意力向量;
    其中,所述第一分词为所述T个分词中除首个分词之外的T-1个分词中的任意一个。
  8. 根据权利要求6或7所述的电子设备,其特征在于,所述处理器将用户命令进行预处理得到原始词序列时,具体执行:
    根据所述用户命令生成标记Token序列;
    将所述Token序列进行随机排序,并按照批大小batch_size将所述Token序列划分为多个批batch的Token序列;
    对每个Batch的Token序列进行截断或填充操作,得到预处理后的原始词序列。
  9. 根据权利要求6至8任一项所述的电子设备,其特征在于,所述处理器通过对所述原始词序列进行BERT编码处理,得到意图向量、以及T个分词分别对应的隐藏状态向量时,具体执行:
    对所述原始词序列进行BERT语义编码后,生成向量序列h 0,h 1,……,h T,其中h 0为所述用户命令的句向量编码信息,h 1,……,h T为所述T个分词分别对应的隐藏状态向量;
    根据所述用户命令的句向量编码信息h 0,生成所述用户命令的意图向量,其中,所述意图向量满足
    Figure PCTCN2021078762-appb-100014
    其中,y I∈R 1×I,I表示所述用户命令的可能的意图的数量,y I中最大概率值对应的意图为所述用户命令的意图,h 0为所述用户命令的句向量编码信息,
    Figure PCTCN2021078762-appb-100015
    为偏置项,
    Figure PCTCN2021078762-appb-100016
    为权重矩阵。
  10. 根据权利要求6至9任一项所述的电子设备,其特征在于,所述处理器将所述第一分词的隐藏状态向量和所述第一分词的注意力向量进行拼接,确定所述第一分词的槽位概率向量时,具体执行:
    将所述第一分词的隐藏状态向量h i和所述第一分词的注意力向量
    Figure PCTCN2021078762-appb-100017
    进行拼接,生成深层向量编码信息,所述深层向量编码信息
    Figure PCTCN2021078762-appb-100018
    满足
    Figure PCTCN2021078762-appb-100019
    其 中,concat为拼接操作函数,
    Figure PCTCN2021078762-appb-100020
    表示拼接后的深层向量编码信息;
    将所述深层向量编码信息
    Figure PCTCN2021078762-appb-100021
    进行逻辑回归模型softmax转换,得到所述第一分词的槽位概率向量,所述第一分词的槽位概率向量
    Figure PCTCN2021078762-appb-100022
    满足
    Figure PCTCN2021078762-appb-100023
    其中,softmax表示归一化指数函数,
    Figure PCTCN2021078762-appb-100024
    表示权重矩阵,
    Figure PCTCN2021078762-appb-100025
    表示所述深层向量编码信息,所述
    Figure PCTCN2021078762-appb-100026
    表示偏置项。
  11. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质包括程序指令,当所述程序指令在处理器上运行时,使得所述处理器执行如权利要求1至5任一项所述的方法。
  12. 一种芯片,其特征在于,所述芯片与存储器耦合,用于执行所述存储器中存储的计算机程序,以执行如权利要求1至5任一项所述的方法。
PCT/CN2021/078762 2020-03-23 2021-03-02 一种槽位识别方法及电子设备 WO2021190259A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010210034.9A CN113505591A (zh) 2020-03-23 2020-03-23 一种槽位识别方法及电子设备
CN202010210034.9 2020-03-23

Publications (1)

Publication Number Publication Date
WO2021190259A1 true WO2021190259A1 (zh) 2021-09-30

Family

ID=77890951

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/078762 WO2021190259A1 (zh) 2020-03-23 2021-03-02 一种槽位识别方法及电子设备

Country Status (2)

Country Link
CN (1) CN113505591A (zh)
WO (1) WO2021190259A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114021582A (zh) * 2021-12-30 2022-02-08 深圳市北科瑞声科技股份有限公司 结合语音信息的口语理解方法、装置、设备及存储介质
CN114302028A (zh) * 2021-12-24 2022-04-08 贝壳找房网(北京)信息技术有限公司 提词方法、装置以及电子设备、存储介质、程序产品
CN115358186A (zh) * 2022-08-31 2022-11-18 南京擎盾信息科技有限公司 一种槽位标签的生成方法、装置及存储介质
CN117725189A (zh) * 2024-02-18 2024-03-19 国家超级计算天津中心 专业领域的生成式问答方法及电子设备

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737996B (zh) * 2020-05-29 2024-03-26 北京百度网讯科技有限公司 基于语言模型获取词向量的方法、装置、设备及存储介质
CN115273849B (zh) * 2022-09-27 2022-12-27 北京宝兰德软件股份有限公司 一种关于音频数据的意图识别方法及装置
CN115935994B (zh) * 2022-12-12 2024-03-08 芽米科技(广州)有限公司 一种智能识别电商标题方法
CN116153313A (zh) * 2023-04-07 2023-05-23 广州小鹏汽车科技有限公司 语音交互方法、服务器和计算机可读存储介质
CN116092495B (zh) * 2023-04-07 2023-08-29 广州小鹏汽车科技有限公司 语音交互方法、服务器和计算机可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109785833A (zh) * 2019-01-02 2019-05-21 苏宁易购集团股份有限公司 用于智能设备的人机交互语音识别方法及系统
CN110008476A (zh) * 2019-04-10 2019-07-12 出门问问信息科技有限公司 语义解析方法、装置、设备及存储介质
CN110309514A (zh) * 2019-07-09 2019-10-08 北京金山数字娱乐科技有限公司 一种语义识别方法及装置
CN110334210A (zh) * 2019-05-30 2019-10-15 哈尔滨理工大学 一种基于bert与lstm、cnn融合的中文情感分析方法
CN110413785A (zh) * 2019-07-25 2019-11-05 淮阴工学院 一种基于bert和特征融合的文本自动分类方法
CN110853626A (zh) * 2019-10-21 2020-02-28 成都信息工程大学 基于双向注意力神经网络的对话理解方法、装置及设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109785833A (zh) * 2019-01-02 2019-05-21 苏宁易购集团股份有限公司 用于智能设备的人机交互语音识别方法及系统
CN110008476A (zh) * 2019-04-10 2019-07-12 出门问问信息科技有限公司 语义解析方法、装置、设备及存储介质
CN110334210A (zh) * 2019-05-30 2019-10-15 哈尔滨理工大学 一种基于bert与lstm、cnn融合的中文情感分析方法
CN110309514A (zh) * 2019-07-09 2019-10-08 北京金山数字娱乐科技有限公司 一种语义识别方法及装置
CN110413785A (zh) * 2019-07-25 2019-11-05 淮阴工学院 一种基于bert和特征融合的文本自动分类方法
CN110853626A (zh) * 2019-10-21 2020-02-28 成都信息工程大学 基于双向注意力神经网络的对话理解方法、装置及设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
QIAN CHEN , ZHU ZHUO, WEN WANG: "BERT for Joint Intent Classification and Slot Filling", ARXIV.ORG, 28 February 2019 (2019-02-28), pages 1 - 6, XP081034620 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114302028A (zh) * 2021-12-24 2022-04-08 贝壳找房网(北京)信息技术有限公司 提词方法、装置以及电子设备、存储介质、程序产品
CN114021582A (zh) * 2021-12-30 2022-02-08 深圳市北科瑞声科技股份有限公司 结合语音信息的口语理解方法、装置、设备及存储介质
CN114021582B (zh) * 2021-12-30 2022-04-01 深圳市北科瑞声科技股份有限公司 结合语音信息的口语理解方法、装置、设备及存储介质
CN115358186A (zh) * 2022-08-31 2022-11-18 南京擎盾信息科技有限公司 一种槽位标签的生成方法、装置及存储介质
CN115358186B (zh) * 2022-08-31 2023-11-14 南京擎盾信息科技有限公司 一种槽位标签的生成方法、装置及存储介质
CN117725189A (zh) * 2024-02-18 2024-03-19 国家超级计算天津中心 专业领域的生成式问答方法及电子设备
CN117725189B (zh) * 2024-02-18 2024-04-16 国家超级计算天津中心 专业领域的生成式问答方法及电子设备

Also Published As

Publication number Publication date
CN113505591A (zh) 2021-10-15

Similar Documents

Publication Publication Date Title
WO2021190259A1 (zh) 一种槽位识别方法及电子设备
US11676067B2 (en) System and method for creating data to train a conversational bot
CN111862977B (zh) 一种语音对话处理方法和系统
WO2020140487A1 (zh) 用于智能设备的人机交互语音识别方法及系统
WO2021042904A1 (zh) 会话意图识别方法、装置、计算机设备和存储介质
CN112100349A (zh) 一种多轮对话方法、装置、电子设备及存储介质
US20240153489A1 (en) Data driven dialog management
CN110597961B (zh) 一种文本类目标注方法、装置、电子设备及存储介质
US11481387B2 (en) Facet-based conversational search
JP2023522083A (ja) 音声認識方法、装置、設備及び記憶媒体
WO2022252636A1 (zh) 基于人工智能的回答生成方法、装置、设备及存储介质
WO2021147041A1 (zh) 语义分析方法、装置、设备及存储介质
US10872601B1 (en) Natural language processing
WO2021063089A1 (zh) 规则匹配方法、规则匹配装置、存储介质及电子设备
CA3050202A1 (en) Visualization interface for voice input
US20230094730A1 (en) Model training method and method for human-machine interaction
US11170765B2 (en) Contextual multi-channel speech to text
CN114220461A (zh) 客服话术的引导方法、装置、设备及存储介质
CN116304748A (zh) 一种文本相似度计算方法、系统、设备及介质
CN111209297B (zh) 数据查询方法、装置、电子设备及存储介质
CN115392264A (zh) 一种基于rasa的任务型智能多轮对话方法及相关设备
WO2023045186A1 (zh) 意图识别方法、装置、电子设备和存储介质
CN111368066B (zh) 获取对话摘要的方法、装置和计算机可读存储介质
CN112036186A (zh) 语料标注方法、装置、计算机存储介质及电子设备
WO2023137903A1 (zh) 基于粗糙语义的回复语句确定方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21774763

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21774763

Country of ref document: EP

Kind code of ref document: A1