WO2020140487A1 - 用于智能设备的人机交互语音识别方法及系统 - Google Patents

用于智能设备的人机交互语音识别方法及系统 Download PDF

Info

Publication number
WO2020140487A1
WO2020140487A1 PCT/CN2019/106778 CN2019106778W WO2020140487A1 WO 2020140487 A1 WO2020140487 A1 WO 2020140487A1 CN 2019106778 W CN2019106778 W CN 2019106778W WO 2020140487 A1 WO2020140487 A1 WO 2020140487A1
Authority
WO
WIPO (PCT)
Prior art keywords
slot
vector
context
intent
word sequence
Prior art date
Application number
PCT/CN2019/106778
Other languages
English (en)
French (fr)
Inventor
孙鹏飞
贾洪园
李春生
Original Assignee
苏宁云计算有限公司
苏宁易购集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏宁云计算有限公司, 苏宁易购集团股份有限公司 filed Critical 苏宁云计算有限公司
Priority to CA3166784A priority Critical patent/CA3166784A1/en
Publication of WO2020140487A1 publication Critical patent/WO2020140487A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces

Definitions

  • the present invention relates to the field of voice recognition technology, and in particular to a human-machine interactive voice recognition method and system for smart devices.
  • the CNN+knowledge representation classifier For intent recognition, you can abstract it as a classification problem, and then use the CNN+knowledge representation classifier to train the intent recognition model. In addition to embedding the user's speech problem into the word in the intent recognition model, it also introduces the semantics of knowledge Representation to increase the generalization ability of the presentation layer, but in practical applications, it is found that the model has a defect of slot information filling deviation, which affects the accuracy of the intention recognition model.
  • slot filling the essence is to formalize sentence sequences into labeled sequences. There are many methods for labeling sequences, such as hidden Markov model or conditional random field model, but these slot filling models are used in specific applications.
  • An object of the present invention is to provide a method and system for human-machine interactive voice recognition for smart devices, by joint optimization training of intention recognition and slot filling, to improve the accuracy of voice recognition.
  • one aspect of the present invention provides a human-machine interactive voice recognition method for a smart device, including:
  • the user's speech problem is segmented to obtain the original word sequence, and the original word sequence is vectorized through the embedding process;
  • slot gate g to perform splicing processing on the slot context vector c i S and the intention context vector c I , and converting and representing the slot label model y i S through the slot gate g;
  • An intent prediction model y I and the converted slot label model y i S are jointly optimized to construct an objective function, and intent recognition is performed on the user's speech problem based on the objective function.
  • the user's speech problem is segmented to obtain the original word sequence
  • the method of vectorizing the original word sequence through the embedding process includes:
  • the original word sequence is subjected to word embedding to realize the vectorized representation of each word segmentation in the original word sequence.
  • the vector of each sub-word is calculated implicit state vector h i and c i S slots context of the vector, the implied by the state vector h i and the vector c i S slots context obtain slot weighting
  • the methods of the label model y i S include:
  • the implicit state vector hT and the intention context vector c I of the original word sequence represented by the vectorization are calculated, and the intention prediction model y I is obtained by weighting the implicit state vector hT and the intention context vector c I Methods include:
  • the slot gate g is used to splice the slot context vector c i S and the intention context vector c I , and the method of converting and representing the slot label model y i S through the slot gate g includes:
  • v represents the weight vector obtained by training
  • W represents the weight matrix obtained by training
  • the objective function constructed by jointly optimizing the intent prediction model y I and the converted slot label model y i S is:
  • X) represents the conditional probability of slot filling and intent prediction output at a given original word sequence, where X is the original word sequence represented by vectorization.
  • the method for intent recognition of the user's voice problem based on the objective function includes:
  • the word segmentation with the highest probability value is selected and recognized as the intention of the user's voice problem.
  • the human-machine interactive voice recognition method for smart devices provided by the present invention has the following beneficial effects:
  • the acquired user voice question is first converted into recognized text, and the original word sequence is generated based on the recognizable text segmentation processing, and then the original word sequence is word embedded After processing, the vectorized representation is implemented. After that, the slot label model y i S and the intent prediction model y I are constructed based on the original word sequence of the vector representation. The construction step of the slot label model y i S is through calculation.
  • y I is to calculate the implicit state vector hT and the intention context vector c I of the original word sequence, and then weight the implicit state vector hT and the intention context vector c I to obtain the intention prediction model y I.
  • Integrating the intent prediction model y I and the slot label model y i S we add a decoder layer to the existing encoder-decoder architecture to construct the intent prediction model y I , and introduce the slot gate g to the slot context vector c i S and the intent context vector c I are stitched together. Finally, the intent prediction model y I and the converted slot label model y i S are jointly optimized to obtain an objective function, and the objective function is used to sequentially obtain the corresponding words in the original word sequence. Intentional conditional probability, and then select the word segmentation with the largest probability value to recognize the user's voice problem intent, which ensures the accuracy of voice recognition.
  • Another aspect of the present invention provides a human-machine interactive voice recognition system for smart devices, which is applied to the human-machine interactive voice recognition method for smart devices described in the above technical solution, the system includes:
  • the word segmentation processing unit is used to segment the user's speech problem to obtain the original word sequence, and vectorize the original word sequence through embedding processing;
  • First calculating means for calculating a hidden state of the sub-word vector and the vector h i c i S slots context of the vector, the implied by the state vector h i and the context slots weighting vector c i S Obtain the slot label model y i S ;
  • the second calculation unit is used to calculate the implicit state vector hT and the intention context vector c I of the original word sequence represented by the vectorization, and obtain the intention prediction by weighting the implicit state vector hT and the intention context vector c I Model y I ;
  • a model conversion unit used to perform slotting processing on the slot context vector c i S and the intention context vector c I using a slot gate g, and convert and represent the slot label model y i S through the slot gate g;
  • the joint optimization unit is used to jointly optimize the intent prediction model y I and the converted slot label model y i S to construct an objective function, and perform intent recognition on the user's voice problem based on the objective function.
  • the word segmentation processing unit includes:
  • the word segmentation module is used to convert the user's voice question into recognizable text, and use the word segmenter to segment the recognizable text to obtain the original word sequence;
  • the embedded processing module is used to embedding the original word sequence to realize the vectorized representation of each word segmentation in the original word sequence.
  • the first calculation unit includes:
  • Implicit state calculation module for the bidirectional network for each word LSTM vector coding process, and outputs the sub-word vectors corresponding implicit state vector h i;
  • Slot context calculation module used to pass formulas Calculate the slot context vector c i S corresponding to each participle vector; where, Represents the attention weight of the slot, and its calculation formula is ⁇ represents the slot activation function, Represents the slot weight matrix;
  • Slot tag model module configured to build, based on the implicit tag slot state vector h i and the vector c i S slots context model
  • the beneficial effects of the human-machine interactive voice recognition system for smart devices provided by the present invention are the same as the beneficial effects of the human-machine interactive voice recognition method for smart devices provided by the foregoing technical solutions, and are not described here To repeat.
  • FIG. 1 is a schematic flowchart of a human-machine interactive voice recognition method for a smart device according to Embodiment 1 of the present invention
  • FIG. 2 is an example diagram of an encoder-decoder fusion model in Embodiment 1 of the present invention
  • FIG. 3 is an example diagram of the slot gate g in FIG. 2;
  • FIG. 4 is a structural block diagram of a human-machine interactive voice recognition system for smart devices in Embodiment 2 of the present invention.
  • FIG. 1 is a schematic flowchart of a human-machine interactive voice recognition method for a smart device according to Embodiment 1 of the present invention.
  • this embodiment provides a human-machine interactive voice recognition method for a smart device, including:
  • the acquired user voice question is first converted into recognized text, and the original word sequence is generated based on the recognizable text segmentation processing, and then the original word sequence is subjected to words
  • the embedding process realizes the vectorized representation.
  • the slot label model y i S and the intent prediction model y I are constructed based on the original word sequence represented by the vectorization.
  • the construction step of the slot label model y i S is after calculating the vectors of the sub-word implicit state vector h i and a slot context of the vector c i S, then the implicit state vector h i and a slot context of the vector c i S obtain slot weighting tag model y i S, intent prediction
  • the construction step of model y I is to calculate the implicit state vector hT and intention context vector c I of the original word sequence, and then weight the implicit state vector hT and intention context vector c I to obtain the intention prediction model y I , as shown in the figure
  • the vector c i S and the intent context vector c I are stitched together.
  • the intent prediction model y I and the converted slot label model y i S are jointly optimized to obtain the target function, and the target function is used to sequentially obtain each participle in the original word sequence.
  • Corresponding intent conditional probabilities and then select the word segmentation with the highest probability value to recognize the intent of the user's voice problem, ensuring the accuracy of voice recognition.
  • the user's speech word segmentation is processed to obtain the original word sequence
  • the method of vectorizing the original word sequence through the embedding process includes:
  • the received user's voice question is converted into recognizable text, and the word segmentation is used to segment the recognizable text to obtain the original word sequence; the original word sequence is subjected to word embedding processing to realize the vectorized representation of each word segmentation in the original word sequence.
  • LSTM network using the bidirectional hidden state vector h i for each word vector coding process, and outputs the sub-word vectors corresponding to; by the equation Calculate the slot context vector c i S corresponding to each participle vector; where, Represents the attention weight of the slot, and its calculation formula is ⁇ represents the slot activation function, Represents a weight matrix slot; slot tag model constructed based on implicit state vector h i and a slot context of the vector c i S
  • a plurality of word vectors LSTM bidirectional input one-output network may be hidden state vector h i, the formula for the context of the vector slot among them Represents the attention weight of the slot, i represents the i-th word segmentation vector, j represents the j-th element in the i-th word segmentation vector, specifically, the calculation formula of the slot's attention weight is T represents the total number of elements in the word segmentation vector, and K represents the Kth element in T.
  • the slot activation function ⁇ and slot weight matrix It can be derived based on the vector matrix training of the original word sequence, and the specific training process is a common technical means in the art, which will not be repeated here in this embodiment.
  • the method for calculating the implicit state vector hT and the intention context vector c I of the original word sequence represented by the vectorization in the above embodiment, and the method for obtaining the intention prediction model y I after weighting the implicit state vector hT and the intention context vector c I include:
  • the training method of the intent prediction model y I and the slot label model The training method is the same, the difference is that the hidden state vector hT can be obtained only by using the hidden units in the bidirectional LSTM network, by one-dimensional processing of the vector matrix, and then calling the formula Calculate the intent context vector c I of the original word sequence; where,
  • the attention weight of the table diagram, the calculation formula is ⁇ ′ table schematic activation function, Table schematic weight matrix, for intent activation function ⁇ ′ and intent weight matrix It can be derived based on the processed one-dimensional vector training.
  • the specific training process is a common technical means in the art, and this embodiment will not repeat them here.
  • the slot gate g is used to splice the slot context vector c i S and the intent context vector c I , and the method of converting and representing the slot label model y i S through the slot gate g includes: :
  • v represents the weight vector obtained by training
  • W represents the weight matrix obtained by training
  • Fig. 3 shows a structural model of the slot gate g.
  • the objective function constructed by jointly optimizing the intent prediction model y I and the converted slot label model y i S in the above embodiment is:
  • X) represents the conditional probability of slot filling and intent prediction output at a given original word sequence, where X represents the original word sequence represented by vectorization.
  • x i represents the i-th word segmentation vector
  • T represents the total number of word segmentation vectors.
  • this embodiment provides a human-machine interactive voice recognition system for smart devices, including:
  • the word segmentation processing unit 1 is used for word segmentation processing of the user's speech problem to obtain an original word sequence, and vectorizing the original word sequence through embedding processing;
  • the second calculation unit 3 is used to calculate the implicit state vector hT and the intention context vector c I of the original word sequence represented by the vectorization, and obtain the intention by weighting the implicit state vector hT and the intention context vector c I Prediction model y I ;
  • the joint optimization unit 5 is used to jointly optimize the intent prediction model y I and the converted slot label model y i S to construct an objective function, and perform intent recognition on the user's voice problem based on the objective function.
  • the word segmentation processing unit includes:
  • the word segmentation module is used to convert the user's voice question into recognizable text, and use the word segmenter to segment the recognizable text to obtain the original word sequence;
  • the embedded processing module is used to embedding the original word sequence to realize the vectorized representation of each word segmentation in the original word sequence.
  • the first calculation unit includes:
  • Implicit state calculation module for the bidirectional network for each word LSTM vector coding process, and outputs the sub-word vectors corresponding implicit state vector h i;
  • Slot context calculation module used to pass formulas Calculate the slot context vector c i S corresponding to each participle vector; where, Represents the attention weight of the slot, and its calculation formula is ⁇ represents the slot activation function, Represents the slot weight matrix;
  • Slot tag model module configured to build, based on the implicit tag slot state vector h i and the vector c i S slots context model
  • the beneficial effects of the human-machine interactive voice recognition system for smart devices provided by the embodiments of the present invention are the same as the beneficial effects of the human-machine interactive voice recognition method for smart devices provided by the first embodiment, I will not repeat them here.
  • the above program can be stored in a computer-readable storage medium.
  • the program When executed, it includes Each step of the method in the foregoing embodiment, and the storage medium may be: ROM/RAM, magnetic disk, optical disk, memory card, or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

一种用于智能设备的人机交互语音识别方法及系统,属于语音识别技术领域,通过将意图识别和槽位填充进行联合优化训练,以提高语音识别的准确率。该方法包括:将用户的语音问题分词处理得到原始词序列,并通过嵌入处理对原始词序列进行向量化表示;通过将隐含状态向量h i和槽位上下文向量c i S加权处理后得到槽位标签模型y i S;通过将隐含状态向量hT和意图上下文向量c I加权处理后得到意图预测模型y I;使用槽位门g对槽位上下文向量c i S和意图上下文向量c I进行拼接处理,并通过槽位门g对槽位标签模型y i S进行转换表示;对意图预测模型y I和转换后的槽位标签模型y i S联合优化构建目标函数,并基于目标函数对用户的语音问题进行意图识别。

Description

用于智能设备的人机交互语音识别方法及系统 技术领域
本发明涉及语音识别技术领域,尤其涉及一种用于智能设备的人机交互语音识别方法及系统。
背景技术
随着互联网技术的发展,使用语音进行人机交互的智能设备越来越多,现有的语音交互系统有Siri、小蜜、Cortana、小冰、度秘等,语音人机交互相比较于传统的手动输入人机交互而言具备便捷高效的特点,具有广泛的应用场景,在语音识别的过程中,意图识别及槽位填充技术是保证语音识别结果准确率的关键。
对于意图识别来说,可以将其抽象为一个分类问题,然后使用CNN+知识表示的分类器训练意图识别模型,在意图识别模型中除了将用户的语音问题进行单词嵌入外,还引入了知识的语义表示来增加表示层的泛化能力,但在实际应用中发现该模型存在槽位信息填充偏差的缺陷,影响了意图识别模型的准确性。对于槽位填充来说,其本质是将句子序列形式化为标注序列,常用的标注序列的方法有很多,如隐马尔科夫模型或者条件随机场模型,但是这些槽位填充模型在特定的应用场景中,由于缺乏上下文信息会导致槽位在不同语义意图下存在歧义,进而无法满足实际应用需求。可见,现有技术中的两个模型的训练是独立进行的,没有针对意图识别任务和槽位填充任务进行结合优化,最终导致训练出的模型在语音识别方面存在识别准确率低的问题,降低了用户体验。
发明内容
本发明的目的在于提供一种用于智能设备的人机交互语音识别方法及系统,通过将意图识别和槽位填充进行联合优化训练,以提高语音识别的准确 率。
为了实现上述目的,本发明的一方面提供一种用于智能设备的人机交互语音识别方法,包括:
将用户的语音问题分词处理得到原始词序列,并通过嵌入处理对所述原始词序列进行向量化表示;
计算各分词向量的隐含状态向量h i和槽位上下文向量c i S,通过将所述隐含状态向量h i和所述槽位上下文向量c i S加权处理后得到槽位标签模型y i S
计算向量化表示的原始词序列隐含状态向量hT和意图上下文向量c I,通过将所述隐含状态向量hT和所述意图上下文向量c I加权处理后得到意图预测模型y I
使用槽位门g对所述槽位上下文向量c i S和意图上下文向量c I进行拼接处理,并通过槽位门g对槽位标签模型y i S进行转换表示;
对意图预测模型y I和转换后的槽位标签模型y i S联合优化构建目标函数,并基于所述目标函数对用户的语音问题进行意图识别。
优选地,将用户的语音问题分词处理得到原始词序列,并通过嵌入处理对所述原始词序列进行向量化表示的方法包括:
接收用户的语音问题转换为可识别文本,利用分词器对所述可识别文本分词处理得到原始词序列;
将原始词序列进行word embedding处理,实现对原始词序列中各分词的向量化表示。
较佳地,计算各分词向量的隐含状态向量h i和槽位上下文向量c i S,通过将所述隐含状态向量h i和所述槽位上下文向量c i S加权处理后得到槽位标签模型y i S的方法包括:
利用双向LSTM网络对各分词向量编码处理,输出与各分词向量相对应的隐含状态向量h i
通过公式
Figure PCTCN2019106778-appb-000001
计算各分词向量对应的槽位上下文向量c i S;其中,
Figure PCTCN2019106778-appb-000002
表示槽位的注意力权重,其计算公式为
Figure PCTCN2019106778-appb-000003
σ表示槽位激活函数,
Figure PCTCN2019106778-appb-000004
表示槽位权重矩阵;
基于所述隐含状态向量h i和所述槽位上下文向量c i S构建槽位标签模型
Figure PCTCN2019106778-appb-000005
进一步地,计算向量化表示的原始词序列隐含状态向量hT和意图上下文向量c I,通过将所述隐含状态向量hT和所述意图上下文向量c I加权处理后得到意图预测模型y I的方法包括:
利用双向LSTM网络中的隐含单元对向量化表示的原始词序列进行编码处理,得到隐含状态向量hT;
通过公式
Figure PCTCN2019106778-appb-000006
计算原始词序列的意图上下文向量c I;其中,
Figure PCTCN2019106778-appb-000007
表示意图的注意力权重,其计算公式为
Figure PCTCN2019106778-appb-000008
σ′表示意图激活函数,
Figure PCTCN2019106778-appb-000009
表示意图权重矩阵;
基于所述隐含状态向量hT和所述意图上下文向量c I构建意图预测模型
Figure PCTCN2019106778-appb-000010
优选地,使用槽位门g对所述槽位上下文向量c i S和意图上下文向量c I进行拼接处理,并通过槽位门g对槽位标签模型y i S进行转换表示的方法包括:
槽位门g的形式化表示为
Figure PCTCN2019106778-appb-000011
其中,v表示训练得到的权重向量,W表示训练得到的权重矩阵;
通过槽位门g对槽位标签模型y i S进行转换的形式化表示为
Figure PCTCN2019106778-appb-000012
可选地,对意图预测模型y I和转换后的槽位标签模型y i S联合优化构建的目标函数为:
Figure PCTCN2019106778-appb-000013
其中,p(y S,y I|X)表示槽位填充和意图 预测在给定原始词序列输出的条件概率,其中,X为向量化表示的原始词序列。
较佳地,基于所述目标函数对用户的语音问题进行意图识别的方法包括:
通过目标目标函数依次获取原始词序列中各分词对应的意图条件概率;
从中筛选出概率值最大的分词识别为用户语音问题的意图。
与现有技术相比,本发明提供的用于智能设备的人机交互语音识别方法具有以下有益效果:
本发明提供的用于智能设备的人机交互语音识别方法中,首先将获取到的用户语音问题转换成为识别文本,并基于可识别文本分词处理生成原始词序列,然后对原始词序列进行单词嵌入处理实现向量化形式表示,之后,基于向量化表示的原始词序列分别进行槽位标签模型y i S和意图预测模型y I的构建,其中,槽位标签模型y i S的构建步骤为通过计算各分词向量的隐含状态向量h i和槽位上下文向量c i S,然后将隐含状态向量h i和槽位上下文向量c i S加权处理后得到槽位标签模型y i S,意图预测模型y I的构建步骤为通过计算原始词序列的隐含状态向量hT和意图上下文向量c I,然后将隐含状态向量hT和意图上下文向量c I加权处理后得到意图预测模型y I,可见,为了融合意图预测模型y I和槽位标签模型y i S,我们在现有的encoder-decoder架构上额外增加了decoder层构建意图预测模型y I,并通过引入槽位门g对槽位上下文向量c i S和意图上下文向量c I拼接处理,最后对意图预测模型y I和转换后的槽位标签模型y i S进行联合优化得到目标函数,并利用目标函数依次获取原始词序列中各分词对应的意图条件概率,然后从中筛选出概率值最大的分词识别为用户语音问题的意图,保证了语音识别的准确性。
本发明的另一方面提供一种用于智能设备的人机交互语音识别系统,应用于上述技术方案所述的用于智能设备的人机交互语音识别方法中,所述系统包括:
分词处理单元,用于将用户的语音问题分词处理得到原始词序列,并通过嵌入处理对所述原始词序列进行向量化表示;
第一计算单元,用于计算各分词向量的隐含状态向量h i和槽位上下文向量c i S,通过将所述隐含状态向量h i和所述槽位上下文向量c i S加权处理后得到槽位标签模型y i S
第二计算单元,用于计算向量化表示的原始词序列隐含状态向量hT和意图上下文向量c I,通过将所述隐含状态向量hT和所述意图上下文向量c I加权处理后得到意图预测模型y I
模型转换单元,用于使用槽位门g对所述槽位上下文向量c i S和意图上下文向量c I进行拼接处理,并通过槽位门g对槽位标签模型y i S进行转换表示;
联合优化单元,用于对意图预测模型y I和转换后的槽位标签模型y i S联合优化构建目标函数,并基于所述目标函数对用户的语音问题进行意图识别。
优选地,所述分词处理单元包括:
分词模块,用于接收用户的语音问题转换为可识别文本,利用分词器对所述可识别文本分词处理得到原始词序列;
嵌入处理模块,用于将原始词序列进行word embedding处理,实现对原始词序列中各分词的向量化表示。
较佳地,所述第一计算单元包括:
隐含状态计算模块,用于利用双向LSTM网络对各分词向量编码处理,输出与各分词向量相对应的隐含状态向量h i
槽位上下文计算模块,用于通过公式
Figure PCTCN2019106778-appb-000014
计算各分词向量对应的槽位上下文向量c i S;其中,
Figure PCTCN2019106778-appb-000015
表示槽位的注意力权重,其计算公式为
Figure PCTCN2019106778-appb-000016
σ表示槽位激活函数,
Figure PCTCN2019106778-appb-000017
表示槽位权重矩阵;
槽位标签模型模块,用于基于所述隐含状态向量h i和所述槽位上下文向量c i S构建槽位标签模型
Figure PCTCN2019106778-appb-000018
与现有技术相比,本发明提供的用于智能设备的人机交互语音识别系统的有益效果与上述技术方案提供的用于智能设备的人机交互语音识别方法的 有益效果相同,在此不做赘述。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本发明的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1为本发明实施例一中用于智能设备的人机交互语音识别方法的流程示意图;
图2为本发明实施例一中encoder-decoder融合模型示例图;
图3为图2中槽位门g的示例图;
图4为本发明实施例二中用于智能设备的人机交互语音识别系统的结构框图。
附图标记:
1-分词处理单元,                     2-第一计算单元;
3-第二计算单元,                     4-模型转换单元;
5-联合优化单元。
具体实施方式
为使本发明的上述目的、特征和优点能够更加明显易懂,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其它实施例,均属于本发明保护的范围。
实施例一
图1为本发明实施例一中用于智能设备的人机交互语音识别方法流程示意图。请参阅图1,本实施例提供一种用于智能设备的人机交互语音识别方法,包括:
将用户的语音问题分词处理得到原始词序列,并通过嵌入处理对原始词序列进行向量化表示;计算各分词向量的隐含状态向量h i和槽位上下文向量c i S,通过将隐含状态向量h i和槽位上下文向量c i S加权处理后得到槽位标签模型y i S;计算向量化表示的原始词序列隐含状态向量hT和意图上下文向量c I,通过将隐含状态向量hT和意图上下文向量c I加权处理后得到意图预测模型y I;使用槽位门g对槽位上下文向量c i S和意图上下文向量c I进行拼接处理,并通过槽位门g对槽位标签模型y i S进行转换表示;对意图预测模型y I和转换后的槽位标签模型y i S联合优化构建目标函数,并基于所述目标函数对用户的语音问题进行意图识别。
本实施例提供的用于智能设备的人机交互语音识别方法中,首先将获取到的用户语音问题转换成为识别文本,并基于可识别文本分词处理生成原始词序列,然后对原始词序列进行单词嵌入处理实现向量化形式表示,之后,基于向量化表示的原始词序列分别进行槽位标签模型y i S和意图预测模型y I的构建,其中,槽位标签模型y i S的构建步骤为通过计算各分词向量的隐含状态向量h i和槽位上下文向量c i S,然后将隐含状态向量h i和槽位上下文向量c i S加权处理后得到槽位标签模型y i S,意图预测模型y I的构建步骤为通过计算原始词序列的隐含状态向量hT和意图上下文向量c I,然后将隐含状态向量hT和意图上下文向量c I加权处理后得到意图预测模型y I,如图2所述,为了融合意图预测模型y I和槽位标签模型y i S,我们在encoder-decoder架构上额外增加了decoder层构建意图预测模型y I,并通过引入槽位门g对槽位上下文向量c i S和意图上下文向量c I拼接处理,最后对意图预测模型y I和转换后的槽位标签模型y i S进行联合优化得到目标函数,并利用目标函数依次获取原始词序列中各分词对应的意图条件概率,然后从中筛选出概率值最大的分词识别为用户语音问题的意图,保证了语音识别的准确性。
具体地,上述实施例中将用户的语音问题分词处理得到原始词序列,并通过嵌入处理对所述原始词序列进行向量化表示的方法包括:
接收用户的语音问题转换为可识别文本,利用分词器对可识别文本分词 处理得到原始词序列;将原始词序列进行word embedding处理,实现对原始词序列中各分词的向量化表示。
需要说明的是,上述实施例中计算各分词向量的隐含状态向量h i和槽位上下文向量c i S,通过将所述隐含状态向量h i和所述槽位上下文向量c i S加权处理后得到槽位标签模型y i S的方法包括:
利用双向LSTM网络对各分词向量编码处理,输出与各分词向量相对应的隐含状态向量h i;通过公式
Figure PCTCN2019106778-appb-000019
计算各分词向量对应的槽位上下文向量c i S;其中,
Figure PCTCN2019106778-appb-000020
表示槽位的注意力权重,其计算公式为
Figure PCTCN2019106778-appb-000021
Figure PCTCN2019106778-appb-000022
σ表示槽位激活函数,
Figure PCTCN2019106778-appb-000023
表示槽位权重矩阵;基于隐含状态向量h i和槽位上下文向量c i S构建槽位标签模型
Figure PCTCN2019106778-appb-000024
具体实施时,将多个分词向量输入双向LSTM网络后可一一对应的输出隐含状态向量h i,对于槽位上下文向量公式
Figure PCTCN2019106778-appb-000025
其中
Figure PCTCN2019106778-appb-000026
代表槽位的注意力权重,i表示第i个分词向量,j表示第i个分词向量中的第j个元素,具体地,槽位的注意力权重的计算公式为
Figure PCTCN2019106778-appb-000027
T表示分词向量中元素的总数量,K表示T中的第K个元素。另外,对于槽位激活函数σ和槽位权重矩阵
Figure PCTCN2019106778-appb-000028
可基于原始词序列的向量矩阵训练导出,其具体训练过程为本领域常用技术手段,本实施例在此不做赘述。
上述实施例中计算向量化表示的原始词序列隐含状态向量hT和意图上下文向量c I,通过将隐含状态向量hT和意图上下文向量c I加权处理后得到意图预测模型y I的方法包括:
利用双向LSTM网络中的隐含单元对向量化表示的原始词序列进行编码处理,得到隐含状态向量hT;通过公式
Figure PCTCN2019106778-appb-000029
计算原始词序列的意图上下文向量c I;其中,
Figure PCTCN2019106778-appb-000030
表示意图的注意力权重,其计算公式为
Figure PCTCN2019106778-appb-000031
Figure PCTCN2019106778-appb-000032
σ′表示意图激活函数,
Figure PCTCN2019106778-appb-000033
表示意图权重矩阵;基于隐含状态向量hT和意图上下文向量c I构建意图预测模型
Figure PCTCN2019106778-appb-000034
具体实施过程中,意图预测模型y I的训练方法与槽位标签模型
Figure PCTCN2019106778-appb-000035
的训练方法相同,区别在于隐含状态向量hT只需利用双向LSTM网络中的隐含单元即可得到,通过将向量矩阵一维化处理,然后调用公式
Figure PCTCN2019106778-appb-000036
计算原始词序列的意图上下文向量c I;其中,
Figure PCTCN2019106778-appb-000037
表示意图的注意力权重,其计算公式为
Figure PCTCN2019106778-appb-000038
σ′表示意图激活函数,
Figure PCTCN2019106778-appb-000039
表示意图权重矩阵,对于意图激活函数σ′和意图权重矩阵
Figure PCTCN2019106778-appb-000040
可基于处理后的一维化向量训练导出,其具体训练过程为本领域常用技术手段,本实施例在此不做赘述。
进一步地,上述实施例中使用槽位门g对槽位上下文向量c i S和意图上下文向量c I进行拼接处理,并通过槽位门g对槽位标签模型y i S进行转换表示的方法包括:
槽位门g的形式化表示为
Figure PCTCN2019106778-appb-000041
其中,v表示训练得到的权重向量,W表示训练得到的权重矩阵;通过槽位门g对槽位标签模型y i S进行转换的形式化表示为
Figure PCTCN2019106778-appb-000042
图3示出了槽位门g的结构模型。
优选地,上述实施例中对意图预测模型y I和转换后的槽位标签模型y i S联合优化构建的目标函数为:
Figure PCTCN2019106778-appb-000043
其中,p(y S,y I|X)表示槽位填充和意图预测在给定原始词序列输出的条件概率,其中,X表示向量化表示的原始词序列。拓展后
Figure PCTCN2019106778-appb-000044
x i表示第i个分词向量,T表示分词向量的总数量。通过目标函数的计算可得出各分词向量的意图概率值,筛选各分词向量中概率值最大的分词识别为用户语音 问题的意图。
实施例二
请参阅图1和图4,本实施例提供一种用于智能设备的人机交互语音识别系统,包括:
分词处理单元1,用于将用户的语音问题分词处理得到原始词序列,并通过嵌入处理对所述原始词序列进行向量化表示;
第一计算单元2,用于计算各分词向量的隐含状态向量h i和槽位上下文向量c i S,通过将所述隐含状态向量h i和所述槽位上下文向量c i S加权处理后得到槽位标签模型y i S
第二计算单元3,用于计算向量化表示的原始词序列隐含状态向量hT和意图上下文向量c I,通过将所述隐含状态向量hT和所述意图上下文向量c I加权处理后得到意图预测模型y I
模型转换单元4,用于使用槽位门g对所述槽位上下文向量c i S和意图上下文向量c I进行拼接处理,并通过槽位门g对槽位标签模型y i S进行转换表示;
联合优化单元5,用于对意图预测模型y I和转换后的槽位标签模型y i S联合优化构建目标函数,并基于所述目标函数对用户的语音问题进行意图识别。具体地,所述分词处理单元包括:
分词模块,用于接收用户的语音问题转换为可识别文本,利用分词器对所述可识别文本分词处理得到原始词序列;
嵌入处理模块,用于将原始词序列进行word embedding处理,实现对原始词序列中各分词的向量化表示。
具体地,所述第一计算单元包括:
隐含状态计算模块,用于利用双向LSTM网络对各分词向量编码处理,输出与各分词向量相对应的隐含状态向量h i
槽位上下文计算模块,用于通过公式
Figure PCTCN2019106778-appb-000045
计算各分词向量对应的槽 位上下文向量c i S;其中,
Figure PCTCN2019106778-appb-000046
表示槽位的注意力权重,其计算公式为
Figure PCTCN2019106778-appb-000047
σ表示槽位激活函数,
Figure PCTCN2019106778-appb-000048
表示槽位权重矩阵;
槽位标签模型模块,用于基于所述隐含状态向量h i和所述槽位上下文向量c i S构建槽位标签模型
Figure PCTCN2019106778-appb-000049
与现有技术相比,本发明实施例提供的用于智能设备的人机交互语音识别系统的有益效果与上述实施例一提供的用于智能设备的人机交互语音识别方法的有益效果相同,在此不做赘述。
本领域普通技术人员可以理解,实现上述发明方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,上述程序可以存储于计算机可读取存储介质中,该程序在执行时,包括上述实施例方法的各步骤,而所述的存储介质可以是:ROM/RAM、磁碟、光盘、存储卡等。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (10)

  1. 一种用于智能设备的人机交互语音识别方法,其特征在于,包括:
    将用户的语音问题分词处理得到原始词序列,并通过嵌入处理对所述原始词序列进行向量化表示;
    计算各分词向量的隐含状态向量h i和槽位上下文向量
    Figure PCTCN2019106778-appb-100001
    通过将所述隐含状态向量h i和所述槽位上下文向量
    Figure PCTCN2019106778-appb-100002
    加权处理后得到槽位标签模型
    Figure PCTCN2019106778-appb-100003
    计算向量化表示的原始词序列隐含状态向量hT和意图上下文向量c I,通过将所述隐含状态向量hT和所述意图上下文向量c I加权处理后得到意图预测模型y I
    使用槽位门g对所述槽位上下文向量
    Figure PCTCN2019106778-appb-100004
    和意图上下文向量c I进行拼接处理,并通过槽位门g对槽位标签模型
    Figure PCTCN2019106778-appb-100005
    进行转换表示;
    对意图预测模型y I和转换后的槽位标签模型
    Figure PCTCN2019106778-appb-100006
    联合优化构建目标函数,并基于所述目标函数对用户的语音问题进行意图识别。
  2. 根据权利要求1所述的方法,其特征在于,将用户的语音问题分词处理得到原始词序列,并通过嵌入处理对所述原始词序列进行向量化表示的方法包括:
    接收用户的语音问题转换为可识别文本,利用分词器对所述可识别文本分词处理得到原始词序列;
    将原始词序列进行word embedding处理,实现对原始词序列中各分词的向量化表示。
  3. 根据权利要求1所述的方法,其特征在于,计算各分词向量的隐含状态向量h i和槽位上下文向量
    Figure PCTCN2019106778-appb-100007
    通过将所述隐含状态向量h i和所述槽位上下文向量
    Figure PCTCN2019106778-appb-100008
    加权处理后得到槽位标签模型
    Figure PCTCN2019106778-appb-100009
    的方法包括:
    利用双向LSTM网络对各分词向量编码处理,输出与各分词向量相对应的隐含状态向量h i
    通过公式
    Figure PCTCN2019106778-appb-100010
    计算各分词向量对应的槽位上下文向量
    Figure PCTCN2019106778-appb-100011
    其中,
    Figure PCTCN2019106778-appb-100012
    表示槽位的注意力权重,其计算公式为
    Figure PCTCN2019106778-appb-100013
    σ表示槽位激活函数,
    Figure PCTCN2019106778-appb-100014
    表示槽位权重矩阵;
    基于所述隐含状态向量h i和所述槽位上下文向量
    Figure PCTCN2019106778-appb-100015
    构建槽位标签模型
    Figure PCTCN2019106778-appb-100016
  4. 根据权利要求1所述的方法,其特征在于,计算向量化表示的原始词序列隐含状态向量hT和意图上下文向量c I,通过将所述隐含状态向量hT和所述意图上下文向量c I加权处理后得到意图预测模型y I的方法包括:
    利用双向LSTM网络中的隐含单元对向量化表示的原始词序列进行编码处理,得到隐含状态向量hT;
    通过公式
    Figure PCTCN2019106778-appb-100017
    计算原始词序列的意图上下文向量c I;其中,
    Figure PCTCN2019106778-appb-100018
    表示意图的注意力权重,其计算公式为
    Figure PCTCN2019106778-appb-100019
    σ′表示意图激活函数,
    Figure PCTCN2019106778-appb-100020
    表示意图权重矩阵;
    基于所述隐含状态向量hT和所述意图上下文向量c I构建意图预测模型
    Figure PCTCN2019106778-appb-100021
  5. 根据权利要求1所述的方法,其特征在于,使用槽位门g对所述槽位上下文向量
    Figure PCTCN2019106778-appb-100022
    和意图上下文向量c I进行拼接处理,并通过槽位门g对槽位标签模型
    Figure PCTCN2019106778-appb-100023
    进行转换表示的方法包括:
    槽位门g的形式化表示为
    Figure PCTCN2019106778-appb-100024
    其中,v表示训练得到的权重向量,W表示训练得到的权重矩阵;
    通过槽位门g对槽位标签模型
    Figure PCTCN2019106778-appb-100025
    进行转换的形式化表示为
    Figure PCTCN2019106778-appb-100026
  6. 根据权利要求1所述的方法,其特征在于,对意图预测模型y I和转换后的槽位标签模型
    Figure PCTCN2019106778-appb-100027
    联合优化构建的目标函数为:
    Figure PCTCN2019106778-appb-100028
    其中,p(y S,y I|X)表示槽位填充和意图预测在给定原始词序列输出的条件概率,其中,X为向量化表示的原始词序列。
  7. 根据权利要求6所述的方法,其特征在于,基于所述目标函数对用户的语音问题进行意图识别的方法包括:
    通过目标目标函数依次获取原始词序列中各分词对应的意图条件概率;
    从中筛选出概率值最大的分词识别为用户语音问题的意图。
  8. 一种用于智能设备的人机交互语音识别系统,其特征在于,包括:
    分词处理单元,用于将用户的语音问题分词处理得到原始词序列,并通过嵌入处理对所述原始词序列进行向量化表示;
    第一计算单元,用于计算各分词向量的隐含状态向量h i和槽位上下文向量
    Figure PCTCN2019106778-appb-100029
    通过将所述隐含状态向量h i和所述槽位上下文向量
    Figure PCTCN2019106778-appb-100030
    加权处理后得到槽位标签模型
    Figure PCTCN2019106778-appb-100031
    第二计算单元,用于计算向量化表示的原始词序列隐含状态向量hT和意图上下文向量c I,通过将所述隐含状态向量hT和所述意图上下文向量c I加权处理后得到意图预测模型y I
    模型转换单元,用于使用槽位门g对所述槽位上下文向量
    Figure PCTCN2019106778-appb-100032
    和意图上下文向量c I进行拼接处理,并通过槽位门g对槽位标签模型
    Figure PCTCN2019106778-appb-100033
    进行转换表示;
    联合优化单元,用于对意图预测模型y I和转换后的槽位标签模型
    Figure PCTCN2019106778-appb-100034
    联合优化构建目标函数,并基于所述目标函数对用户的语音问题进行意图识别。
  9. 根据权利要求8所述的系统,其特征在于,所述分词处理单元包括:
    分词模块,用于接收用户的语音问题转换为可识别文本,利用分词器对所述可识别文本分词处理得到原始词序列;
    嵌入处理模块,用于将原始词序列进行word embedding处理,实现对原始词序列中各分词的向量化表示。
  10. 根据权利要求8所述的系统,其特征在于,所述第一计算单元包括:
    隐含状态计算模块,用于利用双向LSTM网络对各分词向量编码处理,输出与各分词向量相对应的隐含状态向量h i
    槽位上下文计算模块,用于通过公式
    Figure PCTCN2019106778-appb-100035
    计算各分词向量对应的槽位上下文向量
    Figure PCTCN2019106778-appb-100036
    其中,
    Figure PCTCN2019106778-appb-100037
    表示槽位的注意力权重,其计算公式为
    Figure PCTCN2019106778-appb-100038
    σ表示槽位激活函数,
    Figure PCTCN2019106778-appb-100039
    表示槽位权重矩阵;
    槽位标签模型模块,用于基于所述隐含状态向量h i和所述槽位上下文向量
    Figure PCTCN2019106778-appb-100040
    构建槽位标签模型
    Figure PCTCN2019106778-appb-100041
PCT/CN2019/106778 2019-01-02 2019-09-19 用于智能设备的人机交互语音识别方法及系统 WO2020140487A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA3166784A CA3166784A1 (en) 2019-01-02 2019-09-19 Human-machine interactive speech recognizing method and system for intelligent devices

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910002748.8A CN109785833A (zh) 2019-01-02 2019-01-02 用于智能设备的人机交互语音识别方法及系统
CN201910002748.8 2019-01-02

Publications (1)

Publication Number Publication Date
WO2020140487A1 true WO2020140487A1 (zh) 2020-07-09

Family

ID=66499837

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/106778 WO2020140487A1 (zh) 2019-01-02 2019-09-19 用于智能设备的人机交互语音识别方法及系统

Country Status (3)

Country Link
CN (1) CN109785833A (zh)
CA (1) CA3166784A1 (zh)
WO (1) WO2020140487A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765959A (zh) * 2020-12-31 2021-05-07 康佳集团股份有限公司 意图识别方法、装置、设备及计算机可读存储介质
CN117151121A (zh) * 2023-10-26 2023-12-01 安徽农业大学 一种基于波动阈值与分割化的多意图口语理解方法

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109785833A (zh) * 2019-01-02 2019-05-21 苏宁易购集团股份有限公司 用于智能设备的人机交互语音识别方法及系统
CN110532355B (zh) * 2019-08-27 2022-07-01 华侨大学 一种基于多任务学习的意图与槽位联合识别方法
CN110750628A (zh) * 2019-09-09 2020-02-04 深圳壹账通智能科技有限公司 会话信息交互处理方法、装置、计算机设备和存储介质
CN110795532A (zh) * 2019-10-18 2020-02-14 珠海格力电器股份有限公司 一种语音信息的处理方法、装置、智能终端以及存储介质
CN110853626B (zh) * 2019-10-21 2021-04-20 成都信息工程大学 基于双向注意力神经网络的对话理解方法、装置及设备
CN110827816A (zh) * 2019-11-08 2020-02-21 杭州依图医疗技术有限公司 语音指令识别方法、装置、电子设备及存储介质
CN111090728B (zh) * 2019-12-13 2023-05-26 车智互联(北京)科技有限公司 一种对话状态跟踪方法、装置及计算设备
CN111062209A (zh) * 2019-12-16 2020-04-24 苏州思必驰信息科技有限公司 自然语言处理模型训练方法和自然语言处理模型
CN111177381A (zh) * 2019-12-21 2020-05-19 深圳市傲立科技有限公司 基于语境向量反馈的槽填充和意图检测联合建模方法
US20230040394A1 (en) * 2020-01-06 2023-02-09 7Hugs Labs System and method for controlling a plurality of devices
CN111339770B (zh) * 2020-02-18 2023-07-21 百度在线网络技术(北京)有限公司 用于输出信息的方法和装置
CN111833849A (zh) * 2020-03-10 2020-10-27 北京嘀嘀无限科技发展有限公司 语音识别和语音模型训练的方法及存储介质和电子设备
CN113505591A (zh) * 2020-03-23 2021-10-15 华为技术有限公司 一种槽位识别方法及电子设备
CN111597342B (zh) * 2020-05-22 2024-01-26 北京慧闻科技(集团)有限公司 一种多任务意图分类方法、装置、设备及存储介质
CN113779975B (zh) * 2020-06-10 2024-03-01 北京猎户星空科技有限公司 一种语义识别方法、装置、设备及介质
CN112069828B (zh) * 2020-07-31 2023-07-04 飞诺门阵(北京)科技有限公司 一种文本意图的识别方法及装置
CN112800190B (zh) * 2020-11-11 2022-06-10 重庆邮电大学 基于Bert模型的意图识别与槽值填充联合预测方法
CN114969339B (zh) * 2022-05-30 2023-05-12 中电金信软件有限公司 一种文本匹配方法、装置、电子设备及可读存储介质
CN115358186B (zh) * 2022-08-31 2023-11-14 南京擎盾信息科技有限公司 一种槽位标签的生成方法、装置及存储介质
CN115273849B (zh) * 2022-09-27 2022-12-27 北京宝兰德软件股份有限公司 一种关于音频数据的意图识别方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180182380A1 (en) * 2016-12-28 2018-06-28 Amazon Technologies, Inc. Audio message extraction
CN108415923A (zh) * 2017-10-18 2018-08-17 北京邮电大学 封闭域的智能人机对话系统
CN108876527A (zh) * 2018-06-06 2018-11-23 北京京东尚科信息技术有限公司 服务方法和服务装置、应用开放平台和存储介质
CN109065053A (zh) * 2018-08-20 2018-12-21 百度在线网络技术(北京)有限公司 用于处理信息的方法和装置
CN109785833A (zh) * 2019-01-02 2019-05-21 苏宁易购集团股份有限公司 用于智能设备的人机交互语音识别方法及系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491541B (zh) * 2017-08-24 2021-03-02 北京丁牛科技有限公司 文本分类方法及装置
CN108417205B (zh) * 2018-01-19 2020-12-18 苏州思必驰信息科技有限公司 语义理解训练方法和系统
CN108874782B (zh) * 2018-06-29 2019-04-26 北京寻领科技有限公司 一种层次注意力lstm和知识图谱的多轮对话管理方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180182380A1 (en) * 2016-12-28 2018-06-28 Amazon Technologies, Inc. Audio message extraction
CN108415923A (zh) * 2017-10-18 2018-08-17 北京邮电大学 封闭域的智能人机对话系统
CN108876527A (zh) * 2018-06-06 2018-11-23 北京京东尚科信息技术有限公司 服务方法和服务装置、应用开放平台和存储介质
CN109065053A (zh) * 2018-08-20 2018-12-21 百度在线网络技术(北京)有限公司 用于处理信息的方法和装置
CN109785833A (zh) * 2019-01-02 2019-05-21 苏宁易购集团股份有限公司 用于智能设备的人机交互语音识别方法及系统

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765959A (zh) * 2020-12-31 2021-05-07 康佳集团股份有限公司 意图识别方法、装置、设备及计算机可读存储介质
CN112765959B (zh) * 2020-12-31 2024-05-28 康佳集团股份有限公司 意图识别方法、装置、设备及计算机可读存储介质
CN117151121A (zh) * 2023-10-26 2023-12-01 安徽农业大学 一种基于波动阈值与分割化的多意图口语理解方法
CN117151121B (zh) * 2023-10-26 2024-01-12 安徽农业大学 一种基于波动阈值与分割化的多意图口语理解方法

Also Published As

Publication number Publication date
CN109785833A (zh) 2019-05-21
CA3166784A1 (en) 2020-07-09

Similar Documents

Publication Publication Date Title
WO2020140487A1 (zh) 用于智能设备的人机交互语音识别方法及系统
CN108733792B (zh) 一种实体关系抽取方法
CN109033068B (zh) 基于注意力机制的用于阅读理解的方法、装置和电子设备
CN113268609B (zh) 基于知识图谱的对话内容推荐方法、装置、设备及介质
WO2021190259A1 (zh) 一种槽位识别方法及电子设备
CN113239169B (zh) 基于人工智能的回答生成方法、装置、设备及存储介质
CN110990555B (zh) 端到端检索式对话方法与系统及计算机设备
CN110678882B (zh) 使用机器学习从电子文档选择回答跨距的方法及系统
CN114676234A (zh) 一种模型训练方法及相关设备
CN111625634A (zh) 词槽识别方法及装置、计算机可读存储介质、电子设备
CN110399454B (zh) 一种基于变压器模型和多参照系的文本编码表示方法
CN109933792A (zh) 基于多层双向lstm和验证模型的观点型问题阅读理解方法
CN111814489A (zh) 口语语义理解方法及系统
CN115203409A (zh) 一种基于门控融合和多任务学习的视频情感分类方法
CN116304748A (zh) 一种文本相似度计算方法、系统、设备及介质
CN113705315A (zh) 视频处理方法、装置、设备及存储介质
CN116341651A (zh) 实体识别模型训练方法、装置、电子设备及存储介质
CN111597816A (zh) 一种自注意力命名实体识别方法、装置、设备及存储介质
CN116955644A (zh) 基于知识图谱的知识融合方法、系统及存储介质
US20240037335A1 (en) Methods, systems, and media for bi-modal generation of natural languages and neural architectures
CN115659242A (zh) 一种基于模态增强卷积图的多模态情感分类方法
CN116258147A (zh) 一种基于异构图卷积的多模态评论情感分析方法及系统
CN115240712A (zh) 一种基于多模态的情感分类方法、装置、设备及存储介质
CN115130461A (zh) 一种文本匹配方法、装置、电子设备及存储介质
CN113822018A (zh) 实体关系联合抽取方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19908004

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19908004

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19908004

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3166784

Country of ref document: CA

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07.02.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19908004

Country of ref document: EP

Kind code of ref document: A1