WO2020155766A1 - Method, device and apparatus for identification rejection in intention identification, and storage medium - Google Patents

Method, device and apparatus for identification rejection in intention identification, and storage medium Download PDF

Info

Publication number
WO2020155766A1
WO2020155766A1 PCT/CN2019/118278 CN2019118278W WO2020155766A1 WO 2020155766 A1 WO2020155766 A1 WO 2020155766A1 CN 2019118278 W CN2019118278 W CN 2019118278W WO 2020155766 A1 WO2020155766 A1 WO 2020155766A1
Authority
WO
WIPO (PCT)
Prior art keywords
input information
information
model
input
text
Prior art date
Application number
PCT/CN2019/118278
Other languages
French (fr)
Chinese (zh)
Inventor
许开河
杨坤
王少军
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020155766A1 publication Critical patent/WO2020155766A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

A method, device and apparatus for identification rejection in intention identification and a storage medium, relating to the technical field of artificial intelligence. The method comprises: obtaining input information to be identified (S1); inputting the input information into an intention identification model comprising a text classification model and a text similarity model, and obtaining by means of the intention identification model a classification category and a confidence score corresponding to the input information (S2); and determining whether the confidence score exceeds a preset threshold (S3); if yes, obtaining from a knowledge base knowledge point information corresponding to the classification category, and if not, rejecting the identification of the input information. The conditional probability obtained by means of the text classification model is corrected to obtain a confidence score, and the confidence score is used as the determination basis for rejecting the identification of the input information, thereby improving the accuracy of intention identification.

Description

意图识别中的拒识方法、装置、设备及存储介质Rejection method, device, equipment and storage medium in intention recognition
本申请要求于2019年01月31日提交的中国专利申请号201910100204.5的优先权益,上述案件全部内容以引用的方式并入本文中。This application claims the priority rights of Chinese Patent Application No. 201910100204.5 filed on January 31, 2019, and the entire contents of the above cases are incorporated herein by reference.
技术领域Technical field
本申请涉及人工智能技术领域,尤其涉及一种意图识别中的拒识方法、装置、设备及存储介质。This application relates to the field of artificial intelligence technology, and in particular to a method, device, device, and storage medium for rejection in intention recognition.
背景技术Background technique
意图识别,即识别一种行为的意图,是问答机器人最重要的组成部分。意图识别往往由两个重要方向组成,基于检索的意图识别:类似于搜索引擎,机器人检索自己的知识库并返回最能回答用户问题的答案。基于文本分类的意图识别算法:使用知识库的知识点训练文本分类模型并使用文本分类模型对用户的问题进行分类得到知识点并返回知识点相应的答案。基于深度网络的文本分类模型往往会比检索模型问答准确率高,但是文本分类模型无法正确识别知识库之外的问题,对于用户的每一个问题分类模型都会强行给予一个分类。现有的文本分类模型最后的输出层往往都使用softmax对样本属于每个分类的概率进行打分。首先计算样本属于每一个分类的得分,再用这个得分除以总分得到属于该类的概率。这样得到的概率其实是一个条件概率:在样本属于知识库的条件下,它属于某个类的概率;当样本不属于知识库时,这个概率完全随机。因为样本可能跟知识库中的每一个知识点都不像,属于每个知识点的打分都很低,softmax相当于把这些很小的数归一化到0-1之间。因此完全可能某个类别被放大,输出一个比较大的概率,导致文本分类模型对问题的分类准确度较低,使得意图识别的准确率较低。Intention recognition, that is, recognizing the intention of a behavior, is the most important part of a question answering robot. Intent recognition is often composed of two important directions. Intent recognition based on retrieval: Similar to a search engine, a robot retrieves its own knowledge base and returns the answer that best answers the user’s question. Intent recognition algorithm based on text classification: use knowledge points in the knowledge base to train a text classification model and use the text classification model to classify user questions to obtain knowledge points and return corresponding answers to the knowledge points. The text classification model based on the deep network tends to have higher accuracy than the retrieval model's question and answer, but the text classification model cannot correctly identify problems outside the knowledge base, and the user's classification model will force a classification for each problem. The final output layer of the existing text classification model often uses softmax to score the probability that the sample belongs to each category. First calculate the score that the sample belongs to each category, and then divide this score by the total score to get the probability of belonging to that category. The probability obtained in this way is actually a conditional probability: when the sample belongs to the knowledge base, the probability that it belongs to a certain class; when the sample does not belong to the knowledge base, this probability is completely random. Because the sample may be different from every knowledge point in the knowledge base, and the score belonging to each knowledge point is very low, softmax is equivalent to normalizing these small numbers to between 0-1. Therefore, it is entirely possible that a certain category is enlarged, and a relatively large probability is output, resulting in a lower classification accuracy of the text classification model, and a lower accuracy of intent recognition.
发明内容Summary of the invention
本申请提供一种意图识别中的拒识方法、装置、设备及存储介质,以解决现有技术中意图识别的准确率较低的问题。This application provides a method, device, equipment, and storage medium for rejecting intent recognition in order to solve the problem of low accuracy of intent recognition in the prior art.
为了实现上述目的,本申请的第一个方面是提供一种意图识别中的拒识 方法,包括:In order to achieve the above objectives, the first aspect of this application is to provide a method of rejection in intention identification, including:
获取待识别的输入信息;将所述输入信息输入经过训练得到的意图识别模型,通过所述意图识别模型获取与所述输入信息对应的分类类别和置信度得分;判断所述置信度得分是否超过预设阈值,若超过预设阈值,则从知识库中获取所述分类类别对应的知识点信息,若未超过预设阈值,则拒绝识别所述输入信息;其中,所述意图识别模型包括文本分类模型和文本相似度模型,通过所述文本分类模型获取所述输入信息对应的分类类别和所述输入信息属于所述分类类别的条件概率,通过所述文本相似度模型和所述条件概率获取所述置信度得分。Obtain the input information to be recognized; input the input information into the trained intention recognition model, and obtain the classification category and the confidence score corresponding to the input information through the intention recognition model; determine whether the confidence score exceeds The preset threshold value, if it exceeds the preset threshold value, the knowledge point information corresponding to the classification category is obtained from the knowledge base, if the preset threshold value is not exceeded, the input information is rejected; wherein, the intention recognition model includes text A classification model and a text similarity model, the classification category corresponding to the input information and the conditional probability that the input information belongs to the classification category are obtained through the text classification model, and the text similarity model and the conditional probability are obtained The confidence score.
为了实现上述目的,本发明的第二个方面是提供一种意图识别中的拒识装置,包括:In order to achieve the above objective, the second aspect of the present invention is to provide a rejection device in intention recognition, including:
输入信息获取模块,用于获取待识别的输入信息;识别模块,用于将所述输入信息输入经过训练得到的意图识别模型进行识别,其中,所述意图识别模型包括文本分类模型和文本相似度模型,通过所述文本分类模型获取所述输入信息对应的分类类别和所述输入信息属于所述分类类别的条件概率,通过所述文本相似度模型和所述条件概率获取所述置信度得分;置信度获取模块,用于通过所述意图识别模型获取与所述输入信息对应的分类类别和置信度得分;判断模块,用于判断所述置信度得分是否超过预设阈值,若超过预设阈值,则从知识库中获取所述分类类别对应的知识点信息,若未超过预设阈值,则拒绝识别所述输入信息。The input information obtaining module is used to obtain the input information to be recognized; the recognition module is used to input the input information into a trained intent recognition model for recognition, wherein the intent recognition model includes a text classification model and text similarity A model, obtaining a classification category corresponding to the input information and a conditional probability of the input information belonging to the classification category through the text classification model, and obtaining the confidence score through the text similarity model and the conditional probability; Confidence acquisition module, used to obtain the classification category and confidence score corresponding to the input information through the intention recognition model; judgment module, used to determine whether the confidence score exceeds a preset threshold, and if it exceeds the preset threshold , The knowledge point information corresponding to the classification category is obtained from the knowledge base, and if it does not exceed the preset threshold, the input information is rejected.
为了实现上述目的,本申请的第三个方面是提供一种电子设备,该电子设备包括:处理器和存储器,所述存储器中包括意图识别中的拒识程序,所述拒识程序被所述处理器执行时实现如上所述的意图识别中的拒识方法。In order to achieve the above objective, the third aspect of the present application is to provide an electronic device, the electronic device includes a processor and a memory, the memory includes an intention recognition rejection program, the rejection program is The processor implements the rejection method in intention recognition as described above during execution.
为了实现上述目的,本申请的第四个方面是提供一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质中包括意图识别中的拒识程序,所述拒识程序被处理器执行时,实现如上所述的意图识别中的拒识方法。In order to achieve the above objective, the fourth aspect of the present application is to provide a computer non-volatile readable storage medium, the computer non-volatile readable storage medium includes an intention recognition rejection program, the rejection When the recognition program is executed by the processor, the recognition rejection method in the intention recognition as described above is realized.
相对于现有技术,本申请具有以下优点和有益效果:Compared with the prior art, this application has the following advantages and beneficial effects:
本申请的意图识别模型包括文本分类模型和文本相似度模型,通过对文本分类模型获取的条件概率进行修正,得到置信度得分,根据置信度得分判断是否对输入信息拒识,提高了意图识别的准确率。The intent recognition model of this application includes a text classification model and a text similarity model. The confidence score is obtained by modifying the conditional probability obtained by the text classification model, and the confidence score is used to determine whether to reject the input information, which improves the intention recognition Accuracy.
附图说明Description of the drawings
图1为本申请所述意图识别中的拒识方法的流程示意图;Fig. 1 is a schematic flow diagram of the rejection method in the intention recognition described in this application;
图2为本申请中意图识别模型与现有文本分类模型对知识库之内的问题识别结果对比图;Figure 2 is a comparison diagram of the problem identification results in the knowledge base between the intention recognition model and the existing text classification model in this application;
图3为本申请中意图识别模型与现有文本分类模型对知识库之外的问题识别结果对比图;Figure 3 is a comparison diagram of the recognition results of the intent recognition model in the application and the existing text classification model for problems outside the knowledge base;
图4为本申请中意图识别中的拒识装置的模块示意图。Fig. 4 is a schematic diagram of modules of the rejection device in the intention recognition in this application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the drawings.
具体实施方式detailed description
下面将参考附图来描述本申请所述的实施例。本领域的普通技术人员可以认识到,在不偏离本申请的精神和范围的情况下,可以用各种不同的方式或其组合对所描述的实施例进行修正。因此,附图和描述在本质上是说明性的,仅仅用以解释本申请,而不是用于限制权利要求的保护范围。此外,在本说明书中,附图未按比例画出,并且相同的附图标记表示相同的部分。The embodiments described in this application will be described below with reference to the drawings. Those of ordinary skill in the art may realize that, without departing from the spirit and scope of the present application, the described embodiments can be modified in various different ways or combinations thereof. Therefore, the drawings and description are illustrative in nature, and are only used to explain the application, rather than to limit the protection scope of the claims. In addition, in this specification, the drawings are not drawn to scale, and the same reference numerals denote the same parts.
本申请所述意图识别中的拒识方法应用于问答机器人中,对于用户的某一个问题,意图识别模型会输出一个分类结果和一个分数,其中,分类结果表示知识库中对应的知识点信息,分数表示置信度得分,对于置信度得分较低的情况,可以进行拒识,从而识别出输入问题属于知识库之外的情况。知识库由一个或多个知识点信息构成,每个知识点信息对应一个针对问题具体的解答方案,接收到用户问题之后,可以向用户反馈与该问题对应的知识点信息,或者对所述问题拒识。The rejection method in the intention recognition described in this application is applied to a question answering robot. For a certain question of the user, the intention recognition model will output a classification result and a score. The classification result represents the corresponding knowledge point information in the knowledge base. The score represents the confidence score. For the case with a low confidence score, it can be rejected, so as to identify that the input question is outside the knowledge base. The knowledge base is composed of one or more knowledge point information, and each knowledge point information corresponds to a specific solution plan for the problem. After receiving the user’s question, the knowledge point information corresponding to the question can be fed back to the user, or the question Rejected.
图1为本申请所述意图识别中的拒识方法的流程示意图,如图1所示,所述拒识方法包括:Fig. 1 is a schematic diagram of the process of the rejection method in the intention recognition described in this application. As shown in Fig. 1, the rejection method includes:
步骤S1、获取待识别的输入信息;Step S1: Obtain input information to be recognized;
步骤S2、将所述输入信息输入经过训练得到的意图识别模型,通过所述意图识别模型获取与所述输入信息对应的分类类别和置信度得分;Step S2: Input the input information into the trained intention recognition model, and obtain the classification category and the confidence score corresponding to the input information through the intention recognition model;
步骤S3、判断所述置信度得分是否超过预设阈值,若超过预设阈值,则 从知识库中获取所述分类类别对应的知识点信息,若未超过预设阈值,则拒绝识别所述输入信息;Step S3: Determine whether the confidence score exceeds a preset threshold, if it exceeds the preset threshold, obtain knowledge point information corresponding to the classification category from the knowledge base, and if it does not exceed the preset threshold, refuse to recognize the input information;
其中,所述意图识别模型包括文本分类模型和文本相似度模型,通过所述文本分类模型获取所述输入信息对应的分类类别和所述输入信息属于所述分类类别的条件概率,通过所述文本相似度模型和所述条件概率获取所述置信度得分。Wherein, the intent recognition model includes a text classification model and a text similarity model. The classification category corresponding to the input information and the conditional probability of the input information belonging to the classification category are obtained through the text classification model, and the text The similarity model and the conditional probability obtain the confidence score.
本申请利用意图识别模型中的文本相似度模型,对文本分类模型获取的条件概率进行修正,得到置信度得分,并以置信度得分作为判断依据,拒识输入信息,提高了意图识别的准确率。This application uses the text similarity model in the intention recognition model to modify the conditional probability obtained by the text classification model to obtain the confidence score, and use the confidence score as the basis for judgment to reject the input information, which improves the accuracy of the intention recognition .
本申请中,待识别的输入信息是经过处理之后,可以直接输入意图识别模型中的信息,进一步地,待识别的输入信息可以直接输入文本分类模型中获取分类类别,也可以直接输入相似度模型获取与知识点信息的相似度。优选地,获取待识别的输入信息的步骤包括:In this application, the input information to be recognized can be directly input into the intent recognition model after processing. Further, the input information to be recognized can be directly input into the text classification model to obtain the classification category, or directly into the similarity model Get the similarity with the knowledge point information. Preferably, the step of obtaining the input information to be recognized includes:
获取待识别的语音信息;将获取的语音信息转化为预设格式的文本信息;对所述文本信息进行处理得到待识别的输入信息。其中,获取待识别的语音信息可以是用户通过语音命令或聊天语音等。进一步地,对所述文本信息进行处理包括对文本信息进行去噪处理和分词处理等,通过去噪处理可以去除无意义的词组,且不会影响输入信息的真实意思,通过分词处理对文本信息进行分词,并可以进一步标注各词组的词性并识别命名实体。Acquire voice information to be recognized; convert the acquired voice information into text information in a preset format; process the text information to obtain input information to be recognized. Wherein, obtaining the voice information to be recognized may be a user's voice command or chat voice. Further, processing the text information includes denoising processing and word segmentation processing on the text information, etc., through denoising processing, meaningless phrases can be removed without affecting the true meaning of the input information, and the text information is processed through word segmentation. Perform word segmentation, and further mark the part of speech of each phrase and identify named entities.
本申请中,待识别的输入信息可以是句子或词组等,待识别的输入信息中包括用户想要咨询的问题表述,例如,问题表述为“我有网上银行怎么申请信用卡?”,对应的知识点信息为“信用卡申请”等。进一步地,所述输入信息中包括用户信息,用户信息包括但不限于用户年龄、性别、身份、职业、地域、家乡等信息,以便于通过用户信息对用户的输入信息进行偏好聚类,识别用户的倾向兴趣。In this application, the input information to be recognized can be sentences or phrases, etc. The input information to be recognized includes the expression of the question that the user wants to consult, for example, the question expression is "How do I apply for a credit card with online banking?", corresponding knowledge The point information is "credit card application" and so on. Further, the input information includes user information. The user information includes but is not limited to the user’s age, gender, identity, occupation, region, hometown and other information, so as to facilitate the preference clustering of the user’s input information through the user information and identify the user The tendency of interest.
本申请根据文本相似度模型的输出结果获取置信度得分,在本申请的一个可选实施例中,通过所述文本相似度模型和所述条件概率获取所述置信度得分的步骤包括:将所述输入信息和知识库中的知识点信息输入所述文本相似度模型中;通过所述文本相似度模型分别获取所述输入信息和所述知识库中各个知识点信息的相似度;从获取的多个相似度中选取最大相似度;将所 述最大相似度与所述条件概率相乘得到所述置信度得分。The present application obtains the confidence score according to the output result of the text similarity model. In an optional embodiment of the present application, the step of obtaining the confidence score through the text similarity model and the conditional probability includes: The input information and the knowledge point information in the knowledge base are input into the text similarity model; the similarity between the input information and the knowledge point information in the knowledge base is obtained through the text similarity model; The maximum similarity is selected among the multiple similarities; the maximum similarity is multiplied by the conditional probability to obtain the confidence score.
如下式所示:As shown in the following formula:
Figure PCTCN2019118278-appb-000001
Figure PCTCN2019118278-appb-000001
式中,x表示输入信息;C i表示知识库中第i类知识点信息;C表示知识库;Score(x∈C i)表示输入信息x属于知识库中第i类知识点信息的置信度得分;P(x∈C i,x∈C)表示输入信息x在知识库范围之内,并属于第i类知识点信息的概率;
Figure PCTCN2019118278-appb-000002
表示输入信息x不在知识库范围之内,并属于第i类知识点信息的概率,一般为0;P(x∈C i|x∈C)表示输入信息x在知识库范围之内,并属于第i类知识点信息的条件概率,通过文本分类模型输出,可对其进行贝叶斯公式展开联合概率计算;j表示知识库中知识点信息类别的索引;P(x∈C)表示输入信息属于知识库的概率;sim(x,C j)表示输入信息x与知识库中第j类知识点信息的相似度,若输入信息x与知识库中的任意一个知识点信息很相似,则认为该输入信息x属于知识库,因此取其中相似度的最大值计算置信度得分。
In the formula, x represents the input information; C i represents the i-th type of knowledge point information in the knowledge base; C represents the knowledge base; Score (x∈C i ) represents the confidence that the input information x belongs to the i-th type knowledge point in the knowledge base Score; P(x∈C i ,x∈C) represents the probability that the input information x is within the scope of the knowledge base and belongs to the i-th type of knowledge point information;
Figure PCTCN2019118278-appb-000002
Indicates the probability that the input information x is not within the scope of the knowledge base and belongs to the i-th type of knowledge point information, generally 0; P(x∈C i |x∈C) means that the input information x is within the scope of the knowledge base and belongs to The conditional probability of the i-th type of knowledge point information is output by the text classification model, which can be calculated by Bayesian formula expansion joint probability; j represents the index of the knowledge point information category in the knowledge base; P(x∈C) represents the input information The probability of belonging to the knowledge base; sim(x, C j ) represents the similarity between the input information x and the j-th type of knowledge point information in the knowledge base. If the input information x is very similar to any knowledge point information in the knowledge base, it is considered The input information x belongs to the knowledge base, so the maximum similarity is taken to calculate the confidence score.
本申请的一个实施例中,对置信度的预设阈值设定等级,例如,将置信度得分为0.9设置为一级阈值,将置信度得分为0.8设置为二级阈值,将置信度得分为0.6设置为三级阈值,将置信度得分为0.4设置为四级阈值;以置信度得分为依据获取意图识别结果时,根据置信度得分所属的阈值级别获取与输入信息对应的一个或多个知识点信息。具体地,将所述输入信息和知识库中的知识点信息输入所述文本相似度模型中,通过所述文本相似度模型分别获取所述输入信息和所述知识库中各个知识点信息的相似度,将获取的多个相似度按照从大到小的顺序排列,从中依次选取排序靠前的预设个数个相似度,并得到相对应的预设个数个置信度得分,根据实际需求,可以选择超过预设的某一级别阈值的置信度得分对应的知识点信息作为与所述分类类别对应的知识点信息反馈给用户,而若多个置信度得分中的最大值低于设定的最低级别阈值,则拒绝识别对应的输入信息;例如,通过意图识别模型得到的置信度得分分别为0.95、0.85和0.5,若选择一级阈值,则仅反馈置信度得分为0.95对应的知识点信息,若选择二级阈值,则可以反馈置信度得分为0.95和0.8对应的知识点信息供用户参考。若通过意图识别模型得到的置信度得分 分别为0.38、0.3和0.25,其中最大的置信度得分为0.38,低于设定的四级阈值,则拒绝识别对应的输入信息。In an embodiment of the present application, the preset threshold of confidence is set to a level, for example, a confidence score of 0.9 is set as a first-level threshold, a confidence score of 0.8 is set as a second-level threshold, and the confidence score is 0.6 is set as the three-level threshold, and the confidence score is set to 0.4 as the four-level threshold; when the intent recognition result is obtained based on the confidence score, one or more knowledge corresponding to the input information is obtained according to the threshold level to which the confidence score belongs Point information. Specifically, the input information and the knowledge point information in the knowledge base are input into the text similarity model, and the similarity between the input information and the knowledge point information in the knowledge base is obtained through the text similarity model. Degree, arrange the obtained similarities in descending order, select the top-ranked preset similarities, and obtain the corresponding preset confidence scores, according to actual needs , The knowledge point information corresponding to the confidence score exceeding a preset level threshold can be selected as the knowledge point information corresponding to the classification category and fed back to the user, and if the maximum value of the multiple confidence scores is lower than the set For example, the confidence scores obtained by the intent recognition model are 0.95, 0.85, and 0.5, respectively. If the first-level threshold is selected, only knowledge points with a confidence score of 0.95 will be fed back Information, if the secondary threshold is selected, the knowledge point information corresponding to the confidence scores of 0.95 and 0.8 can be fed back for the user's reference. If the confidence scores obtained by the intention recognition model are 0.38, 0.3, and 0.25, respectively, and the maximum confidence score is 0.38, which is lower than the set four-level threshold, the corresponding input information is rejected.
假设文本分类模型中的分类算法是可信任的,那么如果输入信息x属于知识库,则通过文本分类模型必然将输入信息分类成为与该输入信息最相似的知识点信息类别。优选地,通过所述文本相似度模型和所述条件概率获取所述置信度得分的步骤包括:将所述输入信息和知识库中与所述分类类别对应的知识点信息输入所述文本相似度模型中;通过所述文本相似度模型获取所述输入信息和所述分类类别对应的知识点信息的相似度;将所述条件概率与所述文本相似度模型获取的所述相似度相乘得到所述置信度得分。Assuming that the classification algorithm in the text classification model is trustworthy, if the input information x belongs to the knowledge base, the input information must be classified into the knowledge point information category most similar to the input information through the text classification model. Preferably, the step of obtaining the confidence score through the text similarity model and the conditional probability includes: inputting the input information and knowledge point information corresponding to the classification category in the knowledge base into the text similarity In the model; obtain the similarity between the input information and the knowledge point information corresponding to the classification category through the text similarity model; multiply the conditional probability and the similarity obtained by the text similarity model to obtain The confidence score.
如下式所示:As shown in the following formula:
Figure PCTCN2019118278-appb-000003
Figure PCTCN2019118278-appb-000003
式中,x表示输入信息,C i表示知识库中第i类知识点信息,C表示知识库,Score(x∈C i)表示输入信息x属于知识库中第i类知识点信息的置信度得分,P(x∈C i,x∈C)表示输入信息x在知识库范围之内,并属于第i类知识点信息的概率;
Figure PCTCN2019118278-appb-000004
表示输入信息x不在知识库范围之内,并属于第i类知识点信息的概率,一般为0;P(x∈C i|x∈C)表示输入信息x在知识库范围之内,并属于第i类知识点信息的条件概率,通过文本分类模型输出,P(x∈C)表示输入信息属于知识库的概率;sim(x,C i)表示输入信息x与第i类知识点信息的相似度。
In the formula, x represents the input information, C i represents the i-th type of knowledge point information in the knowledge base, C represents the knowledge base, and Score (x∈C i ) represents the confidence that the input information x belongs to the i-th type of knowledge point information in the knowledge base Score, P(x∈C i ,x∈C) represents the probability that the input information x is within the scope of the knowledge base and belongs to the i-th type of knowledge point information;
Figure PCTCN2019118278-appb-000004
Indicates the probability that the input information x is not within the scope of the knowledge base and belongs to the i-th type of knowledge point information, generally 0; P(x∈C i |x∈C) means that the input information x is within the scope of the knowledge base and belongs to The conditional probability of the i-th type of knowledge point information is output through the text classification model, P(x∈C) represents the probability that the input information belongs to the knowledge base; sim(x,C i ) represents the difference between the input information x and the i-th type of knowledge point information Similarity.
通过先使用文本分类模型得到一个分类结果,获取输入信息对应的分类类别,再使用这个分类结果计算文本相似度,得到置信度得分,大幅度地减少了文本相似度的匹配次数,提高运算效率,判断一个输入信息是否属于知识库不再需要遍历地计算该输入信息和知识库中每个知识点信息的相似度。By first using the text classification model to obtain a classification result, obtain the classification category corresponding to the input information, and then use this classification result to calculate the text similarity, and obtain the confidence score, which greatly reduces the number of text similarity matches and improves the calculation efficiency. To determine whether an input information belongs to the knowledge base, it is no longer necessary to traversely calculate the similarity between the input information and each knowledge point information in the knowledge base.
所述文本分类模型用于对输入信息(可以是句子或词组等)进行分类,输出分类类别和相应的得分。优选地,所述文本分类模型包括:输入层、嵌入层、卷积层、池化层、归一化层和输出层,将所述输入信息输入所述输入层,通过嵌入层将输入信息转化为词向量矩阵,通过卷积层进行卷积运算,通过池化层进行池化操作,通过归一化层将所述输入信息属于每一个分类的得分进行归一化处理,通过所述输出层输出所述输入信息对应的分类类别和 所述输入信息属于所述分类类别的得分。通过获取输入信息属于每一个分类的得分,再用这个得分除以总分得到输入信息属于该分类类别的概率,如下式所示:The text classification model is used to classify input information (which can be a sentence or a phrase, etc.), and output classification categories and corresponding scores. Preferably, the text classification model includes: an input layer, an embedding layer, a convolutional layer, a pooling layer, a normalization layer, and an output layer. The input information is input to the input layer, and the input information is converted through the embedding layer Is a word vector matrix, convolution operation is performed through the convolution layer, pooling operation is performed through the pooling layer, and the score of the input information belonging to each category is normalized through the normalization layer, and through the output layer The classification category corresponding to the input information and the score of the input information belonging to the classification category are output. By obtaining the score of the input information belonging to each category, and then dividing this score by the total score to obtain the probability that the input information belongs to the category category, as shown in the following formula:
Figure PCTCN2019118278-appb-000005
Figure PCTCN2019118278-appb-000005
式中,x为输入信息,C i为知识库中第i类知识点信息,s为得分,P(x∈C i)为输入信息x属于知识库中第i类知识点信息的概率,s(x∈C i)为输入信息x属于知识库中第i类知识点信息的得分,j为知识库中知识点信息类别的索引,n为知识库中知识点信息类别的总数量。 Where x is the input information, C i is the i-th type of knowledge point information in the knowledge base, s is the score, P(x∈C i ) is the probability that the input information x belongs to the i-th type of knowledge point information in the knowledge base, s (x∈C i ) is the score of the input information x belonging to the i-th type of knowledge point information in the knowledge base, j is the index of the knowledge point information category in the knowledge base, and n is the total number of knowledge point information categories in the knowledge base.
本申请中,文本分类模型可以使用cnn网络结构模型,也可以使用dnn网络结构模型等。In this application, the text classification model can use the cnn network structure model or the dnn network structure model.
本申请的一个实施例中,所述文本相似度模型采用基于孪生网络的网络模型,包括两个并行的相同神经网络,将输入信息和知识库中知识点信息各输入一个神经网络中,通过两个神经网络分别将所述输入信息转化为第一向量,将所述知识点信息转化为第二向量,通过计算第一向量和第二向量的相似度获取输入信息与知识点信息的相似度并输出。通过文本相似度模型可以分别获取输入信息与知识库中各个知识点信息的相似度,也可以仅获取输入信息与文本分类模型输出的分类类别对应的知识点信息的相似度。In an embodiment of the present application, the text similarity model adopts a network model based on a twin network, which includes two parallel identical neural networks. The input information and the knowledge point information in the knowledge base are each input into a neural network. A neural network transforms the input information into a first vector, and transforms the knowledge point information into a second vector, and obtains the similarity between the input information and the knowledge point information by calculating the similarity between the first vector and the second vector. Output. Through the text similarity model, the similarity between the input information and the knowledge point information in the knowledge base can be obtained separately, or only the similarity between the input information and the knowledge point information corresponding to the classification category output by the text classification model can be obtained.
进一步地,所述第一向量和所述第二向量的相似度通过下式计算得到:Further, the similarity between the first vector and the second vector is calculated by the following formula:
Figure PCTCN2019118278-appb-000006
Figure PCTCN2019118278-appb-000006
式中,Y 1为第一向量,Y 2为第二向量,sim(Y 1,Y 2)为第一向量和第二向量的相似度。 In the formula, Y 1 is the first vector, Y 2 is the second vector, and sim(Y 1 , Y 2 ) is the similarity between the first vector and the second vector.
通过计算第一向量和第二向量的相似度表征输入信息与知识点信息的相似度,确定输入信息所对应的知识点信息在知识库之内的可能性By calculating the similarity between the first vector and the second vector to characterize the similarity between the input information and the knowledge point information, determine the possibility of the knowledge point information corresponding to the input information in the knowledge base
所述文本相似度模型中的两个神经网络的参数相同。神经网络可以是RNN神经网络、CNN神经网络、LSTM神经网络等,本申请优选为双向LSTM神经网络。The parameters of the two neural networks in the text similarity model are the same. The neural network may be an RNN neural network, a CNN neural network, an LSTM neural network, etc. The application is preferably a bidirectional LSTM neural network.
采用知识库中的知识点信息作为训练样本训练文本相似度模型。每个训练样本包括两个知识点信息,并对所述训练样本进行标签标注,若训练样本的两个知识点信息的语义一致,则标注标签为1,若不一致,则标注标签为0。 根据两个知识点信息的相似性将训练样本划分为正样本和负样本,正样本表示两个知识点信息相似,相应的标签为1,负样本表示两个知识点信息不相似,相应的标签为0。例如,知识库中的多个知识点信息中,一个标准问匹配有多个扩展问,相匹配的标准问和扩展问是相似的,正样本包括一个标准问和与之相匹配的扩展问,负样本包括一个标准问和与之不匹配的扩展问或另一个标准问。通过对正样本和负样本的划分提高文本相似度模型的准确度。The knowledge point information in the knowledge base is used as the training sample to train the text similarity model. Each training sample includes two knowledge point information, and labels the training sample. If the semantics of the two knowledge point information of the training sample are the same, the label is 1; if they are inconsistent, the label is 0. According to the similarity of the two knowledge points, the training samples are divided into positive samples and negative samples. A positive sample indicates that the information of the two knowledge points is similar, and the corresponding label is 1, and a negative sample indicates that the information of the two knowledge points is not similar, and the corresponding label Is 0. For example, among the multiple knowledge point information in the knowledge base, a standard question is matched with multiple extended questions. The matched standard question and the extended question are similar. The positive sample includes a standard question and the matched extended question. Negative samples include a standard question and an extended question that does not match it or another standard question. The accuracy of the text similarity model is improved by dividing the positive sample and the negative sample.
本申请可以利用现有的训练方法训练孪生网络的参数,本申请对此并无限定。This application can use existing training methods to train the parameters of the twin network, which is not limited in this application.
图2为本申请中意图识别模型与现有文本分类模型对知识库之内的问题识别结果对比图,如图2所示,对于知识库之内的问题识别,通过本申请的意图识别模型对待识别的输入信息处理得到的该输入信息属于某个知识点信息的得分分布与通过现有文本分类模型得到的得分分布相差不大。图3为本申请中意图识别模型与现有文本分类模型对知识库之外的问题识别结果对比图,如图3所示,对于知识库之外的问题识别,通过现有文本分类模型得到的得分普遍偏高,而通过本申请的意图识别模型得到的得分分布普遍偏低,以便于根据得分与预设阈值的比较进行拒识,从而提高意图识别的正确率。图2和图3中的横坐标均表示输入信息属于某个分类类别的得分,纵坐标表示输入模型的样本个数,图中的现有模型指的是现有意图识别中使用的文本分类模型。Figure 2 is a comparison diagram of the problem identification results in the knowledge base between the intent recognition model in the application and the existing text classification model. As shown in Figure 2, the problem recognition in the knowledge base is treated by the intention recognition model in this application The score distribution of the input information obtained by the recognized input information processing that belongs to a certain knowledge point information is not much different from the score distribution obtained by the existing text classification model. Figure 3 is a comparison diagram of the intent recognition model in this application and the existing text classification model for the recognition of problems outside the knowledge base. As shown in Figure 3, the problem recognition outside the knowledge base is obtained through the existing text classification model The score is generally high, and the distribution of the score obtained through the intention recognition model of the present application is generally low, so as to reject the recognition based on the comparison of the score with the preset threshold, thereby improving the accuracy of the intention recognition. The abscissas in Figure 2 and Figure 3 both represent the score of the input information belonging to a certain classification category, and the ordinate represents the number of samples of the input model. The existing model in the figure refers to the text classification model used in the existing intention recognition .
本申请所述意图识别中的拒识方法应用于电子设备,所述电子设备可以是电视机、智能手机、平板电脑、计算机等终端设备。The rejection method in the intention recognition described in this application is applied to electronic devices, which may be terminal devices such as televisions, smart phones, tablet computers, and computers.
所述电子设备包括:处理器和存储器,所述存储器用于存储意图识别中的拒识程序,处理器执行所述意图识别中的拒识程序,实现以下的意图识别中的拒识方法:The electronic device includes: a processor and a memory, the memory is used to store the rejection program in the intention recognition, and the processor executes the rejection program in the intention recognition to implement the following rejection method in the intention recognition:
获取待识别的输入信息;将所述输入信息输入经过训练得到的意图识别模型,通过所述意图识别模型获取与所述输入信息对应的分类类别和置信度得分;判断所述置信度得分是否超过预设阈值,若超过预设阈值,则从知识库中获取所述分类类别对应的知识点信息,若未超过预设阈值,则拒绝识别所述输入信息;其中,所述意图识别模型包括文本分类模型和文本相似度模型,通过所述文本分类模型获取所述输入信息对应的分类类别和所述输入信息属于所述分类类别的条件概率,通过所述文本相似度模型和所述条件概率 获取所述置信度得分。Obtain the input information to be recognized; input the input information into the trained intention recognition model, and obtain the classification category and the confidence score corresponding to the input information through the intention recognition model; determine whether the confidence score exceeds The preset threshold value, if it exceeds the preset threshold value, the knowledge point information corresponding to the classification category is obtained from the knowledge base, if the preset threshold value is not exceeded, the input information is rejected; wherein, the intention recognition model includes text A classification model and a text similarity model, the classification category corresponding to the input information and the conditional probability that the input information belongs to the classification category are obtained through the text classification model, and the text similarity model and the conditional probability are obtained The confidence score.
所述电子设备还包括网络接口和通信总线等。其中,网络接口可以包括标准的有线接口、无线接口,通信总线用于实现各个组件之间的连接通信。The electronic device also includes a network interface, a communication bus, and the like. Among them, the network interface may include a standard wired interface and a wireless interface, and the communication bus is used to realize the connection and communication between various components.
存储器包括至少一种类型的可读存储介质,可以是闪存、硬盘、光盘等非易失性存储介质,也可以是插接式硬盘等,且并不限于此,可以是以非暂时性方式存储指令或软件以及任何相关联的数据文件并向处理器提供指令或软件程序以使该处理器能够执行指令或软件程序的任何装置。本申请中,存储器存储的软件程序包括意图识别中的拒识程序,并可以向处理器提供该意图识别中的拒识程序,以使得处理器可以执行该意图识别中的拒识程序,实现意图识别中的拒识方法。The memory includes at least one type of readable storage medium, which can be a non-volatile storage medium such as a flash memory, a hard disk, an optical disc, or a plug-in hard disk, etc., and is not limited to this, and can be stored in a non-transitory manner Any device that provides instructions or software and any associated data files to the processor to enable the processor to execute the instructions or software program. In this application, the software program stored in the memory includes the rejection program in the intention recognition, and can provide the rejection program in the intention recognition to the processor, so that the processor can execute the rejection program in the intention recognition to realize the intention Rejection method in recognition.
处理器可以是中央处理器、微处理器或其他数据处理芯片等,可以运行存储器中的存储程序,例如,本申请中意图识别中的拒识程序。The processor may be a central processing unit, a microprocessor, or other data processing chips, etc., and may run a program stored in the memory, for example, the recognition rejection program in the intention recognition in this application.
所述电子设备还可以包括显示器,显示器也可以称为显示屏或显示单元。在一些实施例中显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)触摸器等。显示器用于显示在电子设备中处理的信息以及用于显示可视化的工作界面,包括输入信息和通过意图识别模型的输出信息等。The electronic device may also include a display, which may also be called a display screen or a display unit. In some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (OLED) touch device, and the like. The display is used to display the information processed in the electronic device and to display the visual work interface, including input information and output information through the intent recognition model.
所述电子设备还可以包括用户接口,用户接口可以包括输入单元(比如键盘)、语音输出装置(比如音响、耳机)等。The electronic device may also include a user interface, and the user interface may include an input unit (such as a keyboard), a voice output device (such as a stereo, earphone), and the like.
需要说明的是,本申请之电子设备的具体实施方式与上述意图识别中的拒识方法的具体实施方式大致相同,在此不再赘述。It should be noted that the specific implementation of the electronic device of the present application is substantially the same as the specific implementation of the rejection method in the aforementioned intention recognition, and will not be repeated here.
图4为本申请中意图识别中的拒识装置的模块示意图,如图4所示,所述拒识装置包括:输入信息获取模块1,用于获取待识别的输入信息;识别模块2,用于将所述输入信息输入经过训练得到的意图识别模型进行识别,其中,所述意图识别模型包括文本分类模型和文本相似度模型,通过所述文本分类模型获取所述输入信息对应的分类类别和所述输入信息属于所述分类类别的条件概率,通过所述文本相似度模型和所述条件概率获取所述置信度得分;置信度获取模块3,用于通过所述意图识别模型获取与所述输入信息对应的分类类别和置信度得分;判断模块4,用于判断所述置信度得分是否超过预设阈值,若超过预设阈值,则从知识库中获取所述分类类别对应的知识点信息, 若未超过预设阈值,则拒绝识别所述输入信息。Fig. 4 is a schematic diagram of the module of the recognition rejection device in the intention recognition in this application. As shown in Fig. 4, the recognition rejection device includes: an input information acquisition module 1 for acquiring input information to be recognized; an identification module 2 The input information is input into a trained intent recognition model for recognition, where the intent recognition model includes a text classification model and a text similarity model, and the classification category and classification corresponding to the input information are obtained through the text classification model. The input information belongs to the conditional probability of the classification category, and the confidence score is obtained through the text similarity model and the conditional probability; the confidence obtaining module 3 is configured to obtain the The classification category and the confidence score corresponding to the input information; the judgment module 4 is used to judge whether the confidence score exceeds a preset threshold, and if it exceeds the preset threshold, obtain knowledge point information corresponding to the classification category from the knowledge base If it does not exceed the preset threshold, refuse to recognize the input information.
本申请中,待识别的输入信息是经过处理之后,可以直接输入意图识别模型中的信息,进一步地,待识别的输入信息可以直接输入文本分类模型中获取分类类别,也可以直接输入相似度模型获取与知识点信息的相似度。优选地,所述拒识装置通过以下步骤实现获取待识别的输入信息:In this application, the input information to be recognized can be directly input into the intent recognition model after processing. Further, the input information to be recognized can be directly input into the text classification model to obtain the classification category, or directly into the similarity model Get the similarity with the knowledge point information. Preferably, the recognition rejection device obtains the input information to be recognized through the following steps:
获取待识别的语音信息;将获取的语音信息转化为预设格式的文本信息;对所述文本信息进行处理得到待识别的输入信息。其中,获取待识别的语音信息可以是用户通过语音命令或聊天语音等。进一步地,对所述文本信息进行处理包括对文本信息进行去噪处理和分词处理等,通过去噪处理可以去除无意义的词组,且不会影响输入信息的真实意思,通过分词处理对文本信息进行分词,并可以进一步标注各词组的词性并识别命名实体。Acquire voice information to be recognized; convert the acquired voice information into text information in a preset format; process the text information to obtain input information to be recognized. Wherein, obtaining the voice information to be recognized may be a user's voice command or chat voice. Further, processing the text information includes denoising processing and word segmentation processing on the text information, etc., through denoising processing, meaningless phrases can be removed without affecting the true meaning of the input information, and the text information is processed through word segmentation. Perform word segmentation, and further mark the part of speech of each phrase and identify named entities.
本申请中,待识别的输入信息可以是句子或词组等,待识别的输入信息中包括用户想要咨询的问题表述,例如,问题表述为“我有网上银行怎么申请信用卡?”,对应的知识点信息为“信用卡申请”等。进一步地,所述输入信息中包括用户信息,用户信息包括但不限于用户年龄、性别、身份、职业、地域、家乡等信息,以便于通过用户信息对用户的输入信息进行偏好聚类,识别用户的倾向兴趣。In this application, the input information to be recognized can be sentences or phrases, etc. The input information to be recognized includes the expression of the question that the user wants to consult, for example, the question expression is "How do I apply for a credit card with online banking?", corresponding knowledge The point information is "credit card application" and so on. Further, the input information includes user information. The user information includes but is not limited to the user’s age, gender, identity, occupation, region, hometown and other information, so as to facilitate the preference clustering of the user’s input information through the user information and identify the user The tendency of interest.
本申请所述拒识装置包括置信度获取模块,根据文本相似度模型的输出结果获取置信度得分,在本申请的一个可选实施例中,所述置信度获取模块包括:第一信息输入单元,将所述输入信息和知识库中的知识点信息输入所述文本相似度模型中;第一相似度获取单元,通过所述文本相似度模型分别获取所述输入信息和所述知识库中各个知识点信息的相似度;选取单元,从获取的多个相似度中选取最大相似度;第一置信度计算单元,将所述最大相似度与所述条件概率相乘得到所述置信度得分。The recognition rejection device of the present application includes a confidence acquisition module, which acquires a confidence score according to the output result of the text similarity model. In an optional embodiment of the present application, the confidence acquisition module includes: a first information input unit , Input the input information and the knowledge point information in the knowledge base into the text similarity model; the first similarity obtaining unit obtains the input information and each of the knowledge points in the knowledge base through the text similarity model. The similarity of the knowledge point information; the selecting unit selects the maximum similarity from the obtained multiple similarities; the first confidence calculating unit multiplies the maximum similarity by the conditional probability to obtain the confidence score.
如下式所示:As shown in the following formula:
Figure PCTCN2019118278-appb-000007
Figure PCTCN2019118278-appb-000007
式中,x表示输入信息;C i表示知识库中第i类知识点信息;C表示知识库;Score(x∈C i)表示输入信息x属于知识库中第i类知识点信息的置信度得分;P(x∈C i,x∈C)表示输入信息x在知识库范围之内,并属于第i类知识点 信息的概率;
Figure PCTCN2019118278-appb-000008
表示输入信息x不在知识库范围之内,并属于第i类知识点信息的概率,一般为0;P(x∈C i|x∈C)表示输入信息x在知识库范围之内,并属于第i类知识点信息的条件概率,通过文本分类模型输出,可对其进行贝叶斯公式展开联合概率计算;j表示知识库中知识点信息类别的索引;P(x∈C)表示输入信息属于知识库的概率;sim(x,C j)表示输入信息x与知识库中第j类知识点信息的相似度,若输入信息x与知识库中的任意一个知识点信息很相似,则认为该输入信息x属于知识库,因此取其中相似度的最大值计算置信度得分。
In the formula, x represents the input information; C i represents the i-th type of knowledge point information in the knowledge base; C represents the knowledge base; Score(x∈C i ) represents the confidence that the input information x belongs to the i-th type knowledge point information in the knowledge base Score; P(x∈C i ,x∈C) represents the probability that the input information x is within the scope of the knowledge base and belongs to the i-th type of knowledge point information;
Figure PCTCN2019118278-appb-000008
Indicates the probability that the input information x is not within the scope of the knowledge base and belongs to the i-th type of knowledge point information, generally 0; P(x∈C i |x∈C) means that the input information x is within the scope of the knowledge base and belongs to The conditional probability of the i-th type of knowledge point information is output by the text classification model, which can be calculated by Bayesian formula expansion joint probability; j represents the index of the knowledge point information category in the knowledge base; P(x∈C) represents the input information The probability of belonging to the knowledge base; sim(x, C j ) represents the similarity between the input information x and the j-th type of knowledge point information in the knowledge base. If the input information x is very similar to any knowledge point information in the knowledge base, it is considered The input information x belongs to the knowledge base, so the maximum similarity is taken to calculate the confidence score.
假设文本分类模型中的分类算法是可信任的,那么如果输入信息x属于知识库,则通过文本分类模型必然将输入信息分类成为与该输入信息最相似的知识点信息类别。优选地,所述置信度获取模块包括:第二信息输入单元,将所述输入信息和知识库中与所述分类类别对应的知识点信息输入所述文本相似度模型中;第二相似度获取单元,通过所述文本相似度模型获取所述输入信息和所述分类类别对应的知识点信息的相似度;第二置信度计算单元,将所述条件概率与所述文本相似度模型获取的所述相似度相乘得到所述置信度得分。Assuming that the classification algorithm in the text classification model is trustworthy, if the input information x belongs to the knowledge base, the input information must be classified into the knowledge point information category most similar to the input information through the text classification model. Preferably, the confidence degree acquisition module includes: a second information input unit, which inputs the input information and knowledge point information corresponding to the classification category in the knowledge base into the text similarity model; and the second similarity degree acquisition Unit to obtain the similarity between the input information and the knowledge point information corresponding to the classification category through the text similarity model; a second confidence calculation unit to compare the conditional probability with the all obtained by the text similarity model The similarity is multiplied to obtain the confidence score.
如下式所示:As shown in the following formula:
Figure PCTCN2019118278-appb-000009
Figure PCTCN2019118278-appb-000009
式中,x表示输入信息,C i表示知识库中第i类知识点信息,C表示知识库,Score(x∈C i)表示输入信息x属于知识库中第i类知识点信息的置信度得分,P(x∈C i,x∈C)表示输入信息x在知识库范围之内,并属于第i类知识点信息的概率;
Figure PCTCN2019118278-appb-000010
表示输入信息x不在知识库范围之内,并属于第i类知识点信息的概率,一般为0;P(x∈C i|x∈C)表示输入信息x在知识库范围之内,并属于第i类知识点信息的条件概率,通过文本分类模型输出,P(x∈C)表示输入信息属于知识库的概率;sim(x,C i)表示输入信息x与第i类知识点信息的相似度。
In the formula, x represents the input information, C i represents the i-th type of knowledge point information in the knowledge base, C represents the knowledge base, and Score (x∈C i ) represents the confidence that the input information x belongs to the i-th type of knowledge point information in the knowledge base Score, P(x∈C i ,x∈C) represents the probability that the input information x is within the scope of the knowledge base and belongs to the i-th type of knowledge point information;
Figure PCTCN2019118278-appb-000010
Indicates the probability that the input information x is not within the scope of the knowledge base and belongs to the i-th type of knowledge point information, generally 0; P(x∈C i |x∈C) means that the input information x is within the scope of the knowledge base and belongs to The conditional probability of the i-th type of knowledge point information is output through the text classification model, P(x∈C) represents the probability that the input information belongs to the knowledge base; sim(x,C i ) represents the difference between the input information x and the i-th type of knowledge point information Similarity.
通过先使用文本分类模型得到一个分类结果,获取输入信息对应的分类类别,再使用这个分类结果计算文本相似度,得到置信度得分,大幅度地减少了文本相似度的匹配次数,提高运算效率,判断一个输入信息是否属于知 识库不再需要遍历地计算该输入信息和知识库中每个知识点信息的相似度。By first using the text classification model to obtain a classification result, obtain the classification category corresponding to the input information, and then use this classification result to calculate the text similarity, and obtain the confidence score, which greatly reduces the number of text similarity matches and improves the calculation efficiency. To determine whether an input information belongs to the knowledge base, it is no longer necessary to traversely calculate the similarity between the input information and each knowledge point information in the knowledge base.
所述文本分类模型用于对输入信息(可以是句子或词组等)进行分类,输出分类类别和相应的得分。优选地,所述文本分类模型包括:输入层、嵌入层、卷积层、池化层、归一化层和输出层,将所述输入信息输入所述输入层,通过嵌入层将输入信息转化为词向量矩阵,通过卷积层进行卷积运算,通过池化层进行池化操作,通过归一化层将所述输入信息属于每一个分类的得分进行归一化处理,通过所述输出层输出所述输入信息对应的分类类别和所述输入信息属于所述分类类别的得分。通过获取输入信息属于每一个分类的得分,再用这个得分除以总分得到输入信息属于该分类类别的概率,如下式所示:The text classification model is used to classify input information (which can be a sentence or a phrase, etc.), and output classification categories and corresponding scores. Preferably, the text classification model includes: an input layer, an embedding layer, a convolutional layer, a pooling layer, a normalization layer, and an output layer. The input information is input to the input layer, and the input information is converted through the embedding layer Is a word vector matrix, convolution operation is performed through the convolution layer, pooling operation is performed through the pooling layer, and the score of the input information belonging to each category is normalized through the normalization layer, and through the output layer The classification category corresponding to the input information and the score of the input information belonging to the classification category are output. By obtaining the score of the input information belonging to each category, and then dividing this score by the total score to obtain the probability that the input information belongs to the category category, as shown in the following formula:
Figure PCTCN2019118278-appb-000011
Figure PCTCN2019118278-appb-000011
式中,x为输入信息,C i为知识库中第i类知识点信息,s为得分,P(x∈C i)为输入信息x属于知识库中第i类知识点信息的概率,s(x∈C i)为输入信息x属于知识库中第i类知识点信息的得分,j为知识库中知识点信息类别的索引,n为知识库中知识点信息类别的总数量。 Where x is the input information, C i is the i-th type of knowledge point information in the knowledge base, s is the score, P(x∈C i ) is the probability that the input information x belongs to the i-th type of knowledge point information in the knowledge base, s (x∈C i ) is the score of the input information x belonging to the i-th type of knowledge point information in the knowledge base, j is the index of the knowledge point information category in the knowledge base, and n is the total number of knowledge point information categories in the knowledge base.
本申请的一个实施例中,所述文本相似度模型采用基于孪生网络的网络模型,包括两个并行的相同神经网络,将输入信息和知识库中知识点信息各输入一个神经网络中,通过两个神经网络分别将所述输入信息转化为第一向量,将所述知识点信息转化为第二向量,通过计算第一向量和第二向量的相似度获取输入信息与知识点信息的相似度并输出。通过文本相似度模型可以分别获取输入信息与知识库中各个知识点信息的相似度,也可以仅获取输入信息与文本分类模型输出的分类类别对应的知识点信息的相似度。其中,所述第一向量和所述第二向量的相似度的获取方式与意图识别中的拒识方法中的相似度的获取方式相同,在此不再赘述。In an embodiment of the present application, the text similarity model adopts a network model based on a twin network, which includes two parallel identical neural networks. The input information and the knowledge point information in the knowledge base are each input into a neural network. A neural network transforms the input information into a first vector, and transforms the knowledge point information into a second vector, and obtains the similarity between the input information and the knowledge point information by calculating the similarity between the first vector and the second vector. Output. Through the text similarity model, the similarity between the input information and the knowledge point information in the knowledge base can be obtained separately, or only the similarity between the input information and the knowledge point information corresponding to the classification category output by the text classification model can be obtained. Wherein, the method for obtaining the similarity between the first vector and the second vector is the same as the method for obtaining the similarity in the rejection method in the intention recognition, which will not be repeated here.
采用知识库中的知识点信息作为训练样本训练文本相似度模型。与意图识别中的拒识方法中对文本相似度模型的训练方式大致相同,在此不再赘述。The knowledge point information in the knowledge base is used as the training sample to train the text similarity model. The training method of the text similarity model in the rejection method in intention recognition is roughly the same, so I won't repeat it here.
需要说明的是,本申请之意图识别中的拒识装置与上述意图识别中的拒识方法、电子设备的具体实施方式大致相同,在此不再赘述。It should be noted that the rejection device in the intention recognition of this application is substantially the same as the rejection method and the specific implementation of the electronic device in the intention recognition mentioned above, and will not be repeated here.
在其他实施例中,意图识别中的拒识程序还可以被分割为一个或者多个 模块,一个或者多个模块被存储于存储器中,并由处理器执行,以完成本申请。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段。例如,所述意图识别中的拒识程序可以被分割为:输入信息获取模块1、识别模块2、置信度获取模块3和判断模块4。上述模块所实现的功能或操作步骤均与上文类似,此处不再详述。In other embodiments, the rejection program in the intention recognition can also be divided into one or more modules, and the one or more modules are stored in the memory and executed by the processor to complete the application. The module referred to in this application refers to a series of computer program instruction segments that can complete specific functions. For example, the rejection procedure in the intention recognition can be divided into: the input information acquisition module 1, the recognition module 2, the confidence acquisition module 3, and the judgment module 4. The functions or operation steps implemented by the above modules are all similar to the above, and will not be detailed here.
本申请的一个实施例中,计算机非易失性可读存储介质可以是任何包含或存储程序或指令的有形介质,其中的程序可以被执行,通过存储的程序指令相关的硬件实现相应的功能。例如,计算机非易失性可读存储介质可以是计算机磁盘、硬盘、随机存取存储器、只读存储器等。本申请并不限于此,可以是以非暂时性方式存储指令或软件以及任何相关数据文件或数据结构并且可提供给处理器以使处理器执行其中的程序或指令的任何装置。所述计算机非易失性可读存储介质中包括意图识别中的拒识程序,所述意图识别中的拒识程序被处理器执行时,实现如上所述的意图识别中的拒识方法,在此不再赘述。In an embodiment of the present application, the computer non-volatile readable storage medium may be any tangible medium that contains or stores a program or instruction, the program can be executed, and the stored program instructs relevant hardware to implement corresponding functions. For example, the computer non-volatile readable storage medium may be a computer disk, hard disk, random access memory, read-only memory, etc. The application is not limited to this, and it can be any device that stores instructions or software and any related data files or data structures in a non-transitory manner and can be provided to the processor to enable the processor to execute the programs or instructions therein. The computer non-volatile readable storage medium includes a rejection program in intention recognition, and when the rejection program in intention recognition is executed by a processor, the aforementioned rejection method in intention recognition is implemented. This will not be repeated here.
本申请之计算机非易失性可读存储介质的具体实施方式与上述意图识别中的拒识方法、装置和电子设备的具体实施方式大致相同,在此不再赘述。The specific implementation of the computer non-volatile readable storage medium of the present application is substantially the same as the specific implementation of the rejection method, device and electronic device in the above-mentioned intention recognition, and will not be repeated here.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, It also includes other elements not explicitly listed, or elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article, or method that includes the element.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments. Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, hardware can also be used, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disk, optical disk), including several instructions to make a terminal device (can be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application.

Claims (20)

  1. 一种意图识别中的拒识方法,应用于电子设备,其特征在于,包括:A method of refusal in intention recognition, applied to electronic equipment, characterized in that it includes:
    获取待识别的输入信息;Obtain the input information to be recognized;
    将所述输入信息输入经过训练得到的意图识别模型,通过所述意图识别模型获取与所述输入信息对应的分类类别和置信度得分;Inputting the input information into an intention recognition model obtained through training, and obtaining a classification category and a confidence score corresponding to the input information through the intention recognition model;
    判断所述置信度得分是否超过预设阈值,若超过预设阈值,则从知识库中获取所述分类类别对应的知识点信息,若未超过预设阈值,则拒绝识别所述输入信息;Determine whether the confidence score exceeds a preset threshold, if it exceeds the preset threshold, obtain knowledge point information corresponding to the classification category from the knowledge base, and if it does not exceed the preset threshold, refuse to recognize the input information;
    其中,所述意图识别模型包括文本分类模型和文本相似度模型,通过所述文本分类模型获取所述输入信息对应的分类类别和所述输入信息属于所述分类类别的条件概率,通过所述文本相似度模型和所述条件概率获取所述置信度得分。Wherein, the intent recognition model includes a text classification model and a text similarity model. The classification category corresponding to the input information and the conditional probability of the input information belonging to the classification category are obtained through the text classification model, and the text The similarity model and the conditional probability obtain the confidence score.
  2. 根据权利要求1所述的意图识别中的拒识方法,其特征在于,通过所述文本相似度模型和所述条件概率获取所述置信度得分的步骤包括:The method of rejecting recognition in intention recognition according to claim 1, wherein the step of obtaining the confidence score through the text similarity model and the conditional probability comprises:
    将所述输入信息和知识库中的知识点信息输入所述文本相似度模型中;Input the input information and knowledge point information in the knowledge base into the text similarity model;
    通过所述文本相似度模型分别获取所述输入信息和所述知识库中各个知识点信息的相似度;Obtaining the similarity between the input information and the knowledge point information in the knowledge base through the text similarity model;
    从获取的多个相似度中选取最大相似度;Select the maximum similarity from the obtained multiple similarities;
    将所述最大相似度与所述条件概率相乘得到所述置信度得分。The maximum similarity is multiplied by the conditional probability to obtain the confidence score.
  3. 根据权利要求1所述的意图识别中的拒识方法,其特征在于,通过所述文本相似度模型和所述条件概率获取所述置信度得分的步骤包括:The method of rejecting recognition in intention recognition according to claim 1, wherein the step of obtaining the confidence score through the text similarity model and the conditional probability comprises:
    将所述输入信息和知识库中与所述分类类别对应的知识点信息输入所述文本相似度模型中;Inputting the input information and knowledge point information corresponding to the classification category in the knowledge base into the text similarity model;
    通过所述文本相似度模型获取所述输入信息和所述分类类别对应的知识点信息的相似度;Obtaining the similarity between the input information and the knowledge point information corresponding to the classification category through the text similarity model;
    将所述条件概率与所述文本相似度模型获取的所述相似度相乘得到所述置信度得分。The conditional probability is multiplied by the similarity obtained by the text similarity model to obtain the confidence score.
  4. 根据权利要求1所述的意图识别中的拒识方法,其特征在于,从知识库中获取所述分类类别对应的知识点信息,包括:The method for rejecting recognition in intention recognition according to claim 1, wherein obtaining knowledge point information corresponding to the classification category from the knowledge base comprises:
    对置信度的预设阈值设定等级;Set the level of the preset threshold of confidence;
    根据置信度得分所属的阈值级别获取与所述分类类别对应的一个或多个知识点信息。Acquire one or more knowledge point information corresponding to the classification category according to the threshold level to which the confidence score belongs.
  5. 根据权利要求4所述的意图识别中的拒识方法,其特征在于,通过所述文本相似度模型和所述条件概率获取所述置信度得分的步骤包括:The method for rejecting recognition in intention recognition according to claim 4, wherein the step of obtaining the confidence score through the text similarity model and the conditional probability comprises:
    将所述输入信息和知识库中的知识点信息输入所述文本相似度模型中;Input the input information and knowledge point information in the knowledge base into the text similarity model;
    通过所述文本相似度模型分别获取所述输入信息和所述知识库中各个知识点信息的相似度;Obtaining the similarity between the input information and the knowledge point information in the knowledge base through the text similarity model;
    将获取的相似度按照从大到小的顺序排列,选取排序靠前的预设个数个相似度,并得到相对应的预设个数个置信度得分;Arrange the acquired similarities in descending order, select the top preset similarities, and obtain the corresponding preset confidence scores;
    从知识库中获取所述分类类别对应的知识点信息的步骤包括:The steps of obtaining knowledge point information corresponding to the classification category from the knowledge base include:
    从预设个数个置信度得分中,确定超过预设级别阈值的置信度得分对应的一个或多个知识点信息,作为与所述分类类别对应知识点信息。From a predetermined number of confidence scores, one or more knowledge point information corresponding to a confidence score exceeding a preset level threshold is determined as the knowledge point information corresponding to the classification category.
  6. 根据权利要求1所述的意图识别中的拒识方法,其特征在于,所述文本相似度模型采用基于孪生网络的网络模型,包括两个并行的相同神经网络,将输入信息和知识库中知识点信息各输入一个神经网络中,通过两个神经网络分别将所述输入信息转化为第一向量,将所述知识点信息转化为第二向量,通过计算所述第一向量和所述第二向量的相似度获取所述输入信息和所述知识点信息的相似度并输出。The method for rejecting recognition in intention recognition according to claim 1, wherein the text similarity model adopts a network model based on a twin network, including two parallel identical neural networks, which combine the input information and the knowledge in the knowledge base. Each point information is input into a neural network. The input information is converted into a first vector through two neural networks, and the knowledge point information is converted into a second vector. By calculating the first vector and the second vector, The similarity of the vector obtains and outputs the similarity of the input information and the knowledge point information.
  7. 根据权利要求6所述的意图识别中的拒识方法,其特征在于,所述第一向量和所述第二向量的相似度通过下式计算得到:The method for rejecting recognition in intention recognition according to claim 6, wherein the similarity between the first vector and the second vector is calculated by the following formula:
    Figure PCTCN2019118278-appb-100001
    Figure PCTCN2019118278-appb-100001
    式中,Y 1为第一向量,Y 2为第二向量,sim(Y 1,Y 2)为第一向量和第二向量的相似度。 In the formula, Y 1 is the first vector, Y 2 is the second vector, and sim(Y 1 , Y 2 ) is the similarity between the first vector and the second vector.
  8. 根据权利要求6所述的意图识别中的拒识方法,其特征在于,所述神经网络是RNN神经网络、CNN神经网络、LSTM神经网络中的一种。The method for rejecting recognition in intention recognition according to claim 6, wherein the neural network is one of RNN neural network, CNN neural network, and LSTM neural network.
  9. 根据权利要求1所述的意图识别中的拒识方法,其特征在于,获取待识别的输入信息的步骤包括:The method of rejecting recognition in intention recognition according to claim 1, wherein the step of obtaining the input information to be recognized comprises:
    获取待识别的语音信息;Obtain the voice information to be recognized;
    将获取的语音信息转化为预设格式的文本信息;Convert the acquired voice information into text information in a preset format;
    对所述文本信息进行处理得到待识别的输入信息。The text information is processed to obtain input information to be recognized.
  10. 根据权利要求9所述的意图识别中的拒识方法,其特征在于,对所述文本信息进行处理包括:对所述文本信息进行去噪处理和分词处理。The method for rejecting recognition in intention recognition according to claim 9, wherein processing the text information comprises: denoising processing and word segmentation processing on the text information.
  11. 根据权利要求1所述的意图识别中的拒识方法,其特征在于,所述文本分类模型包括:输入层、嵌入层、卷积层、池化层、归一化层和输出层,将所述输入信息输入所述输入层,通过嵌入层将输入信息转化为词向量矩阵,通过卷积层进行卷积运算,通过池化层进行池化操作,通过归一化层将所述输入信息属于每一个分类的得分进行归一化处理,通过所述输出层输出与所述输入信息对应的分类类别和所述输入信息属于所述分类类别的条件概率。The method for rejecting intent recognition according to claim 1, wherein the text classification model includes: an input layer, an embedding layer, a convolutional layer, a pooling layer, a normalization layer, and an output layer. The input information is input to the input layer, the input information is converted into a word vector matrix through the embedding layer, the convolution operation is performed through the convolution layer, the pooling operation is performed through the pooling layer, and the input information belongs to the normalization layer. The score of each classification is normalized, and the classification category corresponding to the input information and the conditional probability that the input information belongs to the classification category are output through the output layer.
  12. 根据权利要求11所述的意图识别中的拒识方法,其特征在于,所述输入信息属于所述分类类别的条件概率通过下式获取:The method of rejecting recognition in intention recognition according to claim 11, wherein the conditional probability that the input information belongs to the classification category is obtained by the following formula:
    Figure PCTCN2019118278-appb-100002
    Figure PCTCN2019118278-appb-100002
    式中,x为输入信息,C i为知识库中第i类知识点信息,s为得分,P(x∈C i)为输入信息x属于知识库中第i类知识点信息的概率,s(x∈C i)为输入信息x属于知识库中第i类知识点信息的得分,j为知识库中知识点信息类别的索引,n为知识库中知识点信息类别的总数量。 Where x is the input information, C i is the i-th type of knowledge point information in the knowledge base, s is the score, P(x∈C i ) is the probability that the input information x belongs to the i-th type of knowledge point information in the knowledge base, s (x∈C i ) is the score of the input information x belonging to the i-th type of knowledge point information in the knowledge base, j is the index of the knowledge point information category in the knowledge base, and n is the total number of knowledge point information categories in the knowledge base.
  13. 一种意图识别中的拒识装置,其特征在于,包括:A rejection device in intention recognition, which is characterized in that it includes:
    输入信息获取模块,用于获取待识别的输入信息;The input information obtaining module is used to obtain the input information to be recognized;
    识别模块,用于将所述输入信息输入经过训练得到的意图识别模型进行识别,其中,所述意图识别模型包括文本分类模型和文本相似度模型,通过所述文本分类模型获取所述输入信息对应的分类类别和所述输入信息属于所述分类类别的条件概率,通过所述文本相似度模型和所述条件概率获取所述置信度得分;The recognition module is used to input the input information into a trained intent recognition model for recognition, wherein the intent recognition model includes a text classification model and a text similarity model, and the corresponding input information is obtained through the text classification model The classification category of and the conditional probability that the input information belongs to the classification category, and the confidence score is obtained through the text similarity model and the conditional probability;
    置信度获取模块,用于通过所述意图识别模型获取与所述输入信息对应的分类类别和置信度得分;A confidence degree acquisition module, configured to acquire a classification category and a confidence score corresponding to the input information through the intention recognition model;
    判断模块,用于判断所述置信度得分是否超过预设阈值,若超过预设阈值,则从知识库中获取所述分类类别对应的知识点信息,若未超过预设阈值,则拒绝识别所述输入信息。The judgment module is used to judge whether the confidence score exceeds a preset threshold, if it exceeds the preset threshold, obtain the knowledge point information corresponding to the classification category from the knowledge base, and if it does not exceed the preset threshold, refuse to identify the述input information.
  14. 根据权利要求13所述的意图识别中的拒识装置,其特征在于,所述置信度获取模块包括:第一信息输入单元,用于将所述输入信息和知识库中 的知识点信息输入所述文本相似度模型中;第一相似度获取单元,用于通过所述文本相似度模型分别获取所述输入信息和所述知识库中各个知识点信息的相似度;选取单元,用于从获取的多个相似度中选取最大相似度;第一置信度计算单元,用于将所述最大相似度与所述条件概率相乘得到所述置信度得分。The recognition rejection device in intention recognition according to claim 13, wherein the confidence level acquisition module comprises: a first information input unit, configured to input the input information and knowledge point information in the knowledge base into the place In the text similarity model; the first similarity acquisition unit is used to obtain the similarity between the input information and the knowledge point information in the knowledge base through the text similarity model; the selection unit is used to obtain The maximum similarity is selected among the multiple similarities; the first confidence calculation unit is configured to multiply the maximum similarity and the conditional probability to obtain the confidence score.
  15. 根据权利要求13所述的意图识别中的拒识装置,其特征在于,所述置信度获取模块包括:第二信息输入单元,用于将所述输入信息和知识库中与所述分类类别对应的知识点信息输入所述文本相似度模型中;第二相似度获取单元,用于通过所述文本相似度模型获取所述输入信息和所述分类类别对应的知识点信息的相似度;第二置信度计算单元,用于将所述条件概率与所述文本相似度模型获取的所述相似度相乘得到所述置信度得分。The device for rejecting recognition in intention recognition according to claim 13, wherein the confidence acquisition module comprises: a second information input unit, configured to correspond to the classification category in the input information and the knowledge base The knowledge point information of is input into the text similarity model; a second similarity obtaining unit is used to obtain the similarity between the input information and the knowledge point information corresponding to the classification category through the text similarity model; second The confidence calculation unit is configured to multiply the conditional probability and the similarity obtained by the text similarity model to obtain the confidence score.
  16. 根据权利要求13所述的意图识别中的拒识装置,其特征在于,所述文本相似度模型采用基于孪生网络的网络模型,包括两个并行的相同神经网络,将输入信息和知识库中知识点信息各输入一个神经网络中,通过两个神经网络分别将所述输入信息转化为第一向量,将所述知识点信息转化为第二向量,通过计算所述第一向量和所述第二向量的相似度获取所述输入信息和所述知识点信息的相似度并输出。The recognition rejection device in intention recognition according to claim 13, characterized in that the text similarity model adopts a network model based on a twin network, including two parallel identical neural networks, which combine the input information and the knowledge in the knowledge base. Each point information is input into a neural network. The input information is converted into a first vector through two neural networks, and the knowledge point information is converted into a second vector. By calculating the first vector and the second vector, The similarity of the vector obtains and outputs the similarity of the input information and the knowledge point information.
  17. 根据权利要求13所述的意图识别中的拒识装置,其特征在于,所述文本分类模型包括:输入层、嵌入层、卷积层、池化层、归一化层和输出层,将所述输入信息输入所述输入层,通过嵌入层将输入信息转化为词向量矩阵,通过卷积层进行卷积运算,通过池化层进行池化操作,通过归一化层将所述输入信息属于每一个分类的得分进行归一化处理,通过所述输出层输出与所述输入信息对应的分类类别和所述输入信息属于所述分类类别的条件概率。The recognition rejection device in intent recognition according to claim 13, wherein the text classification model comprises: an input layer, an embedding layer, a convolutional layer, a pooling layer, a normalization layer, and an output layer. The input information is input to the input layer, the input information is converted into a word vector matrix through the embedding layer, the convolution operation is performed through the convolution layer, the pooling operation is performed through the pooling layer, and the input information belongs to the normalization layer The score of each classification is normalized, and the classification category corresponding to the input information and the conditional probability that the input information belongs to the classification category are output through the output layer.
  18. 根据权利要求17所述的意图识别中的拒识方法,其特征在于,所述输入信息属于所述分类类别的条件概率通过下式获取:The method for rejecting recognition in intention recognition according to claim 17, wherein the conditional probability that the input information belongs to the classification category is obtained by the following formula:
    Figure PCTCN2019118278-appb-100003
    Figure PCTCN2019118278-appb-100003
    式中,x为输入信息,C i为知识库中第i类知识点信息,s为得分,P(x∈C i)为输入信息x属于知识库中第i类知识点信息的概率,s(x∈C i)为输入信息x属于知识库中第i类知识点信息的得分,j为知识库中知识点信息类别的索引,n为知识库中知识点信息类别的总数量。 Where x is the input information, C i is the i-th type of knowledge point information in the knowledge base, s is the score, P(x∈C i ) is the probability that the input information x belongs to the i-th type of knowledge point information in the knowledge base, s (x∈C i ) is the score of the input information x belonging to the i-th type of knowledge point information in the knowledge base, j is the index of the knowledge point information category in the knowledge base, and n is the total number of knowledge point information categories in the knowledge base.
  19. 一种电子设备,其特征在于,该电子设备包括:处理器和存储器,所述存储器中包括意图识别中的拒识程序,所述拒识程序被所述处理器执行时实现如权利要求1至12中任一项所述的意图识别中的拒识方法。An electronic device, characterized in that the electronic device comprises: a processor and a memory, the memory includes an intent recognition rejection program, and when the rejection program is executed by the processor, it realizes the following claims 1 to 12. The rejection method in intention recognition described in any one of 12.
  20. 一种计算机非易失性可读存储介质,其特征在于,所述计算机非易失性可读存储介质中包括意图识别中的拒识程序,所述拒识程序被处理器执行时,实现如权利要求1至12中任一项所述的意图识别中的拒识方法。A computer non-volatile readable storage medium, characterized in that the computer non-volatile readable storage medium includes an intent recognition rejection program, when the rejection program is executed by a processor, The method of rejecting recognition in intention recognition according to any one of claims 1 to 12.
PCT/CN2019/118278 2019-01-31 2019-11-14 Method, device and apparatus for identification rejection in intention identification, and storage medium WO2020155766A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910100204.5 2019-01-31
CN201910100204.5A CN109871446B (en) 2019-01-31 2019-01-31 Refusing method in intention recognition, electronic device and storage medium

Publications (1)

Publication Number Publication Date
WO2020155766A1 true WO2020155766A1 (en) 2020-08-06

Family

ID=66918490

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118278 WO2020155766A1 (en) 2019-01-31 2019-11-14 Method, device and apparatus for identification rejection in intention identification, and storage medium

Country Status (2)

Country Link
CN (1) CN109871446B (en)
WO (1) WO2020155766A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112015921A (en) * 2020-09-15 2020-12-01 重庆广播电视大学重庆工商职业学院 Natural language processing method based on learning-assisted knowledge graph
CN112396111A (en) * 2020-11-20 2021-02-23 平安普惠企业管理有限公司 Text intention classification method and device, computer equipment and storage medium
CN112667076A (en) * 2020-12-23 2021-04-16 广州橙行智动汽车科技有限公司 Voice interaction data processing method and device

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871446B (en) * 2019-01-31 2023-06-06 平安科技(深圳)有限公司 Refusing method in intention recognition, electronic device and storage medium
CN110472007A (en) * 2019-07-04 2019-11-19 深圳追一科技有限公司 Information-pushing method, device, equipment and storage medium
CN110414005B (en) * 2019-07-31 2023-10-10 达闼机器人股份有限公司 Intention recognition method, electronic device and storage medium
CN110619035B (en) * 2019-08-01 2023-07-25 平安科技(深圳)有限公司 Method, device, equipment and storage medium for identifying keywords in interview video
CN112347776A (en) * 2019-08-09 2021-02-09 金色熊猫有限公司 Medical data processing method and device, storage medium and electronic equipment
CN110503143B (en) * 2019-08-14 2024-03-19 平安科技(深圳)有限公司 Threshold selection method, device, storage medium and device based on intention recognition
CN112699909B (en) * 2019-10-23 2024-03-19 中移物联网有限公司 Information identification method, information identification device, electronic equipment and computer readable storage medium
CN112733869A (en) * 2019-10-28 2021-04-30 中移信息技术有限公司 Method, device and equipment for training text recognition model and storage medium
CN111078846A (en) * 2019-11-25 2020-04-28 青牛智胜(深圳)科技有限公司 Multi-turn dialog system construction method and system based on business scene
CN111310441A (en) * 2020-01-20 2020-06-19 上海眼控科技股份有限公司 Text correction method, device, terminal and medium based on BERT (binary offset transcription) voice recognition
CN111400473A (en) * 2020-03-18 2020-07-10 北京三快在线科技有限公司 Method and device for training intention recognition model, storage medium and electronic equipment
CN111611366B (en) 2020-05-20 2023-08-11 北京百度网讯科技有限公司 Method, device, equipment and storage medium for optimizing intention recognition
CN111625636B (en) * 2020-05-28 2023-08-04 深圳追一科技有限公司 Method, device, equipment and medium for rejecting man-machine conversation
CN111859986B (en) * 2020-07-27 2023-06-20 中国平安人寿保险股份有限公司 Semantic matching method, device, equipment and medium based on multi-task twin network
CN112232269B (en) * 2020-10-29 2024-02-09 南京莱斯网信技术研究院有限公司 Ship identity intelligent recognition method and system based on twin network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095845A (en) * 2016-06-02 2016-11-09 腾讯科技(深圳)有限公司 File classification method and device
CN106326984A (en) * 2016-08-09 2017-01-11 北京京东尚科信息技术有限公司 User intention identification method and device and automatic answering system
CN108334891A (en) * 2017-12-15 2018-07-27 北京奇艺世纪科技有限公司 A kind of Task intent classifier method and device
CN108509463A (en) * 2017-02-28 2018-09-07 华为技术有限公司 A kind of answer method and device of problem
CN109241255A (en) * 2018-08-20 2019-01-18 华中师范大学 A kind of intension recognizing method based on deep learning
CN109871446A (en) * 2019-01-31 2019-06-11 平安科技(深圳)有限公司 Rejection method for identifying, electronic device and storage medium in intention assessment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8515736B1 (en) * 2010-09-30 2013-08-20 Nuance Communications, Inc. Training call routing applications by reusing semantically-labeled data collected for prior applications
JP5733158B2 (en) * 2011-11-02 2015-06-10 富士通株式会社 Recognition support device, recognition support method, and program
US9966065B2 (en) * 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
CN105957516B (en) * 2016-06-16 2019-03-08 百度在线网络技术(北京)有限公司 More voice identification model switching method and device
CN106547742B (en) * 2016-11-30 2019-05-03 百度在线网络技术(北京)有限公司 Semantic parsing result treating method and apparatus based on artificial intelligence
CN107657284A (en) * 2017-10-11 2018-02-02 宁波爱信诺航天信息有限公司 A kind of trade name sorting technique and system based on Semantic Similarity extension

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095845A (en) * 2016-06-02 2016-11-09 腾讯科技(深圳)有限公司 File classification method and device
CN106326984A (en) * 2016-08-09 2017-01-11 北京京东尚科信息技术有限公司 User intention identification method and device and automatic answering system
CN108509463A (en) * 2017-02-28 2018-09-07 华为技术有限公司 A kind of answer method and device of problem
CN108334891A (en) * 2017-12-15 2018-07-27 北京奇艺世纪科技有限公司 A kind of Task intent classifier method and device
CN109241255A (en) * 2018-08-20 2019-01-18 华中师范大学 A kind of intension recognizing method based on deep learning
CN109871446A (en) * 2019-01-31 2019-06-11 平安科技(深圳)有限公司 Rejection method for identifying, electronic device and storage medium in intention assessment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112015921A (en) * 2020-09-15 2020-12-01 重庆广播电视大学重庆工商职业学院 Natural language processing method based on learning-assisted knowledge graph
CN112015921B (en) * 2020-09-15 2024-04-16 重庆广播电视大学重庆工商职业学院 Natural language processing method based on learning auxiliary knowledge graph
CN112396111A (en) * 2020-11-20 2021-02-23 平安普惠企业管理有限公司 Text intention classification method and device, computer equipment and storage medium
CN112667076A (en) * 2020-12-23 2021-04-16 广州橙行智动汽车科技有限公司 Voice interaction data processing method and device
WO2022135496A1 (en) * 2020-12-23 2022-06-30 广州橙行智动汽车科技有限公司 Voice interaction data processing method and device

Also Published As

Publication number Publication date
CN109871446A (en) 2019-06-11
CN109871446B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
WO2020155766A1 (en) Method, device and apparatus for identification rejection in intention identification, and storage medium
WO2020143844A1 (en) Intent analysis method and apparatus, display terminal, and computer readable storage medium
WO2020207431A1 (en) Document classification method, apparatus and device, and storage medium
CN109933785B (en) Method, apparatus, device and medium for entity association
WO2020220539A1 (en) Data increment method and device, computer device and storage medium
WO2020199591A1 (en) Text categorization model training method, apparatus, computer device, and storage medium
WO2020077895A1 (en) Signing intention determining method and apparatus, computer device, and storage medium
WO2021174717A1 (en) Text intent recognition method and apparatus, computer device and storage medium
WO2020147395A1 (en) Emotion-based text classification method and device, and computer apparatus
WO2020119031A1 (en) Deep learning-based question and answer feedback method, device, apparatus, and storage medium
US20190197119A1 (en) Language-agnostic understanding
CN111160017A (en) Keyword extraction method, phonetics scoring method and phonetics recommendation method
WO2020237856A1 (en) Smart question and answer method and apparatus based on knowledge graph, and computer storage medium
US20200143575A1 (en) Method and device for displaying explanation of reference numeral in patent drawing image using artificial intelligence technology based machine learning
WO2019084810A1 (en) Information processing method and terminal, and computer storage medium
US11526663B2 (en) Methods, apparatuses, devices, and computer-readable storage media for determining category of entity
WO2022227165A1 (en) Question and answer method and apparatus for machine reading comprehension, computer device, and storage medium
WO2022227162A1 (en) Question and answer data processing method and apparatus, and computer device and storage medium
WO2021114810A1 (en) Graph structure-based official document recommendation method, apparatus, computer device, and medium
CN111274371B (en) Intelligent man-machine conversation method and equipment based on knowledge graph
CN111090719B (en) Text classification method, apparatus, computer device and storage medium
CN110502610A (en) Intelligent sound endorsement method, device and medium based on text semantic similarity
WO2020114100A1 (en) Information processing method and apparatus, and computer storage medium
CN112699923A (en) Document classification prediction method and device, computer equipment and storage medium
CN112632261A (en) Intelligent question and answer method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19914035

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19914035

Country of ref document: EP

Kind code of ref document: A1