WO2020087774A1 - 基于概念树的意图识别方法、装置及计算机设备 - Google Patents

基于概念树的意图识别方法、装置及计算机设备 Download PDF

Info

Publication number
WO2020087774A1
WO2020087774A1 PCT/CN2019/070295 CN2019070295W WO2020087774A1 WO 2020087774 A1 WO2020087774 A1 WO 2020087774A1 CN 2019070295 W CN2019070295 W CN 2019070295W WO 2020087774 A1 WO2020087774 A1 WO 2020087774A1
Authority
WO
WIPO (PCT)
Prior art keywords
intent
word
intention
keyword
target
Prior art date
Application number
PCT/CN2019/070295
Other languages
English (en)
French (fr)
Inventor
严海锐
周宝
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020087774A1 publication Critical patent/WO2020087774A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/008Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the technical field of data analysis, and in particular to a concept tree-based intent recognition method, device, and computer equipment.
  • robots can handle business consulting services in specific fields.
  • whether the robot can correctly identify the user's intention is an important factor for the robot to effectively provide users with business question and answer responses. Therefore, research based on intention recognition is constantly developing.
  • the purpose of the present application is to provide a concept tree-based intent recognition method, device and computer equipment for solving the problems in the prior art.
  • the present application provides a concept tree-based intent recognition method, including the following steps:
  • Step 01 Obtain the target sentence that requires intent recognition
  • Step 02 Perform word segmentation processing on the target sentence to obtain at least one traversal word
  • Step 03 For each current traversal word, traverse the keyword corresponding to each intention in the pre-constructed concept tree, and calculate the word vector similarity between the current traversal word and each keyword traversed; wherein, the The concept tree includes at least one layer, each layer includes at least one intent, each intent corresponds to at least one keyword, and each keyword is set with a corresponding weight; the n + 1th layer in the concept tree is the nth layer Child intent, the nth level in the concept tree is the n + 1 level parent intent, n is a positive integer;
  • Step 04 Calculate the intent score corresponding to each intent of the target sentence according to the similarity between the word vector of the current traversed word and each traversed keyword, and the weight corresponding to each traversed keyword;
  • Step 05 Determine the intent corresponding to the target sentence according to the intent score corresponding to the intent of the target sentence and the intent threshold corresponding to each layer set in advance.
  • the present application also provides an intention recognition device based on a concept tree, including:
  • the target sentence acquisition module is used to obtain the target sentence that needs to be identified
  • the word segmentation processing module is used to perform word segmentation processing on the target sentence to obtain at least one traversal word
  • the keyword traversal module is used to traverse the keywords corresponding to each intention in the pre-constructed concept tree for each current traversal word; wherein the concept tree includes at least one layer, and each layer includes at least one intention, Each intent corresponds to at least one keyword, and each keyword is set with a corresponding weight; the n + 1th level in the concept tree is the n-th level sub-intent, and the nth level in the concept tree is the n + th level The parent's intention of level 1, n is a positive integer;
  • the word vector similarity calculation module is used to calculate the word vector similarity between the current traversed word and each keyword traversed;
  • the intent score calculation module is used to calculate the target sentence corresponding to each intent according to the similarity between the current traversed word and the traversed word vector of each keyword, and the weight value corresponding to each traversed keyword Intention score
  • the intent determination module is used to determine the intent corresponding to the target sentence according to the intent score corresponding to the intent of the target sentence and the intent threshold value corresponding to each layer set in advance.
  • the present application also provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor.
  • a computer program stored on the memory and executable on the processor.
  • Step 01 Obtain the target sentence that requires intent recognition
  • Step 02 Perform word segmentation processing on the target sentence to obtain at least one traversal word
  • Step 03 For each current traversal word, traverse the keyword corresponding to each intention in the pre-constructed concept tree, and calculate the word vector similarity between the current traversal word and each keyword traversed; wherein, the The concept tree includes at least one layer, each layer includes at least one intent, each intent corresponds to at least one keyword, and each keyword is set with a corresponding weight; the n + 1th layer in the concept tree is the nth layer Child intent, the nth level in the concept tree is the n + 1 level parent intent, n is a positive integer;
  • Step 04 Calculate the intent score corresponding to each intent of the target sentence according to the similarity between the word vector of the current traversed word and each traversed keyword, and the weight corresponding to each traversed keyword;
  • Step 05 Determine the intent corresponding to the target sentence according to the intent score corresponding to the intent of the target sentence and the intent threshold value corresponding to each layer set in advance.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps of the concept tree-based intent recognition method are implemented:
  • Step 01 Obtain the target sentence that requires intent recognition
  • Step 02 Perform word segmentation processing on the target sentence to obtain at least one traversal word
  • Step 03 For each current traversal word, traverse the keyword corresponding to each intention in the pre-constructed concept tree, and calculate the word vector similarity between the current traversal word and each keyword traversed; wherein, the The concept tree includes at least one layer, each layer includes at least one intent, each intent corresponds to at least one keyword, and each keyword is set with a corresponding weight; the n + 1th layer in the concept tree is the nth layer Child intent, the nth level in the concept tree is the n + 1 level parent intent, n is a positive integer;
  • Step 04 Calculate the intent score corresponding to each intent of the target sentence according to the similarity between the word vector of the current traversed word and each traversed keyword, and the weight corresponding to each traversed keyword;
  • Step 05 Determine the intent corresponding to the target sentence according to the intent score corresponding to the intent of the target sentence and the intent threshold corresponding to each layer set in advance.
  • the intent recognition method, device and computer equipment based on the concept tree provided by the present application by constructing the concept tree, the concept tree includes at least one layer, each layer includes at least one intent, each intent corresponds to at least one keyword, and each keyword The corresponding weights are set.
  • the n + 1th level in the concept tree is the n-level child intent.
  • the nth level in the concept tree is the n + 1-level parent intent.
  • FIG. 1 is a flowchart of Embodiment 1 of an intent recognition method based on a concept tree in this application;
  • FIG. 2 is a simple example diagram of a concept tree according to Embodiment 1 of the present application.
  • FIG. 3 is a simple example diagram of another concept tree according to Embodiment 1 of the present application.
  • FIG. 4 is a schematic diagram of a program module of Embodiment 1 of an intention recognition device based on a concept tree of the present application;
  • FIG. 5 is a schematic diagram of the hardware structure of Embodiment 1 of an intention recognition device based on a concept tree in this application.
  • the concept tree-based intention recognition method, device, and computer equipment provided in this application are applicable to the field of data analysis technology, and are methods for identifying user intentions.
  • This application constructs a concept tree.
  • the concept tree includes at least one layer, and each layer includes at least one intent.
  • Each intent corresponds to at least one keyword, and each keyword is set with a corresponding weight.
  • the n + 1th layer in the concept tree It is the child intent of the nth layer.
  • the nth layer in the concept tree is the parent intent of the n + 1th layer.
  • At least one traversal word is obtained, and for each current Traverse words, traverse the keywords corresponding to each intention in the concept tree, and calculate the word vector similarity between the current traversed word and each keyword traversed, according to the word vector similarity and the weight of each keyword, The intent score of the target sentence corresponding to each intent is calculated, and the intent of the target sentence is determined according to the intent score and the intent threshold of each layer.
  • This application does not require a large number of training samples in a specific field, and can accurately identify the intention of the target sentence.
  • a concept tree-based intent recognition method in this embodiment includes the following steps:
  • Step 00 Construct the concept tree in advance.
  • the concept tree includes at least one layer, each layer includes at least one intent, each intent corresponds to at least one keyword, and each keyword is set with a corresponding weight;
  • the n + 1th layer in the concept tree is The child intention of the nth layer, the nth layer in the concept tree is the parent intent of the n + 1th layer, and n is a positive integer.
  • the concept tree is constructed as follows:
  • Step 001 Determine each intention for constructing the nth level of the concept tree.
  • n is a positive integer.
  • the concept tree may use the root node as a starting point, and the first layer is various intents in various fields, wherein the intents of the first layer are all connected to the root node.
  • the intent of the first layer can be connected to the sub-intent of the next layer, and the sub-intent can also be connected to the sub-intent of the next layer.
  • Step 002 Obtain data samples corresponding to each intention.
  • the actual intent of a data sample corresponding to an intent is that intent.
  • Each intent data sample is composed of sentence text.
  • the data samples of each intent can be obtained in the sample library.
  • the sample library is in the working process. Accumulated.
  • the data sample size corresponding to each intention may be preset, for example, the data sample size corresponding to each intention is 100.
  • Step 003 For each intent, perform word segmentation processing on the corresponding data sample to obtain at least one word to be selected corresponding to the intent, and select a keyword corresponding to the intent among the at least one word to be selected.
  • the word segmentation method includes Stanford word segmentation method or stammer word segmentation method.
  • the at least one candidate word obtained after word segmentation processing of the data sample may not all be used as a keyword.
  • the keyword that best indicates the intention may be selected.
  • the number of selected keywords may be One or more.
  • the manner of selecting the keyword corresponding to the intention in at least one candidate word can be at least the following ways:
  • the TF-IDF value can be calculated by the following formula (1):
  • TF-IDF W is used to characterize the TF-IDF value of the term W
  • TF W is used to characterize the number of times the term W appears in the intent, in order to prevent the parameter from biasing to a long file, it is usually necessary to summarize the parameter Uniform processing
  • IDF W is used to characterize the frequency of the reverse file of the entry W.
  • the main idea of the IDF is: if the intention of including the entry W is less, the larger the IDF, it means that the entry has a good ability to distinguish between categories .
  • TF W and IDF W can be calculated by the following formula (2) and formula (3).
  • a word with a TF-IDF value greater than the first threshold may be selected as the keyword of the intention, for example, the fixed threshold is 0.12.
  • the TF-IDF value may be sorted from largest to smallest, and the first preset number of words to be selected that are ranked first in the TF-IDF value may be selected as the keyword of the intention.
  • b Count the word frequency (TF value) of each word to be selected, and select keywords according to the word frequency of the word to be selected.
  • the TF value of each word to be selected is calculated according to the above formula (2).
  • words to be selected with a TF value exceeding a set number of times may be selected as keywords of the intention.
  • the chi-square test is performed on the words to be selected after word segmentation, and the keyword of the intention is determined according to the value of the chi-square test.
  • a chi-square test can be obtained by performing a chi-square test on each word to be selected after word segmentation, and a chi-square test value calculated by the chi-square test corresponding to each word to be selected can be obtained.
  • the values are sorted from largest to smallest, and the second preset number of candidate words that are ranked first in the chi-square test value are taken as keywords of the intention, or the candidate words whose chi-square test value is greater than the second threshold are used as Keywords for this intention.
  • the above two or three ways can also be used to select, for example, the same keyword selected in each of the above modes is used as the intent keyword; or, for each A selection method to set a weight, combining the keywords selected in each of the above methods with the weight of the corresponding method to further filter out words with a value greater than the third threshold as the keywords of the intention, or, the third value with the highest value A predetermined number of words are used as keywords for this intention.
  • the accuracy and reliability of the selected keywords can be further improved, so as to improve the accuracy of intent recognition.
  • some words with no substantive meaning can be deleted first, for example, stop words, " ⁇ " " ⁇ ” " ⁇ ”, etc. word.
  • Step 004 Determine the weight corresponding to each keyword, and configure the determined weight to the corresponding keyword.
  • the calculated TF-IDF value can be directly used as the weight of the keyword, or the TF-IDF value can be normalized to be the weight of the keyword (that is, all the weights add up 1) Or, you can modify the weight of keywords according to user needs, or you can manually add required keywords and assign values according to user needs.
  • Step 005 determine whether each current intention includes sub-intentions, if sub-intentions are included, determine each sub-intention used to construct the n + 1th level of the concept tree, and perform steps 002-005 for each sub-intention, if not Including the sub-intent, the construction of the concept tree is completed.
  • the current intent includes sub-intents, for example, under the "ticket” intent, there will be “view” sub-intent, “reservation” sub-intent, and “cancel” sub-intent.
  • the "view” sub-intent, "reservation” sub-intent, and “cancel” sub-intent under the "ticket” intent belong to the second-level intent of the concept tree.
  • steps 002-005 can be used to further determine keywords and corresponding weights.
  • FIG. 2 is a simple example diagram of a concept tree.
  • the first layer of intents connected under the root node include: “ticket” intent, “entertainment” intent, “stock” intent, “food” intent, and “credit card” intent.
  • the concept tree also includes a second level of intent, that is, "ticket” intent connected with sub-intents includes: “view” intent, “reservation” intent, “cancel” intent, "credit card” intent connected with sub-intents includes: “transaction” intent , "Logout” intention, "View” intention, “Repayment” intention.
  • Key words that can be included for “ticket” are, for example, “ticket”, “flight”, “airline”, “airport”, “weather”, “temperature”, “temperature”, “tourism”, “delay insurance”, 'Accident insurance', ..., 'boarding'; keywords that can be included for "stock” intentions, such as “stock”, “market”, “market”, “recommendation”, “index”, “increase” , 'K line', 'stock trading', 'stock market', 'holding', 'making money', 'analysis', ..., 'long-term'.
  • Step 01 Obtain the target sentence that requires intent recognition.
  • the objects that need to be intentionally identified may be voice, text, pictures, video, etc. That is, extract sentence text from speech, text, pictures, and video, and use it as a target sentence that requires intention recognition.
  • Step 02 Perform word segmentation processing on the target sentence to obtain at least one traversal word.
  • the word segmentation method includes Stanford word segmentation method or stammer word segmentation method.
  • the word segmentation is at least one ergodic word
  • the ergodic words located in the word list in at least one ergodic word are deleted, and the remaining traversal words are deleted.
  • the obtained ergodic words are ergodic word 1 and ergodic word 2, respectively.
  • Step 03 For each current traversal word, traverse the keyword corresponding to each intention in the pre-constructed concept tree, and calculate the word vector similarity between the current traversal word and each keyword traversed.
  • the keywords corresponding to the intents of the first layer are traversed. For example, taking the concept tree in FIG. 2 as an example, the traversal word 1 is traversed first, and the keywords corresponding to the intent of the "ticket" can be traversed.
  • the word vector similarity between the traversed word 1 and the traversed keyword is calculated. In this embodiment, the word vector similarity can be solved using the word2Vec word vector.
  • Step 04 Calculate the intent score corresponding to each intent of the target sentence according to the similarity between the word vector of the current traversed word and each keyword traversed, and the weight value corresponding to each traversed keyword.
  • calculating the intent score corresponding to each intent of the target sentence can be calculated by the following formula (4) and formula (5).
  • S is used to characterize the intent score corresponding to the current intention of the target sentence; m is used to characterize the total number of traversal words; Si is used to characterize the intent score corresponding to the current intent of the i-th traversal word; n is used to represent the total number of keywords corresponding to the current intention, Pij is used to represent the word vector similarity of the ith traversal word and the jth keyword corresponding to the current intention, and Qj is used to represent the jth corresponding to the current intention The weight of the keyword.
  • the current intent includes keyword 1 and keyword 2
  • the target sentence includes ergodic word 1 and ergodic word 2.
  • the intent score S1 corresponding to ergodic word 1 in the current intent is the word vector similarity between ergodic word 1 and keyword 1.
  • the intent score S2 corresponding to the current intention of ergodic word 2 is ergodic word 2 and keyword
  • the intention score S corresponding to the current intention of the target sentence is the sum of S1 and S2.
  • Step 05 Determine the intent corresponding to the target sentence according to the intent score corresponding to the intent of the target sentence and the intent threshold corresponding to each layer set in advance.
  • the intent corresponding to the target sentence can be determined as follows:
  • Step 051 For the current layer, determine the target intention with the highest intention score on the current layer.
  • the intent score of each intent in the first layer select the target intent with the highest intent score of the layer. For example, if the intent of the "ticket" intent is the highest, then the target intent is " "Ticket” intent.
  • Step 052 Determine whether the intent score of the target intent is greater than the intent threshold corresponding to the layer to which it belongs. If yes, go to step 053; if not, go to step 056.
  • each level of the concept tree can be set with an intent threshold, and the intent thresholds of each layer can be the same or different.
  • Step 053 determine whether the target intent includes sub-intents, if the target intent includes sub-intents, perform step 054; if the target intent does not include sub-intents, perform step 055.
  • Step 054 Determine the target sub-intent with the highest intention score among the sub-intents included in the target intent, and perform step 052 with the target sub-intent as the target intent.
  • the target intent includes sub-intents
  • Step 055 Determine the target intention as the intention corresponding to the target sentence, and end;
  • Step 056 determine whether the target intent includes the parent intent, if so, perform step 057; if not, perform step 058;
  • Step 057 Determine the parent intent corresponding to the target intent as the intent corresponding to the target sentence, and end;
  • Step 058 Determine that the target sentence has no intention, and end.
  • the root node connects the "weather” intent and the "ticket” intent, where the "weather” intent includes keywords: “weather” (weight value 0.2) and “Today” (weight value 0.3); "ticket” is intended to include keywords: “weather” (weight value 0.05) and “flight” (weight value 0.4).
  • the target sentence that requires intent recognition is: "How was the temperature yesterday?".
  • three traversal words: "yesterday”, "temperature” and "how” are obtained.
  • the preset similarity threshold is 0.8. When the word vector similarity is less than 0.8, the product of the word vector similarity and the keyword weight is 0.
  • the similarity between the word vectors of the traversal word “how” and the keywords “weather” and “flight” are 0.001 and 0.001 respectively, then the traversal can be calculated according to formula (4) and formula (5)
  • the intention score of the word "temperature” on the intention of "ticket” 0.
  • the target sentence has the highest intention score on the “weather” intention. Then determine whether the intent score on the "weather” intention exceeds 0.448 the intent threshold of this layer (assuming that the intent threshold value of this layer is set to 0.3), then the intent score of the "weather” intent exceeds the set intent threshold, Then continue to traverse the keywords corresponding to the sub-intent of the "weather” intention in the same way. If the "weather” intention has no sub-intents, the intention of determining the target sentence belongs to the "weather” intention.
  • the intent of determining the target sentence "how is the weather today" is "Ask the weather”. If the intent score of the child intent does not exceed the set intent threshold, the intention of the target sentence is determined to be the parent intent, that is, the “weather” intent.
  • the weight of each keyword can also be adjusted, for example, the frequency of the keyword corresponding to the intention can be output according to a period of time (such as a week, a month, etc.)
  • a period of time such as a week, a month, etc.
  • the concept tree-based intention recognition device 10 may include or be divided into one or more program modules, one or more
  • the program module is stored in the storage medium and executed by one or more processors to complete the present application, and can implement the above-mentioned concept tree-based intent recognition method.
  • the program module referred to in this application refers to a series of computer program instruction segments capable of performing specific functions, and is more suitable than the program itself to describe the execution process of the concept tree-based intention recognition device 10 in the storage medium. The following description will specifically introduce the functions of the program modules of this embodiment:
  • the target sentence obtaining module 11 is used to obtain the target sentence that needs to be identified;
  • the word segmentation processing module 12 is configured to perform word segmentation processing on the target sentence to obtain at least one traversal word;
  • the keyword traversal module 13 is configured to traverse the keywords corresponding to each intention in the pre-constructed concept tree for each current traversal word; wherein the concept tree includes at least one layer, and each layer includes at least one intention , Each intent corresponds to at least one keyword, and each keyword is set with a corresponding weight; the n + 1th level in the concept tree is the n-th level sub-intent, and the nth level in the concept tree is the nth level +1 layer of parental intention, n is a positive integer;
  • the word vector similarity calculation module 14 is used to calculate the word vector similarity between the current traversed word and each keyword traversed;
  • the intent score calculation module 15 is used to calculate the correspondence between the target sentence and each intent according to the similarity between the current traversed word and the traversed word vector of each keyword, and the weight corresponding to each traversed keyword Score of intent
  • the intent determination module 16 is used to determine the intent corresponding to the target sentence according to the intent score corresponding to the intent of the target sentence and the intent threshold corresponding to each layer set in advance.
  • This embodiment also provides a computer device, such as a smartphone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server, or a rack server (including an independent server, or A server cluster composed of multiple servers), etc.
  • the computer device 20 of this embodiment includes at least but not limited to: a memory 21 and a processor 22 that can be communicatively connected to each other through a system bus, as shown in FIG. 5. It should be noted that FIG. 5 only shows the computer device 20 having components 21-22, but it should be understood that it is not required to implement all the components shown, and that more or fewer components may be implemented instead.
  • the memory 21 (ie, readable storage medium) includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), Read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 21 may be an internal storage unit of the computer device 20, such as a hard disk or memory of the computer device 20.
  • the memory 21 may also be an external storage device of the computer device 20, such as a plug-in hard disk equipped on the computer device 20, a smart memory card (Smart Medna Card, SMC), and a secure digital (Secure Dngntal, SD) card, flash card (Flash Card), etc.
  • the memory 21 may also include both the internal storage unit of the computer device 20 and its external storage device.
  • the memory 21 is generally used to store the operating system and various application software installed in the computer device 20, such as the program code of the concept tree-based intention recognition device 10 of the first embodiment.
  • the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 22 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments.
  • the processor 22 is generally used to control the overall operation of the computer device 20.
  • the processor 22 is used to run the program code or process data stored in the memory 21, for example, to run the concept tree-based intention recognition device 10, so as to implement the concept tree-based intention recognition method of the first embodiment.
  • This embodiment also provides a computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), only Read memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, server, App store, etc., on which computer programs are stored, When the program is executed by the processor, the corresponding function is realized.
  • the computer-readable storage medium of this embodiment is used to store a concept tree-based intention recognition device 10, and when executed by a processor, implements the concept tree-based intention tree recognition method of Embodiment 1.

Abstract

一种基于概念树的意图识别方法、装置及计算机设备,涉及数据分析技术领域,通过构建概念树,概念树包括至少一个层,每一个层包括至少一个意图,每一个意图对应至少一个关键词,每一个关键词设置有相应的权值,将目标语句分词处理为至少一个遍历词,并针对每一个当前遍历词,对概念树中每一个意图对应的关键词进行遍历,并计算当前遍历词与遍历到的每一个关键词的词向量相似度,根据词向量相似度以及每一个关键词的权值,计算目标语句在每一个意图对应的意图分值,根据意图分值以及每一个层的意图阈值,来确定目标语句的意图。该方法无需特定领域的大量训练样本,通过概念树实现关系网络分析,从而可以准确的预估出目标语句的意图。

Description

基于概念树的意图识别方法、装置及计算机设备
本申请申明享有2018年10月31日递交的申请号为CN 2018112855371、名称为“基于概念树的意图识别方法、装置及计算机设备”的中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
技术领域
本申请涉及数据分析技术领域,尤其涉及一种基于概念树的意图识别方法、装置及计算机设备。
背景技术
在服务机器人领域中,机器人可处理特定领域业务咨询业务,在处理咨询业务过程中,机器人能否正确识别用户的意图是机器人能有效给用户提供业务问答回复的重要因素。因此,基于意图识别的研究在不断地发展。
目前,在处理特定领域的意图识别中,大多数使用机器学习或者深度学习的方法对数据进行训练模型,从而进行意图分类。但是,因为应用场景是在特定领域下的意图识别,所以可提供的训练样本相对来说比较少,所以基于机器学习或者深度学习的方法在特定领域下可能会由于数据训练样本较少的原因,训练出来的模型效果不明显或是模型根本不可用。
因此,需要提供一种能够准确识别意图的方法。
发明内容
本申请的目的是提供一种基于概念树的意图识别方法、装置及计算机设备,用于解决现有技术存在的问题。
为实现上述目的,本申请提供一种基于概念树的意图识别方法,包括以下步骤:
步骤01,获取需要进行意图识别的目标语句;
步骤02,对所述目标语句进行分词处理,得到至少一个遍历词;
步骤03,针对每一个当前遍历词,对预先构建的概念树中每一个意图对应的关键词进行遍历,并计算当前遍历词与遍历到的每一个关键词的词向量相似度;其中,所述概念树包括至少一个层,每一个层包括至少一个意图,每一个意图对应至少一个关键词,每一个关键词设置有相应的权值;所述概念树中第n+1层为第n层的子意图,所述概念树中第n 层为第n+1层的父意图,n为正整数;
步骤04,根据当前遍历词与遍历到的每一个关键词的词向量相似度,以及遍历到的每一个关键词对应的权值,计算所述目标语句在每一个意图对应的意图分值;
步骤05,根据所述目标语句在每一个意图上对应的意图分值,以及预先设置的每一个层对应的意图阈值,确定所述目标语句对应的意图。
为实现上述目的,本申请还提供一种基于概念树的意图识别装置,包括:
目标语句获取模块,用于获取需要进行意图识别的目标语句;
分词处理模块,用于对所述目标语句进行分词处理,得到至少一个遍历词;
关键词遍历模块,用于针对每一个当前遍历词,对预先构建的概念树中每一个意图对应的关键词进行遍历;其中,所述概念树包括至少一个层,每一个层包括至少一个意图,每一个意图对应至少一个关键词,每一个关键词设置有相应的权值;所述概念树中第n+1层为第n层的子意图,所述概念树中第n层为第n+1层的父意图,n为正整数;
词向量相似度计算模块,用于计算当前遍历词与遍历到的每一个关键词的词向量相似度;
意图分值计算模块,用于根据当前遍历词与遍历到的每一个关键词的词向量相似度,以及遍历到的每一个关键词对应的权值,计算所述目标语句在每一个意图对应的意图分值;
意图确定模块,用于根据所述目标语句在每一个意图上对应的意图分值,以及预先设置的每一个层对应的意图阈值,确定所述目标语句对应的意图。
为实现上述目的,本申请还提供一种计算机设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现基于概念树的意图识别方法的以下步骤:
步骤01,获取需要进行意图识别的目标语句;
步骤02,对所述目标语句进行分词处理,得到至少一个遍历词;
步骤03,针对每一个当前遍历词,对预先构建的概念树中每一个意图对应的关键词进行遍历,并计算当前遍历词与遍历到的每一个关键词的词向量相似度;其中,所述概念树包括至少一个层,每一个层包括至少一个意图,每一个意图对应至少一个关键词,每一个关键词设置有相应的权值;所述概念树中第n+1层为第n层的子意图,所述概念树中第n层为第n+1层的父意图,n为正整数;
步骤04,根据当前遍历词与遍历到的每一个关键词的词向量相似度,以及遍历到的每一个关键词对应的权值,计算所述目标语句在每一个意图对应的意图分值;
步骤05,根据所述目标语句在每一个意图上对应的意图分值,以及预先设置的每一个 层对应的意图阈值,确定所述目标语句对应的意图。
为实现上述目的,本申请还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现基于概念树的意图识别方法的以下步骤:
步骤01,获取需要进行意图识别的目标语句;
步骤02,对所述目标语句进行分词处理,得到至少一个遍历词;
步骤03,针对每一个当前遍历词,对预先构建的概念树中每一个意图对应的关键词进行遍历,并计算当前遍历词与遍历到的每一个关键词的词向量相似度;其中,所述概念树包括至少一个层,每一个层包括至少一个意图,每一个意图对应至少一个关键词,每一个关键词设置有相应的权值;所述概念树中第n+1层为第n层的子意图,所述概念树中第n层为第n+1层的父意图,n为正整数;
步骤04,根据当前遍历词与遍历到的每一个关键词的词向量相似度,以及遍历到的每一个关键词对应的权值,计算所述目标语句在每一个意图对应的意图分值;
步骤05,根据所述目标语句在每一个意图上对应的意图分值,以及预先设置的每一个层对应的意图阈值,确定所述目标语句对应的意图。
本申请提供的基于概念树的意图识别方法、装置及计算机设备,通过构建概念树,概念树包括至少一个层,每一个层包括至少一个意图,每一个意图对应至少一个关键词,每一个关键词设置有相应的权值,概念树中第n+1层为第n层的子意图,所述概念树中第n层为第n+1层的父意图,通过对需要进行意图识别的目标语句进行分词处理,得到至少一个遍历词,并针对每一个当前遍历词,对概念树中每一个意图对应的关键词进行遍历,并计算当前遍历词与遍历到的每一个关键词的词向量相似度,根据词向量相似度以及每一个关键词的权值,计算目标语句在每一个意图对应的意图分值,根据意图分值以及每一个层的意图阈值,来确定目标语句的意图。本申请无需特定领域的大量训练样本,可以准确的识别出目标语句的意图。
附图说明
图1为本申请基于概念树的意图识别方法实施例一的流程图;
图2为本申请实施例一的一个概念树的简单示例图;
图3为本申请实施例一的另一个概念树的简单示例图;
图4为本申请基于概念树的意图识别装置实施例一的程序模块示意图;
图5为本申请基于概念树的意图识别装置实施例一的硬件结构示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请提供的基于概念树的意图识别方法、装置及计算机设备,适用于数据分析技术领域,为对用户意图进行识别的方法。本申请通过构建概念树,概念树包括至少一个层,每一个层包括至少一个意图,每一个意图对应至少一个关键词,每一个关键词设置有相应的权值,概念树中第n+1层为第n层的子意图,所述概念树中第n层为第n+1层的父意图,通过对需要进行意图识别的目标语句进行分词处理,得到至少一个遍历词,并针对每一个当前遍历词,对概念树中每一个意图对应的关键词进行遍历,并计算当前遍历词与遍历到的每一个关键词的词向量相似度,根据词向量相似度以及每一个关键词的权值,计算目标语句在每一个意图对应的意图分值,根据意图分值以及每一个层的意图阈值,来确定目标语句的意图。本申请无需特定领域的大量训练样本,可以准确的识别出目标语句的意图。
实施例1
请参阅图1,本实施例的一种基于概念树的意图识别方法中,包括以下步骤:
步骤00,预先构建概念树。
其中,所述概念树包括至少一个层,每一个层包括至少一个意图,每一个意图对应至少一个关键词,每一个关键词设置有相应的权值;所述概念树中第n+1层为第n层的子意图,所述概念树中第n层为第n+1层的父意图,n为正整数。
在本实施例中,所述概念树按照如下方式构建:
步骤001,确定用于构建所述概念树第n层的各个意图。
在某一个领域内,不同的句子或词语对应不同的意图,为了能够准确识别用户语句的意图,首先需要确定出第一层的各个意图,其中,第一层的各个意图对应各个领域。例如,“机票”意图,“股票”意图等。其中,n为正整数。
在本实施例中,概念树可以将根节点作为出发点,第一层为各个领域的各个意图,其中,第一层的意图均与根节点连接。第一层的意图可以连接下一层的子意图,子意图也可以连接下一层的子意图。
步骤002,获取各个意图分别对应的数据样本。
在本实施例中,某一意图对应的数据样本其实际意图为该意图,每个意图的数据样本 由句子文本组成,各个意图的数据样本可以在样本库中获取到,样本库为工作过程中积累得到的。
其中,每一个意图对应的数据样本量可以预先设定,例如,每一个意图对应的数据样本量为100个。
步骤003,针对每一个意图,将对应的数据样本进行分词处理,得到该意图对应的至少一个待选词,在至少一个待选词中选择该意图对应的关键词。
其中,分词方法包括斯坦福分词方法或结巴分词方法。
将数据样本进行分词处理后得到的至少一个待选词不一定全部作为关键词使用,可以在至少一个待选词中选择出最能够表明该意图的关键词,选择的关键词的个数可以是一个,也可以是多个。
在本实施例中,在至少一个待选词中选择该意图对应的关键词的方式至少可以采用如下几种方式:
a、针对每一个待选词进行TF-IDF计算,根据待选词的TF-IDF值选取关键词。
在本实施例中,TF-IDF值可以通过如下公式(1)计算:
TF-IDF W=TF W*IDF W式(1)
其中,TF-IDF W用于表征词条W的TF-IDF值;TF W用于表征词条W在该意图中出现的次数,为防止该参数偏向长的文件,通常需要对该参数进行归一化处理;IDF W用于表征词条W的逆向文件频率,其中,IDF的主要思想是:如果包含词条W的意图越少,IDF越大,则说明词条具有很好的类别区分能力。
TF W和IDF W可以通过如下式(2)和式(3)计算得到。
Figure PCTCN2019070295-appb-000001
Figure PCTCN2019070295-appb-000002
在本实施例中,在根据各个待选词的TF-IDF值选取关键词时可以是选取TF-IDF值大于第一阈值的词作为该意图的关键词,例如,该固定阈值为0.12。或者,可以根据TF-IDF值的从大到小进行排序,选择TF-IDF值排序靠前的第一预设数目的待选词作为该意图的关键词。
b、统计每一个待选词的词频(TF值),根据待选词的词频选取关键词。
其中,每一个待选词的TF值根据上述式(2)计算得到。
在本实施例中,可以选取TF值超过设定次数的待选词作为该意图的关键词。
c、对分词后的待选词进行卡方检验,根据卡方检验的值确定该意图的关键词。
在本实施例中,可以通过对分词后的各个待选词进行卡方检验,分别得到每个待选词对应的卡方检验计算出来的卡方检验值,对各个待选词的卡方检验值进行从大到小的排序, 取出卡方检验值排序靠前的第二预设数目的待选词作为该意图的关键词,或者,将卡方检验值大于第二阈值的待选词作为该意图的关键词。
进一步地,在选取意图的关键词时,还可以采用将上述两种或三种方式结合来选取,例如,将上述各个方式下选取的相同关键词作为该意图的关键词;或者,为每一种选取方式设置一个权重,将上述各个方式下选取出的关键词结合对应方式的权重进一步筛选出取值大于第三阈值的词作为该意图的关键词,或者,将取值靠前的第三预设数目的词作为该意图的关键词。如此,可以进一步提升选取的关键词的准确性和可靠性,以便于提升意图识别的准确性。
在本申请一个实施例中,由于在对数据样本进行分词处理之后,在选取关键词之前,可以先删除一些无实质含义的词,例如,停用词,“的”“地”“得”等词。可以设置一个词列表,将这些无实质含义的词存入词列表中,通过词匹配的方式,将分词处理后待选词中位于词列表中的待选词删除。通过删除这些无实质含义的词,在从剩余的待选词中选取关键词,可以减小关键词确定的成本。
步骤004,确定每一个关键词对应的权值,将确定的权值配置给相应的关键词。
在本实施例中,可以直接将计算出来的TF-IDF值作为关键词的权值,也可以对TF-IDF值进行归一化处理后作为关键词的权值(即所有的权值加起来为1),或者,可以根据用户需求修改关键词的权值,或者,可以手动增加需要的关键词并根据用户需求对其进行赋值。
步骤005,判断每一个当前意图是否包括子意图,若包括子意图,则确定用于构建所述概念树第n+1层的各个子意图,并针对各个子意图执行步骤002-005,若不包括子意图,则所述概念树构建完成。
若当前意图包括子意图时,例如,“机票”意图下会有“查看”子意图、“预定”子意图、“取消”子意图,对于该“机票”意图属于概念树的第一层意图,对于“机票”意图下的“查看”子意图、“预定”子意图、“取消”子意图属于概念树的第二层意图。针对子意图可以通过步骤002-005来进一步确定关键词以及相应权值。
请参考图2,为一个概念树的简单示例图,根节点下连接有第一层意图包括:“机票”意图、“娱乐”意图、“股票”意图、“美食”意图、“信用卡”意图。该概念树还包括第二层意图,即“机票”意图连接有子意图包括:“查看”意图、“预定”意图、“取消”意图,“信用卡”意图连接有子意图包括:“办理”意图、“注销”意图、“查看”意图、“还款”意图。
对于“机票”意图可以包括的关键词,例如为‘机票’,‘航班’,‘航线’,‘机场’,‘天气’,‘温度’,‘气温’,‘旅游’,‘延误险’,‘意外险’,...,‘登机’;对于“股票”意图可以包括的关键词,例如为‘股票’,‘行情’,‘大盘’,‘推 荐’,‘指数’,‘涨幅’,‘K线’,‘炒股’,‘股市’,‘持有’,‘赚钱’,‘分析’,...,‘长期’。
步骤01,获取需要进行意图识别的目标语句。
在本实施例中,需要进行意图识别的对象可以是语音、文字、图片、影像等。即对语音、文字、图片、影像提取出句子文本,将其作为需要进行意图识别的目标语句。
步骤02,对所述目标语句进行分词处理,得到至少一个遍历词。
其中,分词方法包括斯坦福分词方法或结巴分词方法。
进一步地,在分词为至少一个遍历词之后,为了进一步降低遍历概念树的成本,可以按照步骤00中设置的词列表,将至少一个遍历词中位于词列表中的遍历词删除,从剩余的遍历词中执行后续步骤。
例如,得到的遍历词分别为遍历词1和遍历词2。
步骤03,针对每一个当前遍历词,对预先构建的概念树中每一个意图对应的关键词进行遍历,并计算当前遍历词与遍历到的每一个关键词的词向量相似度。
首先从根节点出发,对第一层各个意图对应的关键词进行遍历,例如,以图2的概念树为例,首先针对遍历词1进行遍历,可以遍历“机票”意图对应的关键词,每遍历到一个关键词,计算遍历词1与该遍历到的关键词的词向量相似度,在本实施例中,词向量相似度可以利用word2Vec词向量求解。
步骤04,根据当前遍历词与遍历到的每一个关键词的词向量相似度,以及遍历到的每一个关键词对应的权值,计算所述目标语句在每一个意图对应的意图分值。
在本实施例中,计算所述目标语句在每一个意图上对应的意图分值可以通过如下式(4)式(5)进行计算。
Figure PCTCN2019070295-appb-000003
Figure PCTCN2019070295-appb-000004
其中,S用于表征所述目标语句在当前意图上对应的意图分值;m用于表征遍历词的总个数;Si用于表征第i个遍历词在当前意图上对应的意图分值;n用于表征当前意图对应关键词的总个数,Pij用于表征第i个遍历词与当前意图对应的第j个关键词的词向量相似度,Qj用于表征当前意图对应的第j个关键词的权值。
例如,当前意图包括关键词1和关键词2,目标语句包括遍历词1和遍历词2,遍历词1在当前意图上对应的意图分值S1为遍历词1与关键词1的词向量相似度与关键词1权值的乘积+遍历词1与关键词2的词向量相似度与关键词2权值的乘积,遍历词2在当前意图上对应的意图分值S2为遍历词2与关键词1的词向量相似度与关键词1权值的乘积+遍历词2与关键词2的词向量相似度与关键词2权值的乘积。目标语句在当前意图上对应的意 图分值S为S1与S2的和。
在本申请一个实施例中,可以预先设置相似度阈值,当遍历词与关键词的词向量相似度小于该相似度阈值时,那么则设定该遍历词与关键词的词向量相似度与关键词权值的乘积为0,即在Pij的值小于设定的相似度阈值时,则Pij*Qj=0。
步骤05,根据所述目标语句在每一个意图上对应的意图分值,以及预先设置的每一个层对应的意图阈值,确定所述目标语句对应的意图。
在本实施例中,可以通过如下方式确定目标语句对应的意图:
步骤051:针对当前层,确定当前层上意图分值最高的目标意图。
以当前层为第一层为例,对于第一层中各个意图的意图分值,选择该层意图分值最高的目标意图,例如,“机票”意图的意图分值最高,那么目标意图为“机票”意图。
步骤052:判断所述目标意图的意图分值是否大于所属层对应的意图阈值,若是,执行步骤053;若否,执行步骤056。
在本实施中,概念树的每一个层均可以设置一个意图阈值,各个层的意图阈值可以相同,也可以不同。
在本步骤中,需要判断“机票”意图的意图分值是否大于第一层的意图阈值。
步骤053:判断所述目标意图是否包括子意图,若所述目标意图包括子意图,执行步骤054;若所述目标意图不包括子意图,则执行步骤055。
步骤054:确定所述目标意图包括的子意图中意图分值最高的目标子意图,将所述目标子意图作为所述目标意图执行步骤052。
在确定目标意图包括子意图时,则需要继续针对遍历词遍历子意图对应的各个关键词,并计算出每一个子意图的意图分值,将每一个子意图作为目标意图继续执行步骤052。
步骤055:将所述目标意图确定为所述目标语句对应的意图,结束;
步骤056:判断所述目标意图是否包括父意图,若包括,则执行步骤057;若不包括,则执行步骤058;
步骤057:将所述目标意图对应的父意图确定为所述目标语句对应的意图,结束;
步骤058:确定所述目标语句没有意图,结束。
下面以概念树中包括两个意图为例,请参考图3,根节点连接“天气”意图和“机票”意图,其中,“天气”意图包括关键词:“天气”(权值为0.2)和“今天”(权值为0.3);“机票”意图包括关键词:“天气”(权值为0.05)和“航班”(权值为0.4)。
例如,需要进行意图识别的目标语句是:“昨天的气温怎么样?”。将该目标语句分词处理后得到:“昨天”“气温”“怎么样”这三个遍历词。预先设定相似度阈值为0.8,在词向量相似度小于0.8时,词向量相似度与关键词权值的乘积为0。
针对遍历词“昨天”:A、在“天气”意图上,遍历词“昨天”与关键词“天气”、“今天”的词向量相似度分别为0.001、0.89,那么根据式(4)式(5)可以计算得到遍历词“昨天”在“天气”意图上的意图分值=0+0.89*0.2。B、在“机票”意图上,遍历词“昨天”与关键词“天气”、“航班”的词向量相似度分别为0.001、0.002,那么根据式(4)式(5)可以计算得到遍历词“昨天”在“机票”意图上的意图分值=0。
针对遍历词“气温”:A、在“天气”意图上,遍历词“气温”与关键词“天气”、“今天”的词向量相似度分别为0.9、0.001,那么根据式(4)式(5)可以计算得到遍历词“气温”在“天气”意图上的意图分值=0.9*0.3+0。B、在“机票”意图上,遍历词“气温”与关键词“天气”、“航班”的词向量相似度分别为0.9、0.001,那么根据式(4)式(5)可以计算得到遍历词“气温”在“机票”意图上的意图分值=0.9*0.05。
针对遍历词“怎么样”:A、在“天气”意图上,遍历词“怎么样”与关键词“天气”、“今天”的词向量相似度分别为0.001、0.001,那么根据式(4)式(5)可以计算得到遍历词“怎么样”在“天气”意图上的意图分值=0。B、在“机票”意图上,遍历词“怎么样”与关键词“天气”、“航班”的词向量相似度分别为0.001、0.001,那么根据式(4)式(5)可以计算得到遍历词“气温”在“机票”意图上的意图分值=0。
综上,可以得出目标语句在“天气”意图上的意图分值=0.2*0.89+0.3*0.9+0=0.448,目标语句在“机票”意图上的意图分值=0+0.05*0.90+0=0.045。
针对概念树的第一层,可以确定目标语句在“天气”意图上的意图分值最高。然后判断该“天气”意图上的意图分值0.448是否超过这一层的意图阈值(假设该层意图阈值设置为0.3),此时“天气”意图的意图分值超过所设定的意图阈值,则继续以同样的方式遍历“天气”意图的子意图对应的关键词。若“天气”意图没有子意图,则确定目标语句的意图就是属于“天气”意图。若“天气”意图有子意图(假如是“询问天气“意图)且“询问天气”意图的意图分值超过子意图层设定的意图阈值,则确定目标语句”今天天气怎么样”的意图为“询问天气”。假如该子意图的意图分值没有超过设定的意图阈值,则确定目标语句的意图为父级意图,即“天气”意图。
在本实施例中,在构建多层概念树之后,还可调整各关键词的权值,比如可根据一段时间内(如一周、一个月等等)输出意图对应的关键词的频率,更改该关键词的权值,输出关键词的频率越高,增加该关键词的权值,反之,则减小该关键词的权值。此外,还可收集输出没有意图的句子,对这些句子进行训练分析,整理出新的意图关键词,并将该新的意图关键词更新到多层概念树中,以提升意图识别的成功率和可靠性。
请继续参阅图4,示出了一种基于概念树的意图识别装置,在本实施例中,基于概念 树的意图识别装置10可以包括或被分割成一个或多个程序模块,一个或者多个程序模块被存储于存储介质中,并由一个或多个处理器所执行,以完成本申请,并可实现上述基于概念树的意图识别方法。本申请所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,比程序本身更适合于描述基于概念树的意图识别装置10在存储介质中的执行过程。以下描述将具体介绍本实施例各程序模块的功能:
目标语句获取模块11,用于获取需要进行意图识别的目标语句;
分词处理模块12,用于对所述目标语句进行分词处理,得到至少一个遍历词;
关键词遍历模块13,用于针对每一个当前遍历词,对预先构建的概念树中每一个意图对应的关键词进行遍历;其中,所述概念树包括至少一个层,每一个层包括至少一个意图,每一个意图对应至少一个关键词,每一个关键词设置有相应的权值;所述概念树中第n+1层为第n层的子意图,所述概念树中第n层为第n+1层的父意图,n为正整数;
词向量相似度计算模块14,用于计算当前遍历词与遍历到的每一个关键词的词向量相似度;
意图分值计算模块15,用于根据当前遍历词与遍历到的每一个关键词的词向量相似度,以及遍历到的每一个关键词对应的权值,计算所述目标语句在每一个意图对应的意图分值;
意图确定模块16,用于根据所述目标语句在每一个意图上对应的意图分值,以及预先设置的每一个层对应的意图阈值,确定所述目标语句对应的意图。
本实施例还提供一种计算机设备,如可以执行程序的智能手机、平板电脑、笔记本电脑、台式计算机、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。本实施例的计算机设备20至少包括但不限于:可通过系统总线相互通信连接的存储器21、处理器22,如图5所示。需要指出的是,图5仅示出了具有组件21-22的计算机设备20,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
本实施例中,存储器21(即可读存储介质)包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器21可以是计算机设备20的内部存储单元,例如该计算机设备20的硬盘或内存。在另一些实施例中,存储器21也可以是计算机设备20的外部存储设备,例如该计算机设备20上配备的插接式硬盘,智能存储卡(Smart Medna Card,SMC),安全数字(Secure Dngntal,SD)卡,闪存卡(Flash Card)等。当然,存储器21还可以既包括计算机设备20的内部存储单元也包括其外部存储设备。本 实施例中,存储器21通常用于存储安装于计算机设备20的操作系统和各类应用软件,例如实施例一的基于概念树的意图识别装置10的程序代码等。此外,存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。
处理器22在一些实施例中可以是中央处理器(Central ProcessnngUnnt,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器22通常用于控制计算机设备20的总体操作。本实施例中,处理器22用于运行存储器21中存储的程序代码或者处理数据,例如运行基于概念树的意图识别装置10,以实现实施例一的基于概念树的意图识别方法。
本实施例还提供一种计算机可读存储介质,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机程序,程序被处理器执行时实现相应功能。本实施例的计算机可读存储介质用于存储基于概念树的意图识别装置10,被处理器执行时实现实施例一的基于概念树的意图识别方法。

Claims (20)

  1. 一种基于概念树的意图识别方法,其特征在于,包括以下步骤:
    步骤01,获取需要进行意图识别的目标语句;
    步骤02,对所述目标语句进行分词处理,得到至少一个遍历词;
    步骤03,针对每一个当前遍历词,对预先构建的概念树中每一个意图对应的关键词进行遍历,并计算当前遍历词与遍历到的每一个关键词的词向量相似度;其中,所述概念树包括至少一个层,每一个层包括至少一个意图,每一个意图对应至少一个关键词,每一个关键词设置有相应的权值;所述概念树中第n+1层为第n层的子意图,所述概念树中第n层为第n+1层的父意图,n为正整数;
    步骤04,根据当前遍历词与遍历到的每一个关键词的词向量相似度,以及遍历到的每一个关键词对应的权值,计算所述目标语句在每一个意图对应的意图分值;
    步骤05,根据所述目标语句在每一个意图上对应的意图分值,以及预先设置的每一个层对应的意图阈值,确定所述目标语句对应的意图。
  2. 根据权利要求1所述的基于概念树的意图识别方法,其特征在于,所述概念树按照如下方式构建:
    步骤001,确定用于构建所述概念树第n层的各个意图;
    步骤002,获取各个意图分别对应的数据样本;
    步骤003,针对每一个意图,将对应的数据样本进行分词处理,得到该意图对应的至少一个待选词,在至少一个待选词中选择该意图对应的关键词;
    步骤004,确定每一个关键词对应的权值,将确定的权值配置给相应的关键词;
    步骤005,判断每一个当前意图是否包括子意图,若包括子意图,则确定用于构建所述概念树第n+1层的各个子意图,并针对各个子意图执行步骤002-005,若不包括子意图,则所述概念树构建完成。
  3. 根据权利要求2所述的基于概念树的意图识别方法,其特征在于,所述在至少一个待选词中选择该意图对应的关键词,包括:针对每一个待选词进行TF-IDF计算,根据待选词的TF-IDF值选取关键词。
  4. 根据权利要求1所述的基于概念树的意图识别方法,其特征在于,
    方法还包括:预先配置词列表;所述词列表中包括若干个无实质含义的词;
    在步骤03之前,还包括:将得到的至少一个遍历词中位于所述词列表中的遍历词删除,并针对删除操作后剩余的至少一个遍历词执行步骤03。
  5. 根据权利要求1所述的基于概念树的意图识别方法,其特征在于,所述步骤04中 计算所述目标语句在每一个意图上对应的意图分值通过如下公式计算:
    Figure PCTCN2019070295-appb-100001
    Figure PCTCN2019070295-appb-100002
    其中,S用于表征所述目标语句在当前意图上对应的意图分值;m用于表征遍历词的总个数;Si用于表征第i个遍历词在当前意图上对应的意图分值;n用于表征当前意图对应关键词的总个数,Pij用于表征第i个遍历词与当前意图对应的第j个关键词的词向量相似度,Qj用于表征当前意图对应的第j个关键词的权值。
  6. 根据权利要求5所述的基于概念树的意图识别方法,其特征在于,在Pij的值小于设定的相似度阈值时,则Pij*Qj=0。
  7. 根据权利要求1-6中任一所述的基于概念树的意图识别方法,其特征在于,所述步骤05包括:
    步骤051:针对当前层,确定当前层上意图分值最高的目标意图;
    步骤052:判断所述目标意图的意图分值是否大于所属层对应的意图阈值,若是,执行步骤053;若否,执行步骤056;
    步骤053:判断所述目标意图是否包括子意图,若所述目标意图包括子意图,执行步骤054;若所述目标意图不包括子意图,则执行步骤055;
    步骤054:确定所述目标意图包括的子意图中意图分值最高的目标子意图,将所述目标子意图作为所述目标意图执行步骤052;
    步骤055:将所述目标意图确定为所述目标语句对应的意图,结束;
    步骤056:判断所述目标意图是否包括父意图,若包括,则执行步骤057;若不包括,则执行步骤058;
    步骤057:将所述目标意图对应的父意图确定为所述目标语句对应的意图,结束;
    步骤058:确定所述目标语句没有意图,结束。
  8. 一种基于概念树的意图识别装置,其特征在于,包括:
    目标语句获取模块,用于获取需要进行意图识别的目标语句;
    分词处理模块,用于对所述目标语句进行分词处理,得到至少一个遍历词;
    关键词遍历模块,用于针对每一个当前遍历词,对预先构建的概念树中每一个意图对应的关键词进行遍历;其中,所述概念树包括至少一个层,每一个层包括至少一个意图,每一个意图对应至少一个关键词,每一个关键词设置有相应的权值;所述概念树中第n+1层为第n层的子意图,所述概念树中第n层为第n+1层的父意图,n为正整数;
    词向量相似度计算模块,用于计算当前遍历词与遍历到的每一个关键词的词向量相似度;
    意图分值计算模块,用于根据当前遍历词与遍历到的每一个关键词的词向量相似度,以及遍历到的每一个关键词对应的权值,计算所述目标语句在每一个意图对应的意图分值;
    意图确定模块,用于根据所述目标语句在每一个意图上对应的意图分值,以及预先设置的每一个层对应的意图阈值,确定所述目标语句对应的意图。
  9. 一种计算机设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现基于概念树的意图识别方法的以下步骤:
    步骤01,获取需要进行意图识别的目标语句;
    步骤02,对所述目标语句进行分词处理,得到至少一个遍历词;
    步骤03,针对每一个当前遍历词,对预先构建的概念树中每一个意图对应的关键词进行遍历,并计算当前遍历词与遍历到的每一个关键词的词向量相似度;其中,所述概念树包括至少一个层,每一个层包括至少一个意图,每一个意图对应至少一个关键词,每一个关键词设置有相应的权值;所述概念树中第n+1层为第n层的子意图,所述概念树中第n层为第n+1层的父意图,n为正整数;
    步骤04,根据当前遍历词与遍历到的每一个关键词的词向量相似度,以及遍历到的每一个关键词对应的权值,计算所述目标语句在每一个意图对应的意图分值;
    步骤05,根据所述目标语句在每一个意图上对应的意图分值,以及预先设置的每一个层对应的意图阈值,确定所述目标语句对应的意图。
  10. 根据权利要求9所述的计算机设备,其特征在于,所述概念树按照如下方式构建:
    步骤001,确定用于构建所述概念树第n层的各个意图;
    步骤002,获取各个意图分别对应的数据样本;
    步骤003,针对每一个意图,将对应的数据样本进行分词处理,得到该意图对应的至少一个待选词,在至少一个待选词中选择该意图对应的关键词;
    步骤004,确定每一个关键词对应的权值,将确定的权值配置给相应的关键词;
    步骤005,判断每一个当前意图是否包括子意图,若包括子意图,则确定用于构建所述概念树第n+1层的各个子意图,并针对各个子意图执行步骤002-005,若不包括子意图,则所述概念树构建完成。
  11. 根据权利要求9所述的计算机设备,其特征在于,
    还包括:预先配置词列表;所述词列表中包括若干个无实质含义的词;
    在步骤03之前,还包括:将得到的至少一个遍历词中位于所述词列表中的遍历词删除,并针对删除操作后剩余的至少一个遍历词执行步骤03。
  12. 根据权利要求9所述的计算机设备,其特征在于,所述步骤04中计算所述目标语句在每一个意图上对应的意图分值通过如下公式计算:
    Figure PCTCN2019070295-appb-100003
    Figure PCTCN2019070295-appb-100004
    其中,S用于表征所述目标语句在当前意图上对应的意图分值;m用于表征遍历词的总个数;S i用于表征第i个遍历词在当前意图上对应的意图分值;n用于表征当前意图对应关键词的总个数,P ij用于表征第i个遍历词与当前意图对应的第j个关键词的词向量相似度,Q j用于表征当前意图对应的第j个关键词的权值。
  13. 根据权利要求12所述的计算机设备,其特征在于,在P ij的值小于设定的相似度阈值时,则P ij*Q j=0。
  14. 根据权利要求9-13中任一所述的计算机设备,其特征在于,所述步骤05包括:
    步骤051:针对当前层,确定当前层上意图分值最高的目标意图;
    步骤052:判断所述目标意图的意图分值是否大于所属层对应的意图阈值,若是,执行步骤053;若否,执行步骤056;
    步骤053:判断所述目标意图是否包括子意图,若所述目标意图包括子意图,执行步骤054;若所述目标意图不包括子意图,则执行步骤055;
    步骤054:确定所述目标意图包括的子意图中意图分值最高的目标子意图,将所述目标子意图作为所述目标意图执行步骤052;
    步骤055:将所述目标意图确定为所述目标语句对应的意图,结束;
    步骤056:判断所述目标意图是否包括父意图,若包括,则执行步骤057;若不包括,则执行步骤058;
    步骤057:将所述目标意图对应的父意图确定为所述目标语句对应的意图,结束;
    步骤058:确定所述目标语句没有意图,结束。
  15. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现基于概念树的意图识别方法的以下步骤:
    步骤01,获取需要进行意图识别的目标语句;
    步骤02,对所述目标语句进行分词处理,得到至少一个遍历词;
    步骤03,针对每一个当前遍历词,对预先构建的概念树中每一个意图对应的关键词进行遍历,并计算当前遍历词与遍历到的每一个关键词的词向量相似度;其中,所述概念树包括至少一个层,每一个层包括至少一个意图,每一个意图对应至少一个关键词,每一个关键词设置有相应的权值;所述概念树中第n+1层为第n层的子意图,所述概念树中第n层为第n+1层的父意图,n为正整数;
    步骤04,根据当前遍历词与遍历到的每一个关键词的词向量相似度,以及遍历到的每一个关键词对应的权值,计算所述目标语句在每一个意图对应的意图分值;
    步骤05,根据所述目标语句在每一个意图上对应的意图分值,以及预先设置的每一个层对应的意图阈值,确定所述目标语句对应的意图。
  16. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述概念树按照如下方式构建:
    步骤001,确定用于构建所述概念树第n层的各个意图;
    步骤002,获取各个意图分别对应的数据样本;
    步骤003,针对每一个意图,将对应的数据样本进行分词处理,得到该意图对应的至少一个待选词,在至少一个待选词中选择该意图对应的关键词;
    步骤004,确定每一个关键词对应的权值,将确定的权值配置给相应的关键词;
    步骤005,判断每一个当前意图是否包括子意图,若包括子意图,则确定用于构建所述概念树第n+1层的各个子意图,并针对各个子意图执行步骤002-005,若不包括子意图,则所述概念树构建完成。
  17. 根据权利要求15所述的计算机可读存储介质,其特征在于,
    还包括:预先配置词列表;所述词列表中包括若干个无实质含义的词;
    在步骤03之前,还包括:将得到的至少一个遍历词中位于所述词列表中的遍历词删除,并针对删除操作后剩余的至少一个遍历词执行步骤03。
  18. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述步骤04中计算所述目标语句在每一个意图上对应的意图分值通过如下公式计算:
    Figure PCTCN2019070295-appb-100005
    Figure PCTCN2019070295-appb-100006
    其中,S用于表征所述目标语句在当前意图上对应的意图分值;m用于表征遍历词的总个数;S i用于表征第i个遍历词在当前意图上对应的意图分值;n用于表征当前意图对应 关键词的总个数,P ij用于表征第i个遍历词与当前意图对应的第j个关键词的词向量相似度,Q j用于表征当前意图对应的第j个关键词的权值。
  19. 根据权利要求18所述的计算机可读存储介质,其特征在于,在P ij的值小于设定的相似度阈值时,则P ij*Q j=0。
  20. 根据权利要求15-19所述的计算机可读存储介质,其特征在于,所述步骤05包括:
    步骤051:针对当前层,确定当前层上意图分值最高的目标意图;
    步骤052:判断所述目标意图的意图分值是否大于所属层对应的意图阈值,若是,执行步骤053;若否,执行步骤056;
    步骤053:判断所述目标意图是否包括子意图,若所述目标意图包括子意图,执行步骤054;若所述目标意图不包括子意图,则执行步骤055;
    步骤054:确定所述目标意图包括的子意图中意图分值最高的目标子意图,将所述目标子意图作为所述目标意图执行步骤052;
    步骤055:将所述目标意图确定为所述目标语句对应的意图,结束;
    步骤056:判断所述目标意图是否包括父意图,若包括,则执行步骤057;若不包括,则执行步骤058;
    步骤057:将所述目标意图对应的父意图确定为所述目标语句对应的意图,结束;
    步骤058:确定所述目标语句没有意图,结束。
PCT/CN2019/070295 2018-10-31 2019-01-03 基于概念树的意图识别方法、装置及计算机设备 WO2020087774A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811285537.1A CN109492222B (zh) 2018-10-31 2018-10-31 基于概念树的意图识别方法、装置及计算机设备
CN201811285537.1 2018-10-31

Publications (1)

Publication Number Publication Date
WO2020087774A1 true WO2020087774A1 (zh) 2020-05-07

Family

ID=65693411

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/070295 WO2020087774A1 (zh) 2018-10-31 2019-01-03 基于概念树的意图识别方法、装置及计算机设备

Country Status (2)

Country Link
CN (1) CN109492222B (zh)
WO (1) WO2020087774A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111708873A (zh) * 2020-06-15 2020-09-25 腾讯科技(深圳)有限公司 智能问答方法、装置、计算机设备和存储介质
CN111814481A (zh) * 2020-08-24 2020-10-23 深圳市欢太科技有限公司 购物意图识别方法、装置、终端设备及存储介质

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492222B (zh) * 2018-10-31 2023-04-07 平安科技(深圳)有限公司 基于概念树的意图识别方法、装置及计算机设备
CN109815314B (zh) * 2019-01-04 2023-08-08 平安科技(深圳)有限公司 一种意图识别方法、识别设备及计算机可读存储介质
CN112699909B (zh) * 2019-10-23 2024-03-19 中移物联网有限公司 信息识别方法、装置、电子设备及计算机可读存储介质
CN111832305B (zh) * 2020-07-03 2023-08-25 北京小鹏汽车有限公司 一种用户意图识别方法、装置、服务器和介质
CN112016296B (zh) * 2020-09-07 2023-08-25 平安科技(深圳)有限公司 句子向量生成方法、装置、设备及存储介质
CN112199958A (zh) * 2020-09-30 2021-01-08 平安科技(深圳)有限公司 概念词序列生成方法、装置、计算机设备及存储介质
CN112948550A (zh) * 2021-02-04 2021-06-11 维沃移动通信有限公司 日程创建方法、装置和电子设备
CN113887224A (zh) * 2021-10-19 2022-01-04 京东科技信息技术有限公司 语句意图识别方法、语句应答方法、装置和电子设备
CN115080786A (zh) * 2022-08-22 2022-09-20 科大讯飞股份有限公司 基于图片作诗的方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970864A (zh) * 2014-05-08 2014-08-06 清华大学 基于微博文本的情绪分类和情绪成分分析方法及系统
CN107766426A (zh) * 2017-09-14 2018-03-06 北京百分点信息科技有限公司 一种文本分类方法、装置及电子设备
CN107844559A (zh) * 2017-10-31 2018-03-27 国信优易数据有限公司 一种文件分类方法、装置及电子设备
US20180181613A1 (en) * 2016-12-22 2018-06-28 Sap Se Natural language query generation
CN109492222A (zh) * 2018-10-31 2019-03-19 平安科技(深圳)有限公司 基于概念树的意图识别方法、装置及计算机设备

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5620349B2 (ja) * 2011-07-22 2014-11-05 株式会社東芝 対話装置、対話方法および対話プログラム
CN104598445B (zh) * 2013-11-01 2019-05-10 腾讯科技(深圳)有限公司 自动问答系统和方法
CN105868366B (zh) * 2016-03-30 2019-02-01 浙江工业大学 基于概念关联的概念空间导航方法
CN107146610B (zh) * 2017-04-10 2021-06-15 易视星空科技无锡有限公司 一种用户意图的确定方法及装置
CN108595619A (zh) * 2018-04-23 2018-09-28 海信集团有限公司 一种问答方法及设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970864A (zh) * 2014-05-08 2014-08-06 清华大学 基于微博文本的情绪分类和情绪成分分析方法及系统
US20180181613A1 (en) * 2016-12-22 2018-06-28 Sap Se Natural language query generation
CN107766426A (zh) * 2017-09-14 2018-03-06 北京百分点信息科技有限公司 一种文本分类方法、装置及电子设备
CN107844559A (zh) * 2017-10-31 2018-03-27 国信优易数据有限公司 一种文件分类方法、装置及电子设备
CN109492222A (zh) * 2018-10-31 2019-03-19 平安科技(深圳)有限公司 基于概念树的意图识别方法、装置及计算机设备

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111708873A (zh) * 2020-06-15 2020-09-25 腾讯科技(深圳)有限公司 智能问答方法、装置、计算机设备和存储介质
CN111708873B (zh) * 2020-06-15 2023-11-24 腾讯科技(深圳)有限公司 智能问答方法、装置、计算机设备和存储介质
CN111814481A (zh) * 2020-08-24 2020-10-23 深圳市欢太科技有限公司 购物意图识别方法、装置、终端设备及存储介质
CN111814481B (zh) * 2020-08-24 2023-11-14 深圳市欢太科技有限公司 购物意图识别方法、装置、终端设备及存储介质

Also Published As

Publication number Publication date
CN109492222B (zh) 2023-04-07
CN109492222A (zh) 2019-03-19

Similar Documents

Publication Publication Date Title
WO2020087774A1 (zh) 基于概念树的意图识别方法、装置及计算机设备
JP6643554B2 (ja) エンティティ推薦方法及び装置
US20220188521A1 (en) Artificial intelligence-based named entity recognition method and apparatus, and electronic device
CN108073568B (zh) 关键词提取方法和装置
WO2021159613A1 (zh) 文本语义相似度的分析方法、装置及计算机设备
WO2019120115A1 (zh) 人脸识别的方法、装置及计算机装置
WO2020140373A1 (zh) 一种意图识别方法、识别设备及计算机可读存储介质
WO2020253503A1 (zh) 人才画像的生成方法、装置、设备及存储介质
WO2017097231A1 (zh) 话题处理方法及装置
WO2020164276A1 (zh) 网页数据爬取方法、装置、系统及计算机可读存储介质
US20210042366A1 (en) Machine-learning system for servicing queries for digital content
US20150356091A1 (en) Method and system for identifying microblog user identity
CN110263854B (zh) 直播标签确定方法、装置及存储介质
CN110569289B (zh) 基于大数据的列数据处理方法、设备及介质
US20200394448A1 (en) Methods for more effectively moderating one or more images and devices thereof
WO2018171295A1 (zh) 一种给文章标注标签的方法、装置、终端及计算机可读存储介质
WO2019041528A1 (zh) 新闻情感方向判断方法、电子设备及计算机可读存储介质
WO2020107864A1 (zh) 信息处理方法、装置、服务设备及计算机可读存储介质
CN111667817A (zh) 一种语音识别方法、装置、计算机系统及可读存储介质
WO2021135104A1 (zh) 基于多源数据的对象推送方法、装置、设备及存储介质
CN109344232B (zh) 一种舆情信息检索方法及终端设备
WO2022116444A1 (zh) 文本分类方法、装置、计算机设备和介质
CN110750619A (zh) 聊天记录关键词的提取方法、装置、计算机设备及存储介质
CN112579781B (zh) 文本归类方法、装置、电子设备及介质
WO2019085118A1 (zh) 基于主题模型的关联词分析方法、电子装置及存储介质

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19878196

Country of ref document: EP

Kind code of ref document: A1