WO2022211737A1 - Détection automatique d'intention de texte d'entrée en langage naturel - Google Patents

Détection automatique d'intention de texte d'entrée en langage naturel Download PDF

Info

Publication number
WO2022211737A1
WO2022211737A1 PCT/SG2022/050183 SG2022050183W WO2022211737A1 WO 2022211737 A1 WO2022211737 A1 WO 2022211737A1 SG 2022050183 W SG2022050183 W SG 2022050183W WO 2022211737 A1 WO2022211737 A1 WO 2022211737A1
Authority
WO
WIPO (PCT)
Prior art keywords
intention
hypothesis
platform
constituent components
noun
Prior art date
Application number
PCT/SG2022/050183
Other languages
English (en)
Inventor
Junyu CHOY
Original Assignee
Emo Technologies Pte. Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Emo Technologies Pte. Ltd. filed Critical Emo Technologies Pte. Ltd.
Publication of WO2022211737A1 publication Critical patent/WO2022211737A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis

Definitions

  • the present invention relates to processing of natural language input text to extract its intention through the use of, for example, semantic rules.
  • Chatbots are taking hold of the online world. Markets &Markets forecast chatbots to become a US$ lObn industry, fuelled by demand for automation both in the covid and post-covid world.
  • Banks use chatbots to facilitate banking transactions, customer service requests and product fulfilment.
  • E-commerce use them to offer 24/7 customer service, catering to shoppers who browse websites at night. Beauty companies rely on them to advise customers and generate sales leads.
  • chatbots are not easy to implement. Users need bots to understand what they want (the intention) using natural language expressions.
  • NLP Natural Language Processing
  • bots still struggle to understand human intent in the myriad ways that users express themselves. They need to be extensively trained and most organisations do not have the resources to fully support them. The results is that, for example, some chatbots employed by the Government fail to interpret human intent.
  • a platform for processing natural language input text configured to parse an input string into constituent components; categorise each of the constituent components through context with respect to other constituent components; identify, from the categorised constituent components, components that are activity related; validate a first hypothesis that determines which of the activity related components provides an intention of the input string from analysing the constituent components used to support the hypothesis, together with their placement relative to the activity-related component; identify, from the categorised constituent components, noun components; validate a second hypothesis that determines which of the noun components provides a target of the intention from analysing the constituent components used to support the second hypothesis, together with their placement relative to the noun component; and extract the activity related component that provides the intention of the input string and the noun component that provides the target of the intention, so as to construct the intention of the input string.
  • Figure 1 is a block diagram of a text input type determination module present in a platform in accordance with an embodiment of the present invention.
  • Figure 2 is a block diagram of a parsing decision determination module present in a platform in accordance with an embodiment of the present invention.
  • Figure 3 is a block diagram of an intent determination module present in a platform in accordance with an embodiment of the present invention.
  • Figure 4 is a block diagram of an intent verification determination module present in a platform in accordance with an embodiment of the present invention.
  • the present invention relates to intention identification of textual information input for determining a user's intention that enable the appropriate actions to be taken.
  • the intention refers to what an input text seeks to achieve, i.e. the objective of the input text, whereby the intention of the input text comprises an activity related component (in most instances being, but not limited, to a verb) and a noun component (which is the object of the activity related component).
  • Activity related component in most instances being, but not limited, to a verb
  • noun component which is the object of the activity related component.
  • “Components” when used in the context of input text refer to a word or a phrase extracted from the input text.
  • Semantical rules are defined as rules that identify the specific nature of words, giving them meaning in the context of a particular sentence.
  • the semantical rules are derived from the syntactical rules of the language of the input text. Syntactical rules define the nature and order of words used by a language for sentence construction, whereby detecting the presence of these rules allow for the sentence to be parsed and for its intention to be extracted.
  • the intention of an input text can be an action, idea or request.
  • the identification of the action, idea, or request depends on identifying specific elements in the sentence. As with syntactic/semantic rules, there are other structures present in the sentence that identifies the sentence's nature. For example, the sentence “Could you please assist me?", the addition of "Please” and “Could” creates new semantic features that changes the intention of the S-V-O structure of "You assist me.” which is an action to a request. To accomplish this step of identification, the system uses multiple decision rules to guide the parsing approach and to apply different semantic rules to, ultimately, identify the intention accurately. Identifying the intention accurately provides a means to positively engage with a provider of the input text. DETAILED DESCRIPTION
  • module can refer to software, algorithm, hardware, a combination of hardware and software, process, or process in execution.
  • One or more modules may reside within a process or hardware.
  • the extraction is particularly advantageous for search, chatbot or similar applications that are unable to parse the natural language input in its original format.
  • the extraction is done by an algorithm that analyses the natural language input to ascertain its semantic structure, identifies its intention and a target of the intention, i.e. the underlying objective of the input text and what a user seeks to achieve.
  • Digital users have different needs under different contexts, resulting in them requesting various information with various intentions. Providing the right answer to the user can dramatically increase customer satisfaction.
  • This extraction algorithm may be hosted in a platform required to have natural language input text processing capability.
  • the platform parses an input string into constituent components, whereby the input is separated into words or phrases that include noun(s), verb(s), connector word(s) like "is” and “are”.
  • the platform categorises each of the constituent components through context with respect to other constituent components. In one approach, placement of one constituent component with respect to another constituent component is used as basis for the categorisation.
  • Each word in an input text is categorised as a noun, verb or connector by analysing its position and order compared to other words in a sentence. For example, the word “vote” in the sentence, "I went to vote " is a verb. However, "vote” in the sentence, "Every vote counts in an election” is a noun
  • the platform then identifies, from the categorised constituent components, components that are activity related, such as components that convey the performance of an action, like verbs and adverbs.
  • the platform validates a first hypothesis that determines which of the activity related components provides an intention of the input string from analysing the constituent components used to support the first hypothesis, together with their placement relative to the activity related component. This validation is required because each activity related component could possibly provide the intention of the input text.
  • a selected activity related component is tested to determine whether it conveys the intention of the input string, the testing involving the use of other constituent components (i.e. excluding the selected activity related component), factoring in their location compared to the selected activity related component. If the selected activity related component does not provide the intention, another activity related component is then tested.
  • the platform also identifies, from the categorised constituent components, noun components. Noun components are relevant for whether they provide a subject or an object in the input text. In addition, when the noun component is an object, whether it is the target of the intention of the input text.
  • the platform validates a second hypothesis that determines which of the noun components provides a target of the intention from analysing the constituent components used to support the second hypothesis, together with their placement relative to the noun component. In one approach, a selected noun component is tested to determine whether it the target of the intention (resulting from the validation of the first hypothesis), the testing involving the use of other constituent components (i.e. excluding the selected noun component), factoring in their location compared to the selected noun component..
  • the platform may decide whether the input text follows a S-O- V (Subject-Object- Verb) or S-V-O (Subject- Verb-Object) syntactic structure, so as to identify the subject and object for any sentence for the already identified verb (i.e. the activity related component identified from the first hypothesis).
  • the activity related component that provides the intention of the input string and the noun component that provides the target of the intention can both be extracted, for example, for use as input for external applications.
  • the platform for processing natural language input text to extract its intention and the target of the intention is described in greater detail below, with reference to Figures 1 to 4.
  • FIGS 1 to 3 show block diagrams, each representative of a module present in a platform 150 for processing natural language input text, in accordance with an embodiment of the present invention.
  • Each computer may be a programmed module implemented by one or more processors executing instructions to perform its designated function, to allow the platform 100 to achieve its objective of extracting the intention of an input text and a target of the intention.
  • the platform 150 may be part of a network of servers, not shown for the sake of simplicity, that are working in conjunction to achieve this objective and/or utilise the extracted intention and the target of the intention.
  • Figure 1 shows that the platform 150 comprises a text input type determination module 101 that receives an input string as text input 100 and outputs text input type 102.
  • the input string may be a phrase preferably with a predefined minimum number of words.
  • the text input type determination module 101 classifies the text input 100 into one of three types of textual inputs.
  • the three types of textual inputs are identified as action, question and comments, so that the text input type determination module 101 essentially classifies an input string as either a question or an utterance.
  • the classification process involves the active application of specialised syntactic/semantic rules developed to identify the textual inputs. These rules are provided by the user or pre-determined by the platform 150 or a system to which the platform 150 belongs.
  • Figure 2 shows that the platform 150 further comprises a parsing decision determination module 201 that analyses the text input type 102 (i.e. the output of the text input type determination module 101 providing a classification of the text input 100 into one of three types) to decide how to parse the input string into constituent components.
  • the parsing decision determination module 201 determines the most appropriate parsing approach 202 that can facilitate the extraction of the intention of the input text 100.
  • the determination of the most appropriate parsing approach 202 which also results in categorising each of the parsed constituent components through context with respect to other constituent components in the input string, is based on the placement of the core semantic feature of the language and the peripheral features of the language. The ordering and placement of the parsed constituent components in the input string will determine the core intention.
  • the rules used by the parsing decision determination module 201 to decide on a parsing approach and the subsequent categorisation of the parsed constituent components are provided by the user or pre-determined by the platform 150 or a system to which the platform 150 belongs.
  • the parsing decision determination module 201 uses a universal dependency treebank, which categorises based on interdependency of the constituent components, to determine how to categorise an input string into its constituent components.
  • Figure 3 shows that the platform 150 further comprises an intention determination module 304, which includes a parsed output intention identification module 302.
  • the parsing approach 202 chosen as explained with respect to Figure 2, is provided to the parsed output intention identification module 302.
  • the parsed output intention identification module 302 then parses the input string, received as the text input type 100 by the intention determination module 304, using the chosen parsing approach 202. This results in the input string being parsed into its constituent components, with each of the constituent components being categorised through context with respect to other constituent components.
  • the parsed output intention identification module 302 applies a specific set of rules to parse the text input to produce initial intentions, i.e. activity related components and noun components are identified from the categorised constituent components.
  • the rules are provided by the user or pre determined by the platform 150 or a system to which the platform 150 belongs.
  • the intention determination module 304 provides parsed intentions 303, being the identified activity related components and the identified noun components from the input string in the text input type 102.
  • Figure 4 shows that the platform 150 further comprises an intention verification determination module 404.
  • the intention verification determination module 404 comprises an intention and object identification module 401 that receives parsed intentions 303 (i.e. the output from the intention determination module 304 of Figure 3); and an intention and object coherence determination module 402.
  • the intention and object coherence determination module 402 utilises a specific set of rules to identify the intention and object relationship, from the activity related components and the identified noun components present in the parsed intentions 303, to determine the correct intention and a target of the correct intention.
  • the rules are provided by the user or predetermined by the platform 150 or a system to which the platform 150 belongs.
  • the parsed intentions 303 have multiple possible intentions, especially in the case where the parsed intentions 303 are derived from is a complex statement having several activity related components (such as verbs, verb extensions, like gerunds) and several noun components (which can serve as an object to the activity related components).
  • the intention verification determination module 404 tests these activity related components (i.e. the multiple intentions) by validating a first hypothesis that determines which of them provides an intention of the input string, from analysing the constituent components used to support the first hypothesis together with their placement relative to the activity related component that is being tested.
  • the intention verification determination module 404 also validates a second hypothesis that determines which of the noun components provides a target of the intention from analysing the constituent components used to support the second hypothesis, together with their placement relative to the noun component.
  • a sequential approach may be adopted during the validation of two hypothesis. If the validation of the first hypothesis returns a definitive outcome, the selected activity related component that was tested, along with constituent components tested together with the selected activity related component form the intention of the input string. There is then no requirement to validate the second hypothesis. If both hypotheses require validation, the first hypothesis is performed first, followed by the second hypothesis.
  • the intention verification determination module 404 performs the validation of the first hypothesis and the second hypothesis by checking for each activity related component coherence with a noun component in the parsed intentions 303. The checking of coherence is achieved through use of order and placement of semantic features. Identifying the location of the specific feature and its relative position to other features identifies the right intentions from the several possible intentions.
  • the intention determination module 304 performs the validation of the first hypothesis and the second hypothesis by utilising multiple rules to test the most appropriate purpose for each such phrase. The correctly identified purposes are then tested to verify which provides the correct intention for the input string. The rules test various hypotheses of the sentence structure to evaluate the right intention.
  • the intention and object coherence determination module 402 outputs coherent intentions 403, being the activity related component that provides the intention of the input string and the noun component that provides the target of the intention.
  • the coherent intentions 403 may be exported, such as to map the intention and the target of the intention to a matching input for an external application, to which the external application can recognise and respond.
  • the validation of the first hypothesis and the validation of the second hypothesis involves analysing results of an application of rules that test the first hypothesis and the second hypothesis.
  • the rules used to test the first hypothesis may include any one or more of determining whether there is a noun after the intention that is being validated (e.g. the noun “polls” after the intention “habit to vote early in the primaries”); whether the intention that is being validated is subordinate to another intention (e.g. the relationship of the intention “polls open on weekend” to the intention “habit to vote early in the primaries”); whether the input string has complimentary intentions to the intention that is being validated (e.g.
  • the rules used to test the second hypothesis may include any one or more of determining whether there is an intention before the noun that is being validated (e.g. “vote” before “primaries” in the intention “habit to vote early in the primaries”); and whether the noun that is being validated has been referenced by the constituent components in the input string (e.g. “election day”).
  • the rules used for the validation of the first hypothesis and the validation of the second hypothesis may be categorised into two groups, those whose application results in an affirmative outcome and those whose application results in a negative outcome. To successfully validate either or both of the first or second hypothesis, the rules belonging to the two groups may have to be applied in the affirmative or negative accordingly.
  • the rule of whether there is a noun component after an activity related component has to be positive; the rule of whether the intention that is being validated is subordinate to another intention has to be negative; the rule of whether there are complimentary intentions to the intention that is being validated has to be positive; the rule of whether conjunctions follow the intention that is being validated has to be positive; and the rule of whether the intention that is being validated is a first occurring verb in the input string has to be positive.
  • the input string may comprise text tokenised into individual sentences found between two punctuation marks.
  • the platform 150 may be further configured to identify language of the input string before parsing into its constituent components.
  • the working of the platform 150 is described below with reference to a simple input phrase, along with an illustration of a selection of the rules used in determining the intention of the input phrase.
  • the input phrase is "I want to cancel my plan.”
  • the phrase is received by the platform 150 as text input 100.
  • Text input 100 is processed by the text input type determination module 101 to determine the classification of the input phrase.
  • the text input type determination module 101 may apply the most straightforward rule to identify verbs' position.
  • the first rule of the S- V-O structure is applied to identify the placement of subject, verb and object. From the sentence parsing, the verbs “want” and “cancel” are found to be between the subject “I” and the object “plan”.
  • the text input type determination module 101 outputs the text input type 102, indicating that the input phrase is classified as a request.
  • the text input type 102 specifying the classification of the input phrase, is provided to the parsing determination module 201.
  • the parsing determination module 201 will then select a parsing rule which is based on V - O.
  • the intention of the input phrase is determined through the verb - object rule.
  • the selected parsing rule is output as parsing approach output 202.
  • the input phrase and the selected parsing rule are respectively received as text input 100 and parsing approach output 202 to the intent determination module 304.
  • the selected parsing rule from the parsing approach output 202 is applied to extract the intention from the text input 100.
  • V-O rule two possible intentions are identified and output as parsed intentions 303.
  • the first possible intention is "Want - Plan” and the second possible intention is "Cancel - Plan”.
  • the parsed intentions 303 form the input to the intention verification module 404.
  • the intention and object coherence determination module 402 uses coherence determination rules and hypotheses to determine which of the possible intentions in the parsed intentions 303 is the actual intention of the input phrase.
  • the coherence determination uses several hypotheses to verify the intention.
  • V-C-V verb - conjunction - verb

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

Selon un aspect, l'invention concerne une plateforme pour traiter un texte d'entrée en langage naturel, la plateforme étant configurée pour analyser une chaîne d'entrée en composants constitutifs ; catégoriser chacun des composants constitutifs par le biais du contexte par rapport à d'autres composants constitutifs ; identifier, à partir des composants constitutifs catégorisés, des composants qui sont liés à l'activité ; valider une première hypothèse qui détermine lequel des composants liés à l'activité fournit une intention de la chaîne d'entrée à partir de l'analyse des composants constitutifs utilisés pour soutenir l'hypothèse, conjointement avec leur placement par rapport au composant lié à l'activité ; identifier, à partir des composants constitutifs catégorisés, des composants nominaux ; valider une seconde hypothèse qui détermine lequel des composants nominaux fournit une cible de l'intention à partir de l'analyse des composants constitutifs utilisés pour soutenir la seconde hypothèse, conjointement avec leur placement par rapport au composant nominal ; et extraire le composant lié à l'activité qui fournit l'intention de la chaîne d'entrée et le composant nominal qui fournit la cible de l'intention, de façon à construire l'intention de la chaîne d'entrée.
PCT/SG2022/050183 2021-03-31 2022-03-30 Détection automatique d'intention de texte d'entrée en langage naturel WO2022211737A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SG10202103337T 2021-03-31
SG10202103337T 2021-03-31

Publications (1)

Publication Number Publication Date
WO2022211737A1 true WO2022211737A1 (fr) 2022-10-06

Family

ID=83459994

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2022/050183 WO2022211737A1 (fr) 2021-03-31 2022-03-30 Détection automatique d'intention de texte d'entrée en langage naturel

Country Status (1)

Country Link
WO (1) WO2022211737A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102789464A (zh) * 2011-05-20 2012-11-21 陈伯妤 基于语意识别的自然语言处理方法、装置和系统
US20200012906A1 (en) * 2017-02-14 2020-01-09 Microsoft Technology Licensing, Llc Intelligent assistant
US10600419B1 (en) * 2017-09-22 2020-03-24 Amazon Technologies, Inc. System command processing
US20200184307A1 (en) * 2018-12-11 2020-06-11 Adobe Inc. Utilizing recurrent neural networks to recognize and extract open intent from text inputs
CN111984778A (zh) * 2020-09-08 2020-11-24 四川长虹电器股份有限公司 基于依存句法分析和汉语语法的多轮语义分析方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102789464A (zh) * 2011-05-20 2012-11-21 陈伯妤 基于语意识别的自然语言处理方法、装置和系统
US20200012906A1 (en) * 2017-02-14 2020-01-09 Microsoft Technology Licensing, Llc Intelligent assistant
US10600419B1 (en) * 2017-09-22 2020-03-24 Amazon Technologies, Inc. System command processing
US20200184307A1 (en) * 2018-12-11 2020-06-11 Adobe Inc. Utilizing recurrent neural networks to recognize and extract open intent from text inputs
CN111984778A (zh) * 2020-09-08 2020-11-24 四川长虹电器股份有限公司 基于依存句法分析和汉语语法的多轮语义分析方法

Similar Documents

Publication Publication Date Title
US10832006B2 (en) Responding to an indirect utterance by a conversational system
US11954613B2 (en) Establishing a logical connection between an indirect utterance and a transaction
US9904675B2 (en) Automatic question generation from natural text
JP5936698B2 (ja) 単語意味関係抽出装置
US10671929B2 (en) Question correction and evaluation mechanism for a question answering system
US8452772B1 (en) Methods, systems, and articles of manufacture for addressing popular topics in a socials sphere
US10460028B1 (en) Syntactic graph traversal for recognition of inferred clauses within natural language inputs
US10642928B2 (en) Annotation collision detection in a question and answer system
US9772996B2 (en) Method and system for applying role based association to entities in textual documents
US9632998B2 (en) Claim polarity identification
Karamibekr et al. Sentence subjectivity analysis in social domains
Fashwan et al. SHAKKIL: an automatic diacritization system for modern standard Arabic texts
Roth et al. Parsing software requirements with an ontology-based semantic role labeler
KR101851786B1 (ko) 챗봇의 트레이닝 세트 레이블링을 위한 가상 레이블 생성 장치 및 방법
Schraagen et al. Extraction of semantic relations in noisy user-generated law enforcement data
GB2572320A (en) Hate speech detection system for online media content
Kolomiyets et al. KUL: recognition and normalization of temporal expressions
KR101851791B1 (ko) 도메인 특화 용어 및 고빈도 일반 용어를 이용한 도메인 다양성 계산 장치 및 방법
US11625536B2 (en) System and method for identification and profiling adverse events
WO2016037167A1 (fr) Identification d'opérateurs mathématiques dans un texte en langue naturelle pour appariement sur base de connaissances
WO2022211737A1 (fr) Détection automatique d'intention de texte d'entrée en langage naturel
US11017172B2 (en) Proposition identification in natural language and usage thereof for search and retrieval
KR101851792B1 (ko) 질문 데이터 세트의 가상 레이블 생성 장치 및 방법
Nishy Reshmi et al. Textual entailment classification using syntactic structures and semantic relations
Mishra et al. Identifying and Analyzing Reduplication Multiword Expressions in Hindi Text Using Machine Learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22781773

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22781773

Country of ref document: EP

Kind code of ref document: A1