WO2017000777A1 - 一种口语语义解析系统及方法 - Google Patents

一种口语语义解析系统及方法 Download PDF

Info

Publication number
WO2017000777A1
WO2017000777A1 PCT/CN2016/085763 CN2016085763W WO2017000777A1 WO 2017000777 A1 WO2017000777 A1 WO 2017000777A1 CN 2016085763 W CN2016085763 W CN 2016085763W WO 2017000777 A1 WO2017000777 A1 WO 2017000777A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
semantic
candidate
sentences
spoken
Prior art date
Application number
PCT/CN2016/085763
Other languages
English (en)
French (fr)
Inventor
陈见耸
Original Assignee
芋头科技(杭州)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 芋头科技(杭州)有限公司 filed Critical 芋头科技(杭州)有限公司
Priority to US15/739,351 priority Critical patent/US20180190270A1/en
Priority to EP16817141.1A priority patent/EP3318978A4/en
Priority to JP2017567752A priority patent/JP6596517B2/ja
Publication of WO2017000777A1 publication Critical patent/WO2017000777A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2264Multidimensional index structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2468Fuzzy queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/33Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using fuzzy logic
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting

Definitions

  • the invention relates to the field of spoken language natural language understanding, and in particular relates to a highly robust spoken language semantic analysis system and method.
  • Spoken speech recognition involves multidisciplinary fields such as phonetics, linguistics, mathematical signal processing, and pattern recognition. With the popularity of smart devices, how more direct and friendly interaction between people and smart devices becomes an important issue. Due to the natural friendliness and convenience of spoken language natural language, human-computer interaction based on spoken natural language has become a trend, which has received more and more attention from the industry.
  • the key technology of oral natural language interaction lies in the semantic understanding of spoken language, that is, the user's oral sentence is parsed, and the intention and corresponding keywords that the user wants to express are obtained.
  • the method for implementing spoken semantic understanding is to manually collect or write a corresponding semantic sentence, and then match the sentence to be parsed with the sentence to obtain an analytical result.
  • an aim is to realize a method that can quickly and accurately find a sentence similar to the sentence to be parsed in a large-scale semantic sentence library, and gives an accurate result. Spoken semantic analysis system and method.
  • a spoken semantic analysis system for parsing spoken language semantics in a preset field including:
  • a storage unit configured to store a semantic sentence of the preset domain, each of the semantic sentences corresponding to an address, the semantic sentence includes a word and a keyword, and each of the keywords corresponds to a label,
  • a word list is preset in the storage unit for storing an address of the semantic sentence in which each of the words is located and/or an address of the semantic sentence in which each of the tags is located;
  • An obtaining unit configured to obtain a statement to be parsed
  • An index unit which is respectively connected to the storage unit and the obtaining unit, configured to retrieve the semantic sentence in the storage unit according to the to-be-resolved port statement, and obtain a sentence that matches the to-be-analyzed sentence Candidate semantic sentences, and corresponding candidate sequences;
  • An parsing unit is configured to connect the index unit to parse the to-be-analyzed port sentence according to the sorted candidate semantic sentence to obtain an analysis result.
  • the index unit comprises:
  • An extraction module configured to extract the same keyword in the to-be-resolved port sentence as the storage unit, and obtain a label corresponding to the keyword
  • a replacement module configured to connect the keyword in the to-be-resolved port sentence with a tag corresponding to the keyword to form a replacement port sentence
  • An indexing module is connected to the replacement module, configured to perform a search in the vocabulary in the storage unit according to a word and the label in the replacement vocabulary, to obtain a location matching the word An address of the semantic sentence, and/or an address of the semantic sentence that the tag matches;
  • a sorting module coupled to the indexing module, for using the semantic sentence and/or the matching with the word in the alternative spoken sentence in a manner similar to the similarity of the alternative spoken sentence
  • the semantic sentences of the tag matching are sorted, and the sorted candidate semantic sentences are obtained.
  • the ranking module uses a score formula to obtain a score of the similarity between the candidate semantic sentence and the replacement spoken sentence;
  • the score formula is:
  • S represents a score of the similarity between the candidate semantic sentence and the alternative spoken sentence
  • S 1 represents that the word and/or the label in the candidate semantic sentence occupy the replacement spoken sentence
  • the ratio of S 2 represents the ratio of the word and/or the label in the candidate semantic sentence to the candidate semantic sentence.
  • the parsing unit parses the to-be-analyzed port statement by using a fuzzy matching algorithm according to the sorted candidate semantic sentence pattern:
  • the port statement to be parsed is used as the parsing result of the sentence to be parsed.
  • the vocabulary is represented by a hash table.
  • a colloquial semantic analysis method is applied to the spoken semantic analysis system, comprising the following steps:
  • step S2 is:
  • the step S24 uses a score formula to obtain a score of the similarity between the candidate semantic sentence and the replacement spoken sentence;
  • the score formula is:
  • S represents a score of the similarity between the candidate semantic sentence and the alternative spoken sentence
  • S 1 represents that the word and/or the label in the candidate semantic sentence occupy the replacement spoken sentence
  • the ratio of S 2 represents the ratio of the word and/or the label in the candidate semantic sentence to the candidate semantic sentence.
  • step S3 is:
  • the vocabulary is represented by a hash table.
  • the index unit can quickly retrieve the sentence pattern related to the sentence to be parsed to improve the matching efficiency; the fuzzy matching algorithm can be used to parse the sentence to be parsed.
  • the inconsistency between the to-be-resolved port statement and the candidate semantic sentence is allowed, which has certain fault tolerance, thereby improving the robustness of the system.
  • the sentence patterns related to the sentence to be parsed can be quickly retrieved to improve the matching efficiency, so that the sentence similar to the sentence to be parsed can be quickly and accurately found in the large-scale semantic sentence database. And output accurate results.
  • FIG. 1 is a block diagram of an embodiment of a spoken semantic analysis system according to the present invention.
  • FIG. 2 is a flowchart of a method for an embodiment of a spoken semantic analysis method according to the present invention
  • FIG. 3 is a flow chart of a method for searching the semantic sentence in the storage unit according to the present invention.
  • Figure 5 is a schematic diagram of a sentenced inverted index of the present invention.
  • FIG. 6 is a schematic diagram of a finite state automaton corresponding to a sentence pattern of the present invention.
  • a spoken semantic analysis system is used to parse the spoken semantics of a preset domain, including:
  • a storage unit 1 for storing a semantic sentence of a preset domain, each semantic sentence corresponding to an address, the semantic sentence includes a word and a keyword, each keyword corresponds to a label, and a word is preset in the storage unit 1 a table for storing an address of a semantic sentence in which each word is located and/or an address of a semantic sentence in which each tag is located;
  • An obtaining unit 2 configured to acquire a mouth sentence to be parsed
  • An index unit 3 is connected to the storage unit 1 and the acquisition unit 2, respectively, for searching the semantic sentence in the storage unit 1 according to the statement to be parsed, and obtaining the candidate that matches the sentence to be parsed Select a semantic sentence and the corresponding candidate order;
  • An analysis unit 4 is connected to the index unit 3 for parsing the parsing sentence according to the sorted candidate semantic sentence by using a fuzzy matching algorithm to obtain an analysis result.
  • the index unit 3 can quickly retrieve the sentence pattern associated with the sentence to be parsed to improve the matching efficiency; the fuzzy matching algorithm can allow the sentence to be parsed when parsing the sentence to be parsed.
  • the fuzzy matching algorithm can allow the sentence to be parsed when parsing the sentence to be parsed.
  • the indexing unit 3 comprises:
  • An extraction module 31 configured to extract the same keyword in the to-be-analyzed port sentence as in the storage unit 1, and obtain a tag corresponding to the keyword;
  • a replacement module 32 the connection extraction module 31, is configured to replace the keyword in the sentence to be parsed with the label corresponding to the keyword to form a replacement mouth sentence;
  • An indexing module 34 the connection replacement module 32, is configured to perform retrieval in the vocabulary in the storage unit 1 according to the words and labels in the replacement vocabulary, to obtain the address of the semantic sentence matching the word, and/or the label The address of the matching semantic sentence;
  • a sorting module 33 the connection indexing module 34, is configured to sort the semantic sentences matching the words in the replacement mouth sentence and/or the semantic sentences matching the tags by comparing the similarity with the replacement mouth sentences. Get the sorted candidate semantic sentences.
  • the index unit 3 is configured to quickly retrieve a candidate semantic sentence similar to the sentence to be parsed according to the index when a sentence to be parsed is given.
  • each word or tag is retrieved in the vocabulary to obtain the address (ID) of the semantic sentence in which it appears. It is possible to record how many words or labels are matched in each semantic sentence and the sentence to be retrieved.
  • the search results are sorted according to the similarity scores, and the sentence patterns of the high scores are obtained as candidate semantic sentences.
  • the ranking module 33 uses a score formula to obtain a score of the similarity between the candidate semantic sentence and the replacement spoken sentence;
  • the score formula is:
  • S represents the score of the similarity between the candidate semantic sentence and the replacement spoken sentence
  • S 1 represents the proportion of the word and/or label in the candidate semantic sentence to the replacement spoken sentence
  • S 2 represents the candidate semantic sentence The ratio of words and/or labels to candidate semantic sentences.
  • the parsing unit 4 uses a fuzzy matching algorithm to parse the parsing sentence according to the sorted candidate semantic sentence:
  • a finite state automaton network is established for each candidate semantic sentence, and the parsing sentence sentence is scored according to the finite state automaton network, and the score of the sentence to be parsed is compared, and the highest score of the sentence to be parsed is used as the parsing of the sentence to be parsed. result.
  • parsing unit 4 may establish a finite state automaton network for each candidate semantic sentence. Each word or label acts as an arc on a finite state automaton.
  • Figure 6 shows a schematic diagram of a finite state automaton network corresponding to a sentence pattern; parsing and scoring the parsing statement according to the finite state machine network, specifically, the key in the sentence to be parsed according to the result of the keyword detection Words are replaced with corresponding tags. Assuming there are n keyword detection results in the sentence to be parsed, there is a possible combination of 2 n labels.
  • the combination of position conflicts of the tags is removed, and the candidate tags to be detected are replaced by the candidate tags; the replacement port sentences are fuzzy matched with the finite state machine network generated by each sentence pattern, and the matching method is more
  • the method in the "Error-tolerant Finite-state Recognition with Applications to Morphological Analysis and Spelling Correction" is not described here because the matching method is prior art.
  • the matching method can quickly calculate two by using the dynamic programming algorithm. The degree of matching between sentences; the optimal sentence pattern and its corresponding parsing result are obtained according to the score.
  • the parsing and scoring process allows for insertion and/or deletion and/or replacement operations between the to-be-analyzed spoken sentence and the spoken semantic sentence; and the number of inserting and/or deleting and/or replacing operations is preset
  • the threshold is limited. When the number is less than the preset threshold, the sentence to be parsed conforms to the corresponding semantic sentence, and vice versa.
  • the vocabulary is represented by a hash table.
  • a spoken semantic analysis method is applied to a spoken semantic analysis system, which includes the following steps:
  • the fuzzy matching algorithm is used to parse the parsing sentence, and the parsing result is obtained.
  • the sentence pattern related to the sentence to be parsed can be quickly retrieved to improve the matching efficiency, so that the large-scale semantic sentence database can be quickly and accurately found and treated. Parse the sentences that are similar to the spoken statement and output accurate results.
  • step S2 is:
  • the spoken semantic analysis method may include an offline phase and an online phase, wherein the offline phase includes: collecting and organizing semantic sentences of the corresponding domain according to the defined domain requirements.
  • the semantic sentence formula includes: a keyword conforming to the spoken language specification, and the keyword that needs to be parsed by the semantic sentence is represented by a label. For example, a possible sentence in the field of calling is "call to Zhang San”. Since "Zhang San" is a name keyword to be parsed, the keywords to be parsed are replaced with labels, such as: "Zhang San" is replaced by " $name", then the sentence after rewriting through the query sentence is "call $name".
  • Indexes the semantic clauses of each domain the words and labels in the semantic sentence are indexed together, where the labels are indexed as a word.
  • This embodiment adopts a hash inverted index, and its schematic diagram is shown in FIG. 5.
  • the hash table stores the words and labels that appear in all semantic sentences, each word or label The sign is followed by a list in which each element of the list stores the address (ID number) of the sentence in which the word or tag is located.
  • the online phase includes: when given a sentence to be parsed, the candidate semantic sentence similar to the sentence to be parsed is quickly retrieved according to the index.
  • the specific steps are as follows:
  • each word or tag is retrieved in the hashed index, and the address (ID) of the semantic sentence in which it appears is obtained.
  • ID the address of the semantic sentence in which it appears.
  • the search results are sorted according to the scores of the similarities, and the sentence patterns of the high scores are obtained as candidate semantic sentences.
  • step S24 uses a score formula to obtain a score of the similarity between the candidate semantic sentence and the replacement spoken sentence;
  • the score formula is:
  • S represents the score of the similarity between the candidate semantic sentence and the replacement spoken sentence
  • S 1 represents the proportion of the word and/or label in the candidate semantic sentence to the replacement spoken sentence
  • S 2 represents the candidate semantic sentence The ratio of words and/or labels to candidate semantic sentences.
  • step S3 is:
  • a finite state automaton network can be established for each candidate semantic sentence.
  • Each word or label acts as an arc on a finite state automaton.
  • Figure 6 shows a schematic diagram of a finite state automaton network corresponding to a sentence pattern; parsing and scoring the parsing statement according to the finite state machine network, specifically, the key in the sentence to be parsed according to the result of the keyword detection Words are replaced with corresponding tags. Assuming there are n keyword detection results in the sentence to be parsed, there is a possible combination of 2 n labels.
  • the combination of position conflicts of the tags is removed, and the candidate tags to be detected are replaced by the candidate tags; the replacement port sentences are fuzzy matched with the finite state machine network generated by each sentence pattern, and the matching method is more
  • the method in the "Error-tolerant Finite-state Recognition with Applications to Morphological Analysis and Spelling Correction" is not described here because the matching method is prior art.
  • the matching method can quickly calculate two by using the dynamic programming algorithm. The degree of matching between sentences; the optimal sentence pattern and its corresponding parsing result are obtained according to the score.
  • the parsing and scoring process allows for insertion and/or deletion and/or replacement operations between the to-be-analyzed spoken sentence and the spoken semantic sentence; and the number of inserting and/or deleting and/or replacing operations is preset
  • the threshold is limited. When the number is less than the preset threshold, the sentence to be parsed conforms to the corresponding semantic sentence, and vice versa.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Automation & Control Theory (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种口语语义解析系统及方法,口语语义解析系统用于对预设领域的口语语义进行解析,包括:存储单元(1),用于存储预设领域的语义句式,每个语义句式对应一地址,语义句式包括字和关键词,每个关键词对应一标签,存储单元(1)中预设有一词表,用以存储每个字所在的语义句式的地址和/或每个标签所在的语义句式的地址;获取单元(2),用于获取待解析口语句子;索引单元(3),分别连接存储单元(1)和获取单元(2),用于根据待解析口语句子对存储单元(1)中的语义句式进行检索,获取与待解析口语句子相符的候选语义句式,及相应的候选顺序;解析单元(4),连接索引单元(3),用于根据排序后的候选语义句式采用模糊匹配算法对待解析口语句子进行解析,获取解析结果。

Description

一种口语语义解析系统及方法 技术领域
本发明涉及口语自然语言理解领域,尤其涉及一种高鲁棒性口语语义解析系统及方法。
背景技术
口语语音识别涉及语音学、语言学、数学信号处理、模式识别等多学科领域。随着智能设备的普及,人与智能设备之间如何更直接友好的交互成为重要问题。由于口语自然语言对于用户天然的友好性和便捷性,基于口语自然语言的人机交互成为趋势,受到工业界越来越多的重视。口语自然语言交互的关键技术在于口语语义理解,即对用户的口语句子进行解析,得到用户想要表达的意图及相应的关键词。一般地,实现口语语义理解的方法是人工搜集或撰写相应的语义句式,然后将待解析的句子与句式匹配从而得到解析结果。在现有的口语语义解析方法中,大都是基于某种文法的匹配,比如正则文法、上下文无关文法,这要求待解析口语句子要与语义句式完全一致,才能解析成功;这使得语义理解系统的构造人员需要耗费大量的时间搜集语义句式;由于前端语音识别等模块存在识别不准确的现象,从而造成语义理解的解析失败;并且由于待解析句式需要与大量的语义句式进行匹配,会造成解析时间长、效率低的问题。
发明内容
针对现有的口语语义解析方法存在的上述问题,现提供一种旨在实现可在大规模语义句式库中能够快速准确的查找到与待解析口语句子相似句子,并给出准确的结果的口语语义解析系统及方法。
具体技术方案如下:
一种口语语义解析系统,用于对预设领域的口语语义进行解析,包括:
一存储单元,用于存储所述预设领域的语义句式,每个所述语义句式对应一地址,所述语义句式包括字和关键词,每个所述关键词对应一标签,所述存储单元中预设有一词表,用以存储每个所述字所在的所述语义句式的地址和/或每个所述标签所在的所述语义句式的地址;
一获取单元,用于获取待解析口语句子;
一索引单元,分别连接所述存储单元和所述获取单元,用于根据所述待解析口语句子对所述存储单元中的所述语义句式进行检索,获取与所述待解析口语句子相符的候选语义句式,及相应的候选顺序;
一解析单元,连接所述索引单元,用于根据排序后的所述候选语义句式采用模糊匹配算法对所述待解析口语句子进行解析,获取解析结果。
优选的,所述索引单元包括:
一提取模块,用于提取所述待解析口语句子中与所述存储单元中相同的所述关键词,并获取所述关键词对应的标签;
一替换模块,连接所述提取模块,用于将所述待解析口语句子中的所述关键词采用与所述关键词对应的标签替换,形成替换式口语句子;
一索引模块,连接所述替换模块,用于根据所述替换式口语句子中的字和所述标签,在所述存储单元中的所述词表中进行检索,获取与所述字匹配的所述语义句式的地址,和/或所述标签匹配的所述语义句式的地址;
一排序模块,连接所述索引模块,用于采用与所述替换式口语句子的相似度比较的方式对与所述替换式口语句子中的所述字匹配的所述语义句式和/或所述标签匹配的所述语义句式进行排序,获取经排序后的所述候选语义句式。
优选的,所述排序模块采用得分公式获取所述候选语义句式与所述替换式口语句子的相似度的分数;
所述得分公式为:
S=(S1+S2)/2,
其中,S表示所述候选语义句式与所述替换式口语句子的相似度的分数,S1表示所述候选语义句式中的所述字和/或所述标签占所述替换式口语句子的比例;S2表示所述候选语义句式中的所述字和/或所述标签占所述候选语义句式的比例。
优选的,所述解析单元根据排序后的所述候选语义句式采用模糊匹配算法对所述待解析口语句子进行解析的具体过程为:
对每个所述候选语义句式建立有限状态自动机网络,根据所述有限状态自动机网络对所述待解析口语句子进行打分,比较所述待解析口语句子的分数,将最高分数的所述待解析口语句子作为所述待解析口语句子的解析结果。
优选的,所述词表采用哈希表表示。
一种口语语义解析方法,应用于所述口语语义解析系统,包括下述步骤:
S1.获取待解析口语句子;
S2.根据所述待解析口语句子对所述存储单元中的所述语义句式进行检索,获取与所述待解析口语句子相符的候选语义句式,及相应的候选顺序;
S3.根据排序后的所述候选语义句式采用模糊匹配算法对所述待解析口语句子进行解析,获取解析结果。
优选的,所述步骤S2的具体过程为:
S21.提取所述待解析口语句子中与所述存储单元中相同的所述关键词,并获取所述关键词对应的标签;
S22.将所述待解析口语句子中的所述关键词采用与所述关键词对应的标签替换,形成替换式口语句子;
S23.根据所述替换式口语句子中的字和所述标签,在所述存储单元中的所述词表中进行检索,获取与所述字匹配的所述语义句式的地址,和/或所述标签匹配的所述语义句式的地址;
S24.采用与所述替换式口语句子的相似度比较的方式对与所述替换式口语句子中的所述字匹配的所述语义句式和/或所述标签匹配的所述语义句式进行排序,获取经排序后的所述候选语义句式。
优选的,所述步骤S24采用得分公式获取所述候选语义句式与所述替换式口语句子的相似度的分数;
所述得分公式为:
S=(S1+S2)/2,
其中,S表示所述候选语义句式与所述替换式口语句子的相似度的分数,S1表示所述候选语义句式中的所述字和/或所述标签占所述替换式口语句子 的比例;S2表示所述候选语义句式中的所述字和/或所述标签占所述候选语义句式的比例。
优选的,所述步骤S3的具体过程为:
S31.对每个所述候选语义句式建立有限状态自动机网络;
S32.根据所述有限状态自动机网络对所述待解析口语句子进行打分;
S33.比较所述待解析口语句子的分数,将最高分数的所述待解析口语句子作为所述待解析口语句子的解析结果。
优选的,所述词表采用哈希表表示。
上述技术方案的有益效果:
在本技术方案中,在口语语义解析系统中通过索引单元可快速检索出与待解析口语句子相关的句式,以提高匹配的效率;采用的模糊匹配算法可在对待解析口语句子进行解析时,允许待解析口语句子和候选语义句式之间可存在不一致的部分,具有一定的容错性,从而提高了系统的鲁棒性。在口语语义解析方法中可快速检索出与待解析口语句子相关的句式,以提高匹配的效率,以使在大规模语义句式库中能够快速准确的查找到与待解析口语句子相似的句式,并输出准确的结果。
附图说明
图1为本发明所述口语语义解析系统的一种实施例的模块图;
图2为本发明所述口语语义解析方法的一种实施例的方法流程图;
图3为本发明对所述存储单元中的所述语义句式进行检索的方法流程图;
图4为本发明对所述待解析口语句子进行解析的方法流程图;
图5为本发明句式倒排索引示意图;
图6为本发明句式对应的有限状态自动机示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。
需要说明的是,在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组合。
下面结合附图和具体实施例对本发明作进一步说明,但不作为本发明的限定。
如图1所示,一种口语语义解析系统,用于对预设领域的口语语义进行解析,包括:
一存储单元1,用于存储预设领域的语义句式,每个语义句式对应一地址,语义句式包括字和关键词,每个关键词对应一标签,存储单元1中预设有一词表,用以存储每个字所在的语义句式的地址和/或每个标签所在的语义句式的地址;
一获取单元2,用于获取待解析口语句子;
一索引单元3,分别连接存储单元1和获取单元2,用于根据待解析口语句子对存储单元1中的语义句式进行检索,获取与待解析口语句子相符的候 选语义句式,及相应的候选顺序;
一解析单元4,连接索引单元3,用于根据排序后的候选语义句式采用模糊匹配算法对待解析口语句子进行解析,获取解析结果。
在本实施例中,通过索引单元3可快速检索出与待解析口语句子相关的句式,以提高匹配的效率;采用的模糊匹配算法可在对待解析口语句子进行解析时,允许待解析口语句子和候选语义句式之间可存在不一致的部分,且可使得口语语义解析系统的构建人员不需要撰写大量的差异很小的句式;同时对语音识别前端的错误具有一定的容错性,从而提高了系统的鲁棒性。
在优选的实施例中,索引单元3包括:
一提取模块31,用于提取待解析口语句子中与存储单元1中相同的关键词,并获取关键词对应的标签;
一替换模块32,连接提取模块31,用于将待解析口语句子中的关键词采用与关键词对应的标签替换,形成替换式口语句子;
一索引模块34,连接替换模块32,用于根据替换式口语句子中的字和标签,在存储单元1中的词表中进行检索,获取与字匹配的语义句式的地址,和/或标签匹配的语义句式的地址;
一排序模块33,连接索引模块34,用于采用与替换式口语句子的相似度比较的方式对与替换式口语句子中的字匹配的语义句式和/或标签匹配的语义句式进行排序,获取经排序后的候选语义句式。
在本实施例中,索引单元3用于在给定待解析口语句子时,根据索引快速检索到与待解析口语句子相近的候选语义句式。
具体地,获取待解析口语句子后,提取待解析口语句子中关键词;通过 词表进行检测:遍历待解析口语句子中所有可能的词,查找词表中是否存在该词或字,若存在则记下该词在待解析口语句子中的位置;通过统计模型进行检测,可以选择条件随机场(Conditional Radom Fields,CRF)训练统计模型,并进行检测;将待解析口语句子中的关键词替换为相应的标签。将待解析口语句子中的标签以及未做替换的字在索引中检索。在本实施例中,将每个字或标签在词表中检索,都可得到其所出现的语义句式的地址(ID)。可记录每个语义句式与待检索句式中匹配了多少个字或标签。对检索结果根据相似度得分进行排序,取得分高的句式作为候选语义句式。
在优选的实施例中,排序模块33采用得分公式获取候选语义句式与替换式口语句子的相似度的分数;
得分公式为:
S=(S1+S2)/2,
其中,S表示候选语义句式与替换式口语句子的相似度的分数,S1表示候选语义句式中的字和/或标签占替换式口语句子的比例;S2表示候选语义句式中的字和/或标签占候选语义句式的比例。
在优选的实施例中,解析单元4根据排序后的候选语义句式采用模糊匹配算法对待解析口语句子进行解析的具体过程为:
对每个候选语义句式建立有限状态自动机网络,根据有限状态自动机网络对待解析口语句子进行打分,比较待解析口语句子的分数,将最高分数的待解析口语句子作为待解析口语句子的解析结果。
在本实施例中,解析单元4可对每个候选语义句式建立有限状态自动机网络。每个字或标签作为有限状态自动机上的一个弧。如图6所示表示一个 句式所对应的有限状态自动机网络示意图;根据有限状态机网络对待解析口语句子进行解析和打分,具体地,根据关键词检测的结果将待解析口语句子中的关键词用相应标签替换。假设待解析口语句子中有n个关键词检测结果,则存在2n个标签的可能组合。在这些可能组合中去掉标签的位置冲突的组合,即可得到候选的待检测标签替换句子;将替换式口语句子与每个句式生成的有限状态机网络进行模糊匹配,进行匹配的方法有更多,如《Error-tolerant Finite-state Recognition with Applications to Morphological Analysis and Spelling Correction》中的方法,由于该匹配方法为现有技术故此处不再赘述,该匹配方法通过动态规划算法可以快速计算两个句子之间的匹配程度;根据打分获取最优的句式及其相应的解析结果。
进一步地,解析和打分过程允许待解析口语句子和口语语义句式之间存在插入和/或删除和/或替换的操作;并且插入和/或删除和/或替换的操作的个数受预设阈值的限制,当个数小于预设阈值时,则待解析句子符合相应的语义句式,反之则不符合。
在优选的实施例中,词表采用哈希表表示。
如图2所示,一种口语语义解析方法,应用于口语语义解析系统,包括下述步骤:
S1.获取待解析口语句子;
S2.根据待解析口语句子对存储单元1中的语义句式进行检索,获取与待解析口语句子相符的候选语义句式,及相应的候选顺序;
S3.根据排序后的候选语义句式采用模糊匹配算法对待解析口语句子进行解析,获取解析结果。
在本实施例中,在口语语义解析方法中可快速检索出与待解析口语句子相关的句式,以提高匹配的效率,以使在大规模语义句式库中能够快速准确的查找到与待解析口语句子相似的句式,并输出准确的结果。
如图3所示,在优选的实施例中,步骤S2的具体过程为:
S21.提取待解析口语句子中与存储单元1中相同的关键词,并获取关键词对应的标签;
S22.将待解析口语句子中的关键词采用与关键词对应的标签替换,形成替换式口语句子;
S23.根据替换式口语句子中的字和标签,在存储单元1中的词表中进行检索,获取与字匹配的语义句式的地址,和/或标签匹配的语义句式的地址;
S24.采用与替换式口语句子的相似度比较的方式对与替换式口语句子中的字匹配的语义句式和/或标签匹配的语义句式进行排序,获取经排序后的候选语义句式。
在本实施例中,口语语义解析方法可包括离线阶段和在线阶段两部分,其中离线阶段包括:根据定义的领域需求,收集和整理相应领域的语义句式。其中的语义句式,包括:符合口语规范,并且该语义句式需要解析的关键词用标签表示。例如打电话领域的一条可能的句子为“打电话给张三”,由于“张三”是要解析的名称关键词,将需要解析的关键词用标签替代,如:“张三”替换为“$name”,那么通过该查询句子改写后的句式为“打电话给$name”。对每个领域的语义句式建立索引:对语义句式中的字和标签共同建立索引,其中标签作为一个字进行索引。本实施例采用哈希倒排索引,其示意图如图5所示。哈希表中存放的是所有语义句式中出现过的字和标签,每个字或标 签后跟一个列表,列表中的每个元素存放该字或标签所在句式的地址(ID号)。
在线阶段包括:在给定待解析口语句子时,根据索引快速检索到与待解析句子相近的候选语义句式。其具体步骤如下:
获取待解析口语句子后,提取待解析口语句子中关键词;通过词表进行检测:对词表中的每个词建立哈希索引,给定待解析口语句子,遍历待解析口语句子中所有可能的词,查找哈希表中是否存在该词,若存在则记下该词在待解析句子中的位置;通过统计模型进行检测,可以选择条件随机场训练统计模型,进行检测;将待解析口语句子中的关键词替换为相应的标签。该替换与离线阶段的替换一致;将待解析口语句子中的标签以及未做替换的字在索引中检索。在本实施例中,将每个字或标签在哈希倒排索引中检索,都可得到其所出现的语义句式的地址(ID)。记录每个语义句式与待检索句式中匹配了多少个字或标签。将检索结果根据相似度的得分进行排序,取得分高的句式作为候选语义句式。
在优选的实施例中,步骤S24采用得分公式获取候选语义句式与替换式口语句子的相似度的分数;
得分公式为:
S=(S1+S2)/2,
其中,S表示候选语义句式与替换式口语句子的相似度的分数,S1表示候选语义句式中的字和/或标签占替换式口语句子的比例;S2表示候选语义句式中的字和/或标签占候选语义句式的比例。
如图4所示,在优选的实施例中,步骤S3的具体过程为:
S31.对每个候选语义句式建立有限状态自动机网络;
S32.根据有限状态自动机网络对待解析口语句子进行打分;
S33.比较待解析口语句子的分数,将最高分数的待解析口语句子作为待解析口语句子的解析结果。
在本实施例中,可对每个候选语义句式建立有限状态自动机网络。每个字或标签作为有限状态自动机上的一个弧。如图6所示表示一个句式所对应的有限状态自动机网络示意图;根据有限状态机网络对待解析口语句子进行解析和打分,具体地,根据关键词检测的结果将待解析口语句子中的关键词用相应标签替换。假设待解析口语句子中有n个关键词检测结果,则存在2n个标签的可能组合。在这些可能组合中去掉标签的位置冲突的组合,即可得到候选的待检测标签替换句子;将替换式口语句子与每个句式生成的有限状态机网络进行模糊匹配,进行匹配的方法有更多,如《Error-tolerant Finite-state Recognition with Applications to Morphological Analysis and Spelling Correction》中的方法,由于该匹配方法为现有技术故此处不再赘述,该匹配方法通过动态规划算法可以快速计算两个句子之间的匹配程度;根据打分获取最优的句式及其相应的解析结果。
进一步地,解析和打分过程允许待解析口语句子和口语语义句式之间存在插入和/或删除和/或替换的操作;并且插入和/或删除和/或替换的操作的个数受预设阈值的限制,当个数小于预设阈值时,则待解析句子符合相应的语义句式,反之则不符合。
以上所述仅为本发明较佳的实施例,并非因此限制本发明的实施方式及保护范围,对于本领域技术人员而言,应当能够意识到凡运用本发明说明书及图示内容所作出的等同替换和显而易见的变化所得到的方案,均应当包含在本发明的保护范围内。

Claims (10)

  1. 一种口语语义解析系统,用于对预设领域的口语语义进行解析,其特征在于,包括:
    一存储单元,用于存储所述预设领域的语义句式,每个所述语义句式对应一地址,所述语义句式包括字和关键词,每个所述关键词对应一标签,所述存储单元中预设有一词表,用以存储每个所述字所在的所述语义句式的地址和/或每个所述标签所在的所述语义句式的地址;
    一获取单元,用于获取待解析口语句子;
    一索引单元,分别连接所述存储单元和所述获取单元,用于根据所述待解析口语句子对所述存储单元中的所述语义句式进行检索,获取与所述待解析口语句子相符的候选语义句式,及相应的候选顺序;
    一解析单元,连接所述索引单元,用于根据排序后的所述候选语义句式采用模糊匹配算法对所述待解析口语句子进行解析,获取解析结果。
  2. 如权利要求1所述口语语义解析系统,其特征在于,所述索引单元包括:
    一提取模块,用于提取所述待解析口语句子中与所述存储单元中相同的所述关键词,并获取所述关键词对应的标签;
    一替换模块,连接所述提取模块,用于将所述待解析口语句子中的所述关键词采用与所述关键词对应的标签替换,形成替换式口语句子;
    一索引模块,连接所述替换模块,用于根据所述替换式口语句子中的字和所述标签,在所述存储单元中的所述词表中进行检索,获取与所述字匹配的所述语义句式的地址,和/或所述标签匹配的所述语义句式的地址;
    一排序模块,连接所述索引模块,用于采用与所述替换式口语句子的相似度比较的方式对与所述替换式口语句子中的所述字匹配的所述语义句式和/或所述标签匹配的所述语义句式进行排序,获取经排序后的所述候选语义句式。
  3. 如权利要求2所述口语语义解析系统,其特征在于,所述排序模块采用得分公式获取所述候选语义句式与所述替换式口语句子的相似度的分数;
    所述得分公式为:
    S=(S1+S2)/2,
    其中,S表示所述候选语义句式与所述替换式口语句子的相似度的分数,S1表示所述候选语义句式中的所述字和/或所述标签占所述替换式口语句子的比例;S2表示所述候选语义句式中的所述字和/或所述标签占所述候选语义句式的比例。
  4. 如权利要求1所述口语语义解析系统,其特征在于,所述解析单元根据排序后的所述候选语义句式采用模糊匹配算法对所述待解析口语句子进行解析的具体过程为:
    对每个所述候选语义句式建立有限状态自动机网络,根据所述有限状态自动机网络对所述待解析口语句子进行打分,比较所述待解析口语句子的分数,将最高分数的所述待解析口语句子作为所述待解析口语句子的解析结果。
  5. 如权利要求1所述口语语义解析系统,其特征在于,所述词表采用哈希表表示。
  6. 一种口语语义解析方法,应用于如权利要求1所述口语语义解析系统,其特征在于,包括下述步骤:
    S1.获取待解析口语句子;
    S2.根据所述待解析口语句子对所述存储单元中的所述语义句式进行检索,获取与所述待解析口语句子相符的候选语义句式,及相应的候选顺序;
    S3.根据排序后的所述候选语义句式采用模糊匹配算法对所述待解析口语句子进行解析,获取解析结果。
  7. 如权利要求6所述口语语义解析方法,其特征在于,所述步骤S2的具体过程为:
    S21.提取所述待解析口语句子中与所述存储单元中相同的所述关键词,并获取所述关键词对应的标签;
    S22.将所述待解析口语句子中的所述关键词采用与所述关键词对应的标签替换,形成替换式口语句子;
    S23.根据所述替换式口语句子中的字和所述标签,在所述存储单元中的所述词表中进行检索,获取与所述字匹配的所述语义句式的地址,和/或所述标签匹配的所述语义句式的地址;
    S24.采用与所述替换式口语句子的相似度比较的方式对与所述替换式口语句子中的所述字匹配的所述语义句式和/或所述标签匹配的所述语义句式进行排序,获取经排序后的所述候选语义句式。
  8. 如权利要求7所述口语语义解析方法,其特征在于,所述步骤S24采用得分公式获取所述候选语义句式与所述替换式口语句子的相似度的分数;
    所述得分公式为:
    S=(S1+S2)/2,
    其中,S表示所述候选语义句式与所述替换式口语句子的相似度的分数, S1表示所述候选语义句式中的所述字和/或所述标签占所述替换式口语句子的比例;S2表示所述候选语义句式中的所述字和/或所述标签占所述候选语义句式的比例。
  9. 如权利要求6所述口语语义解析方法,其特征在于,所述步骤S3的具体过程为:
    S31.对每个所述候选语义句式建立有限状态自动机网络;
    S32.根据所述有限状态自动机网络对所述待解析口语句子进行打分;
    S33.比较所述待解析口语句子的分数,将最高分数的所述待解析口语句子作为所述待解析口语句子的解析结果。
  10. 如权利要求7所述口语语义解析方法,其特征在于,所述词表采用哈希表表示。
PCT/CN2016/085763 2015-06-30 2016-06-14 一种口语语义解析系统及方法 WO2017000777A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US15/739,351 US20180190270A1 (en) 2015-06-30 2016-06-14 System and method for semantic analysis of speech
EP16817141.1A EP3318978A4 (en) 2015-06-30 2016-06-14 SYSTEM AND METHOD FOR THE SEMANTIC ANALYSIS OF LANGUAGE
JP2017567752A JP6596517B2 (ja) 2015-06-30 2016-06-14 口語語義解析システム及び方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510385309.1A CN106326303B (zh) 2015-06-30 2015-06-30 一种口语语义解析系统及方法
CN201510385309.1 2015-06-30

Publications (1)

Publication Number Publication Date
WO2017000777A1 true WO2017000777A1 (zh) 2017-01-05

Family

ID=57607842

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/085763 WO2017000777A1 (zh) 2015-06-30 2016-06-14 一种口语语义解析系统及方法

Country Status (7)

Country Link
US (1) US20180190270A1 (zh)
EP (1) EP3318978A4 (zh)
JP (1) JP6596517B2 (zh)
CN (1) CN106326303B (zh)
HK (1) HK1231591A1 (zh)
TW (1) TWI601129B (zh)
WO (1) WO2017000777A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435182A (zh) * 2021-07-21 2021-09-24 唯品会(广州)软件有限公司 自然语言处理中分类标注的冲突检测方法、装置和设备

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106782560B (zh) * 2017-03-06 2020-06-16 海信集团有限公司 确定目标识别文本的方法及装置
CN109716326A (zh) * 2017-06-21 2019-05-03 微软技术许可有限责任公司 在自动聊天中提供个性化歌曲
DE102017211120A1 (de) * 2017-06-30 2019-01-03 Siemens Aktiengesellschaft Verfahren zur Erzeugung eines Abbildes eines Streckennetzes, Verwendung des Verfahrens, Computerprogramm und computerlesbares Speichermedium
CN108091321B (zh) * 2017-11-06 2021-07-16 芋头科技(杭州)有限公司 一种语音合成方法
CN109947264B (zh) * 2017-12-21 2023-03-14 北京搜狗科技发展有限公司 一种信息展现方法、装置及电子设备
US10861463B2 (en) * 2018-01-09 2020-12-08 Sennheiser Electronic Gmbh & Co. Kg Method for speech processing and speech processing device
CN108021559B (zh) * 2018-02-05 2022-05-03 威盛电子股份有限公司 自然语言理解系统以及语意分析方法
CN109065020B (zh) * 2018-07-28 2020-11-20 重庆柚瓣家科技有限公司 多语言类别的识别库匹配方法及系统
CN109783821B (zh) * 2019-01-18 2023-06-27 广东小天才科技有限公司 一种特定内容的视频的搜索方法及系统
CN109949799B (zh) * 2019-03-12 2021-02-19 广东小天才科技有限公司 一种语义解析方法及系统
CN110232921A (zh) * 2019-06-21 2019-09-13 深圳市酷开网络科技有限公司 基于生活服务的语音操作方法、装置、智能电视及系统
CN110378704B (zh) * 2019-07-23 2021-10-22 珠海格力电器股份有限公司 基于模糊识别的意见反馈的方法、存储介质和终端设备
CN111090411A (zh) * 2019-12-10 2020-05-01 重庆锐云科技有限公司 一种基于用户语音输入的共享产品智能推荐系统及方法
CN113569565B (zh) * 2020-04-29 2023-04-11 抖音视界有限公司 一种语义理解方法、装置、设备和存储介质
CN111680129B (zh) * 2020-06-16 2022-07-12 思必驰科技股份有限公司 语义理解系统的训练方法及系统
CN112489643B (zh) * 2020-10-27 2024-07-12 广东美的白色家电技术创新中心有限公司 转换方法、转换表的生成方法、装置及计算机存储介质
CN114238667B (zh) * 2021-11-04 2024-04-02 北京建筑大学 一种地址管理的方法、装置、电子设备和存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08249334A (ja) * 1995-03-10 1996-09-27 Csk Corp 自然言語の意味解析処理装置
CN1949211A (zh) * 2005-10-13 2007-04-18 中国科学院自动化研究所 一种新的汉语口语解析方法及装置
US20100235164A1 (en) * 2009-03-13 2010-09-16 Invention Machine Corporation Question-answering system and method based on semantic labeling of text documents and user questions
CN102681982A (zh) * 2012-03-15 2012-09-19 上海云叟网络科技有限公司 可让计算机理解的自然语言句子的自动语义识别的方法
CN102968409A (zh) * 2012-11-23 2013-03-13 海信集团有限公司 智能人机交互语义分析方法及交互系统
CN103268313A (zh) * 2013-05-21 2013-08-28 北京云知声信息技术有限公司 一种自然语言的语义解析方法及装置
CN103309846A (zh) * 2013-06-26 2013-09-18 北京云知声信息技术有限公司 一种自然语言信息的处理方法及装置
CN104360994A (zh) * 2014-12-04 2015-02-18 科大讯飞股份有限公司 自然语言理解方法及系统

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI224771B (en) * 2003-04-10 2004-12-01 Delta Electronics Inc Speech recognition device and method using di-phone model to realize the mixed-multi-lingual global phoneme
JP3766406B2 (ja) * 2003-07-24 2006-04-12 株式会社東芝 機械翻訳装置
US8165877B2 (en) * 2007-08-03 2012-04-24 Microsoft Corporation Confidence measure generation for speech related searching
GB2458461A (en) * 2008-03-17 2009-09-23 Kai Yu Spoken language learning system
KR101253104B1 (ko) * 2009-09-01 2013-04-10 한국전자통신연구원 패턴 데이터베이스화 장치 및 그 방법, 이를 이용한 음성 이해 장치 및 그 방법
TWI441163B (zh) * 2011-05-10 2014-06-11 Univ Nat Chiao Tung 中文語音辨識裝置及其辨識方法
US10019994B2 (en) * 2012-06-08 2018-07-10 Apple Inc. Systems and methods for recognizing textual identifiers within a plurality of words
CN103631772A (zh) * 2012-08-29 2014-03-12 阿里巴巴集团控股有限公司 机器翻译方法及装置
US9646604B2 (en) * 2012-09-15 2017-05-09 Avaya Inc. System and method for dynamic ASR based on social media
CN103020230A (zh) * 2012-12-14 2013-04-03 中国科学院声学研究所 一种语义模糊匹配方法
US9123335B2 (en) * 2013-02-20 2015-09-01 Jinni Media Limited System apparatus circuit method and associated computer executable code for natural language understanding and semantic content discovery
US9432325B2 (en) * 2013-04-08 2016-08-30 Avaya Inc. Automatic negative question handling
US9318113B2 (en) * 2013-07-01 2016-04-19 Timestream Llc Method and apparatus for conducting synthesized, semi-scripted, improvisational conversations
US20150106091A1 (en) * 2013-10-14 2015-04-16 Spence Wetjen Conference transcription system and method
CN103578471B (zh) * 2013-10-18 2017-03-01 威盛电子股份有限公司 语音辨识方法及其电子装置
US9984067B2 (en) * 2014-04-18 2018-05-29 Thomas A. Visel Automated comprehension of natural language via constraint-based processing
US10073673B2 (en) * 2014-07-14 2018-09-11 Samsung Electronics Co., Ltd. Method and system for robust tagging of named entities in the presence of source or translation errors

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08249334A (ja) * 1995-03-10 1996-09-27 Csk Corp 自然言語の意味解析処理装置
CN1949211A (zh) * 2005-10-13 2007-04-18 中国科学院自动化研究所 一种新的汉语口语解析方法及装置
US20100235164A1 (en) * 2009-03-13 2010-09-16 Invention Machine Corporation Question-answering system and method based on semantic labeling of text documents and user questions
CN102681982A (zh) * 2012-03-15 2012-09-19 上海云叟网络科技有限公司 可让计算机理解的自然语言句子的自动语义识别的方法
CN102968409A (zh) * 2012-11-23 2013-03-13 海信集团有限公司 智能人机交互语义分析方法及交互系统
CN103268313A (zh) * 2013-05-21 2013-08-28 北京云知声信息技术有限公司 一种自然语言的语义解析方法及装置
CN103309846A (zh) * 2013-06-26 2013-09-18 北京云知声信息技术有限公司 一种自然语言信息的处理方法及装置
CN104360994A (zh) * 2014-12-04 2015-02-18 科大讯飞股份有限公司 自然语言理解方法及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3318978A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435182A (zh) * 2021-07-21 2021-09-24 唯品会(广州)软件有限公司 自然语言处理中分类标注的冲突检测方法、装置和设备

Also Published As

Publication number Publication date
TWI601129B (zh) 2017-10-01
US20180190270A1 (en) 2018-07-05
CN106326303A (zh) 2017-01-11
EP3318978A1 (en) 2018-05-09
HK1231591A1 (zh) 2017-12-22
CN106326303B (zh) 2019-09-13
TW201701269A (zh) 2017-01-01
JP6596517B2 (ja) 2019-10-23
JP2018524725A (ja) 2018-08-30
EP3318978A4 (en) 2019-02-20

Similar Documents

Publication Publication Date Title
TWI601129B (zh) 一種口語語義解析系統及方法
KR102417045B1 (ko) 명칭을 강인하게 태깅하는 방법 및 시스템
CN112069298B (zh) 基于语义网和意图识别的人机交互方法、设备及介质
US20190102373A1 (en) Model-based automatic correction of typographical errors
US8606559B2 (en) Method and apparatus for detecting errors in machine translation using parallel corpus
CN112035730B (zh) 一种语义检索方法、装置及电子设备
Pettersson et al. A multilingual evaluation of three spelling normalisation methods for historical text
WO2017181834A1 (zh) 一种智能问答方法及装置
WO2014209810A2 (en) Methods and apparatuses for mining synonymous phrases, and for searching related content
WO2014117549A1 (en) Method and device for error correction model training and text error correction
CN107943786B (zh) 一种中文命名实体识别方法及系统
WO2012159558A1 (zh) 基于语意识别的自然语言处理方法、装置和系统
CN103440252A (zh) 一种中文句子中并列信息提取方法及装置
WO2017166626A1 (zh) 归一化方法、装置和电子设备
CN109522396B (zh) 一种面向国防科技领域的知识处理方法及系统
Jayan et al. A hybrid statistical approach for named entity recognition for malayalam language
CN109213998A (zh) 中文错字检测方法及系统
CN104572619A (zh) 智能机器人交互系统在投融资领域的应用
Wang et al. Semi-supervised chinese open entity relation extraction
CN109408828A (zh) 用于电视领域语义分析的分词系统
CN112328811A (zh) 一种基于同类型词组的词谱聚类智能生成方法
CN116226362B (zh) 一种提升搜索医院名称准确度的分词方法
Sanabila et al. Automatic Wayang Ontology Construction using Relation Extraction from Free Text
Chen et al. Detecting OOV names in Arabic handwritten data
Thenmozhi et al. An open information extraction for question answering system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16817141

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017567752

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2016817141

Country of ref document: EP