CN103020230A - Semantic fuzzy matching method - Google Patents

Semantic fuzzy matching method Download PDF

Info

Publication number
CN103020230A
CN103020230A CN2012105438390A CN201210543839A CN103020230A CN 103020230 A CN103020230 A CN 103020230A CN 2012105438390 A CN2012105438390 A CN 2012105438390A CN 201210543839 A CN201210543839 A CN 201210543839A CN 103020230 A CN103020230 A CN 103020230A
Authority
CN
China
Prior art keywords
semantic
similarity
keyword
fuzzy matching
semantic category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012105438390A
Other languages
Chinese (zh)
Inventor
张艳
李艳玲
徐为群
颜永红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Original Assignee
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, Beijing Kexin Technology Co Ltd filed Critical Institute of Acoustics CAS
Priority to CN2012105438390A priority Critical patent/CN103020230A/en
Publication of CN103020230A publication Critical patent/CN103020230A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The embodiment of the invention provides a semantic fuzzy matching method. The method comprises the following steps of: extracting characteristics of the text identified by voice to obtain the characteristic data; carrying out named entity identification on the characteristic data by a conditional random field (CRF) to find the key semantic categories of sentences; and accurately matching the key semantic categories, performing fuzzy matching when the accurate match is failed, calculating the similarity of the key semantic categories and the key words in the dictionary, selecting the key words with largest similarity to replace the key semantic categories, and marking the categories. By the method of the embodiment, the CRF is used for marking the sequence, the key semantic categories in the inquire statement are initially marked and located; the fuzzy matching range is shortened; the similarity is calculated according to the domain dictionary; the dictionary entries with the largest similarity are used for replacing the wrong key semantic categories in the user query; the calculation amount is reduced; and the identifying speed is improved.

Description

A kind of Semantic fuzzy matching method
Technical field
The application relates to field of speech recognition, specifically, relates to a kind of Semantic fuzzy matching method.
Background technology
Man-machine interactive system is to propose query requests by the user by spoken language, and system provides information service.A typical man-machine interactive system comprises: automatic speech recognition, speech understanding, these four ingredients of dialogue management and phonetic synthesis.Speech understanding partly is that the query statement after the speech recognition is changed into corresponding semantic expressiveness.Yet; speech understanding often can run into such problem; be that user's query statement exists the imperfect of pronunciation variation, identification error and crucial semantic concept that speech recognition brings; how still can obtain the correct result that understands in the situation that obtains the part key message, this just need to improve with fuzzy matching the robustness of system.Common man-machine interaction service all is limited to some specific area, and the data of association area all can be kept in the database.Traditional fuzzy matching algorithm mainly is the reference position of finding out in given text string with the substring of pattern matching, majority is to use editing distance as similarity function, each Chinese character in such method in user's query statement will participate in computing, if sentence comparison is long, then arithmetic speed will reduce greatly.
Summary of the invention
For the problems of the prior art, the purpose of the embodiment of the invention is to provide a kind of Semantic fuzzy matching method, and described method comprises: the text after the speech recognition is carried out feature extraction, obtain characteristic; With condition random field CRF model described characteristic is carried out the identification of named entity, find crucial semantic category in the sentence; Described crucial semantic category is carried out exact matching, when the exact matching failure, carry out fuzzy matching, calculate the similarity of keyword in described crucial semantic category and the dictionary, select the larger keyword of similarity to substitute described crucial semantic category, and carry out the classification mark.
Preferably, the similarity of keyword in the described crucial semantic category of described calculating and the dictionary, specifically comprise, with the twice of the Chinese character number of the common factor of the vocabulary of described crucial semantic category and the keyword number sum divided by all Chinese characters of the vocabulary of described crucial semantic category and keyword, the merchant of gained is larger, and similarity is higher.
Preferably, described CRF model obtains by following steps: according to field structure training data, training data covers the common saying of various spoken languages as far as possible; Training data is marked, namely mark out the classification of substantive noun in the training data; Training data is carried out feature extraction, extract substantive noun; With CRF the substantive noun that extracts is trained, obtain the CRF model.
Preferably, described method also comprises: described crucial semantic category through the classification mark is carried out semantic understanding, provide semantic expressiveness.
Preferably, the keyword that described similarity is larger is the keyword of similarity maximum.
Preferably, described keyword is the dictionary entry.
The method of embodiment of the invention utilization statistics, be CRF (conditional random field, condition random field) carries out sequence labelling, crucial semantic category in the query statement is tentatively marked and locates, dwindle the scope of fuzzy matching, and then according to the field dictionary, carrying out similarity calculates, dictionary entry with the similarity maximum replaces the crucial semantic category of makeing mistakes in user's inquiry, has reduced operand, has improved the speed of identification.
Description of drawings
Fig. 1 is the speech understanding system schematic of the embodiment of the invention;
Fig. 2 is the schematic flow sheet of the Semantic fuzzy matching method of the embodiment of the invention.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is carried out detailed, clear, complete explanation.Obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making all other embodiment that obtain under the creative work prerequisite.
Fig. 1 is the speech understanding system schematic of the embodiment of the invention.Among Fig. 1, semantic coupling and understanding system comprise speech recognition system, semantic category mark part, semantic understanding part.Wherein the backup of semantic category mark comprises again three unit: feature extraction unit, exact matching unit, fuzzy matching unit.Wherein feature extraction unit need to CRF model cooperating.
Particularly, semantic category mark part need to be carried out feature extraction to the text after the speech recognition, then carry out the identification of named entity by a CRF model that trains, find semantic concept crucial in the sentence, send into exact matching and partly carry out the classification mark, if the exact matching failure, then enter fuzzy matching, by calculating the similarity of keyword in the substantive noun marked and the dictionary, select optimum vocabulary to revise, and carry out classification and mark.Then send into the semantic understanding part, provide the semantic expressiveness of this sentence, feed back to the user by Query Database.Being noted that the voice here can be people's voice, can be natural-sounding also, is not particularly limited at this.
Here adopt the CRF graph model of chain structure, note observation string for W=(w1, w2 ... wn), string (state) sequence be Y=(y1, y2 ... yn), it is defined as follows:
p λ ( Y | W ) = 1 Z ( W ) exp ( Σ t ∈ T Σ k λ k f k ( y t - 1 , y t , W , t ) ) - - - ( 1 )
F wherein kFundamental function, λ kBe the weight of characteristic of correspondence function, t is mark, and Z (W) is normalized factor, so that above-mentioned probability distribution is between (0,1).
The model parameter estimation of CRF is finished with the L-BFGS algorithm usually.The decode procedure of CRF is the process of finding the solution unknown string mark, needs to search for a maximum joint probability of calculating on this string, that is:
Y *=argmax YP(Y|W) (2)
On linear chain CRF, this calculation task can be finished with the Viterbi algorithm.
According to the training data of field structure CRF, data will cover the common saying of various spoken languages as far as possible, and will comprise the various fields that use in the native system.
Training data is marked, namely mark out the classification of the substantive noun in each query statement.
Feature extraction, in order better to extract the various substantive nouns (comprising name and other nouns) that relate to, according to the characteristics of Chinese personal name word-building, we have set up the everyday character dictionary of using word and name about the surname of Chinese personal name, are used for the structural attitude template.Simultaneously for name and video display name are extracted more accurately, counted individual character and the double word that appears at name and video display name front and back position by mass data, set up name and field name about finger circle word dictionary, carry out the extraction of feature.Refer to about described that boundary's word dictionary refers to the vocabulary that appears at name or field name the right and left in a word.Such as: I want to listen the song of Liu De China.Liu Dehua is name, and the left margin word that appears at Liu Dehua is " listening ", and the right margin word is " ", refers to about being also can be called border, left and right sides word by boundary's word.
With CRF the training data that has extracted feature is trained, obtain a CRF model.That be noted that the training of condition random field uses is Open-Source Tools CRF++; The roughly step of training comprises: carry out the extraction of feature according to the form of training text because for be spoken, word may be introduced the mistake of participle as research object, so select individual character to carry out feature extraction as research object; Select which feature not only to depend on and also depend on template file in the instrument for the training text that has extracted feature, namely except the individual character feature, also will use the assemblage characteristic between the feature; Can obtain a model file after the training; The process of test is to prepare the file of a test, needs equally to extract feature, and form must be the same with the text of training, then tests with the training good model, obtains the annotation results for each word.
For the query statement of user input, carry out feature extraction and carry out Entity recognition with the CRF model that has trained with said method, Primary Location the crucial semantic category in the sentence.
Whether the crucial semantic category of having had good positioning may be wrong, also may not have mistake, at this moment at first carries out exact matching, namely judges the semantic category of CRF identification, exist in the dictionary of field, if there is no then carries out fuzzy matching.
With the Dice similarity semantic category and the entry in the dictionary of field that CRF identifies carried out similarity calculating, the Dice calculating formula of similarity is as follows:
Sim ( A , B ) Dice = 2 · | A ∩ B | | A | + | B | - - - ( 3 )
The twice of the Chinese character number of occuring simultaneously with two vocabulary remove with two vocabulary length and.Seek the entry of similarity maximum the mistake in the former sentence is replaced, just finished the fuzzy matching of semantic category.
Fig. 2 is the schematic flow sheet of the Semantic fuzzy matching method of the embodiment of the invention.As shown in Figure 2, described method comprises: step 200, extract characteristic; Be specially: the text after the speech recognition is carried out feature extraction, obtain characteristic; Step 202 is obtained crucial semantic category; Be specially: with condition random field CRF model described characteristic is carried out the identification of named entity, find crucial semantic category; Step 204, exact matching, be specially described crucial semantic category is carried out exact matching, when the exact matching success, described crucial semantic category is carried out the classification mark, and enter step 208, semantic understanding is specially described crucial semantic category through the classification mark is carried out semantic understanding, provides semantic expressiveness.In step 204, when the exact matching failure, enter step 206, carry out fuzzy matching, calculate the similarity of keyword in described crucial semantic category and the dictionary, select the larger keyword of similarity to substitute described crucial semantic category, and carry out the classification mark, enter again subsequently step 208.
Preferably, the similarity of keyword in the described crucial semantic category of described calculating and the dictionary, specifically comprise, with the twice of the Chinese character number of the common factor of the vocabulary of described crucial semantic category and the keyword number sum divided by all Chinese characters of the vocabulary of described crucial semantic category and keyword, the merchant of gained is larger, and similarity is higher.
Preferably, described CRF model obtains by following steps: according to field structure training data, training data covers the common saying of various spoken languages as far as possible; Training data is marked, namely mark out the classification of substantive noun in the training data; Training data is carried out feature extraction, extract substantive noun; With CRF the substantive noun that extracts is trained, obtain the CRF model.
Preferably, the keyword that described similarity is larger is the keyword of similarity maximum.
Preferably, described keyword is the dictionary entry.
The method of embodiment of the invention utilization statistics, be CRF (conditional random field, condition random field) carries out sequence labelling, crucial semantic category in the query statement is tentatively marked and locates, dwindle the scope of fuzzy matching, and then according to the field dictionary, carrying out similarity calculates, dictionary entry with the similarity maximum replaces the crucial semantic category of makeing mistakes in user's inquiry, has reduced operand, has improved the speed of identification.
Those skilled in the art should further recognize, unit and the algorithm steps of each example of describing in conjunction with embodiment disclosed herein, can realize with electronic hardware, computer software or the combination of the two, for the interchangeability of hardware and software clearly is described, composition and the step of each example described in general manner according to function in the above description.These functions are carried out with hardware or software mode actually, depend on application-specific and the design constraint of technical scheme.Those skilled in the art can specifically should be used for realizing described function with distinct methods to each, but this realization should not thought the scope that exceeds the application.
The method of describing in conjunction with embodiment disclosed herein or the step of algorithm can use the software module of hardware, processor execution, and perhaps the combination of the two is implemented.Software module can place the storage medium of any other form known in random access memory (RAM), internal memory, ROM (read-only memory) (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or the technical field.
Above-described embodiment; purpose of the present invention, technical scheme and beneficial effect are further described; institute is understood that; the above only is the specific embodiment of the present invention; and be not used in the protection domain that limits the application; all within the application's spirit and principle, any modification of making, be equal to replacement, improvement etc., all should be included within the application's the protection domain.

Claims (6)

1. a Semantic fuzzy matching method is characterized in that, described method comprises:
Text after the speech recognition is carried out feature extraction, obtain characteristic;
With condition random field CRF model described characteristic is carried out the identification of named entity, find crucial semantic category;
Described crucial semantic category is carried out exact matching, when the exact matching failure, carry out fuzzy matching, calculate the similarity of keyword in described crucial semantic category and the dictionary, select the larger keyword of similarity to substitute described crucial semantic category, and carry out the classification mark.
2. Semantic fuzzy matching method as claimed in claim 1, it is characterized in that, the similarity of keyword in the described crucial semantic category of described calculating and the dictionary, specifically comprise, with the twice of the Chinese character number of the common factor of the vocabulary of described crucial semantic category and the keyword number sum divided by all Chinese characters of the vocabulary of described crucial semantic category and keyword, the merchant of gained is larger, and similarity is higher.
3. Semantic fuzzy matching method as claimed in claim 1 is characterized in that, described CRF model obtains by following steps:
According to field structure training data, training data covers the common saying of various spoken languages as far as possible;
Training data is marked, namely mark out the classification of substantive noun in the training data;
Training data is carried out feature extraction, extract substantive noun;
With CRF the substantive noun that extracts is trained, obtain the CRF model.
4. such as the described Semantic fuzzy matching method of one of claim 1-3, it is characterized in that described method also comprises: described crucial semantic category through the classification mark is carried out semantic understanding, provide semantic expressiveness.
5. such as the described Semantic fuzzy matching method of one of claim 1-3, it is characterized in that the keyword that described similarity is larger is the keyword of similarity maximum.
6. such as the described Semantic fuzzy matching method of one of claim 1-3, it is characterized in that described keyword is the dictionary entry.
CN2012105438390A 2012-12-14 2012-12-14 Semantic fuzzy matching method Pending CN103020230A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012105438390A CN103020230A (en) 2012-12-14 2012-12-14 Semantic fuzzy matching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012105438390A CN103020230A (en) 2012-12-14 2012-12-14 Semantic fuzzy matching method

Publications (1)

Publication Number Publication Date
CN103020230A true CN103020230A (en) 2013-04-03

Family

ID=47968834

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012105438390A Pending CN103020230A (en) 2012-12-14 2012-12-14 Semantic fuzzy matching method

Country Status (1)

Country Link
CN (1) CN103020230A (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870596A (en) * 2014-03-31 2014-06-18 江南大学 Enhanced constraint conditional random field model for Web object information extraction
CN104268249A (en) * 2014-09-30 2015-01-07 珠海市君天电子科技有限公司 System file identification method and system
CN104505090A (en) * 2014-12-15 2015-04-08 北京国双科技有限公司 Method and device for voice recognizing sensitive words
CN104933152A (en) * 2015-06-24 2015-09-23 北京京东尚科信息技术有限公司 Named entity recognition method and device
US20150279357A1 (en) * 2014-03-31 2015-10-01 NetTalk.com, Inc. System and method for processing flagged words or phrases in audible communications
CN105718586A (en) * 2016-01-26 2016-06-29 中国人民解放军国防科学技术大学 Word division method and device
CN106326258A (en) * 2015-06-26 2017-01-11 中兴通讯股份有限公司 A URL matching method and device
CN106571139A (en) * 2016-11-09 2017-04-19 百度在线网络技术(北京)有限公司 Artificial intelligence based voice search result processing method and device
CN106897266A (en) * 2017-02-16 2017-06-27 北京光年无限科技有限公司 For the text handling method and system of intelligent robot
CN106909600A (en) * 2016-07-07 2017-06-30 阿里巴巴集团控股有限公司 The collection method and device of user context information
CN107195301A (en) * 2017-05-19 2017-09-22 深圳市优必选科技有限公司 The method and device of intelligent robot semantic processes
CN107391482A (en) * 2017-07-12 2017-11-24 成都准星云学科技有限公司 A kind of method that fuzzy matching and beta pruning are carried out based on sentence mould
CN108376140A (en) * 2017-06-30 2018-08-07 勤智数码科技股份有限公司 Government data carding method based on fuzzy matching and device
CN108388559A (en) * 2018-02-26 2018-08-10 中译语通科技股份有限公司 Name entity recognition method and system, computer program of the geographical space under
CN109145529A (en) * 2018-09-12 2019-01-04 重庆工业职业技术学院 A kind of text similarity analysis method and system for copyright authentication
CN109241239A (en) * 2018-07-26 2019-01-18 四川长虹电器股份有限公司 Investigate the text similarity matching process of character arranging sequence
EP3318978A4 (en) * 2015-06-30 2019-02-20 Yutou Technology (Hangzhou) Co., Ltd. System and method for semantic analysis of speech
CN109408626A (en) * 2018-11-09 2019-03-01 苏州思必驰信息科技有限公司 The method and device that natural language is handled
CN109635009A (en) * 2018-12-27 2019-04-16 北京航天智造科技发展有限公司 Fuzzy matching inquiry system and method
CN110619866A (en) * 2018-06-19 2019-12-27 普天信息技术有限公司 Speech synthesis method and device
CN110675870A (en) * 2019-08-30 2020-01-10 深圳绿米联创科技有限公司 Voice recognition method and device, electronic equipment and storage medium
CN110969022A (en) * 2018-09-29 2020-04-07 北京国双科技有限公司 Semantic determination method and related equipment
CN111221995A (en) * 2019-10-10 2020-06-02 南昌市微轲联信息技术有限公司 Sequence matching method based on big data and set theory
CN111737979A (en) * 2020-06-18 2020-10-02 龙马智芯(珠海横琴)科技有限公司 Keyword correction method, device, correction equipment and storage medium for voice text
CN112613320A (en) * 2019-09-19 2021-04-06 北京国双科技有限公司 Method and device for acquiring similar sentences, storage medium and electronic equipment
CN113205813A (en) * 2021-04-01 2021-08-03 北京华宇信息技术有限公司 Error correction method for speech recognition text
CN113539270A (en) * 2021-07-22 2021-10-22 阳光保险集团股份有限公司 Position identification method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645064A (en) * 2008-12-16 2010-02-10 中国科学院声学研究所 Superficial natural spoken language understanding system and method thereof
CN101770453A (en) * 2008-12-31 2010-07-07 华建机器翻译有限公司 Chinese text coreference resolution method based on domain ontology through being combined with machine learning model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645064A (en) * 2008-12-16 2010-02-10 中国科学院声学研究所 Superficial natural spoken language understanding system and method thereof
CN101770453A (en) * 2008-12-31 2010-07-07 华建机器翻译有限公司 Chinese text coreference resolution method based on domain ontology through being combined with machine learning model

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150279357A1 (en) * 2014-03-31 2015-10-01 NetTalk.com, Inc. System and method for processing flagged words or phrases in audible communications
US10242664B2 (en) * 2014-03-31 2019-03-26 NetTalk.com, Inc. System and method for processing flagged words or phrases in audible communications
CN103870596A (en) * 2014-03-31 2014-06-18 江南大学 Enhanced constraint conditional random field model for Web object information extraction
CN104268249B (en) * 2014-09-30 2018-04-27 珠海市君天电子科技有限公司 A kind of recognition methods of system file and system
CN104268249A (en) * 2014-09-30 2015-01-07 珠海市君天电子科技有限公司 System file identification method and system
CN104505090A (en) * 2014-12-15 2015-04-08 北京国双科技有限公司 Method and device for voice recognizing sensitive words
CN104933152B (en) * 2015-06-24 2018-09-14 北京京东尚科信息技术有限公司 Name entity recognition method and device
CN104933152A (en) * 2015-06-24 2015-09-23 北京京东尚科信息技术有限公司 Named entity recognition method and device
CN106326258A (en) * 2015-06-26 2017-01-11 中兴通讯股份有限公司 A URL matching method and device
CN106326258B (en) * 2015-06-26 2022-04-08 中兴通讯股份有限公司 URL matching method and device
EP3318978A4 (en) * 2015-06-30 2019-02-20 Yutou Technology (Hangzhou) Co., Ltd. System and method for semantic analysis of speech
CN105718586A (en) * 2016-01-26 2016-06-29 中国人民解放军国防科学技术大学 Word division method and device
CN105718586B (en) * 2016-01-26 2018-12-28 中国人民解放军国防科学技术大学 The method and device of participle
CN106909600A (en) * 2016-07-07 2017-06-30 阿里巴巴集团控股有限公司 The collection method and device of user context information
US10936636B2 (en) 2016-07-07 2021-03-02 Advanced New Technologies Co., Ltd. Collecting user information from computer systems
CN106571139A (en) * 2016-11-09 2017-04-19 百度在线网络技术(北京)有限公司 Artificial intelligence based voice search result processing method and device
CN106571139B (en) * 2016-11-09 2019-10-15 百度在线网络技术(北京)有限公司 Phonetic search result processing method and device based on artificial intelligence
CN106897266A (en) * 2017-02-16 2017-06-27 北京光年无限科技有限公司 For the text handling method and system of intelligent robot
CN107195301A (en) * 2017-05-19 2017-09-22 深圳市优必选科技有限公司 The method and device of intelligent robot semantic processes
CN108376140A (en) * 2017-06-30 2018-08-07 勤智数码科技股份有限公司 Government data carding method based on fuzzy matching and device
CN107391482A (en) * 2017-07-12 2017-11-24 成都准星云学科技有限公司 A kind of method that fuzzy matching and beta pruning are carried out based on sentence mould
CN108388559A (en) * 2018-02-26 2018-08-10 中译语通科技股份有限公司 Name entity recognition method and system, computer program of the geographical space under
CN108388559B (en) * 2018-02-26 2021-11-19 中译语通科技股份有限公司 Named entity identification method and system under geographic space application and computer program
CN110619866A (en) * 2018-06-19 2019-12-27 普天信息技术有限公司 Speech synthesis method and device
CN109241239A (en) * 2018-07-26 2019-01-18 四川长虹电器股份有限公司 Investigate the text similarity matching process of character arranging sequence
CN109145529A (en) * 2018-09-12 2019-01-04 重庆工业职业技术学院 A kind of text similarity analysis method and system for copyright authentication
CN110969022A (en) * 2018-09-29 2020-04-07 北京国双科技有限公司 Semantic determination method and related equipment
CN110969022B (en) * 2018-09-29 2023-10-27 北京国双科技有限公司 Semantic determining method and related equipment
CN109408626A (en) * 2018-11-09 2019-03-01 苏州思必驰信息科技有限公司 The method and device that natural language is handled
CN109635009B (en) * 2018-12-27 2023-09-15 北京航天智造科技发展有限公司 Fuzzy matching inquiry system
CN109635009A (en) * 2018-12-27 2019-04-16 北京航天智造科技发展有限公司 Fuzzy matching inquiry system and method
CN110675870A (en) * 2019-08-30 2020-01-10 深圳绿米联创科技有限公司 Voice recognition method and device, electronic equipment and storage medium
CN112613320A (en) * 2019-09-19 2021-04-06 北京国双科技有限公司 Method and device for acquiring similar sentences, storage medium and electronic equipment
CN111221995A (en) * 2019-10-10 2020-06-02 南昌市微轲联信息技术有限公司 Sequence matching method based on big data and set theory
CN111221995B (en) * 2019-10-10 2023-10-03 南昌市微轲联信息技术有限公司 Sequence matching method based on big data and set theory
CN111737979A (en) * 2020-06-18 2020-10-02 龙马智芯(珠海横琴)科技有限公司 Keyword correction method, device, correction equipment and storage medium for voice text
CN113205813A (en) * 2021-04-01 2021-08-03 北京华宇信息技术有限公司 Error correction method for speech recognition text
CN113539270A (en) * 2021-07-22 2021-10-22 阳光保险集团股份有限公司 Position identification method and device, electronic equipment and storage medium
CN113539270B (en) * 2021-07-22 2024-04-02 阳光保险集团股份有限公司 Position identification method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN103020230A (en) Semantic fuzzy matching method
Xue et al. Normalizing microtext
Constant et al. MWU-aware part-of-speech tagging with a CRF model and lexical resources
Benajiba et al. Arabic named entity recognition: A feature-driven study
Orosz et al. PurePos 2.0: a hybrid tool for morphological disambiguation
CN102693279B (en) Method, device and system for fast calculating comment similarity
CN103154936A (en) Methods and systems for automated text correction
CN103309926A (en) Chinese and English-named entity identification method and system based on conditional random field (CRF)
CN104756100A (en) Intent estimation device and intent estimation method
Seraji Morphosyntactic corpora and tools for Persian
Dien et al. POS-tagger for English-Vietnamese bilingual corpus
Kübler et al. Part of speech tagging for Arabic
CN103678288A (en) Automatic proper noun translation method
Kessler et al. Extraction of terminology in the field of construction
CN103646017A (en) Acronym generating system for naming and working method thereof
Sen et al. Bangla natural language processing: A comprehensive review of classical machine learning and deep learning based methods
Arora et al. Pre-processing of English-Hindi corpus for statistical machine translation
Arivazhagan et al. Labeling the semantic roles of commas
Hsieh et al. Correcting Chinese spelling errors with word lattice decoding
CN114970541A (en) Text semantic understanding method, device, equipment and storage medium
Fernández et al. Identifying relevant phrases to summarize decisions in spoken meetings.
Tongtep et al. Multi-stage automatic NE and pos annotation using pattern-based and statistical-based techniques for thai corpus construction
Wang et al. Whose nickname is this? recognizing politicians from their aliases
Henderson et al. Data-driven methods for spoken language understanding
Benajiba et al. Arabic Word Segmentation for Better Unit of Analysis.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20130403