CN103020230A - Semantic fuzzy matching method - Google Patents
Semantic fuzzy matching method Download PDFInfo
- Publication number
- CN103020230A CN103020230A CN2012105438390A CN201210543839A CN103020230A CN 103020230 A CN103020230 A CN 103020230A CN 2012105438390 A CN2012105438390 A CN 2012105438390A CN 201210543839 A CN201210543839 A CN 201210543839A CN 103020230 A CN103020230 A CN 103020230A
- Authority
- CN
- China
- Prior art keywords
- semantic
- similarity
- keyword
- fuzzy matching
- semantic category
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Machine Translation (AREA)
Abstract
The embodiment of the invention provides a semantic fuzzy matching method. The method comprises the following steps of: extracting characteristics of the text identified by voice to obtain the characteristic data; carrying out named entity identification on the characteristic data by a conditional random field (CRF) to find the key semantic categories of sentences; and accurately matching the key semantic categories, performing fuzzy matching when the accurate match is failed, calculating the similarity of the key semantic categories and the key words in the dictionary, selecting the key words with largest similarity to replace the key semantic categories, and marking the categories. By the method of the embodiment, the CRF is used for marking the sequence, the key semantic categories in the inquire statement are initially marked and located; the fuzzy matching range is shortened; the similarity is calculated according to the domain dictionary; the dictionary entries with the largest similarity are used for replacing the wrong key semantic categories in the user query; the calculation amount is reduced; and the identifying speed is improved.
Description
Technical field
The application relates to field of speech recognition, specifically, relates to a kind of Semantic fuzzy matching method.
Background technology
Man-machine interactive system is to propose query requests by the user by spoken language, and system provides information service.A typical man-machine interactive system comprises: automatic speech recognition, speech understanding, these four ingredients of dialogue management and phonetic synthesis.Speech understanding partly is that the query statement after the speech recognition is changed into corresponding semantic expressiveness.Yet; speech understanding often can run into such problem; be that user's query statement exists the imperfect of pronunciation variation, identification error and crucial semantic concept that speech recognition brings; how still can obtain the correct result that understands in the situation that obtains the part key message, this just need to improve with fuzzy matching the robustness of system.Common man-machine interaction service all is limited to some specific area, and the data of association area all can be kept in the database.Traditional fuzzy matching algorithm mainly is the reference position of finding out in given text string with the substring of pattern matching, majority is to use editing distance as similarity function, each Chinese character in such method in user's query statement will participate in computing, if sentence comparison is long, then arithmetic speed will reduce greatly.
Summary of the invention
For the problems of the prior art, the purpose of the embodiment of the invention is to provide a kind of Semantic fuzzy matching method, and described method comprises: the text after the speech recognition is carried out feature extraction, obtain characteristic; With condition random field CRF model described characteristic is carried out the identification of named entity, find crucial semantic category in the sentence; Described crucial semantic category is carried out exact matching, when the exact matching failure, carry out fuzzy matching, calculate the similarity of keyword in described crucial semantic category and the dictionary, select the larger keyword of similarity to substitute described crucial semantic category, and carry out the classification mark.
Preferably, the similarity of keyword in the described crucial semantic category of described calculating and the dictionary, specifically comprise, with the twice of the Chinese character number of the common factor of the vocabulary of described crucial semantic category and the keyword number sum divided by all Chinese characters of the vocabulary of described crucial semantic category and keyword, the merchant of gained is larger, and similarity is higher.
Preferably, described CRF model obtains by following steps: according to field structure training data, training data covers the common saying of various spoken languages as far as possible; Training data is marked, namely mark out the classification of substantive noun in the training data; Training data is carried out feature extraction, extract substantive noun; With CRF the substantive noun that extracts is trained, obtain the CRF model.
Preferably, described method also comprises: described crucial semantic category through the classification mark is carried out semantic understanding, provide semantic expressiveness.
Preferably, the keyword that described similarity is larger is the keyword of similarity maximum.
Preferably, described keyword is the dictionary entry.
The method of embodiment of the invention utilization statistics, be CRF (conditional random field, condition random field) carries out sequence labelling, crucial semantic category in the query statement is tentatively marked and locates, dwindle the scope of fuzzy matching, and then according to the field dictionary, carrying out similarity calculates, dictionary entry with the similarity maximum replaces the crucial semantic category of makeing mistakes in user's inquiry, has reduced operand, has improved the speed of identification.
Description of drawings
Fig. 1 is the speech understanding system schematic of the embodiment of the invention;
Fig. 2 is the schematic flow sheet of the Semantic fuzzy matching method of the embodiment of the invention.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is carried out detailed, clear, complete explanation.Obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making all other embodiment that obtain under the creative work prerequisite.
Fig. 1 is the speech understanding system schematic of the embodiment of the invention.Among Fig. 1, semantic coupling and understanding system comprise speech recognition system, semantic category mark part, semantic understanding part.Wherein the backup of semantic category mark comprises again three unit: feature extraction unit, exact matching unit, fuzzy matching unit.Wherein feature extraction unit need to CRF model cooperating.
Particularly, semantic category mark part need to be carried out feature extraction to the text after the speech recognition, then carry out the identification of named entity by a CRF model that trains, find semantic concept crucial in the sentence, send into exact matching and partly carry out the classification mark, if the exact matching failure, then enter fuzzy matching, by calculating the similarity of keyword in the substantive noun marked and the dictionary, select optimum vocabulary to revise, and carry out classification and mark.Then send into the semantic understanding part, provide the semantic expressiveness of this sentence, feed back to the user by Query Database.Being noted that the voice here can be people's voice, can be natural-sounding also, is not particularly limited at this.
Here adopt the CRF graph model of chain structure, note observation string for W=(w1, w2 ... wn), string (state) sequence be Y=(y1, y2 ... yn), it is defined as follows:
F wherein
kFundamental function, λ
kBe the weight of characteristic of correspondence function, t is mark, and Z (W) is normalized factor, so that above-mentioned probability distribution is between (0,1).
The model parameter estimation of CRF is finished with the L-BFGS algorithm usually.The decode procedure of CRF is the process of finding the solution unknown string mark, needs to search for a maximum joint probability of calculating on this string, that is:
Y
*=argmax
YP(Y|W) (2)
On linear chain CRF, this calculation task can be finished with the Viterbi algorithm.
According to the training data of field structure CRF, data will cover the common saying of various spoken languages as far as possible, and will comprise the various fields that use in the native system.
Training data is marked, namely mark out the classification of the substantive noun in each query statement.
Feature extraction, in order better to extract the various substantive nouns (comprising name and other nouns) that relate to, according to the characteristics of Chinese personal name word-building, we have set up the everyday character dictionary of using word and name about the surname of Chinese personal name, are used for the structural attitude template.Simultaneously for name and video display name are extracted more accurately, counted individual character and the double word that appears at name and video display name front and back position by mass data, set up name and field name about finger circle word dictionary, carry out the extraction of feature.Refer to about described that boundary's word dictionary refers to the vocabulary that appears at name or field name the right and left in a word.Such as: I want to listen the song of Liu De China.Liu Dehua is name, and the left margin word that appears at Liu Dehua is " listening ", and the right margin word is " ", refers to about being also can be called border, left and right sides word by boundary's word.
With CRF the training data that has extracted feature is trained, obtain a CRF model.That be noted that the training of condition random field uses is Open-Source Tools CRF++; The roughly step of training comprises: carry out the extraction of feature according to the form of training text because for be spoken, word may be introduced the mistake of participle as research object, so select individual character to carry out feature extraction as research object; Select which feature not only to depend on and also depend on template file in the instrument for the training text that has extracted feature, namely except the individual character feature, also will use the assemblage characteristic between the feature; Can obtain a model file after the training; The process of test is to prepare the file of a test, needs equally to extract feature, and form must be the same with the text of training, then tests with the training good model, obtains the annotation results for each word.
For the query statement of user input, carry out feature extraction and carry out Entity recognition with the CRF model that has trained with said method, Primary Location the crucial semantic category in the sentence.
Whether the crucial semantic category of having had good positioning may be wrong, also may not have mistake, at this moment at first carries out exact matching, namely judges the semantic category of CRF identification, exist in the dictionary of field, if there is no then carries out fuzzy matching.
With the Dice similarity semantic category and the entry in the dictionary of field that CRF identifies carried out similarity calculating, the Dice calculating formula of similarity is as follows:
The twice of the Chinese character number of occuring simultaneously with two vocabulary remove with two vocabulary length and.Seek the entry of similarity maximum the mistake in the former sentence is replaced, just finished the fuzzy matching of semantic category.
Fig. 2 is the schematic flow sheet of the Semantic fuzzy matching method of the embodiment of the invention.As shown in Figure 2, described method comprises: step 200, extract characteristic; Be specially: the text after the speech recognition is carried out feature extraction, obtain characteristic; Step 202 is obtained crucial semantic category; Be specially: with condition random field CRF model described characteristic is carried out the identification of named entity, find crucial semantic category; Step 204, exact matching, be specially described crucial semantic category is carried out exact matching, when the exact matching success, described crucial semantic category is carried out the classification mark, and enter step 208, semantic understanding is specially described crucial semantic category through the classification mark is carried out semantic understanding, provides semantic expressiveness.In step 204, when the exact matching failure, enter step 206, carry out fuzzy matching, calculate the similarity of keyword in described crucial semantic category and the dictionary, select the larger keyword of similarity to substitute described crucial semantic category, and carry out the classification mark, enter again subsequently step 208.
Preferably, the similarity of keyword in the described crucial semantic category of described calculating and the dictionary, specifically comprise, with the twice of the Chinese character number of the common factor of the vocabulary of described crucial semantic category and the keyword number sum divided by all Chinese characters of the vocabulary of described crucial semantic category and keyword, the merchant of gained is larger, and similarity is higher.
Preferably, described CRF model obtains by following steps: according to field structure training data, training data covers the common saying of various spoken languages as far as possible; Training data is marked, namely mark out the classification of substantive noun in the training data; Training data is carried out feature extraction, extract substantive noun; With CRF the substantive noun that extracts is trained, obtain the CRF model.
Preferably, the keyword that described similarity is larger is the keyword of similarity maximum.
Preferably, described keyword is the dictionary entry.
The method of embodiment of the invention utilization statistics, be CRF (conditional random field, condition random field) carries out sequence labelling, crucial semantic category in the query statement is tentatively marked and locates, dwindle the scope of fuzzy matching, and then according to the field dictionary, carrying out similarity calculates, dictionary entry with the similarity maximum replaces the crucial semantic category of makeing mistakes in user's inquiry, has reduced operand, has improved the speed of identification.
Those skilled in the art should further recognize, unit and the algorithm steps of each example of describing in conjunction with embodiment disclosed herein, can realize with electronic hardware, computer software or the combination of the two, for the interchangeability of hardware and software clearly is described, composition and the step of each example described in general manner according to function in the above description.These functions are carried out with hardware or software mode actually, depend on application-specific and the design constraint of technical scheme.Those skilled in the art can specifically should be used for realizing described function with distinct methods to each, but this realization should not thought the scope that exceeds the application.
The method of describing in conjunction with embodiment disclosed herein or the step of algorithm can use the software module of hardware, processor execution, and perhaps the combination of the two is implemented.Software module can place the storage medium of any other form known in random access memory (RAM), internal memory, ROM (read-only memory) (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or the technical field.
Above-described embodiment; purpose of the present invention, technical scheme and beneficial effect are further described; institute is understood that; the above only is the specific embodiment of the present invention; and be not used in the protection domain that limits the application; all within the application's spirit and principle, any modification of making, be equal to replacement, improvement etc., all should be included within the application's the protection domain.
Claims (6)
1. a Semantic fuzzy matching method is characterized in that, described method comprises:
Text after the speech recognition is carried out feature extraction, obtain characteristic;
With condition random field CRF model described characteristic is carried out the identification of named entity, find crucial semantic category;
Described crucial semantic category is carried out exact matching, when the exact matching failure, carry out fuzzy matching, calculate the similarity of keyword in described crucial semantic category and the dictionary, select the larger keyword of similarity to substitute described crucial semantic category, and carry out the classification mark.
2. Semantic fuzzy matching method as claimed in claim 1, it is characterized in that, the similarity of keyword in the described crucial semantic category of described calculating and the dictionary, specifically comprise, with the twice of the Chinese character number of the common factor of the vocabulary of described crucial semantic category and the keyword number sum divided by all Chinese characters of the vocabulary of described crucial semantic category and keyword, the merchant of gained is larger, and similarity is higher.
3. Semantic fuzzy matching method as claimed in claim 1 is characterized in that, described CRF model obtains by following steps:
According to field structure training data, training data covers the common saying of various spoken languages as far as possible;
Training data is marked, namely mark out the classification of substantive noun in the training data;
Training data is carried out feature extraction, extract substantive noun;
With CRF the substantive noun that extracts is trained, obtain the CRF model.
4. such as the described Semantic fuzzy matching method of one of claim 1-3, it is characterized in that described method also comprises: described crucial semantic category through the classification mark is carried out semantic understanding, provide semantic expressiveness.
5. such as the described Semantic fuzzy matching method of one of claim 1-3, it is characterized in that the keyword that described similarity is larger is the keyword of similarity maximum.
6. such as the described Semantic fuzzy matching method of one of claim 1-3, it is characterized in that described keyword is the dictionary entry.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012105438390A CN103020230A (en) | 2012-12-14 | 2012-12-14 | Semantic fuzzy matching method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012105438390A CN103020230A (en) | 2012-12-14 | 2012-12-14 | Semantic fuzzy matching method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103020230A true CN103020230A (en) | 2013-04-03 |
Family
ID=47968834
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012105438390A Pending CN103020230A (en) | 2012-12-14 | 2012-12-14 | Semantic fuzzy matching method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103020230A (en) |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103870596A (en) * | 2014-03-31 | 2014-06-18 | 江南大学 | Enhanced constraint conditional random field model for Web object information extraction |
CN104268249A (en) * | 2014-09-30 | 2015-01-07 | 珠海市君天电子科技有限公司 | System file identification method and system |
CN104505090A (en) * | 2014-12-15 | 2015-04-08 | 北京国双科技有限公司 | Method and device for voice recognizing sensitive words |
CN104933152A (en) * | 2015-06-24 | 2015-09-23 | 北京京东尚科信息技术有限公司 | Named entity recognition method and device |
US20150279357A1 (en) * | 2014-03-31 | 2015-10-01 | NetTalk.com, Inc. | System and method for processing flagged words or phrases in audible communications |
CN105718586A (en) * | 2016-01-26 | 2016-06-29 | 中国人民解放军国防科学技术大学 | Word division method and device |
CN106326258A (en) * | 2015-06-26 | 2017-01-11 | 中兴通讯股份有限公司 | A URL matching method and device |
CN106571139A (en) * | 2016-11-09 | 2017-04-19 | 百度在线网络技术(北京)有限公司 | Artificial intelligence based voice search result processing method and device |
CN106897266A (en) * | 2017-02-16 | 2017-06-27 | 北京光年无限科技有限公司 | For the text handling method and system of intelligent robot |
CN106909600A (en) * | 2016-07-07 | 2017-06-30 | 阿里巴巴集团控股有限公司 | The collection method and device of user context information |
CN107195301A (en) * | 2017-05-19 | 2017-09-22 | 深圳市优必选科技有限公司 | The method and device of intelligent robot semantic processes |
CN107391482A (en) * | 2017-07-12 | 2017-11-24 | 成都准星云学科技有限公司 | A kind of method that fuzzy matching and beta pruning are carried out based on sentence mould |
CN108376140A (en) * | 2017-06-30 | 2018-08-07 | 勤智数码科技股份有限公司 | Government data carding method based on fuzzy matching and device |
CN108388559A (en) * | 2018-02-26 | 2018-08-10 | 中译语通科技股份有限公司 | Name entity recognition method and system, computer program of the geographical space under |
CN109145529A (en) * | 2018-09-12 | 2019-01-04 | 重庆工业职业技术学院 | A kind of text similarity analysis method and system for copyright authentication |
CN109241239A (en) * | 2018-07-26 | 2019-01-18 | 四川长虹电器股份有限公司 | Investigate the text similarity matching process of character arranging sequence |
EP3318978A4 (en) * | 2015-06-30 | 2019-02-20 | Yutou Technology (Hangzhou) Co., Ltd. | System and method for semantic analysis of speech |
CN109408626A (en) * | 2018-11-09 | 2019-03-01 | 苏州思必驰信息科技有限公司 | The method and device that natural language is handled |
CN109635009A (en) * | 2018-12-27 | 2019-04-16 | 北京航天智造科技发展有限公司 | Fuzzy matching inquiry system and method |
CN110619866A (en) * | 2018-06-19 | 2019-12-27 | 普天信息技术有限公司 | Speech synthesis method and device |
CN110675870A (en) * | 2019-08-30 | 2020-01-10 | 深圳绿米联创科技有限公司 | Voice recognition method and device, electronic equipment and storage medium |
CN110969022A (en) * | 2018-09-29 | 2020-04-07 | 北京国双科技有限公司 | Semantic determination method and related equipment |
CN111221995A (en) * | 2019-10-10 | 2020-06-02 | 南昌市微轲联信息技术有限公司 | Sequence matching method based on big data and set theory |
CN111737979A (en) * | 2020-06-18 | 2020-10-02 | 龙马智芯(珠海横琴)科技有限公司 | Keyword correction method, device, correction equipment and storage medium for voice text |
CN112613320A (en) * | 2019-09-19 | 2021-04-06 | 北京国双科技有限公司 | Method and device for acquiring similar sentences, storage medium and electronic equipment |
CN113205813A (en) * | 2021-04-01 | 2021-08-03 | 北京华宇信息技术有限公司 | Error correction method for speech recognition text |
CN113539270A (en) * | 2021-07-22 | 2021-10-22 | 阳光保险集团股份有限公司 | Position identification method and device, electronic equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101645064A (en) * | 2008-12-16 | 2010-02-10 | 中国科学院声学研究所 | Superficial natural spoken language understanding system and method thereof |
CN101770453A (en) * | 2008-12-31 | 2010-07-07 | 华建机器翻译有限公司 | Chinese text coreference resolution method based on domain ontology through being combined with machine learning model |
-
2012
- 2012-12-14 CN CN2012105438390A patent/CN103020230A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101645064A (en) * | 2008-12-16 | 2010-02-10 | 中国科学院声学研究所 | Superficial natural spoken language understanding system and method thereof |
CN101770453A (en) * | 2008-12-31 | 2010-07-07 | 华建机器翻译有限公司 | Chinese text coreference resolution method based on domain ontology through being combined with machine learning model |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150279357A1 (en) * | 2014-03-31 | 2015-10-01 | NetTalk.com, Inc. | System and method for processing flagged words or phrases in audible communications |
US10242664B2 (en) * | 2014-03-31 | 2019-03-26 | NetTalk.com, Inc. | System and method for processing flagged words or phrases in audible communications |
CN103870596A (en) * | 2014-03-31 | 2014-06-18 | 江南大学 | Enhanced constraint conditional random field model for Web object information extraction |
CN104268249B (en) * | 2014-09-30 | 2018-04-27 | 珠海市君天电子科技有限公司 | A kind of recognition methods of system file and system |
CN104268249A (en) * | 2014-09-30 | 2015-01-07 | 珠海市君天电子科技有限公司 | System file identification method and system |
CN104505090A (en) * | 2014-12-15 | 2015-04-08 | 北京国双科技有限公司 | Method and device for voice recognizing sensitive words |
CN104933152B (en) * | 2015-06-24 | 2018-09-14 | 北京京东尚科信息技术有限公司 | Name entity recognition method and device |
CN104933152A (en) * | 2015-06-24 | 2015-09-23 | 北京京东尚科信息技术有限公司 | Named entity recognition method and device |
CN106326258A (en) * | 2015-06-26 | 2017-01-11 | 中兴通讯股份有限公司 | A URL matching method and device |
CN106326258B (en) * | 2015-06-26 | 2022-04-08 | 中兴通讯股份有限公司 | URL matching method and device |
EP3318978A4 (en) * | 2015-06-30 | 2019-02-20 | Yutou Technology (Hangzhou) Co., Ltd. | System and method for semantic analysis of speech |
CN105718586A (en) * | 2016-01-26 | 2016-06-29 | 中国人民解放军国防科学技术大学 | Word division method and device |
CN105718586B (en) * | 2016-01-26 | 2018-12-28 | 中国人民解放军国防科学技术大学 | The method and device of participle |
CN106909600A (en) * | 2016-07-07 | 2017-06-30 | 阿里巴巴集团控股有限公司 | The collection method and device of user context information |
US10936636B2 (en) | 2016-07-07 | 2021-03-02 | Advanced New Technologies Co., Ltd. | Collecting user information from computer systems |
CN106571139A (en) * | 2016-11-09 | 2017-04-19 | 百度在线网络技术(北京)有限公司 | Artificial intelligence based voice search result processing method and device |
CN106571139B (en) * | 2016-11-09 | 2019-10-15 | 百度在线网络技术(北京)有限公司 | Phonetic search result processing method and device based on artificial intelligence |
CN106897266A (en) * | 2017-02-16 | 2017-06-27 | 北京光年无限科技有限公司 | For the text handling method and system of intelligent robot |
CN107195301A (en) * | 2017-05-19 | 2017-09-22 | 深圳市优必选科技有限公司 | The method and device of intelligent robot semantic processes |
CN108376140A (en) * | 2017-06-30 | 2018-08-07 | 勤智数码科技股份有限公司 | Government data carding method based on fuzzy matching and device |
CN107391482A (en) * | 2017-07-12 | 2017-11-24 | 成都准星云学科技有限公司 | A kind of method that fuzzy matching and beta pruning are carried out based on sentence mould |
CN108388559A (en) * | 2018-02-26 | 2018-08-10 | 中译语通科技股份有限公司 | Name entity recognition method and system, computer program of the geographical space under |
CN108388559B (en) * | 2018-02-26 | 2021-11-19 | 中译语通科技股份有限公司 | Named entity identification method and system under geographic space application and computer program |
CN110619866A (en) * | 2018-06-19 | 2019-12-27 | 普天信息技术有限公司 | Speech synthesis method and device |
CN109241239A (en) * | 2018-07-26 | 2019-01-18 | 四川长虹电器股份有限公司 | Investigate the text similarity matching process of character arranging sequence |
CN109145529A (en) * | 2018-09-12 | 2019-01-04 | 重庆工业职业技术学院 | A kind of text similarity analysis method and system for copyright authentication |
CN110969022A (en) * | 2018-09-29 | 2020-04-07 | 北京国双科技有限公司 | Semantic determination method and related equipment |
CN110969022B (en) * | 2018-09-29 | 2023-10-27 | 北京国双科技有限公司 | Semantic determining method and related equipment |
CN109408626A (en) * | 2018-11-09 | 2019-03-01 | 苏州思必驰信息科技有限公司 | The method and device that natural language is handled |
CN109635009B (en) * | 2018-12-27 | 2023-09-15 | 北京航天智造科技发展有限公司 | Fuzzy matching inquiry system |
CN109635009A (en) * | 2018-12-27 | 2019-04-16 | 北京航天智造科技发展有限公司 | Fuzzy matching inquiry system and method |
CN110675870A (en) * | 2019-08-30 | 2020-01-10 | 深圳绿米联创科技有限公司 | Voice recognition method and device, electronic equipment and storage medium |
CN112613320A (en) * | 2019-09-19 | 2021-04-06 | 北京国双科技有限公司 | Method and device for acquiring similar sentences, storage medium and electronic equipment |
CN111221995A (en) * | 2019-10-10 | 2020-06-02 | 南昌市微轲联信息技术有限公司 | Sequence matching method based on big data and set theory |
CN111221995B (en) * | 2019-10-10 | 2023-10-03 | 南昌市微轲联信息技术有限公司 | Sequence matching method based on big data and set theory |
CN111737979A (en) * | 2020-06-18 | 2020-10-02 | 龙马智芯(珠海横琴)科技有限公司 | Keyword correction method, device, correction equipment and storage medium for voice text |
CN113205813A (en) * | 2021-04-01 | 2021-08-03 | 北京华宇信息技术有限公司 | Error correction method for speech recognition text |
CN113539270A (en) * | 2021-07-22 | 2021-10-22 | 阳光保险集团股份有限公司 | Position identification method and device, electronic equipment and storage medium |
CN113539270B (en) * | 2021-07-22 | 2024-04-02 | 阳光保险集团股份有限公司 | Position identification method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103020230A (en) | Semantic fuzzy matching method | |
Xue et al. | Normalizing microtext | |
Constant et al. | MWU-aware part-of-speech tagging with a CRF model and lexical resources | |
Benajiba et al. | Arabic named entity recognition: A feature-driven study | |
Orosz et al. | PurePos 2.0: a hybrid tool for morphological disambiguation | |
CN102693279B (en) | Method, device and system for fast calculating comment similarity | |
CN103154936A (en) | Methods and systems for automated text correction | |
CN103309926A (en) | Chinese and English-named entity identification method and system based on conditional random field (CRF) | |
CN104756100A (en) | Intent estimation device and intent estimation method | |
Seraji | Morphosyntactic corpora and tools for Persian | |
Dien et al. | POS-tagger for English-Vietnamese bilingual corpus | |
Kübler et al. | Part of speech tagging for Arabic | |
CN103678288A (en) | Automatic proper noun translation method | |
Kessler et al. | Extraction of terminology in the field of construction | |
CN103646017A (en) | Acronym generating system for naming and working method thereof | |
Sen et al. | Bangla natural language processing: A comprehensive review of classical machine learning and deep learning based methods | |
Arora et al. | Pre-processing of English-Hindi corpus for statistical machine translation | |
Arivazhagan et al. | Labeling the semantic roles of commas | |
Hsieh et al. | Correcting Chinese spelling errors with word lattice decoding | |
CN114970541A (en) | Text semantic understanding method, device, equipment and storage medium | |
Fernández et al. | Identifying relevant phrases to summarize decisions in spoken meetings. | |
Tongtep et al. | Multi-stage automatic NE and pos annotation using pattern-based and statistical-based techniques for thai corpus construction | |
Wang et al. | Whose nickname is this? recognizing politicians from their aliases | |
Henderson et al. | Data-driven methods for spoken language understanding | |
Benajiba et al. | Arabic Word Segmentation for Better Unit of Analysis. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20130403 |