CN104778171A - Character string matching system and method - Google Patents

Character string matching system and method Download PDF

Info

Publication number
CN104778171A
CN104778171A CN201410011078.3A CN201410011078A CN104778171A CN 104778171 A CN104778171 A CN 104778171A CN 201410011078 A CN201410011078 A CN 201410011078A CN 104778171 A CN104778171 A CN 104778171A
Authority
CN
China
Prior art keywords
character strings
module
phrase
character
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410011078.3A
Other languages
Chinese (zh)
Inventor
叶亚明
王威振
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Ctrip Business Co Ltd
Original Assignee
Ctrip Computer Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ctrip Computer Technology Shanghai Co Ltd filed Critical Ctrip Computer Technology Shanghai Co Ltd
Priority to CN201410011078.3A priority Critical patent/CN104778171A/en
Publication of CN104778171A publication Critical patent/CN104778171A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides a character string matching system and method. The character string matching system stores a plurality of key dimensionalities and a plurality of non-key dimensionalities, and each key dimensionality and each non-key dimensionality are both correspondingly provided with weighted values. The character string matching system comprises an input module, a word segmentation module, a labeling module, a comparison module, a computation module and an output module, wherein the input module is used for receiving the input of two character strings; the word segmentation module is used for carrying out word segmentation on the two character strings to obtain word groups; the labeling module is used for labeling the key dimensionality and the non-key dimensionality corresponding to each word group; the comparison module is used for comparing the word groups in the two character strings; and if the two word groups on any one key dimensionality are different, calling the output module to output one piece of character string mismatching information, and otherwise, calling the computation module to calculate a matching rate between the two character strings through a formula, and calling the output module to output the matching rate. The character string matching system can quickly, flexibly and accurately calculate the matching rate between the character strings.

Description

String matching system and method
Technical field
The present invention relates to a kind of string matching system and character string matching method.
Background technology
Due to the difference of the flexible and changeable characteristic of natural language and name style, having different describing modes for same things, at computing machine, is exactly two different character strings.Whether what how to judge two character strings descriptions fast is same things, also just becomes the technical matters that has realistic meaning.
Contact between existing string association degree computing method or more mechanical calculating character string, or be absorbed in numerous and diverse calculating of semantic analysis, cannot fast and flexible, calculate similarity between character string accurately.
Summary of the invention
The technical problem to be solved in the present invention be in order to overcome in prior art cannot fast and flexible, calculate the defect of the similarity between character string accurately, provide a kind of can fast and flexible, calculate the string matching system and method for the similarity between character string accurately.
The present invention solves above-mentioned technical matters by following technical proposals:
The invention provides a kind of string matching system, its feature is, it stores some key dimensions and some non-key dimensions, each key dimension and the equal correspondence of non-key dimension have weighted value, and this string matching system comprises a load module, a word-dividing mode, a labeling module, a comparison module, a computing module and an output module;
This load module is for receiving the input of two character strings;
It is phrase that this word-dividing mode is used for these two character string participles;
This labeling module is for marking key dimension corresponding to each phrase or non-key dimension;
This comparison module is for comparing the phrase in these two character strings, if two phrases in arbitrary key dimension are not identical, call this output module and export a character string not match information, otherwise (specifically referring to that two phrases in all identical or all key dimensions matched of two phrases in arbitrary key dimension are identical but a certain character string lacks the phrase in a certain or some key dimension) calls this computing module, wherein, it is equivalent in meaning that " two phrases are identical " refers to expressed by two phrases, and be not limited to all character strict conformances that two phrases comprise, similarly, " two phrases are not identical " meaning of referring to expressed by two phrases is not identical,
This computing module is used for passing through formula calculate the matching degree between these two character strings, and call this output module and export this matching degree; Wherein P represents the matching degree between these two character strings, and n represents the number that in these two character strings, phrase is identical, a ifor the twice of weighted value corresponding to i-th identical phrase in these two character strings, B is the cumulative sum of the weighted value that in these two character strings, each phrase is corresponding.
Preferably, this string matching system also comprises a processing module, and this processing module is for removing stop-word in these two character strings, correcting the wrongly written or mispronounced characters in these two character strings and the phonetic in these two character strings is replaced by Chinese character.
Preferably, this string matching system storage one comprises the dictionary of multiple word, and this word-dividing mode comprises stroke sub-module and a matching module;
This division module is used for dividing these two character strings;
This matching module is used for the word marked off to mate with all words in this dictionary, if the match is successful, the word this marked off is as this phrase.
Preferably, those key dimensions and non-key dimension are according to the self-defined setting in field.
The present invention also provides a kind of character string matching method, and its feature is, it stores some key dimensions and some non-key dimensions, and each key dimension and the equal correspondence of non-key dimension have weighted value, and this character string matching method comprises the following steps:
S 1, receive the input of two character strings;
S 2, be phrase by these two character string participles;
S 3, mark key dimension corresponding to each phrase or non-key dimension;
S 4, the phrase compared in these two character strings, if two phrases in arbitrary key dimension are not identical, enter step S 5, otherwise enter step S 6;
S 5, export a character string not match information, process ends;
S 6, pass through formula calculate the matching degree between these two character strings, and export this matching degree, process ends; Wherein n represents the number that in these two character strings, phrase is identical, a ifor the twice of weighted value corresponding to i-th identical phrase in these two character strings, B is the cumulative sum of the weighted value that in these two character strings, each phrase is corresponding.
Preferably, step S 1with step S 2between comprise the following steps:
Remove the stop-word in these two character strings, correct the wrongly written or mispronounced characters in these two character strings and the phonetic in these two character strings is replaced by Chinese character.
Preferably, this character string matching method stores the dictionary that comprises multiple word, step S 2comprise the following steps:
S 21, these two character strings are divided;
S 22, the word marked off is mated with all words in this dictionary, if the match is successful, the word this marked off is as this phrase.
Preferably, those key dimensions and non-key dimension are according to the self-defined setting in field.
On the basis meeting this area general knowledge, above-mentioned each optimum condition, can combination in any, obtains the preferred embodiments of the invention.
Positive progressive effect of the present invention is:
The invention provides a kind of string matching system and method, by marking each phrase marked off, by comparing the phrase in key dimension, and output string not match information when adopting " different negative " mode of priority two phrases in arbitrary key dimension not identical, otherwise specifically calculate the matching degree between two character strings.The present invention can fast and flexible, calculate matching degree between two character strings accurately.
Accompanying drawing explanation
Fig. 1 is the structured flowchart of the string matching system of present pre-ferred embodiments.
Fig. 2 is the process flow diagram of the character string matching method of present pre-ferred embodiments.
Embodiment
Mode below by embodiment further illustrates the present invention, but does not therefore limit the present invention among described scope of embodiments.
As shown in Figure 1, the present embodiment provides a kind of string matching system, it stores some key dimensions and some non-key dimensions, those key dimensions and non-key dimension can according to the self-defined settings in field, each key dimension and the equal correspondence of non-key dimension have weighted value, and this string matching system comprises load module 1, processing module 2, word-dividing mode 3, labeling module 4, comparison module 5, computing module 6 and an output module 7.
Above describe the parts that this string matching system comprises, lower mask body introduces the function that each parts realizes:
This load module 1 is for receiving the input of two character strings;
This processing module 2 is for removing stop-word in these two character strings, correcting the wrongly written or mispronounced characters in these two character strings and the phonetic in these two character strings is replaced by Chinese character;
This word-dividing mode 3 is for being phrase by these two character string participles;
This labeling module 4 is for marking key dimension corresponding to each phrase or non-key dimension;
This comparison module 5 is for comparing the phrase in these two character strings, if two phrases in arbitrary key dimension are not identical, call this output module 7 and exports a character string not match information, otherwise call this computing module 6;
This computing module 6 is for passing through formula calculate the matching degree between these two character strings, and call this output module 7 and export this matching degree; Wherein P represents the matching degree between two character strings, and n represents the number that in these two character strings, phrase is identical, a ifor the twice of weighted value corresponding to i-th identical phrase in these two character strings, B is the cumulative sum of the weighted value that in these two character strings, each phrase is corresponding.
Wherein, this word-dividing mode 3 comprises stroke sub-module 31 and a matching module 32 further, this string matching system storage one comprises the dictionary of multiple word, this division module 31 is for dividing these two character strings, this matching module 32 is for mating the word marked off with all words in this dictionary, if the match is successful, the word this marked off is as this phrase.
As shown in Figure 2, the present embodiment additionally provides a kind of character string matching method, and it stores some key dimensions and some non-key dimensions, and each key dimension and the equal correspondence of non-key dimension have weighted value, and this character string matching method comprises the following steps:
The input of step 101, reception two character strings;
Step 102, the stop-word removed in these two character strings, correct the wrongly written or mispronounced characters in these two character strings and the phonetic in these two character strings is replaced by Chinese character;
Step 103, be phrase by these two character string participles, further, this step comprises two steps below: divide these two character strings; Mated with all words in this dictionary by the word marked off, if the match is successful, the word this marked off is as this phrase;
Step 104, mark key dimension corresponding to each phrase or non-key dimension;
Step 105, the phrase compared in these two character strings, if two phrases in arbitrary key dimension are not identical, enter step 106, otherwise enter step 107;
Step 106, export a character string not match information, process ends;
Step 107, pass through formula calculate the matching degree between these two character strings, and export this matching degree, process ends; Wherein n represents the number that in these two character strings, phrase is identical, a ifor the twice of weighted value corresponding to i-th identical phrase in these two character strings, B is the cumulative sum of the weighted value that in these two character strings, each phrase is corresponding.
Namely string matching system and method is described with the matching degree between the Liang Ge hotel title of input below for a concrete example, the present invention is understood better to make those skilled in the art, but the present invention is not limited to the matching degree that can only be applied to and calculate between hotel's title, and the present invention can be applicable to calculate the matching degree in each field between two character strings.
Different fields, dimension set by different application scenarioss are different, and the key dimension wherein extracted is also different.In this example, for field, hotel, its presumable dimension has " city ", " hotel's brand ", " sub-brand name ", " hotel's title descriptor ", " region " and " meaningless word " etc., key dimension is wherein " city ", " hotel's brand ", " sub-brand name " and " region ", and non-key dimension is " hotel's title descriptor " and " meaningless word ".In key dimension, the weighted value of " city " correspondence is 5, and the weighted value of " region " correspondence is 5, and the weighted value of " hotel's brand " correspondence is 10, and the weighted value of " sub-brand name " correspondence is 8.In non-key dimension, the weighted value that " hotel's title descriptor " is corresponding is 1, and the weighted value that " meaningless word " is corresponding is 0.
Dictionary comprises general dictionary and special dictionary, and general dictionary is the most extensive, the prevailing dictionary not distinguishing industry, for industry is general, comprises as administrative region dictionary, natural language dictionary etc.; But special dictionary is a series of less more professional dictionary according to specific industry tissue, and its data volume can not show a candle to general dictionary, but has higher authority than general dictionary on specific area, and it is higher that it adopts probability.In the field, hotel of this example, what it adopted is special dictionary, by the retrieval to special dictionary, according to the segmentation methods of standard, can obtain a series of set with the word composition of semantic label.
This load module 1 receives the input of two character strings, and first character string is " the quick hotel of ru family of Xujiahui, Shanghai ", and second character string is " shop, IBIs Xujiahui China ".This processing module 2 carries out conventional process, remove in first character string " ", the phonetic " ru " in first character string is replaced by Chinese character " as ".
This division module 31 divides these two character strings, be divided in " Shanghai " by first character string, " Xujiahui ", " as family " and " quick hotel ", second character string is divided into " IBIs ", " Xujiahui " and " China ", the word " Shanghai " that this matching module 32 will mark off, " Xujiahui ", " as family ", " quick hotel " " IBIs " and " China " mates with all words in above-mentioned special dictionary, the word " Shanghai " then this marked off after the match is successful, " Xujiahui ", " as family ", " quick hotel " " IBIs " and " China " are as phrase.
This labeling module 4 marks key dimension corresponding to each phrase or non-key dimension, namely key dimension corresponding to phrase in first character string or non-key dimension " Shanghai (city) ", " Xujiahui (region) ", " as family (hotel's brand) " and " quick hotel (hotel's title descriptor) " is marked, the key dimension that the phrase in second character string is corresponding or non-key dimension " IBIs (hotel's brand) ", " Xujiahui (region) " and " Chinese (meaningless word) ".
This comparison module 5 compares the phrase in these two character strings, phrase " Xujiahui " in first character string in key dimension " region " is identical with the phrase " Xujiahui " in second character string, phrase " as family " in first character string in key dimension " hotel's brand " is identical with the phrase " IBIs " in second character string, and (it is identical that " identical " here refers to commercial brand in the brand in field, hotel, namely commercial brand " as family " and " IBIs " are same commercial brand), the phrase in key dimension " city " is there is and the phrase lacked in second character string in key dimension " city " in first character string, then do not compare the phrase in key dimension " city ", by above-mentioned comparison procedure, that two phrases in all key dimensions matched are identical or be the phrase that the second character string lacks in key dimension " city ", and then computing module 6 calculates the matching degree between these two character strings.
Computing module 6 passes through formula the detailed process calculating the matching degree between these two character strings is:
The number that in these two character strings, phrase is identical is 2, weighted value 10 sum 20 that the weighted value 10 of a1 phrase " as the family " correspondence that to be weighted value 5 sum 10, a2 that weighted value 5 that phrase " Xujiahui " in first character string is corresponding is corresponding with the phrase " Xujiahui " in second character string be in first character string is corresponding with the phrase " IBIs " in second character string; B is the cumulative sum of the weighted value that in these two character strings, each phrase is corresponding, and the weighted value 5 of phrase " Shanghai " correspondence namely in first character string adds that the weighted value 5 of phrase " Xujiahui " correspondence that the weighted value 10 of phrase " IBIs " correspondence that the weighted value 1 of phrase " quick hotel " correspondence that the weighted value 10 of phrase " as the family " correspondence that the weighted value 5 of phrase " Xujiahui " correspondence in first character string adds in first character string adds in first character string adds in second character string adds in second character string adds the weighted value 0 of phrase " China " correspondence in second character string.
Matching degree P=(10+20 then between these two character strings)/(5+5+10+1+10+5+0)=83.33%, and call this output module 7 and export this matching degree 83.33%.
The each matching result of this string matching system all goes on record and carries out manual examination and verification, whether the matching result of this string matching system of manual examination and verification is correct, and auditing result is fed back to this string matching system, the auditing result of this string matching system to feedback carries out matching error number and type statistics, and by statistical result showed out.Be in most cases the phrase owing to not having some special in dictionary, the phrase that participle is gone out is incorrect, and then causes matching result also incorrect.So, auditor can carry out supplementary and perfect in artificially to dictionary, the accuracy of the matching result of this string matching system of further increase, if the amount of error of same type is accumulated to certain threshold value, or think that the matching degree exported is unreasonable, then auditor can regulate weight allocation in artificially, such as a certain key dimension or non-key dimension is carried out to the adjustment of weight.
For the coupling of hotel's title of the present embodiment, by the artificial checking to a large amount of actual case, the accuracy rate nearly 92% of the string matching result of this string matching system under initial situation can be seen, after manual examination and verification after a while with adjustment, the accuracy rate of the string matching result of this string matching system brings up to about 97%, and utilize the accuracy rate about 75% that common comparison algorithm (if the shortest editing distance algorithm is the text string comparison algorithm of core) obtains, find out from above-mentioned, the accuracy rate of matching result of the present invention is far away higher than the accuracy rate of common comparison algorithm.
Equally, this string matching system is applied to the coupling of house type title, although the string length of house type title is shorter, difficulty of matching is larger, but by the artificial checking to a large amount of actual case, the accuracy rate nearly 88.3% of the string matching result of this string matching system under initial situation can be seen, after manual examination and verification after a while with adjustment, the accuracy rate of the string matching result of this string matching system brings up to about 94.4%, and utilize the accuracy rate about 70% that common comparison algorithm (if the shortest editing distance algorithm is the text string comparison algorithm of core) obtains, can find out equally from above-mentioned, the accuracy rate of matching result of the present invention is far away higher than the accuracy rate of common comparison algorithm.
The present embodiment is by marking each phrase marked off, by comparing the phrase in key dimension, and output string not match information when adopting " different negative " mode of priority two phrases in arbitrary key dimension not identical, otherwise specifically calculate the matching degree between two character strings.The present invention can fast and flexible, calculate matching degree between two character strings accurately.
Each functional module in the present invention all can be realized in conjunction with existing software programming means under existing hardware condition, therefore does not all repeat its concrete methods of realizing at this.
Although the foregoing describe the specific embodiment of the present invention, it will be understood by those of skill in the art that these only illustrate, protection scope of the present invention is defined by the appended claims.Those skilled in the art, under the prerequisite not deviating from principle of the present invention and essence, can make various changes or modifications to these embodiments, but these change and amendment all falls into protection scope of the present invention.

Claims (8)

1. a string matching system, it is characterized in that, it stores some key dimensions and some non-key dimensions, each key dimension and the equal correspondence of non-key dimension have weighted value, and this string matching system comprises a load module, a word-dividing mode, a labeling module, a comparison module, a computing module and an output module;
This load module is for receiving the input of two character strings;
It is phrase that this word-dividing mode is used for these two character string participles;
This labeling module is for marking key dimension corresponding to each phrase or non-key dimension;
This comparison module is for comparing the phrase in these two character strings, if two phrases in arbitrary key dimension are not identical, call this output module and exports a character string not match information, otherwise call this computing module;
This computing module is used for passing through formula calculate the matching degree between these two character strings, and call this output module and export this matching degree; Wherein n represents the number that in these two character strings, phrase is identical, a ifor the twice of weighted value corresponding to i-th identical phrase in these two character strings, B is the cumulative sum of the weighted value that in these two character strings, each phrase is corresponding.
2. string matching system as claimed in claim 1, it is characterized in that, this string matching system also comprises a processing module, and this processing module is for removing stop-word in these two character strings, correcting the wrongly written or mispronounced characters in these two character strings and the phonetic in these two character strings is replaced by Chinese character.
3. string matching system as claimed in claim 1, it is characterized in that, this string matching system storage one comprises the dictionary of multiple word, and this word-dividing mode comprises stroke sub-module and a matching module;
This division module is used for dividing these two character strings;
This matching module is used for the word marked off to mate with all words in this dictionary, if the match is successful, the word this marked off is as this phrase.
4. as the string matching system in claim 1-3 as described in any one, it is characterized in that, those key dimensions and non-key dimension are according to the self-defined setting in field.
5. a character string matching method, is characterized in that, it stores some key dimensions and some non-key dimensions, and each key dimension and the equal correspondence of non-key dimension have weighted value, and this character string matching method comprises the following steps:
S 1, receive the input of two character strings;
S 2, be phrase by these two character string participles;
S 3, mark key dimension corresponding to each phrase or non-key dimension;
S 4, the phrase compared in these two character strings, if two phrases in arbitrary key dimension are not identical, enter step S 5, otherwise enter step S 6;
S 5, export a character string not match information, process ends;
S 6, pass through formula calculate the matching degree between these two character strings, and export this matching degree, process ends; Wherein n represents the number that in these two character strings, phrase is identical, a ifor the twice of weighted value corresponding to i-th identical phrase in these two character strings, B is the cumulative sum of the weighted value that in these two character strings, each phrase is corresponding.
6. character string matching method as claimed in claim 5, is characterized in that, step S 1with step S 2between comprise the following steps:
Remove the stop-word in these two character strings, correct the wrongly written or mispronounced characters in these two character strings and the phonetic in these two character strings is replaced by Chinese character.
7. character string matching method as claimed in claim 5, is characterized in that, this character string matching method stores the dictionary that comprises multiple word, step S 2comprise the following steps:
S 21, these two character strings are divided;
S 22, the word marked off is mated with all words in this dictionary, if the match is successful, the word this marked off is as this phrase.
8. as the character string matching method in claim 5-7 as described in any one, it is characterized in that, those key dimensions and non-key dimension are according to the self-defined setting in field.
CN201410011078.3A 2014-01-10 2014-01-10 Character string matching system and method Pending CN104778171A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410011078.3A CN104778171A (en) 2014-01-10 2014-01-10 Character string matching system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410011078.3A CN104778171A (en) 2014-01-10 2014-01-10 Character string matching system and method

Publications (1)

Publication Number Publication Date
CN104778171A true CN104778171A (en) 2015-07-15

Family

ID=53619642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410011078.3A Pending CN104778171A (en) 2014-01-10 2014-01-10 Character string matching system and method

Country Status (1)

Country Link
CN (1) CN104778171A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650803A (en) * 2016-12-09 2017-05-10 北京锐安科技有限公司 Method and device for calculating similarity between strings
CN106815197A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 The determination method and apparatus of text similarity
CN108733665A (en) * 2017-04-13 2018-11-02 艺龙网信息技术(北京)有限公司 The sight spot information matching process and device of feature based and semanteme
CN109165326A (en) * 2018-08-16 2019-01-08 蜜小蜂智慧(北京)科技有限公司 A kind of character string matching method and device
CN111340580A (en) * 2020-02-05 2020-06-26 深圳市道旅旅游科技股份有限公司 Method and device for determining house type, computer equipment and storage medium
CN111897958A (en) * 2020-07-16 2020-11-06 邓桦 Ancient poetry classification method based on natural language processing
CN112052424A (en) * 2020-10-12 2020-12-08 腾讯科技(深圳)有限公司 Content auditing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298618A (en) * 2011-08-03 2011-12-28 百度在线网络技术(北京)有限公司 Method for obtaining matching degree to execute corresponding operations and device and equipment
CN102609459A (en) * 2012-01-12 2012-07-25 神州数码网络(北京)有限公司 Method and device for string matching based on regular expression
CN102682120A (en) * 2012-05-15 2012-09-19 合一网络技术(北京)有限公司 Method,device and system for acquiring essential article commented on network
CN102693279A (en) * 2012-04-28 2012-09-26 合一网络技术(北京)有限公司 Method, device and system for fast calculating comment similarity
CN103106264A (en) * 2013-01-29 2013-05-15 河南理工大学 Matching method and matching device of place names

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298618A (en) * 2011-08-03 2011-12-28 百度在线网络技术(北京)有限公司 Method for obtaining matching degree to execute corresponding operations and device and equipment
CN102609459A (en) * 2012-01-12 2012-07-25 神州数码网络(北京)有限公司 Method and device for string matching based on regular expression
CN102693279A (en) * 2012-04-28 2012-09-26 合一网络技术(北京)有限公司 Method, device and system for fast calculating comment similarity
CN102682120A (en) * 2012-05-15 2012-09-19 合一网络技术(北京)有限公司 Method,device and system for acquiring essential article commented on network
CN103106264A (en) * 2013-01-29 2013-05-15 河南理工大学 Matching method and matching device of place names

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王洪俊 等: "跨语言相似文档检索", 《中文信息学报》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815197A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 The determination method and apparatus of text similarity
CN106815197B (en) * 2015-11-27 2020-07-31 北京国双科技有限公司 Text similarity determination method and device
CN106650803A (en) * 2016-12-09 2017-05-10 北京锐安科技有限公司 Method and device for calculating similarity between strings
CN108733665A (en) * 2017-04-13 2018-11-02 艺龙网信息技术(北京)有限公司 The sight spot information matching process and device of feature based and semanteme
CN109165326A (en) * 2018-08-16 2019-01-08 蜜小蜂智慧(北京)科技有限公司 A kind of character string matching method and device
CN111340580A (en) * 2020-02-05 2020-06-26 深圳市道旅旅游科技股份有限公司 Method and device for determining house type, computer equipment and storage medium
CN111897958A (en) * 2020-07-16 2020-11-06 邓桦 Ancient poetry classification method based on natural language processing
CN111897958B (en) * 2020-07-16 2024-03-12 邓桦 Ancient poetry classification method based on natural language processing
CN112052424A (en) * 2020-10-12 2020-12-08 腾讯科技(深圳)有限公司 Content auditing method and device
CN112052424B (en) * 2020-10-12 2024-05-28 腾讯科技(深圳)有限公司 Content auditing method and device

Similar Documents

Publication Publication Date Title
CN104778171A (en) Character string matching system and method
CN109388795B (en) Named entity recognition method, language recognition method and system
CN107220235B (en) Speech recognition error correction method and device based on artificial intelligence and storage medium
Elhamifar et al. Unsupervised procedure learning via joint dynamic summarization
CN106815197A (en) The determination method and apparatus of text similarity
WO2019228466A1 (en) Named entity recognition method, device and apparatus, and storage medium
CN105808530B (en) Interpretation method and device in a kind of statistical machine translation
CN105138507A (en) Pattern self-learning based Chinese open relationship extraction method
US20140032207A1 (en) Information Classification Based on Product Recognition
CN106326303A (en) Spoken language semantic analysis system and method
CN111583905B (en) Voice recognition conversion method and system
CN112905736B (en) Quantum theory-based unsupervised text emotion analysis method
CN109190099B (en) Sentence pattern extraction method and device
CN106980620A (en) A kind of method and device matched to Chinese character string
CN106708798A (en) String segmentation method and device
CN110929510A (en) Chinese unknown word recognition method based on dictionary tree
CN106610937A (en) Information theory-based Chinese automatic word segmentation method
CN115034218A (en) Chinese grammar error diagnosis method based on multi-stage training and editing level voting
CN112633012A (en) Entity type matching-based unknown word replacing method
CN110222338A (en) A kind of mechanism name entity recognition method
CN109214445A (en) A kind of multi-tag classification method based on artificial intelligence
Namysl et al. NAT: Noise-aware training for robust neural sequence labeling
CN106610949A (en) Text feature extraction method based on semantic analysis
CN110110326B (en) Text cutting method based on subject information
CN103744837A (en) Multi-text comparison method based on keyword extraction

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20160302

Address after: 200335 Shanghai city Changning District Admiralty Road No. 968 Building No. 16 10 floor

Applicant after: SHANGHAI XIECHENG BUSINESS CO., LTD.

Address before: 200335 Shanghai City, Changning District Fuquan Road No. 99, Ctrip network technology building

Applicant before: Ctrip computer technology (Shanghai) Co., Ltd.

C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150715

RJ01 Rejection of invention patent application after publication