CN106951437A - Identifying processing method and device suitable for the sensitive words and phrases of multiple Chinese - Google Patents
Identifying processing method and device suitable for the sensitive words and phrases of multiple Chinese Download PDFInfo
- Publication number
- CN106951437A CN106951437A CN201710072161.5A CN201710072161A CN106951437A CN 106951437 A CN106951437 A CN 106951437A CN 201710072161 A CN201710072161 A CN 201710072161A CN 106951437 A CN106951437 A CN 106951437A
- Authority
- CN
- China
- Prior art keywords
- phrases
- sensitive words
- character
- chinese
- identified
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
Abstract
The present invention provides a kind of identifying processing method and device suitable for the sensitive words and phrases of multiple Chinese, and this method includes:Obtain multiple default sensitive words and phrases;Suffix tree is set up according to the sensitive words and phrases;Obtain Chinese text to be identified;The Chinese text to be identified is matched according to the suffix tree;If after the match is successful, obtaining the sensitive words and phrases and output display in the Chinese version to be identified, the characteristics of this method is for Chinese, match time of the pattern string on suffix tree byBring up toReach the saving time and improve matching speed of the pattern string on suffix tree, it is adaptable to the Chinese String matching of multiple sensitive words and phrases.
Description
Technical field
At computer processing technology field, more particularly to a kind of identification suitable for the sensitive words and phrases of multiple Chinese
Manage method and device.
Background technology
The sensitive words and phrases of identification refer to carry out information text the crucial words that sniff is specified using program, check whether against
It is anti-to specify tactful behavior, it is the basis of filtering sensitive words.Need to apply some moulds to fast and accurately search sensitive vocabulary
Formula matching algorithm.
The pattern matching algorithm of pattern string has Aho-Corasick (AC) algorithm, BM algorithms, ACBM algorithms.Wherein, AC is calculated
Multiple pattern strings are converted to tree-shaped finite automaton state machine (DFSA) by method by pretreatment, can to text string run-down
To complete all pattern matchings, the time complexity of matching is O (n+m).The time complexity of BM algorithms is
But Multi-Pattern Matching problem can not be handled.Efficiency under ACBM algorithm fusions AC algorithms and BM algorithm ideas, average case
Better than AC algorithms, time complexity isAlthough ACBM algorithms show excellent in actual applications, for Chinese
Effect is poor and the characteristics of failing the pattern string and Chinese information that make full use of, causes matching speed slower.
The reason for causing poorly efficient is that the fundamental structural unit of English is " word ", and Chinese fundamental structural unit is " word ".
There is very big difference during sensitive words and phrases detection.For English, sensitive word detection is that 26 English alphabets are matched successively, and
It is that Chinese character up to ten thousand is matched successively for Chinese.Therefore string matching algorithm, is become by 26 English alphabets
After Chinese character up to ten thousand, algorithm Expected Results is all unable to reach over time and space.Other Chinese character is multibyte symbol, also simultaneously
Possess the attribute that the English alphabets such as phonetic do not have also not to be fully utilized in the algorithm.
The content of the invention
The present invention provides a kind of identifying processing method and device suitable for the sensitive words and phrases of multiple Chinese, for solving existing skill
The problem of slower to Chinese sensitive word sentence matching speed in art.
In a first aspect, the present invention provides a kind of identifying processing method suitable for the sensitive words and phrases of multiple Chinese, including:
Obtain multiple default sensitive words and phrases;
Suffix tree is set up according to the sensitive words and phrases;
Obtain Chinese text to be identified;
The Chinese text to be identified is matched according to the suffix tree;
If after the match is successful, obtaining the sensitive words and phrases and output display in the Chinese version to be identified.
Alternatively, it is described to set up suffix tree according to the sensitive words and phrases, including:
S21, according to multiple default sensitive words and phrases, establishment model set of strings P (P1,P2,P3,P4,P5...Pn);
S22, one root node of setting, the property value of the root node is the first preset value, and first preset value is any
The arrangement value of phonetic alphabet;
S23, any sensitive words and phrases P chosen in the pattern set of stringsi, the sensitive words and phrases PiString length be
m;
S24, the acquisition sensitive words and phrases PiM-th of character, to m-th of character resolution obtain correspondence phonetic head word
Mother, the arrangement value of the head letter is obtained according to the alphabetical and default phonetic alphabet of the head and the corresponding relation of arrangement value;
Whether S25, the arrangement value for judging the head letter, if being less than, m-th of character are corresponded to less than the first preset value
Node be arranged on the left side of the root node, conversely, being then arranged on the right side of the root node;
S25, the sensitive words and phrases P is obtained successivelyiThe character of m-1, m-2 ... ..., 2,1, circulation step S24-S25
The corresponding node of the character of m-1, m-2 ... ..., 2,1 is arranged on to the child nodes of the character nodes of m, m-1 ... ..., 2
On.
Alternatively, the Chinese text to be identified is matched according to the suffix tree, including:According to the suffix tree
The Chinese text to be identified is matched using BM algorithms.
Alternatively, the sensitive words and phrases include individual character, phrase and sentence.
Alternatively, also include:If after matching is failed, sending prompt message.
Second aspect, the present invention provides a kind of recognition process unit suitable for the sensitive words and phrases of multiple Chinese, including:
First acquisition module, for obtaining multiple default sensitive words and phrases;
Processing module, for setting up suffix tree according to the sensitive words and phrases;
Second acquisition module, for obtaining Chinese text to be identified;
Matching module, for being matched according to the suffix tree to the Chinese text to be identified;
Display module, for after the match is successful, obtaining sensitive words and phrases and output display in the Chinese version to be identified.
Alternatively, the processing module specifically for:
S21, according to multiple default sensitive words and phrases, establishment model set of strings P (P1,P2,P3,P4,P5...Pn);
S22, one root node of setting, the property value of the root node is the first preset value, and first preset value is any
The arrangement value of phonetic alphabet;
S23, any sensitive words and phrases P chosen in the pattern set of stringsi, the sensitive words and phrases PiString length be
m;
S24, the acquisition sensitive words and phrases PiM-th of character, to m-th of character resolution obtain correspondence phonetic head word
Mother, the arrangement value of the head letter is obtained according to the alphabetical and default phonetic alphabet of the head and the corresponding relation of arrangement value;
Whether S25, the arrangement value for judging the head letter, if being less than, m-th of character are corresponded to less than the first preset value
Node be arranged on the left side of the root node, conversely, being then arranged on the right side of the root node;
S25, the sensitive words and phrases P is obtained successivelyiThe character of m-1, m-2 ... ..., 2,1, circulation step S24-S25
The corresponding node of the character of m-1, m-2 ... ..., 2,1 is arranged on to the child nodes of the character nodes of m, m-1 ... ..., 2
On.
Alternatively, the matching module specifically for:According to the suffix tree using BM algorithms to the Chinese to be identified
Text is matched.
Alternatively, the sensitive words and phrases include individual character, phrase and sentence.
Alternatively, the display module is additionally operable to:After matching is failed, prompt message is sent.
As shown from the above technical solution, the identifying processing method and device of the sensitive words and phrases of multiple Chinese of the invention, passes through
The multiple default sensitive words and phrases of acquisition are parsed and use pinyin character arrangement value to set up suffix tree, Chinese text to be identified is being obtained
After this, the Chinese text to be identified is matched according to suffix tree, according to the matching of the alphanumeric arrangement value branch of character, when
With the sensitive words and phrases and output display obtained after success in the Chinese version to be identified, the characteristics of accomplishing for Chinese, pattern
Go here and there match time on suffix tree byBring up toReach the saving time and improve pattern string rear
Sew the matching speed on tree, it is adaptable to the Chinese String matching of multiple sensitive words and phrases.
Brief description of the drawings
Fig. 1 is the flow signal for the identifying processing method for being suitable to the sensitive words and phrases of multiple Chinese that the embodiment of the present invention 1 is provided
Figure;
Fig. 2 is the block diagram of suffix tree provided in an embodiment of the present invention;
Fig. 3 is the structural representation for the recognition process unit for being suitable to the sensitive words and phrases of multiple Chinese that the embodiment of the present invention 2 is provided
Figure.
Embodiment
With reference to the accompanying drawings and examples, the embodiment to the present invention is described in further detail.Implement below
Example is used to illustrate the present invention, but is not limited to the scope of the present invention.
Fig. 1 shows that the embodiment of the present invention 1 provides a kind of identifying processing method suitable for the sensitive words and phrases of multiple Chinese, bag
Include:
S11, the multiple default sensitive words and phrases of acquisition.
In this step, it is necessary to which explanation, in embodiments of the present invention, the sensitive words and phrases are prior default word
Sentence.Typically it may include to include individual character, phrase and sentence.Individual character such as " stupid ", " stupid ".Phrase such as " wretch " " violence ".Sentence such as " I
Disagreeable China ".
S12, according to the sensitive words and phrases set up suffix tree.
In this step, it is necessary to illustrate, in embodiments of the present invention, for the ease of being matched subsequently from text message
Sensitive words and phrases, need to set up suffix tree, specific as follows:
S21, according to multiple default sensitive words and phrases, establishment model set of strings P (P1,P2,P3,P4,P5...Pn);
S22, one root node of setting, the property value of the root node is the first preset value, and first preset value is any
The arrangement value of phonetic alphabet;
S23, any sensitive words and phrases P chosen in the pattern set of stringsi, the sensitive words and phrases PiString length be
m;
S24, the acquisition sensitive words and phrases PiM-th of character, to m-th of character resolution obtain correspondence phonetic head word
Mother, the arrangement value of the head letter is obtained according to the alphabetical and default phonetic alphabet of the head and the corresponding relation of arrangement value;
Whether S25, the arrangement value for judging the head letter, if being less than, m-th of character are corresponded to less than the first preset value
Node be arranged on the left side of the root node, conversely, being then arranged on the right side of the root node;
S25, the sensitive words and phrases P is obtained successivelyiThe character of m-1, m-2 ... ..., 2,1, circulation step S24-S25
The corresponding node of the character of m-1, m-2 ... ..., 2,1 is arranged on the character nodes of m, m-1 ... ..., 2.
Above-mentioned steps are explained with specific example:
As illustrated in fig. 2, it is assumed that pattern set of strings is P (P1,P2,P3,P4), P1For " stupid ", P2For " pornographic ", P3For " you like
France ", P4For " his Francophobe ".
One root node is set, and the property value of the root node is 13.
Obtain sensitive words and phrases P1, the sensitive words and phrases P1String length be 1.Parsing " stupid " to character obtains correspondence phonetic
Head it is alphabetical " b ", the head is obtained according to the alphabetical and default phonetic alphabet of the head and the corresponding relation of arrangement value alphabetical
Arrangement value " 2 ".Judge that " 2 " are less than the property value " 13 " of root node, then the node of character " stupid " is arranged on to the left side of root node
It is used as the child nodes of root node.
Obtain sensitive words and phrases P2, the sensitive words and phrases P2String length be 2.Correspondence is obtained to the 2nd character " feelings " parsing
The head of phonetic is alphabetical " q ", and the head word is obtained according to the alphabetical and default phonetic alphabet of the head and the corresponding relation of arrangement value
Female arrangement value " 17 ", judges that " 17 " are more than the property value " 13 " of root node, then the node of character " feelings " is arranged on into root node
Right side as root node child nodes.The head for obtaining correspondence phonetic to the 1st character " color " parsing is alphabetical " s ", according to institute
The arrangement value " 19 " that the alphabetical and default phonetic alphabet of head obtain the head letter with the corresponding relation of arrangement value is stated, is judged " 19 "
More than the property value " 17 " of character " feelings " node, then the right side that the node of character " color " is arranged on to character " feelings " node is used as root
The child nodes of node.
Obtain sensitive words and phrases P3, the sensitive words and phrases P3String length be 4.Correspondence is obtained to the 4th character " state " parsing
The head of phonetic is alphabetical " g ", and the head word is obtained according to the alphabetical and default phonetic alphabet of the head and the corresponding relation of arrangement value
Female arrangement value " 7 ", judges that " 7 " are less than the property value " 13 " of root node, then the node of character " state " is arranged on into root node
Left side as root node child nodes.Above-mentioned processing is done to " method ", " love ", " you " successively, be will not be repeated here, it is seen that Fig. 2
It is shown.
Obtain sensitive words and phrases P4, the sensitive words and phrases P4String length be 5.Correspondence is obtained to the 5th character " state " parsing
The head of phonetic is alphabetical " g ", and the head word is obtained according to the alphabetical and default phonetic alphabet of the head and the corresponding relation of arrangement value
Female arrangement value " 7 ", judges that " 7 " are less than the property value " 13 " of root node, then the node of character " state " is arranged on into root node
Left side as root node child nodes.Above-mentioned processing is done to " method " " detesting ", " begging for ", " he " successively, be will not be repeated here, it is seen that
Shown in Fig. 2.
S13, acquisition Chinese text to be identified.
In this step, it is necessary to explanation, in embodiments of the present invention, Chinese text to be identified can be publish an article or
Comment on message etc..
S14, according to the suffix tree Chinese text to be identified is matched.
In this step, it is necessary to illustrate, as character string PiLength be more than text character length, now, just can not
Character string P can be found from the texti, therefore, the character length of Chinese text is more than pattern string PiCharacter length, i.e. len
(T) > maxlen (Pi)。
The suffix tree is matched using BM algorithms to the Chinese text to be identified, specifically be may include:
(1) according to most short pattern string PiLength minlen (Pi) selected target string minlen (Pi) position for starting
Matched position, BM algorithmic match is carried out using the tree.
(2) if during some charactor comparison mismatch, using two heuristic rules, i.e. batter's symbol rule is become reconciled suffix rule
Then.
(3) if during the matching of some charactor comparison, first judging the size of the first from left character and the matching character, size according to
The phonetic assignment of character.If the first from left character is less than the matching character, go in left-side child nodes to search, if it is not,
Then go in right-side child nodes to be searched.
If S15, after the match is successful, obtaining the sensitive words and phrases and output display in the Chinese version to be identified.
If in addition, after matching is failed, prompt message can be sent, to point out to be used for that Chinese text can be delivered.
What the embodiment of the present invention 1 was provided is suitable to the identifying processing method of the sensitive words and phrases of multiple Chinese, by multiple to obtaining
Default sensitive words and phrases parse and use pinyin character arrangement value to set up suffix tree, after Chinese text to be identified is obtained, according to
Suffix tree matches to the Chinese text to be identified, is matched according to the alphanumeric arrangement value branch of character, after the match is successful
The sensitive words and phrases and output display in the Chinese version to be identified are obtained, the characteristics of accomplishing for Chinese, pattern string in suffix
Match time on tree byBring up toReach the saving time and improve pattern string on suffix tree
Matching speed, it is adaptable to the Chinese String matching of multiple sensitive words and phrases.
Fig. 3 shows a kind of recognition process unit suitable for the sensitive words and phrases of multiple Chinese that the embodiment of the present invention 2 is provided, bag
The first acquisition module 21, processing module 22, the second acquisition module 23, matching module 24 and display module 25 are included, wherein:
First acquisition module 21, for obtaining multiple default sensitive words and phrases;
Processing module 22, for setting up suffix tree according to the sensitive words and phrases;
Second acquisition module 23, for obtaining Chinese text to be identified;
Matching module 24, for being matched according to the suffix tree to the Chinese text to be identified;
Display module 25, for after the match is successful, obtaining the sensitive words and phrases in the Chinese version to be identified and exporting aobvious
Show.
The processing module specifically for:
S21, according to multiple default sensitive words and phrases, establishment model set of strings P (P1,P2,P3,P4,P5...Pn);
S22, one root node of setting, the property value of the root node is the first preset value, and first preset value is any
The arrangement value of phonetic alphabet;
S23, any sensitive words and phrases P chosen in the pattern set of stringsi, the sensitive words and phrases PiString length be
m;
S24, the acquisition sensitive words and phrases PiM-th of character, to m-th of character resolution obtain correspondence phonetic head word
Mother, the arrangement value of the head letter is obtained according to the alphabetical and default phonetic alphabet of the head and the corresponding relation of arrangement value;
Whether S25, the arrangement value for judging the head letter, if being less than, m-th of character are corresponded to less than the first preset value
Node be arranged on the left side of the root node, conversely, being then arranged on the right side of the root node;
S25, the sensitive words and phrases P is obtained successivelyiThe character of m-1, m-2 ... ..., 2,1, circulation step S24-S25
The corresponding node of the character of m-1, m-2 ... ..., 2,1 is arranged on to the child nodes of the character nodes of m, m-1 ... ..., 2
On.
Because the described device of the embodiment of the present invention 2 is identical with the principle of above-described embodiment methods described, in further detail
Explanation content will not be repeated here.
It should be noted that can be by hardware processor (hardware processor) come real in the embodiment of the present invention
Existing related function module.
What the embodiment of the present invention 2 was provided is suitable to the recognition process unit of the sensitive words and phrases of multiple Chinese, by multiple to obtaining
Default sensitive words and phrases parse and use pinyin character arrangement value to set up suffix tree, after Chinese text to be identified is obtained, according to
Suffix tree matches to the Chinese text to be identified, is matched according to the alphanumeric arrangement value branch of character, after the match is successful
The sensitive words and phrases and output display in the Chinese version to be identified are obtained, the characteristics of accomplishing for Chinese, pattern string in suffix
Match time on tree byBring up toReach the saving time and improve pattern string on suffix tree
Matching speed, it is adaptable to the Chinese String matching of multiple sensitive words and phrases.
Although in addition, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments
In included some features rather than further feature, but the combination of the feature of be the same as Example does not mean in of the invention
Within the scope of and form different embodiments.For example, in the following claims, times of embodiment claimed
One of meaning mode can be used in any combination.
It should be noted that the present invention will be described rather than limits the invention for above-described embodiment, and ability
Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between bracket should not be configured to limitations on claims.Word "comprising" is not excluded the presence of not
Element listed in the claims or step.Word "a" or "an" before element does not exclude the presence of multiple such
Element.The present invention can be by means of including the hardware of some different elements and coming real by means of properly programmed computer
It is existing.In if the unit claim of equipment for drying is listed, several in these devices can be by same hardware branch
To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame
Claim.
One of ordinary skill in the art will appreciate that:Various embodiments above is merely illustrative of the technical solution of the present invention, and
It is non-that it is limited;Although the present invention is described in detail with reference to foregoing embodiments, one of ordinary skill in the art
It should be understood that:It can still modify to the technical scheme described in foregoing embodiments, or to which part or
All technical characteristic carries out equivalent;And these modifications or replacement, the essence of appropriate technical solution is departed from this hair
Bright claim limited range.
Claims (10)
1. a kind of identifying processing method suitable for the sensitive words and phrases of multiple Chinese, it is characterised in that including:
Obtain multiple default sensitive words and phrases;
Suffix tree is set up according to the sensitive words and phrases;
Obtain Chinese text to be identified;
The Chinese text to be identified is matched according to the suffix tree;
If after the match is successful, obtaining the sensitive words and phrases and output display in the Chinese version to be identified.
2. according to the method described in claim 1, it is characterised in that described to set up suffix tree according to the sensitive words and phrases, including:
S21, according to multiple default sensitive words and phrases, establishment model set of strings P (P1,P2,P3,P4,P5...Pn);
S22, one root node of setting, the property value of the root node is the first preset value, and first preset value is any phonetic
The arrangement value of letter;
S23, any sensitive words and phrases P chosen in the pattern set of stringsi, the sensitive words and phrases PiString length be m;
S24, the acquisition sensitive words and phrases PiM-th of character, to m-th of character resolution obtain correspondence phonetic head letter, according to
The alphabetical and default phonetic alphabet of the head obtain the arrangement value of the head letter with the corresponding relation of arrangement value;
S25, judge whether the arrangement value of head letter is less than the first preset value, if being less than, by the corresponding section of m-th of character
Point is arranged on the left side of the root node, conversely, being then arranged on the right side of the root node;
S25, the sensitive words and phrases P is obtained successivelyiThe character of m-1, m-2 ... ..., 2,1, circulation step S24-S25 is by m-
1, m-2 ... ..., 2,1 corresponding nodes of character are arranged in the child nodes of the character nodes of m, m-1 ... ..., 2.
3. according to the method described in claim 1, it is characterised in that the Chinese text to be identified is entered according to the suffix tree
Row matching, including:The Chinese text to be identified is matched using BM algorithms according to the suffix tree.
4. according to the method described in claim 1, it is characterised in that the sensitive words and phrases include individual character, phrase and sentence.
5. according to the method described in claim 1, it is characterised in that if after matching is failed, sending prompt message.
6. a kind of recognition process unit suitable for the sensitive words and phrases of multiple Chinese, it is characterised in that including:
First acquisition module, for obtaining multiple default sensitive words and phrases;
Processing module, for setting up suffix tree according to the sensitive words and phrases;
Second acquisition module, for obtaining Chinese text to be identified;
Matching module, for being matched according to the suffix tree to the Chinese text to be identified;
Display module, for after the match is successful, obtaining sensitive words and phrases and output display in the Chinese version to be identified.
7. device according to claim 6, it is characterised in that the processing module specifically for:
S21, according to multiple default sensitive words and phrases, establishment model set of strings P (P1,P2,P3,P4,P5...Pn);
S22, one root node of setting, the property value of the root node is the first preset value, and first preset value is any phonetic
The arrangement value of letter;
S23, any sensitive words and phrases P chosen in the pattern set of stringsi, the sensitive words and phrases PiString length be m;
S24, the acquisition sensitive words and phrases PiM-th of character, to m-th of character resolution obtain correspondence phonetic head letter, according to
The alphabetical and default phonetic alphabet of the head obtain the arrangement value of the head letter with the corresponding relation of arrangement value;
S25, judge whether the arrangement value of head letter is less than the first preset value, if being less than, by the corresponding section of m-th of character
Point is arranged on the left side of the root node, conversely, being then arranged on the right side of the root node;
S25, the sensitive words and phrases P is obtained successivelyiThe character of m-1, m-2 ... ..., 2,1, circulation step S24-S25 is by m-
1, m-2 ... ..., 2,1 corresponding nodes of character are arranged in the child nodes of the character nodes of m, m-1 ... ..., 2.
8. device according to claim 6, it is characterised in that the matching module specifically for:According to the suffix tree
The Chinese text to be identified is matched using BM algorithms.
9. device according to claim 6, it is characterised in that the sensitive words and phrases include individual character, phrase and sentence.
10. device according to claim 6, it is characterised in that the display module is additionally operable to:After matching is failed,
Send prompt message.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710072161.5A CN106951437B (en) | 2017-02-08 | 2017-02-08 | Identifying processing method and device suitable for the sensitive words and phrases of multiple Chinese |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710072161.5A CN106951437B (en) | 2017-02-08 | 2017-02-08 | Identifying processing method and device suitable for the sensitive words and phrases of multiple Chinese |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106951437A true CN106951437A (en) | 2017-07-14 |
CN106951437B CN106951437B (en) | 2019-11-01 |
Family
ID=59465486
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710072161.5A Active CN106951437B (en) | 2017-02-08 | 2017-02-08 | Identifying processing method and device suitable for the sensitive words and phrases of multiple Chinese |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106951437B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111062199A (en) * | 2019-11-05 | 2020-04-24 | 北京中科微澜科技有限公司 | Bad information identification method and device |
CN111159990A (en) * | 2019-12-06 | 2020-05-15 | 国家计算机网络与信息安全管理中心 | Method and system for recognizing general special words based on mode expansion |
CN111831785A (en) * | 2020-07-16 | 2020-10-27 | 平安科技(深圳)有限公司 | Sensitive word detection method and device, computer equipment and storage medium |
CN113157904A (en) * | 2021-03-30 | 2021-07-23 | 北京优医达智慧健康科技有限公司 | Sensitive word filtering method and system based on DFA algorithm |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103514238A (en) * | 2012-06-30 | 2014-01-15 | 重庆新媒农信科技有限公司 | Sensitive word recognition processing method based on classification searching |
US20150100304A1 (en) * | 2013-10-07 | 2015-04-09 | Xerox Corporation | Incremental computation of repeats |
CN105843950A (en) * | 2016-04-12 | 2016-08-10 | 乐视控股(北京)有限公司 | Sensitive word filtering method and device |
-
2017
- 2017-02-08 CN CN201710072161.5A patent/CN106951437B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103514238A (en) * | 2012-06-30 | 2014-01-15 | 重庆新媒农信科技有限公司 | Sensitive word recognition processing method based on classification searching |
US20150100304A1 (en) * | 2013-10-07 | 2015-04-09 | Xerox Corporation | Incremental computation of repeats |
CN105843950A (en) * | 2016-04-12 | 2016-08-10 | 乐视控股(北京)有限公司 | Sensitive word filtering method and device |
Non-Patent Citations (1)
Title |
---|
LJSSPACE: "后缀树(Suffix Tree)的文本匹配算法", 《HTTPS://BLOG.CSDN.NET/LJSSPACE/ARTICLE/DETAILS/6571467》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111062199A (en) * | 2019-11-05 | 2020-04-24 | 北京中科微澜科技有限公司 | Bad information identification method and device |
CN111062199B (en) * | 2019-11-05 | 2023-12-22 | 北京中科微澜科技有限公司 | Bad information identification method and device |
CN111159990A (en) * | 2019-12-06 | 2020-05-15 | 国家计算机网络与信息安全管理中心 | Method and system for recognizing general special words based on mode expansion |
CN111159990B (en) * | 2019-12-06 | 2022-09-30 | 国家计算机网络与信息安全管理中心 | Method and system for identifying general special words based on pattern expansion |
CN111831785A (en) * | 2020-07-16 | 2020-10-27 | 平安科技(深圳)有限公司 | Sensitive word detection method and device, computer equipment and storage medium |
CN113157904A (en) * | 2021-03-30 | 2021-07-23 | 北京优医达智慧健康科技有限公司 | Sensitive word filtering method and system based on DFA algorithm |
CN113157904B (en) * | 2021-03-30 | 2024-02-09 | 北京优医达智慧健康科技有限公司 | Sensitive word filtering method and system based on DFA algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN106951437B (en) | 2019-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110321432B (en) | Text event information extraction method, electronic device and nonvolatile storage medium | |
Li et al. | Text-level discourse dependency parsing | |
Nguyen et al. | Relation extraction: Perspective from convolutional neural networks | |
CN106951437A (en) | Identifying processing method and device suitable for the sensitive words and phrases of multiple Chinese | |
Fonseca et al. | Mac-morpho revisited: Towards robust part-of-speech tagging | |
Bartoli et al. | Automatic synthesis of regular expressions from examples | |
WO2017084267A1 (en) | Method and device for keyphrase extraction | |
CN105095204B (en) | The acquisition methods and device of synonym | |
US9558299B2 (en) | Submatch extraction | |
Filice et al. | Kelp: a kernel-based learning platform for natural language processing | |
US9460196B2 (en) | Conditional string search | |
CN104252484B (en) | A kind of phonetic error correction method and system | |
US20140214401A1 (en) | Method and device for error correction model training and text error correction | |
CN111159363A (en) | Knowledge base-based question answer determination method and device | |
CN111444330A (en) | Method, device and equipment for extracting short text keywords and storage medium | |
CN111339268B (en) | Entity word recognition method and device | |
CN105593845B (en) | Generating means and its method based on the arrangement corpus for learning by oneself arrangement, destructive expression morpheme analysis device and its morpheme analysis method using arrangement corpus | |
WO2014117549A1 (en) | Method and device for error correction model training and text error correction | |
WO2022222300A1 (en) | Open relationship extraction method and apparatus, electronic device, and storage medium | |
CN103761225B (en) | A kind of Chinese word semantic similarity calculation method of data-driven | |
Keraghel et al. | Data augmentation process to improve deep learning-based ner task in the automotive industry field | |
JP6558852B2 (en) | Clause identification apparatus, method, and program | |
US20160196303A1 (en) | String search device, string search method, and string search program | |
KR101663038B1 (en) | Entity boundary detection apparatus in text by usage-learning on the entity's surface string candidates and mtehod thereof | |
Celebi et al. | Segmenting hashtags using automatically created training data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |