CN1979482A - Specific text infor mation processing method based on key tree and system therefor - Google Patents
Specific text infor mation processing method based on key tree and system therefor Download PDFInfo
- Publication number
- CN1979482A CN1979482A CN 200610114356 CN200610114356A CN1979482A CN 1979482 A CN1979482 A CN 1979482A CN 200610114356 CN200610114356 CN 200610114356 CN 200610114356 A CN200610114356 A CN 200610114356A CN 1979482 A CN1979482 A CN 1979482A
- Authority
- CN
- China
- Prior art keywords
- character
- particular text
- character string
- node
- blacklist
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a key tree-based specific text information processing method and system, comprising the steps of: storing specific text information; generating a key tree including the specific text information; searching whether the given character string is included in the specific text information the key tree includes according to the key tree and then outputting the searching result. And the invention is used to raise specific text information processing rate, reduce misreporting and raise processing rate of the whole service.
Description
Technical field
The present invention relates to the text information processing technology, the text message that particularly relates to based on the key tree mates fast, searches for and checks, is a kind of specific text infor mation processing method and system based on the key tree concretely.
Background technology
In banking, relate to the operation of particular text information being carried out service observation through regular meeting, relatively be typically the blacklist inspection in the remittance in foreign currencies, at unit on blacklist or individual, stop to handle corresponding business.For this reason, bank is often needing the text that relates in the transaction is inquired about at information bank in respective transaction each time, determining whether relating to particular text information, thereby further takes corresponding measure.But,, adopt the character match method to carry out fuzzy query usually because the text message database data amount of checking is very big, this method not only processing speed is slower, and be easy to generate wrong report, and needing operating personnel's artificial judgment, whole professional treatment effeciency is relatively poor.
Yet, relate to the technology that particular text information is carried out service observation and be widely used in the prior art, for example: the United Nations can impose sanction to some countries, tissue, company and individual; The U.S., Japan and other countries also can be announced some punished countries, tissue, company and individual's list, and these punished countries, tissue, company and individual's list is referred to as blacklist, and this blacklist just belongs to a kind of particular text information.When a remittance passes through the bank of concerned countries, banking system can check whether remitter, payee etc. are included in the blacklist, if the information that writes down in text message such as remitter, payee and the blacklist is corresponding, then banking system can be corresponding freezing of funds.For the safe banking system that guarantees the bank client fund also can adopt the technology of sensitive information being carried out service observation, for example: prevent that the client from remitting money to " blacklist " account overseas, reduces clients fund because neglect by other bank freezing.
As shown in Figure 1, monitor the network system of blacklist for a cover of the prior art.Bank cashier is when doing a remittance in foreign currencies business, can lists such as remitter, payee be delivered to blacklist treating apparatus 3 on gateway 2 by terminal 1 checks, if text message such as remitter, payee has comprised the information of blacklist, then this check result can be fed back to service terminal 1, the teller can stop to handle this remittance in foreign currencies business.Wherein: blacklist treating apparatus 3 is taked the fuzzy query method of direct character match, needs to take out every blacklist record, the character string that begins with each character in the matched character string.Consider that with the worst situation on did not mate 10,000 records of coupling (average length is m) back in the character string that a length is n, the number of times that then needs comparison was n* (m*10000), its efficient and low, and the measure that reduces wrong report is not provided.Because the data volume of blacklist is big, therefore general the character match method and the fuzzy query of blacklist treating apparatus 3 employings at present are easy to generate wrong report, need business personnel's artificial judgment, the traffic affecting treatment effeciency.
Summary of the invention
The invention provides a kind of specific text infor mation processing method and system,, reduce wrong report, whole professional treatment effeciency is strengthened in order to improve to particular text information processing speed based on the key tree.
One of purpose of the present invention is, a kind of specific text infor mation processing method based on the key tree is provided, and it may further comprise the steps: particular text information is stored; Generation includes the key tree of described particular text information; According to described key tree whether given character string is included in the included particular text information of described key tree and searches for, export Search Results then.
Described particular text information is made up of the particular text character string, and described key tree is made up of a plurality of nodes that are tree-shaped annexation; Wherein make a node in the corresponding described key tree of each character of a particular text character string; Each node has key assignments, lower floor's pointer, same layer pointer and the node place number of plies; Described key assignments is deposited is a character in the particular text character string, and lower floor's pointer value is a leaf node for empty node, if this leaf node place number of plies is for being root node first; A node of key assignments minimum in all nodes of the adjacent lower floor of lower floor's pointed of last layer node wherein, the same layer pointed key assignments of the node that key assignments is little in the same node layer time is less than the node of own key assignments.
The concrete steps that described generation includes the key tree of described particular text information comprise: step 1, read first character for the treatment of newly-increased blacklist character string; Whether match the key assignments of this character in step 2, the key tree ground floor, be then to change step 7 over to, otherwise change step 3 over to; Step 3, insertion root node, newly-increased record in the blacklist record sheet, key assignments is this character, and the number of plies is one, and sub-pointer is a null value, and right pointer is a null value; Step 4, read next character of character string; Step 5, judge whether null character (NUL), be then newly-increased the end, otherwise change step 6 over to; Step 6, insertion child node, newly-increased record in the blacklist record sheet, key assignments is this character, the number of plies is that the number of plies of a last character corresponding record adds 1, sub-pointer is a null value, and right pointer is a null value, simultaneously, the sub-pointed that needs to revise a last character corresponding record should increase record position newly, changed step 4 over to; Step 7, read the character late of character string; Step 8, judge whether null character (NUL), be then to change step 9 over to, otherwise change 11 over to; Whether step 9, the last character of judgement are leaf nodes, are then newly-increased the end, otherwise change step 10 over to; The record of correspondence in step 10, all leaf node of the last character of deletion and the blacklist record sheet; Whether match the key assignments of this character under step 11, the key tree in one deck, be then to change step 12 over to, otherwise change step 13 over to; Step 12, judge whether this character is leaf node, be then newly-increased the end, otherwise change step 7 over to; Step 13, insertion child node, and revise record relevant in the blacklist record sheet simultaneously, change step 7 over to.
Method of the present invention also comprises, the step of search particular text character string, the step of this search particular text character string specifically comprises: step 1 ', read first character of waiting to look into the particular text character string; Step 2 ', judge in the described key tree ground floor whether match the key assignments of this character, if then change over to step 3 ', if otherwise search step finish and export the not result in the particular text information of storage of this particular text character string; Step 3 ', read the character late of waiting to look into the particular text character string; Step 4 ', judge whether current character is that (described null character (NUL) is the termination character in the described particular text character string to null character (NUL), for example: character string " lei ", last character behind its character " i " is exactly a null character (NUL), this character is exactly the termination character in the time of logic determines simultaneously), if then change over to step 5 ', if otherwise change over to step 6 '; Step 5 ', finish search step, and judge whether a last character is leaf node, if then export the particular text result of information that this particular text character string belongs to storage, if otherwise export the not result in the particular text information of storage of this particular text character string; Step 6, judge whether described key tree matches the key assignments of this character in one deck down, if then change over to step 3 ', if otherwise search step finish and export the not result in the particular text information of storage of this particular text character string.
Method of the present invention comprises that also the treatment step of erroneous judgement text message is used for the text message of easy erroneous judgement is checked, if the result is a coupling, does not then need to enter the step of the given character string of search, otherwise searches for the step of given character string.
Described particular text information is meant: blacklist and/or white list.
Another object of the present invention is to, a kind of particular text information handling system based on the key tree is provided, comprising: terminal, gateway; The particular text signal conditioning package; Described terminal is connected with described particular text signal conditioning package through gateway; Wherein said particular text signal conditioning package further comprises: data storage cell is used for particular text information is stored; Key tree generation unit is used to generate the key tree that includes described particular text information; The character string search unit is used for searching for whether be included in the included particular text information of described key tree through the given character string of described terminal according to described key tree, exports Search Results then.
Described particular text information is made up of the particular text character string, and described key tree is made up of a plurality of nodes that are tree-shaped annexation; Wherein make a node in the corresponding described key tree of each character of a particular text character string; Each node has key assignments, lower floor's pointer, same layer pointer and the node place number of plies; Described key assignments is deposited is a character in the particular text character string, is root node with layer pointer value for empty node, and lower floor's pointer value is a leaf node for empty node; A node of key assignments minimum in all nodes of the adjacent lower floor of lower floor's pointed of last layer node wherein, the same layer pointed key assignments of the node that key assignments is little in the same node layer time is less than the node of own key assignments.
Described key tree generation unit comprises that the particular text character string increase part, is used for according to described key tree, with given character string being inserted in the node that described key sets of character one by one.
Described character string search unit comprises, particular text character string search part, be used for judging according to described key tree whether the character string of coming in and going out through terminal is present in the particular text information of storage, if then export the particular text result of information that this given character string belongs to storage, if otherwise export the not result in the particular text information of storage of this given character string.
Described particular text signal conditioning package also comprises, erroneous judgement text message unit, be used for the text message of easy erroneous judgement is checked, if the result is a coupling, then do not need to enter the given character string of described particular text character string search unit searches, otherwise the given character string of described particular text character string search unit searches.
Described particular text information is meant: blacklist and/or white list.
A kind of text matches method and system that are a kind of optimization based on specific text infor mation processing method and the system of key tree provided by the invention, adopt the text library of key tree construction, accelerated search speed, and to obviously belonging to the list of wrong report, be increased in the erroneous judgement text library, improve the accuracy of checking.
Beneficial effect of the present invention is:
One, inspection speed of the present invention improves a lot than direct character match, generally speaking, has 26 with initial as the key of root node tree, character string from the beginning to the end, retrieve once in the key tree, irrelevant with the record number in the text library, theoretic number of comparisons is n*m at every turn.The speed that such scheme improves is conspicuous.The time of the direct matching method of character is relevant with text library record number, be linear relation basically, and key tree comparative approach and text library record number is irrelevant.Therefore, along with the increase of text library record number, the raising of matching speed will be more obvious.
Two, the present invention can reduce the rate of false alarm of being brought by fuzzy query greatly.Mainly be by inquiry, the possibility that wrong report takes place is minimized, finally improved whole efficient the erroneous judgement text library.
Description of drawings
Fig. 1 is to use the network structure of blacklist treating apparatus;
Fig. 2 a, b are the functional block diagrams of blacklist treating apparatus;
Fig. 3 is the newly-increased process flow diagram of blacklist;
Fig. 4 is blacklist querying flow figure;
Fig. 5 is the instantiation synoptic diagram of key tree.
Embodiment
The present invention mainly be to the character of present use directly the querying method of coupling improve, adopt the key tree that the text message storehouse is organized, and the string matching of carrying out according to key tree query method.In addition, text message storehouse and the mode that the text message storehouse of erroneous judgement combines have also been adopted in this invention, have reduced the possibility of wrong report, have further improved the accuracy and the efficient of inquiry.Below in conjunction with description of drawings the specific embodiment of the present invention.
The invention provides a kind of specific text infor mation processing method based on the key tree, it may further comprise the steps: particular text information is stored; Generation includes the key tree of described particular text information; According to described key tree whether given character string is included in the included particular text information of described key tree and searches for, export Search Results then.
Described particular text information is made up of the particular text character string, and described key tree is made up of a plurality of nodes that are tree-shaped annexation; Wherein make a node in the corresponding described key tree of each character of a particular text character string; Each node has key assignments, lower floor's pointer, same layer pointer and the node place number of plies; Described key assignments is deposited is a character in the particular text character string, is root node with layer pointer value for empty node, and lower floor's pointer value is a leaf node for empty node; A node of key assignments minimum in all nodes of the adjacent lower floor of lower floor's pointed of last layer node wherein, the same layer pointed key assignments of the node that key assignments is little in the same node layer time is less than the node of own key assignments.
Embodiment 1
Described particular text information can be " blacklist ".The key tree of the technical program Chinese version information bank is made up of a plurality of nodes that are tree-shaped annexation, a node in each character corresponding keys tree of character string, each node have " key assignments ", " sub-pointer " (that is: lower floor's pointer), " right pointer " (that is: with layer pointer) and " this node place layer " four attributes." key assignments " deposited is a character in the character string, and root node place layer is a ground floor, and its child node is the second layer, increases progressively successively." the sub-pointer " of last layer node points to its in all child nodes of one deck one of the key assignments minimum down, points to the key assignments time node less than own key assignments with " the right pointer " of the node of key assignments minimum in one deck.In like manner, " the right pointer " of the node that key assignments is time little points to the key assignments time node less than own key assignments, and up to that node of key assignments maximum, its right pointer is a null value like this, and " the right pointer " of root node is null value.When the sub-pointer of a node was null value, this node was exactly " leaf node ".The structure in text message storehouse of erroneous judgement is identical with above-mentioned structure, wherein store be do not belong to the content in the text message storehouse but be easy to above-mentioned text message storehouse in the text obscured mutually of information, its generation method, querying method are also identical with blacklist.
Described key tree original state is empty, and by reading each character string in the information bank, newly-increased operation below carrying out repeatedly can be set up text library key tree.Newly-increased method of operating step is as shown in Figure 3, and is as described below:
(1) reads first character for the treatment of newly-increased character string;
(2) key is set the key assignments that whether matches this character in the ground floor, is then to change step (7) over to, otherwise changes step (3) over to;
(3) insert root node;
(4) read the character late of character string;
(5) judge whether null character (NUL), be then newly-increased the end, otherwise change step (6) over to;
(6) insert child node, change step (4) over to;
(7) read the character late of character string;
(8) judge whether null character (NUL), be then to change step (9) over to, otherwise change (11) over to;
(9) judge whether a last character is leaf node, be then newly-increased the end, otherwise change step (10) over to;
(10) the last leaf node that character is all of deletion;
(11) whether match the key assignments of this character under key is set in one deck, be then to change step (12) over to, otherwise change step (13) over to;
(12) judge whether leaf node of this character, be then newly-increased the end, otherwise change step (9) over to;
(13) insert child node, change step (9) over to;
Whether secondly, the invention provides the text message inspection method based on the search of key tree, at text library key tree, search for given character string and be included in the included text message storehouse of key tree, execution in step (as shown in Figure 4) is as follows:
(1) reads first character of waiting to look into a unit string;
(2) key is set the key assignments that whether matches this character in the ground floor, is then to change step (3) over to, otherwise finishes to check that this list is not in text library;
(3) read the character late of character string;
(4) judging whether null character (NUL), is then to change step (5) over to, otherwise changes step (6) over to;
(5) finish checking, and judge whether leaf node of a last character, be that then this list belongs in the text library, otherwise this list is not in this storehouse;
(6) whether matching the key assignments of this character under key is set in one deck, is then to change step (3) over to, otherwise finishes to check that this list is not in text library.
In addition, the inspection of erroneous judgement text library is used to fit into this inspection of style of writing, and the text inspection of judging by accident easily earlier before the beginning inspection method if the result is a coupling, does not then need to enter the inspection step of text library, checks step otherwise carry out described text library.According to actual conditions, also can the inspection of advanced row text library, if the result is a coupling, also to judge the inspection of text library by accident, if result's coupling illustrates that then the front checks wrongly, otherwise should be to belong in the text library.
In the specific embodiment of the present invention, described null character (NUL) is the termination character in the described particular text character string, for example: character string " lei ", last character behind its character " i " is exactly a null character (NUL), and this character is exactly the termination character in the time of logic determines simultaneously.
The invention provides a kind of particular text information handling system, comprising: terminal, gateway based on the key tree; Also comprise: particular text signal conditioning package (shown in Fig. 2 a); Described terminal is connected with described particular text signal conditioning package through gateway; Wherein said particular text signal conditioning package further comprises: data storage cell is used for particular text information is stored; Key tree generation unit is used to generate the key tree that includes described particular text information; The character string search unit is used for searching for whether be included in the included particular text information of described key tree through the given character string of described terminal according to described key tree, exports Search Results then.Described particular text signal conditioning package comprises that also the particular text character string increases the unit newly, is used for according to described key tree, with given character string one by one in the node that is inserted into described key tree of character.Erroneous judgement text message unit (shown in Fig. 2 a), be used for the text message of easy erroneous judgement is checked, if the result is a coupling, then do not need to enter the given character string of described particular text character string search unit searches, otherwise the given character string of described particular text character string search unit searches.
Now be that example is described system of the present invention with the blacklist.A kind of blacklist treating apparatus, described device comprises data processing equipment and data storage device, data processing equipment (shown in Fig. 2 b) further comprises blacklist document processing module, the newly-increased module of blacklist and blacklist removing module.
The blacklist document processing module imports to the blacklist file that receives in transient worker's tabulation, so that the newly-increased module of blacklist reads the blacklist character string in this worksheet, and it is set up the key tree.The time interval of considering the change of blacklist file is very long, and after receiving new blacklist file, the blacklist removing module will at first be deleted all key tree records, rebulid the key tree by the newly-increased module of blacklist then.
Simultaneously,, can list in the white list file, and data processing equipment also may further include white list document processing module, the newly-increased module of white list and white list removing module for the list that obviously belongs to wrong report.Its principle of work is with blacklist respective handling module.Like this, the blacklist treating apparatus can mask the rate of false alarm of white list automatically, significantly reduces artificial intervention degree.
Blacklist of the present invention is newly-increased to be comprised: A) read first character for the treatment of newly-increased blacklist character string; B) key is set the key assignments that whether matches this character in the ground floor, is then to change step G over to), otherwise change step C over to); C) insert root node; D) read the character late of character string; E) judge whether null character (NUL), be then newly-increased the end, otherwise change step F over to); F) insert child node, change step D over to); G) read the character late of character string; H) judging whether null character (NUL), is then to change step I over to), otherwise change K over to); I) judge whether a last character is leaf node, be then newly-increased the end, otherwise change step J over to); J) the last leaf node that character is all of deletion; K) whether matching the key assignments of this character under key is set in one deck, is then to change step L over to), otherwise change step M over to); L) judge whether leaf node of this character, be then newly-increased the end, otherwise change step G over to); M) insert child node, change step G over to);
Blacklist inspection of the present invention comprises: a) read first character of waiting to look into a unit string; B) key is set the key assignments that whether matches this character in the ground floor, is then to change step c) over to, otherwise finishes to check that this list is not a blacklist; C) read the character late of character string; D) judge whether null character (NUL), be then to change step e) over to, otherwise change step f) over to; E) finish checking, and judge whether leaf node of a last character, be that then this list is a blacklist, otherwise this list is not a blacklist; F) whether matching the key assignments of this character under key is set in one deck, is then to change step c) over to, otherwise finishes to check that this list is not a blacklist.
One of improvement project of checking as described blacklist, begin to comprise before the described inspection method inspection step of white list, it checks the inspection method with blacklist, if the result is a white list, then do not need to enter the inspection step of blacklist, check step otherwise carry out described blacklist.
Two of the improvement project of checking as described blacklist finishes after the described blacklist inspection, if this list is a blacklist as a result, then carries out the inspection step of white list again, and it checks the inspection method with blacklist.
A node in each character corresponding keys tree of blacklist character string, each node has " key assignments ", " sub-pointer ", " right pointer " and " this node place layer " four attributes, " key assignments " deposited is exactly a character in this blacklist character string, root node place layer is a ground floor, its child node is the second layer, increase progressively successively, " the sub-pointer " of last layer node points to its in all child nodes of one deck one of the key assignments minimum down, point to key assignments time node with " the right pointer " of the node of key assignments minimum in one deck less than own key assignments, in like manner, " the right pointer " of the node that key assignments is time little points to the key assignments time node less than own key assignments, like this up to that node of key assignments maximum, its right pointer is a null value, and " the right pointer " of root node is null value.When the sub-pointer of a node was null value, this node was exactly " leaf node ".
Can be seen that by above-mentioned principle the process of newly-increased blacklist is exactly to set up the process of this key tree, the present invention is described in detail below in conjunction with accompanying drawing.
Fig. 1 is to use the network structure of blacklist treating apparatus.It is by terminal 1, and gateway 2 and blacklist treating apparatus 3 are formed.
Terminal 1 can be a PC, it also can be special-purpose terminal, the teller submits blacklist character string to be checked to by this terminal, be transmitted to blacklist treating apparatus 3 by gateway 2, and receive the blacklist check result feed back, the teller also can send increasing newly or delete instruction of blacklist or white list to blacklist treating apparatus 3 by this terminal simultaneously, carries out the newly-increased or deletion action of blacklist or white list.
Blacklist treating apparatus 3 is responsible for blacklist or/and the white list file imports transient worker's tabulation, and adopts the key tree method to carry out newly-increased, the deletion and the inquiry of blacklist or white list, and return results information.
Shown in Fig. 2 b, blacklist treating apparatus 3 is made up of data processing equipment 31 and data storage device 32.Data processing equipment 31 comprises blacklist document processing module 311, white list document processing module 312 (that is: erroneous judgement text message unit), the newly-increased module 313 of blacklist, blacklist removing module 314, the newly-increased module 315 of white list and white list removing module 316.
The function of blacklist document processing module 311 is in the blacklist worksheet with the interim importing data storage device 32 of blacklist file, with the separator between each character string in the file is sign, each blacklist character string is put into a record of worksheet respectively, be convenient to the newly-increased module 313 of blacklist and carry out the newly-increased of each blacklist.
The function of white list document processing module 312 is in the white list worksheet with the interim importing data storage device 32 of white list file, with the separator between each character string in the file is sign, each white list character string is put into a record of worksheet respectively, be convenient to the newly-increased module 315 of white list and carry out the newly-increased of each white list.
The function of the newly-increased module 313 of blacklist is to set up the key tree at each blacklist character string, four property values of each node of this key tree is put in the record in the blacklist record sheet of data storage device 32.
The function of blacklist removing module 314 is the blacklist record sheets that directly empty data storage device 32.
The function of the newly-increased module 315 of white list is to set up the key tree at each white list character string, four property values of each node of this key tree is put in the record in the white list record sheet of data storage device 32.
The function of white list removing module 316 is the white list record sheets that directly empty data storage device 32.
Four tables are arranged, blacklist worksheet, white list worksheet, blacklist record sheet and white list record sheet in the data storage device 32.
Because the newly-increased flow process of blacklist and white list is identical, below in conjunction with Fig. 3 the newly-increased flow process of blacklist is described.
Step 100: read first character for the treatment of newly-increased blacklist character string.
Step 101: whether match the key assignments of this character in the key tree ground floor, be then to change step 106 over to, otherwise change step 102 over to.
Step 102: insert root node, newly-increased record in the blacklist record sheet, key assignments is this character, and the number of plies is " 1 ", and sub-pointer is a null value, and right pointer is a null value.
Step 103: next character that reads character string.
Step 104: judge whether null character (NUL), be then newly-increased the end, otherwise change step 105 over to.
Step 105: insert child node, newly-increased record in the blacklist record sheet, key assignments is this character, the number of plies is that the number of plies of a last character corresponding record adds 1, sub-pointer is a null value, and right pointer is a null value, simultaneously, the sub-pointed that needs to revise a last character corresponding record should increase record position newly, changed step 103 over to.
Step 106: the character late that reads character string.
Step 107: judge whether to be sky, be then to change step 108 over to, otherwise change 110 over to.
Step 108: judge whether a last character is leaf node, be then newly-increased the end, otherwise change step 109 over to.
Step 109: delete record corresponding in all leaf nodes of a last character and the blacklist record sheet.
Step 110: whether match the key assignments of this character under the key tree in one deck, be then to change step 111 over to, otherwise change step 112 over to.
Step 111: judge whether leaf node of this character, be then newly-increased the end, otherwise change step 106 over to.
Step 112: insert child node, and revise record relevant in the blacklist record sheet simultaneously, change step 106 over to.
As shown in Figure 4, be the flow process of carrying out the blacklist inquiry, specifically describe as follows:
Step 200: read first character of waiting to look into a unit string.
Step 201: whether matching the key assignments of this character in the key tree ground floor, is then to change step 202 over to, otherwise finishes to check that this list is not a blacklist.
Step 202: the character late that reads character string.
Step 203: judge whether null character (NUL), be then to change step 204 over to, otherwise change step 205 over to.
Step 204: finish checking, and judge whether leaf node of a last character, be that then this list is a blacklist, otherwise this list is not a blacklist.
Step 205: whether matching the key assignments of this character under the key tree in one deck, is then to change step 202 over to, otherwise finishes to check that this list is not a blacklist.
Fig. 5 is the example of blacklist key tree: LEI and LO are blacklists.
Node 1 " key assignments " is " L ", and " sub-pointer " points to node 2, " right pointer " is empty, and " this node place layer " is " 1 " layer, is root node.
Node 4 " key assignments " is " I ", and " sub-pointer " is that sky, " right pointer " are empty, and " this node place layer " is " 3 " layer, is leaf node.
The foundation of key tree is described with example below.The key tree of Fig. 5 has been arranged, increase blacklist LAODA.
1) step 100: read first character " L " for the treatment of newly-increased blacklist character string.
2) step 101: match the key assignments of this character in the key tree ground floor, change step 106 over to.
3) step 106: the character late " A " that reads character string.
4) step 107: judge it is not null character (NUL), change step 110 over to.
5) step 110: be the key assignments " A " that does not match this character in one deck under the key tree, change step 112 over to.
6) step 112: insert child node A, and revise record relevant in the blacklist record sheet simultaneously, make the newly-increased node A of sub-pointed of L, the right pointed E of newly-increased node changes step 106 over to.
7) step 106: the character late " O " that reads character string.
8) step 107: judge to be null character (NUL), change step 110 over to.
9) step 110: be the key assignments " O " that does not match this character in one deck under the key tree, change step 112 over to.
10) step 112: insert child node O, and revise record relevant in the blacklist record sheet simultaneously, make the newly-increased node O of sub-pointed of A, the right pointer of newly-increased node is a null pointer, changes step 106 over to.
11) step 106: the character late " D " that reads character string.
12) step 107: judge to be null character (NUL), change step 110 over to.
13) step 110: be the key assignments " D " that does not match this character in one deck under the key tree, change step 112 over to.
14) step 112: insert child node D, and revise record relevant in the blacklist record sheet simultaneously, make the newly-increased node D of sub-pointed of O, the right pointer of newly-increased node is a null pointer, changes step 106 over to.
15) step 106: the character late " A " that reads character string.
16) step 107: judge to be null character (NUL), change step 110 over to.
17) step 110: be the key assignments " A " that does not match this character in one deck under the key tree, change step 112 over to.
18) step 112: insert child node A, and revise record relevant in the blacklist record sheet simultaneously, make the newly-increased node A of sub-pointed of D, the right pointer of newly-increased node is a null pointer, changes step 106 over to.
19) step 106: the character late that reads character string is empty.
20) step 107: judgement is a null character (NUL), changes step 108 over to.
21) step 108: judge that a last node A is a leaf node, the newly-increased end.
The inquiry of blacklist is described with example below: character string to be checked is LEIZI.
1) step 200: read first character " L " of waiting to look into a unit string.
2) step 201: match the key assignments of this character in the key tree ground floor, change step 202 over to.
3) step 202: the character late that reads character string " E ".
4) step 203: whether judge not null character (NUL) to change step 205 over to.
5) step 205: be the key assignments " E " that matches this character in one deck under the key tree, change step 202 over to.
6) step 202: the character late that reads character string " I ".
7) step 203: whether judge not null character (NUL) to change step 205 over to.
8) step 205: be the key assignments " I " that matches this character in one deck under the key tree, change step 202 over to.
9) step 202: the character late " Z " that reads character string.
10) step 203: whether judge not null character (NUL) to change step 205 over to.
11) step 204: finish checking, and judge whether leaf node of a last character, be that then this list is a blacklist, otherwise this list is not a blacklist.
12) step 205: not matching the key assignments " Z " of this character under the key tree in one deck, then is not blacklist, finishes.
Compared with prior art, advantage of the present invention is:
One, the direct character match of efficiency ratio of the present invention has very big raising.
Adopting the method for key tree, from tree root, walk along subtree, get to the leaf end, is exactly the data recording of a list.Have 26 with initial as the tree of root node, character string from the beginning to the end, at every turn in the key tree retrieval once, irrelevant with the record number of key tree the inside, theoretic number of comparisons is n*m.The speed that such scheme improves is conspicuous.
Test findings is as shown in the table:
The information character number | Blacklist record number | Number of times does not match | Directly matching way is consuming time | The key tree method is consuming time | The multiple that improves |
160 | 30000 | 1 | 0.190 second | <0.001 second | >190 |
160 | 30000 | 10 | 1.920 second | 0.010 second | 192 |
160 | 30000 | 100 | 19.170 second | 0.060 second | 319 |
160 | 150000 | 1 | 0.960 second | <0.001 second | >960 |
160 | 150000 | 10 | 9.570 second | 0.010 second | 957 |
160 | 150000 | 100 | 96.090 second | 0.100 second | 961 |
Directly the time of method coupling relatively relevant with the record strip number, be a linear relation (150,000 write down the used time be substantially 30,000 records 5 times) substantially; And the bar number of method that the key tree is compared and record is irrelevant.On the efficient of coupling, the key tree has relatively been improved hundred times than directly, and 30,000 records have improved about 200 times, and 150,000 records have then improved about 1000 times.Along with the increase of record strip number, the efficient of raising will be more obvious.This is because directly relatively relevant with the record strip number, and the key tree is relatively caused with the irrelevant reason of record strip number.
Two, rate of false alarm of the present invention has reduced.For example, SHAN is arranged in the blacklist, the phonetic that has comprised Shandong if desired in the information of Jian Chaing, when then carrying out the blacklist inspection, the warning of blacklist all can be arranged, but we know certainly not blacklist of Shandong at every turn, then we add SHANDONG in white list, when then reexamining out blacklist suspicion, check out SHANDONG in the white list again, then can avoid wrong report.
Particular text information in the above-described embodiments is that example is illustrated with " blacklist ", in fact any utilization method and system of the present invention that the processing of particular text information (as: Item Title etc.) all be may be obvious that.Therefore above embodiment only is used to illustrate the present invention, but not is used to limit the present invention.
Claims (13)
1. specific text infor mation processing method based on key tree is characterized in that may further comprise the steps:
Particular text information is stored;
Generation includes the key tree of described particular text information;
According to described key tree whether given character string is included in the included particular text information of described key tree and searches for, export Search Results then.
2. method according to claim 1 is characterized in that, described particular text information is made up of the particular text character string, and described key tree is made up of a plurality of nodes that are tree-shaped annexation; Wherein
Make a node in the corresponding described key tree of each character of a particular text character string;
Each node has key assignments, lower floor's pointer, same layer pointer and the node place number of plies;
Described key assignments is deposited is a character in the particular text character string, and lower floor's pointer value is a leaf node for empty node, if this leaf node place number of plies is for being root node first; Wherein
A node of key assignments minimum in all nodes of the adjacent lower floor of lower floor's pointed of last layer node, the inferior node of same layer pointed key assignments of the node that key assignments is little in the same node layer less than own key assignments.
3. method according to claim 2 is characterized in that, the key tree that described generation includes described particular text information also comprises: the step of newly-increased particular text character string, and the step of this newly-increased particular text character string specifically comprises:
Step 1, read first character for the treatment of newly-increased blacklist character string;
Whether match the key assignments of this character in step 2, the key tree ground floor, be then to change step 7 over to, otherwise change step 3 over to;
Step 3, insertion root node, newly-increased record in the blacklist record sheet, key assignments is this character, and the number of plies is one, and sub-pointer is a null value, and right pointer is a null value;
Step 4, read next character of character string;
Step 5, judge whether null character (NUL), be then newly-increased the end, otherwise change step 6 over to into the expression end of string;
Step 6, insertion child node, newly-increased record in the blacklist record sheet, key assignments is this character, the number of plies is that the number of plies of a last character corresponding record adds 1, sub-pointer is a null value, and right pointer is a null value, simultaneously, the sub-pointed that needs to revise a last character corresponding record should increase record position newly, changed step 4 over to;
Step 7, read the character late of character string;
Step 8, judge whether null character (NUL), be then to change step 9 over to, otherwise change 11 over to;
Whether step 9, the last character of judgement are leaf nodes, are then newly-increased the end, otherwise change step 10 over to;
The record of correspondence in step 10, all leaf node of the last character of deletion and the blacklist record sheet;
Whether match the key assignments of this character under step 11, the key tree in one deck, be then to change step 12 over to, otherwise change step 13 over to;
Step 12, judge whether this character is leaf node, be then newly-increased the end, otherwise change step 7 over to;
Step 13, insertion child node, and revise record relevant in the blacklist record sheet simultaneously, change step 7 over to.
4. according to claim 2 or 3 described methods, it is characterized in that also comprising, the step of search particular text character string, the step of this search particular text character string specifically comprises:
Step 1 ', read first character of waiting to look into the particular text character string;
Step 2 ', judge in the described key tree ground floor whether match the key assignments of this character, if then change over to step 3 ', if otherwise search step finish and export the not result in the particular text information of storage of this particular text character string;
Step 3 ', read the character late of waiting to look into the particular text character string;
Step 4 ', judge whether current character is null character (NUL), if then change over to step 5 ', if otherwise change over to step 6 ';
Step 5 ', finish search step, and judge whether a last character is leaf node, if then export the particular text result of information that this particular text character string belongs to storage, if otherwise export the not result in the particular text information of storage of this particular text character string;
Step 6 ', judge whether described key tree matches the key assignments of this character in one deck down, if then change over to step 3 ', if otherwise search step finish and export the not result in the particular text information of storage of this particular text character string.
5. method according to claim 1 is characterized in that also comprising, the treatment step of erroneous judgement text message, be used for the text message of easy erroneous judgement is checked, if the result is a coupling, does not then need to enter the step of the given character string of search, otherwise search for the step of given character string.
6. method according to claim 1 is characterized in that, described particular text information is meant: blacklist and/or white list.
7. the particular text information handling system based on the key tree comprises: terminal, gateway; It is characterized in that comprising: the particular text signal conditioning package; Described terminal is connected with described particular text signal conditioning package through gateway; Wherein
Described particular text signal conditioning package further comprises:
Data storage cell is used for particular text information is stored;
Key tree generation unit is used to generate the key tree that includes described particular text information;
The character string search unit is used for searching for whether be included in the included particular text information of described key tree through the given character string of described terminal according to described key tree, exports Search Results then.
8. system according to claim 7 is characterized in that, described particular text information is made up of the particular text character string, and described key tree is made up of a plurality of nodes that are tree-shaped annexation; Wherein
Make a node in the corresponding described key tree of each character of a particular text character string;
Each node has key assignments, lower floor's pointer, same layer pointer and the node place number of plies;
Described key assignments is deposited is a character in the particular text character string, is root node with layer pointer value for empty node, and lower floor's pointer value is a leaf node for empty node; Wherein
A node of key assignments minimum in all nodes of the adjacent lower floor of lower floor's pointed of last layer node, the inferior node of same layer pointed key assignments of the node that key assignments is little in the same node layer less than own key assignments.
9. system according to claim 8 is characterized in that, described key tree generation unit comprises that the particular text character string increase part, is used for according to described key tree, with given character string being inserted in the node that described key sets of character one by one.
10. according to Claim 8 or 9 described systems, it is characterized in that, described character string search unit comprises, particular text character string search part, be used for judging according to described key tree whether the character string of coming in and going out through terminal is present in the particular text information of storage, if then export the particular text result of information that this given character string belongs to storage, if otherwise export the not result in the particular text information of storage of this given character string.
11. system according to claim 10, it is characterized in that, described particular text signal conditioning package also comprises, erroneous judgement text message unit, be used for the text message of easy erroneous judgement is checked, if the result is a coupling, then do not need to enter the given character string of described particular text character string search unit searches, otherwise the given character string of described particular text character string search unit searches.
12. system according to claim 7 is characterized in that, described particular text information is meant: blacklist and/or white list.
13. system according to claim 12, it is characterized in that described particular text signal conditioning package also comprises: blacklist file processing unit, white list file processing unit, the newly-increased unit of blacklist, blacklist delete cells, the newly-increased unit of white list and white list delete cells.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2006101143563A CN100468409C (en) | 2006-11-08 | 2006-11-08 | Specific text infor mation processing method based on key tree and system therefor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2006101143563A CN100468409C (en) | 2006-11-08 | 2006-11-08 | Specific text infor mation processing method based on key tree and system therefor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1979482A true CN1979482A (en) | 2007-06-13 |
CN100468409C CN100468409C (en) | 2009-03-11 |
Family
ID=38130651
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2006101143563A Active CN100468409C (en) | 2006-11-08 | 2006-11-08 | Specific text infor mation processing method based on key tree and system therefor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100468409C (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100527134C (en) * | 2007-12-04 | 2009-08-12 | 威盛电子股份有限公司 | Multiple modes search method and system |
CN102682017A (en) * | 2011-03-15 | 2012-09-19 | 阿里巴巴集团控股有限公司 | Information retrieval method and system |
CN102737105A (en) * | 2012-03-31 | 2012-10-17 | 北京小米科技有限责任公司 | Dict-tree generation method and searching method |
CN102882987A (en) * | 2011-07-12 | 2013-01-16 | 阿里巴巴集团控股有限公司 | Domain filter list storing and matching method and device |
CN104537107A (en) * | 2015-01-15 | 2015-04-22 | 中国联合网络通信集团有限公司 | URL storage matching method and device |
CN108241695A (en) * | 2016-12-26 | 2018-07-03 | 北京国双科技有限公司 | Information processing method and device |
CN114090570A (en) * | 2021-09-29 | 2022-02-25 | 北京信息科技大学 | Data storage method and device based on combination of radix tree and hash table |
-
2006
- 2006-11-08 CN CNB2006101143563A patent/CN100468409C/en active Active
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100527134C (en) * | 2007-12-04 | 2009-08-12 | 威盛电子股份有限公司 | Multiple modes search method and system |
CN102682017A (en) * | 2011-03-15 | 2012-09-19 | 阿里巴巴集团控股有限公司 | Information retrieval method and system |
CN102682017B (en) * | 2011-03-15 | 2014-04-23 | 阿里巴巴集团控股有限公司 | Information retrieval method and system |
CN102882987A (en) * | 2011-07-12 | 2013-01-16 | 阿里巴巴集团控股有限公司 | Domain filter list storing and matching method and device |
CN102882987B (en) * | 2011-07-12 | 2015-08-26 | 阿里巴巴集团控股有限公司 | Domain filter list storage, matching process and device |
CN102737105A (en) * | 2012-03-31 | 2012-10-17 | 北京小米科技有限责任公司 | Dict-tree generation method and searching method |
CN104537107A (en) * | 2015-01-15 | 2015-04-22 | 中国联合网络通信集团有限公司 | URL storage matching method and device |
CN108241695A (en) * | 2016-12-26 | 2018-07-03 | 北京国双科技有限公司 | Information processing method and device |
CN108241695B (en) * | 2016-12-26 | 2021-11-02 | 北京国双科技有限公司 | Information processing method and device |
CN114090570A (en) * | 2021-09-29 | 2022-02-25 | 北京信息科技大学 | Data storage method and device based on combination of radix tree and hash table |
Also Published As
Publication number | Publication date |
---|---|
CN100468409C (en) | 2009-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100468409C (en) | Specific text infor mation processing method based on key tree and system therefor | |
CN109887153B (en) | Finance and tax processing method and system | |
CN109523153A (en) | Acquisition methods, device, computer equipment and the storage medium of illegal fund collection enterprise | |
CN101604437A (en) | Account is real time processing system and account batch real-time processing method in batches | |
CN104731976A (en) | Method for finding and sorting private data in data table | |
CN110889310B (en) | Financial document information intelligent extraction system and method | |
CN107656958A (en) | A kind of classifying method and server of multi-data source data | |
CN106408358A (en) | Invoice management method and invoice management apparatus | |
CN111428599A (en) | Bill identification method, device and equipment | |
CN110413569A (en) | Archives of paper quality electronization archiving method, device and terminal device | |
CN111914294B (en) | Database sensitive data identification method and system | |
CN107067323B (en) | Financial data processing system and batch credit transaction distribution system and method thereof | |
CN204833438U (en) | Self -service equipment of finance | |
CN112364645A (en) | Method and equipment for automatically auditing ERP financial system business documents | |
US20070255651A1 (en) | Batch processing of financial transactions | |
CN113836898A (en) | Automatic order dispatching method for power system | |
CN108984682A (en) | One kind being used for the matched information processing method of industrial policy and system | |
CN111400187B (en) | Parameter dynamic verification system and method based on customized data source | |
CN117407726A (en) | Intelligent service data matching method, system and storage medium | |
CN115168345B (en) | Database classification method, system, device and storage medium | |
CN109829769B (en) | Electronic invoice duplicate checking method and system | |
CN111126966A (en) | Bill auditing method and device, computer equipment and computer-readable storage medium | |
CN110941652A (en) | Analysis method of bank flow data | |
CN103136187A (en) | Method and system for extraction of patent rejection information | |
CN110807702A (en) | Method, device, equipment and storage medium for managing information after loan |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |