CN1979482A - Specific text infor mation processing method based on key tree and system therefor - Google Patents

Specific text infor mation processing method based on key tree and system therefor Download PDF

Info

Publication number
CN1979482A
CN1979482A CN 200610114356 CN200610114356A CN1979482A CN 1979482 A CN1979482 A CN 1979482A CN 200610114356 CN200610114356 CN 200610114356 CN 200610114356 A CN200610114356 A CN 200610114356A CN 1979482 A CN1979482 A CN 1979482A
Authority
CN
China
Prior art keywords
character
particular text
character string
node
blacklist
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200610114356
Other languages
Chinese (zh)
Other versions
CN100468409C (en
Inventor
周鹏伟
李小雍
胡锐明
张学星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CNB2006101143563A priority Critical patent/CN100468409C/en
Publication of CN1979482A publication Critical patent/CN1979482A/en
Application granted granted Critical
Publication of CN100468409C publication Critical patent/CN100468409C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a key tree-based specific text information processing method and system, comprising the steps of: storing specific text information; generating a key tree including the specific text information; searching whether the given character string is included in the specific text information the key tree includes according to the key tree and then outputting the searching result. And the invention is used to raise specific text information processing rate, reduce misreporting and raise processing rate of the whole service.

Description

A kind of specific text infor mation processing method and system based on the key tree
Technical field
The present invention relates to the text information processing technology, the text message that particularly relates to based on the key tree mates fast, searches for and checks, is a kind of specific text infor mation processing method and system based on the key tree concretely.
Background technology
In banking, relate to the operation of particular text information being carried out service observation through regular meeting, relatively be typically the blacklist inspection in the remittance in foreign currencies, at unit on blacklist or individual, stop to handle corresponding business.For this reason, bank is often needing the text that relates in the transaction is inquired about at information bank in respective transaction each time, determining whether relating to particular text information, thereby further takes corresponding measure.But,, adopt the character match method to carry out fuzzy query usually because the text message database data amount of checking is very big, this method not only processing speed is slower, and be easy to generate wrong report, and needing operating personnel's artificial judgment, whole professional treatment effeciency is relatively poor.
Yet, relate to the technology that particular text information is carried out service observation and be widely used in the prior art, for example: the United Nations can impose sanction to some countries, tissue, company and individual; The U.S., Japan and other countries also can be announced some punished countries, tissue, company and individual's list, and these punished countries, tissue, company and individual's list is referred to as blacklist, and this blacklist just belongs to a kind of particular text information.When a remittance passes through the bank of concerned countries, banking system can check whether remitter, payee etc. are included in the blacklist, if the information that writes down in text message such as remitter, payee and the blacklist is corresponding, then banking system can be corresponding freezing of funds.For the safe banking system that guarantees the bank client fund also can adopt the technology of sensitive information being carried out service observation, for example: prevent that the client from remitting money to " blacklist " account overseas, reduces clients fund because neglect by other bank freezing.
As shown in Figure 1, monitor the network system of blacklist for a cover of the prior art.Bank cashier is when doing a remittance in foreign currencies business, can lists such as remitter, payee be delivered to blacklist treating apparatus 3 on gateway 2 by terminal 1 checks, if text message such as remitter, payee has comprised the information of blacklist, then this check result can be fed back to service terminal 1, the teller can stop to handle this remittance in foreign currencies business.Wherein: blacklist treating apparatus 3 is taked the fuzzy query method of direct character match, needs to take out every blacklist record, the character string that begins with each character in the matched character string.Consider that with the worst situation on did not mate 10,000 records of coupling (average length is m) back in the character string that a length is n, the number of times that then needs comparison was n* (m*10000), its efficient and low, and the measure that reduces wrong report is not provided.Because the data volume of blacklist is big, therefore general the character match method and the fuzzy query of blacklist treating apparatus 3 employings at present are easy to generate wrong report, need business personnel's artificial judgment, the traffic affecting treatment effeciency.
Summary of the invention
The invention provides a kind of specific text infor mation processing method and system,, reduce wrong report, whole professional treatment effeciency is strengthened in order to improve to particular text information processing speed based on the key tree.
One of purpose of the present invention is, a kind of specific text infor mation processing method based on the key tree is provided, and it may further comprise the steps: particular text information is stored; Generation includes the key tree of described particular text information; According to described key tree whether given character string is included in the included particular text information of described key tree and searches for, export Search Results then.
Described particular text information is made up of the particular text character string, and described key tree is made up of a plurality of nodes that are tree-shaped annexation; Wherein make a node in the corresponding described key tree of each character of a particular text character string; Each node has key assignments, lower floor's pointer, same layer pointer and the node place number of plies; Described key assignments is deposited is a character in the particular text character string, and lower floor's pointer value is a leaf node for empty node, if this leaf node place number of plies is for being root node first; A node of key assignments minimum in all nodes of the adjacent lower floor of lower floor's pointed of last layer node wherein, the same layer pointed key assignments of the node that key assignments is little in the same node layer time is less than the node of own key assignments.
The concrete steps that described generation includes the key tree of described particular text information comprise: step 1, read first character for the treatment of newly-increased blacklist character string; Whether match the key assignments of this character in step 2, the key tree ground floor, be then to change step 7 over to, otherwise change step 3 over to; Step 3, insertion root node, newly-increased record in the blacklist record sheet, key assignments is this character, and the number of plies is one, and sub-pointer is a null value, and right pointer is a null value; Step 4, read next character of character string; Step 5, judge whether null character (NUL), be then newly-increased the end, otherwise change step 6 over to; Step 6, insertion child node, newly-increased record in the blacklist record sheet, key assignments is this character, the number of plies is that the number of plies of a last character corresponding record adds 1, sub-pointer is a null value, and right pointer is a null value, simultaneously, the sub-pointed that needs to revise a last character corresponding record should increase record position newly, changed step 4 over to; Step 7, read the character late of character string; Step 8, judge whether null character (NUL), be then to change step 9 over to, otherwise change 11 over to; Whether step 9, the last character of judgement are leaf nodes, are then newly-increased the end, otherwise change step 10 over to; The record of correspondence in step 10, all leaf node of the last character of deletion and the blacklist record sheet; Whether match the key assignments of this character under step 11, the key tree in one deck, be then to change step 12 over to, otherwise change step 13 over to; Step 12, judge whether this character is leaf node, be then newly-increased the end, otherwise change step 7 over to; Step 13, insertion child node, and revise record relevant in the blacklist record sheet simultaneously, change step 7 over to.
Method of the present invention also comprises, the step of search particular text character string, the step of this search particular text character string specifically comprises: step 1 ', read first character of waiting to look into the particular text character string; Step 2 ', judge in the described key tree ground floor whether match the key assignments of this character, if then change over to step 3 ', if otherwise search step finish and export the not result in the particular text information of storage of this particular text character string; Step 3 ', read the character late of waiting to look into the particular text character string; Step 4 ', judge whether current character is that (described null character (NUL) is the termination character in the described particular text character string to null character (NUL), for example: character string " lei ", last character behind its character " i " is exactly a null character (NUL), this character is exactly the termination character in the time of logic determines simultaneously), if then change over to step 5 ', if otherwise change over to step 6 '; Step 5 ', finish search step, and judge whether a last character is leaf node, if then export the particular text result of information that this particular text character string belongs to storage, if otherwise export the not result in the particular text information of storage of this particular text character string; Step 6, judge whether described key tree matches the key assignments of this character in one deck down, if then change over to step 3 ', if otherwise search step finish and export the not result in the particular text information of storage of this particular text character string.
Method of the present invention comprises that also the treatment step of erroneous judgement text message is used for the text message of easy erroneous judgement is checked, if the result is a coupling, does not then need to enter the step of the given character string of search, otherwise searches for the step of given character string.
Described particular text information is meant: blacklist and/or white list.
Another object of the present invention is to, a kind of particular text information handling system based on the key tree is provided, comprising: terminal, gateway; The particular text signal conditioning package; Described terminal is connected with described particular text signal conditioning package through gateway; Wherein said particular text signal conditioning package further comprises: data storage cell is used for particular text information is stored; Key tree generation unit is used to generate the key tree that includes described particular text information; The character string search unit is used for searching for whether be included in the included particular text information of described key tree through the given character string of described terminal according to described key tree, exports Search Results then.
Described particular text information is made up of the particular text character string, and described key tree is made up of a plurality of nodes that are tree-shaped annexation; Wherein make a node in the corresponding described key tree of each character of a particular text character string; Each node has key assignments, lower floor's pointer, same layer pointer and the node place number of plies; Described key assignments is deposited is a character in the particular text character string, is root node with layer pointer value for empty node, and lower floor's pointer value is a leaf node for empty node; A node of key assignments minimum in all nodes of the adjacent lower floor of lower floor's pointed of last layer node wherein, the same layer pointed key assignments of the node that key assignments is little in the same node layer time is less than the node of own key assignments.
Described key tree generation unit comprises that the particular text character string increase part, is used for according to described key tree, with given character string being inserted in the node that described key sets of character one by one.
Described character string search unit comprises, particular text character string search part, be used for judging according to described key tree whether the character string of coming in and going out through terminal is present in the particular text information of storage, if then export the particular text result of information that this given character string belongs to storage, if otherwise export the not result in the particular text information of storage of this given character string.
Described particular text signal conditioning package also comprises, erroneous judgement text message unit, be used for the text message of easy erroneous judgement is checked, if the result is a coupling, then do not need to enter the given character string of described particular text character string search unit searches, otherwise the given character string of described particular text character string search unit searches.
Described particular text information is meant: blacklist and/or white list.
A kind of text matches method and system that are a kind of optimization based on specific text infor mation processing method and the system of key tree provided by the invention, adopt the text library of key tree construction, accelerated search speed, and to obviously belonging to the list of wrong report, be increased in the erroneous judgement text library, improve the accuracy of checking.
Beneficial effect of the present invention is:
One, inspection speed of the present invention improves a lot than direct character match, generally speaking, has 26 with initial as the key of root node tree, character string from the beginning to the end, retrieve once in the key tree, irrelevant with the record number in the text library, theoretic number of comparisons is n*m at every turn.The speed that such scheme improves is conspicuous.The time of the direct matching method of character is relevant with text library record number, be linear relation basically, and key tree comparative approach and text library record number is irrelevant.Therefore, along with the increase of text library record number, the raising of matching speed will be more obvious.
Two, the present invention can reduce the rate of false alarm of being brought by fuzzy query greatly.Mainly be by inquiry, the possibility that wrong report takes place is minimized, finally improved whole efficient the erroneous judgement text library.
Description of drawings
Fig. 1 is to use the network structure of blacklist treating apparatus;
Fig. 2 a, b are the functional block diagrams of blacklist treating apparatus;
Fig. 3 is the newly-increased process flow diagram of blacklist;
Fig. 4 is blacklist querying flow figure;
Fig. 5 is the instantiation synoptic diagram of key tree.
Embodiment
The present invention mainly be to the character of present use directly the querying method of coupling improve, adopt the key tree that the text message storehouse is organized, and the string matching of carrying out according to key tree query method.In addition, text message storehouse and the mode that the text message storehouse of erroneous judgement combines have also been adopted in this invention, have reduced the possibility of wrong report, have further improved the accuracy and the efficient of inquiry.Below in conjunction with description of drawings the specific embodiment of the present invention.
The invention provides a kind of specific text infor mation processing method based on the key tree, it may further comprise the steps: particular text information is stored; Generation includes the key tree of described particular text information; According to described key tree whether given character string is included in the included particular text information of described key tree and searches for, export Search Results then.
Described particular text information is made up of the particular text character string, and described key tree is made up of a plurality of nodes that are tree-shaped annexation; Wherein make a node in the corresponding described key tree of each character of a particular text character string; Each node has key assignments, lower floor's pointer, same layer pointer and the node place number of plies; Described key assignments is deposited is a character in the particular text character string, is root node with layer pointer value for empty node, and lower floor's pointer value is a leaf node for empty node; A node of key assignments minimum in all nodes of the adjacent lower floor of lower floor's pointed of last layer node wherein, the same layer pointed key assignments of the node that key assignments is little in the same node layer time is less than the node of own key assignments.
Embodiment 1
Described particular text information can be " blacklist ".The key tree of the technical program Chinese version information bank is made up of a plurality of nodes that are tree-shaped annexation, a node in each character corresponding keys tree of character string, each node have " key assignments ", " sub-pointer " (that is: lower floor's pointer), " right pointer " (that is: with layer pointer) and " this node place layer " four attributes." key assignments " deposited is a character in the character string, and root node place layer is a ground floor, and its child node is the second layer, increases progressively successively." the sub-pointer " of last layer node points to its in all child nodes of one deck one of the key assignments minimum down, points to the key assignments time node less than own key assignments with " the right pointer " of the node of key assignments minimum in one deck.In like manner, " the right pointer " of the node that key assignments is time little points to the key assignments time node less than own key assignments, and up to that node of key assignments maximum, its right pointer is a null value like this, and " the right pointer " of root node is null value.When the sub-pointer of a node was null value, this node was exactly " leaf node ".The structure in text message storehouse of erroneous judgement is identical with above-mentioned structure, wherein store be do not belong to the content in the text message storehouse but be easy to above-mentioned text message storehouse in the text obscured mutually of information, its generation method, querying method are also identical with blacklist.
Described key tree original state is empty, and by reading each character string in the information bank, newly-increased operation below carrying out repeatedly can be set up text library key tree.Newly-increased method of operating step is as shown in Figure 3, and is as described below:
(1) reads first character for the treatment of newly-increased character string;
(2) key is set the key assignments that whether matches this character in the ground floor, is then to change step (7) over to, otherwise changes step (3) over to;
(3) insert root node;
(4) read the character late of character string;
(5) judge whether null character (NUL), be then newly-increased the end, otherwise change step (6) over to;
(6) insert child node, change step (4) over to;
(7) read the character late of character string;
(8) judge whether null character (NUL), be then to change step (9) over to, otherwise change (11) over to;
(9) judge whether a last character is leaf node, be then newly-increased the end, otherwise change step (10) over to;
(10) the last leaf node that character is all of deletion;
(11) whether match the key assignments of this character under key is set in one deck, be then to change step (12) over to, otherwise change step (13) over to;
(12) judge whether leaf node of this character, be then newly-increased the end, otherwise change step (9) over to;
(13) insert child node, change step (9) over to;
Whether secondly, the invention provides the text message inspection method based on the search of key tree, at text library key tree, search for given character string and be included in the included text message storehouse of key tree, execution in step (as shown in Figure 4) is as follows:
(1) reads first character of waiting to look into a unit string;
(2) key is set the key assignments that whether matches this character in the ground floor, is then to change step (3) over to, otherwise finishes to check that this list is not in text library;
(3) read the character late of character string;
(4) judging whether null character (NUL), is then to change step (5) over to, otherwise changes step (6) over to;
(5) finish checking, and judge whether leaf node of a last character, be that then this list belongs in the text library, otherwise this list is not in this storehouse;
(6) whether matching the key assignments of this character under key is set in one deck, is then to change step (3) over to, otherwise finishes to check that this list is not in text library.
In addition, the inspection of erroneous judgement text library is used to fit into this inspection of style of writing, and the text inspection of judging by accident easily earlier before the beginning inspection method if the result is a coupling, does not then need to enter the inspection step of text library, checks step otherwise carry out described text library.According to actual conditions, also can the inspection of advanced row text library, if the result is a coupling, also to judge the inspection of text library by accident, if result's coupling illustrates that then the front checks wrongly, otherwise should be to belong in the text library.
In the specific embodiment of the present invention, described null character (NUL) is the termination character in the described particular text character string, for example: character string " lei ", last character behind its character " i " is exactly a null character (NUL), and this character is exactly the termination character in the time of logic determines simultaneously.
Embodiment 2
The invention provides a kind of particular text information handling system, comprising: terminal, gateway based on the key tree; Also comprise: particular text signal conditioning package (shown in Fig. 2 a); Described terminal is connected with described particular text signal conditioning package through gateway; Wherein said particular text signal conditioning package further comprises: data storage cell is used for particular text information is stored; Key tree generation unit is used to generate the key tree that includes described particular text information; The character string search unit is used for searching for whether be included in the included particular text information of described key tree through the given character string of described terminal according to described key tree, exports Search Results then.Described particular text signal conditioning package comprises that also the particular text character string increases the unit newly, is used for according to described key tree, with given character string one by one in the node that is inserted into described key tree of character.Erroneous judgement text message unit (shown in Fig. 2 a), be used for the text message of easy erroneous judgement is checked, if the result is a coupling, then do not need to enter the given character string of described particular text character string search unit searches, otherwise the given character string of described particular text character string search unit searches.
Now be that example is described system of the present invention with the blacklist.A kind of blacklist treating apparatus, described device comprises data processing equipment and data storage device, data processing equipment (shown in Fig. 2 b) further comprises blacklist document processing module, the newly-increased module of blacklist and blacklist removing module.
The blacklist document processing module imports to the blacklist file that receives in transient worker's tabulation, so that the newly-increased module of blacklist reads the blacklist character string in this worksheet, and it is set up the key tree.The time interval of considering the change of blacklist file is very long, and after receiving new blacklist file, the blacklist removing module will at first be deleted all key tree records, rebulid the key tree by the newly-increased module of blacklist then.
Simultaneously,, can list in the white list file, and data processing equipment also may further include white list document processing module, the newly-increased module of white list and white list removing module for the list that obviously belongs to wrong report.Its principle of work is with blacklist respective handling module.Like this, the blacklist treating apparatus can mask the rate of false alarm of white list automatically, significantly reduces artificial intervention degree.
Blacklist of the present invention is newly-increased to be comprised: A) read first character for the treatment of newly-increased blacklist character string; B) key is set the key assignments that whether matches this character in the ground floor, is then to change step G over to), otherwise change step C over to); C) insert root node; D) read the character late of character string; E) judge whether null character (NUL), be then newly-increased the end, otherwise change step F over to); F) insert child node, change step D over to); G) read the character late of character string; H) judging whether null character (NUL), is then to change step I over to), otherwise change K over to); I) judge whether a last character is leaf node, be then newly-increased the end, otherwise change step J over to); J) the last leaf node that character is all of deletion; K) whether matching the key assignments of this character under key is set in one deck, is then to change step L over to), otherwise change step M over to); L) judge whether leaf node of this character, be then newly-increased the end, otherwise change step G over to); M) insert child node, change step G over to);
Blacklist inspection of the present invention comprises: a) read first character of waiting to look into a unit string; B) key is set the key assignments that whether matches this character in the ground floor, is then to change step c) over to, otherwise finishes to check that this list is not a blacklist; C) read the character late of character string; D) judge whether null character (NUL), be then to change step e) over to, otherwise change step f) over to; E) finish checking, and judge whether leaf node of a last character, be that then this list is a blacklist, otherwise this list is not a blacklist; F) whether matching the key assignments of this character under key is set in one deck, is then to change step c) over to, otherwise finishes to check that this list is not a blacklist.
One of improvement project of checking as described blacklist, begin to comprise before the described inspection method inspection step of white list, it checks the inspection method with blacklist, if the result is a white list, then do not need to enter the inspection step of blacklist, check step otherwise carry out described blacklist.
Two of the improvement project of checking as described blacklist finishes after the described blacklist inspection, if this list is a blacklist as a result, then carries out the inspection step of white list again, and it checks the inspection method with blacklist.
A node in each character corresponding keys tree of blacklist character string, each node has " key assignments ", " sub-pointer ", " right pointer " and " this node place layer " four attributes, " key assignments " deposited is exactly a character in this blacklist character string, root node place layer is a ground floor, its child node is the second layer, increase progressively successively, " the sub-pointer " of last layer node points to its in all child nodes of one deck one of the key assignments minimum down, point to key assignments time node with " the right pointer " of the node of key assignments minimum in one deck less than own key assignments, in like manner, " the right pointer " of the node that key assignments is time little points to the key assignments time node less than own key assignments, like this up to that node of key assignments maximum, its right pointer is a null value, and " the right pointer " of root node is null value.When the sub-pointer of a node was null value, this node was exactly " leaf node ".
Can be seen that by above-mentioned principle the process of newly-increased blacklist is exactly to set up the process of this key tree, the present invention is described in detail below in conjunction with accompanying drawing.
Fig. 1 is to use the network structure of blacklist treating apparatus.It is by terminal 1, and gateway 2 and blacklist treating apparatus 3 are formed.
Terminal 1 can be a PC, it also can be special-purpose terminal, the teller submits blacklist character string to be checked to by this terminal, be transmitted to blacklist treating apparatus 3 by gateway 2, and receive the blacklist check result feed back, the teller also can send increasing newly or delete instruction of blacklist or white list to blacklist treating apparatus 3 by this terminal simultaneously, carries out the newly-increased or deletion action of blacklist or white list.
Gateway 2 is communication servers, is responsible for the communication between terminal 1 and the blacklist treating apparatus.
Blacklist treating apparatus 3 is responsible for blacklist or/and the white list file imports transient worker's tabulation, and adopts the key tree method to carry out newly-increased, the deletion and the inquiry of blacklist or white list, and return results information.
Shown in Fig. 2 b, blacklist treating apparatus 3 is made up of data processing equipment 31 and data storage device 32.Data processing equipment 31 comprises blacklist document processing module 311, white list document processing module 312 (that is: erroneous judgement text message unit), the newly-increased module 313 of blacklist, blacklist removing module 314, the newly-increased module 315 of white list and white list removing module 316.
The function of blacklist document processing module 311 is in the blacklist worksheet with the interim importing data storage device 32 of blacklist file, with the separator between each character string in the file is sign, each blacklist character string is put into a record of worksheet respectively, be convenient to the newly-increased module 313 of blacklist and carry out the newly-increased of each blacklist.
The function of white list document processing module 312 is in the white list worksheet with the interim importing data storage device 32 of white list file, with the separator between each character string in the file is sign, each white list character string is put into a record of worksheet respectively, be convenient to the newly-increased module 315 of white list and carry out the newly-increased of each white list.
The function of the newly-increased module 313 of blacklist is to set up the key tree at each blacklist character string, four property values of each node of this key tree is put in the record in the blacklist record sheet of data storage device 32.
The function of blacklist removing module 314 is the blacklist record sheets that directly empty data storage device 32.
The function of the newly-increased module 315 of white list is to set up the key tree at each white list character string, four property values of each node of this key tree is put in the record in the white list record sheet of data storage device 32.
The function of white list removing module 316 is the white list record sheets that directly empty data storage device 32.
Four tables are arranged, blacklist worksheet, white list worksheet, blacklist record sheet and white list record sheet in the data storage device 32.
Because the newly-increased flow process of blacklist and white list is identical, below in conjunction with Fig. 3 the newly-increased flow process of blacklist is described.
Step 100: read first character for the treatment of newly-increased blacklist character string.
Step 101: whether match the key assignments of this character in the key tree ground floor, be then to change step 106 over to, otherwise change step 102 over to.
Step 102: insert root node, newly-increased record in the blacklist record sheet, key assignments is this character, and the number of plies is " 1 ", and sub-pointer is a null value, and right pointer is a null value.
Step 103: next character that reads character string.
Step 104: judge whether null character (NUL), be then newly-increased the end, otherwise change step 105 over to.
Step 105: insert child node, newly-increased record in the blacklist record sheet, key assignments is this character, the number of plies is that the number of plies of a last character corresponding record adds 1, sub-pointer is a null value, and right pointer is a null value, simultaneously, the sub-pointed that needs to revise a last character corresponding record should increase record position newly, changed step 103 over to.
Step 106: the character late that reads character string.
Step 107: judge whether to be sky, be then to change step 108 over to, otherwise change 110 over to.
Step 108: judge whether a last character is leaf node, be then newly-increased the end, otherwise change step 109 over to.
Step 109: delete record corresponding in all leaf nodes of a last character and the blacklist record sheet.
Step 110: whether match the key assignments of this character under the key tree in one deck, be then to change step 111 over to, otherwise change step 112 over to.
Step 111: judge whether leaf node of this character, be then newly-increased the end, otherwise change step 106 over to.
Step 112: insert child node, and revise record relevant in the blacklist record sheet simultaneously, change step 106 over to.
As shown in Figure 4, be the flow process of carrying out the blacklist inquiry, specifically describe as follows:
Step 200: read first character of waiting to look into a unit string.
Step 201: whether matching the key assignments of this character in the key tree ground floor, is then to change step 202 over to, otherwise finishes to check that this list is not a blacklist.
Step 202: the character late that reads character string.
Step 203: judge whether null character (NUL), be then to change step 204 over to, otherwise change step 205 over to.
Step 204: finish checking, and judge whether leaf node of a last character, be that then this list is a blacklist, otherwise this list is not a blacklist.
Step 205: whether matching the key assignments of this character under the key tree in one deck, is then to change step 202 over to, otherwise finishes to check that this list is not a blacklist.
Fig. 5 is the example of blacklist key tree: LEI and LO are blacklists.
Node 1 " key assignments " is " L ", and " sub-pointer " points to node 2, " right pointer " is empty, and " this node place layer " is " 1 " layer, is root node.
Node 2 " key assignments " is " E ", and " sub-pointer " points to node 4, " right pointer " points to node 3, and " this node place layer " is " 2 " layer.
Node 3 " key assignments " is " O ", and " sub-pointer " is that sky, " right pointer " are empty, and " this node place layer " is " 2 " layer, is leaf node.
Node 4 " key assignments " is " I ", and " sub-pointer " is that sky, " right pointer " are empty, and " this node place layer " is " 3 " layer, is leaf node.
The foundation of key tree is described with example below.The key tree of Fig. 5 has been arranged, increase blacklist LAODA.
1) step 100: read first character " L " for the treatment of newly-increased blacklist character string.
2) step 101: match the key assignments of this character in the key tree ground floor, change step 106 over to.
3) step 106: the character late " A " that reads character string.
4) step 107: judge it is not null character (NUL), change step 110 over to.
5) step 110: be the key assignments " A " that does not match this character in one deck under the key tree, change step 112 over to.
6) step 112: insert child node A, and revise record relevant in the blacklist record sheet simultaneously, make the newly-increased node A of sub-pointed of L, the right pointed E of newly-increased node changes step 106 over to.
7) step 106: the character late " O " that reads character string.
8) step 107: judge to be null character (NUL), change step 110 over to.
9) step 110: be the key assignments " O " that does not match this character in one deck under the key tree, change step 112 over to.
10) step 112: insert child node O, and revise record relevant in the blacklist record sheet simultaneously, make the newly-increased node O of sub-pointed of A, the right pointer of newly-increased node is a null pointer, changes step 106 over to.
11) step 106: the character late " D " that reads character string.
12) step 107: judge to be null character (NUL), change step 110 over to.
13) step 110: be the key assignments " D " that does not match this character in one deck under the key tree, change step 112 over to.
14) step 112: insert child node D, and revise record relevant in the blacklist record sheet simultaneously, make the newly-increased node D of sub-pointed of O, the right pointer of newly-increased node is a null pointer, changes step 106 over to.
15) step 106: the character late " A " that reads character string.
16) step 107: judge to be null character (NUL), change step 110 over to.
17) step 110: be the key assignments " A " that does not match this character in one deck under the key tree, change step 112 over to.
18) step 112: insert child node A, and revise record relevant in the blacklist record sheet simultaneously, make the newly-increased node A of sub-pointed of D, the right pointer of newly-increased node is a null pointer, changes step 106 over to.
19) step 106: the character late that reads character string is empty.
20) step 107: judgement is a null character (NUL), changes step 108 over to.
21) step 108: judge that a last node A is a leaf node, the newly-increased end.
The inquiry of blacklist is described with example below: character string to be checked is LEIZI.
1) step 200: read first character " L " of waiting to look into a unit string.
2) step 201: match the key assignments of this character in the key tree ground floor, change step 202 over to.
3) step 202: the character late that reads character string " E ".
4) step 203: whether judge not null character (NUL) to change step 205 over to.
5) step 205: be the key assignments " E " that matches this character in one deck under the key tree, change step 202 over to.
6) step 202: the character late that reads character string " I ".
7) step 203: whether judge not null character (NUL) to change step 205 over to.
8) step 205: be the key assignments " I " that matches this character in one deck under the key tree, change step 202 over to.
9) step 202: the character late " Z " that reads character string.
10) step 203: whether judge not null character (NUL) to change step 205 over to.
11) step 204: finish checking, and judge whether leaf node of a last character, be that then this list is a blacklist, otherwise this list is not a blacklist.
12) step 205: not matching the key assignments " Z " of this character under the key tree in one deck, then is not blacklist, finishes.
Compared with prior art, advantage of the present invention is:
One, the direct character match of efficiency ratio of the present invention has very big raising.
Adopting the method for key tree, from tree root, walk along subtree, get to the leaf end, is exactly the data recording of a list.Have 26 with initial as the tree of root node, character string from the beginning to the end, at every turn in the key tree retrieval once, irrelevant with the record number of key tree the inside, theoretic number of comparisons is n*m.The speed that such scheme improves is conspicuous.
Test findings is as shown in the table:
The information character number Blacklist record number Number of times does not match Directly matching way is consuming time The key tree method is consuming time The multiple that improves
160 30000 1 0.190 second <0.001 second >190
160 30000 10 1.920 second 0.010 second 192
160 30000 100 19.170 second 0.060 second 319
160 150000 1 0.960 second <0.001 second >960
160 150000 10 9.570 second 0.010 second 957
160 150000 100 96.090 second 0.100 second 961
Directly the time of method coupling relatively relevant with the record strip number, be a linear relation (150,000 write down the used time be substantially 30,000 records 5 times) substantially; And the bar number of method that the key tree is compared and record is irrelevant.On the efficient of coupling, the key tree has relatively been improved hundred times than directly, and 30,000 records have improved about 200 times, and 150,000 records have then improved about 1000 times.Along with the increase of record strip number, the efficient of raising will be more obvious.This is because directly relatively relevant with the record strip number, and the key tree is relatively caused with the irrelevant reason of record strip number.
Two, rate of false alarm of the present invention has reduced.For example, SHAN is arranged in the blacklist, the phonetic that has comprised Shandong if desired in the information of Jian Chaing, when then carrying out the blacklist inspection, the warning of blacklist all can be arranged, but we know certainly not blacklist of Shandong at every turn, then we add SHANDONG in white list, when then reexamining out blacklist suspicion, check out SHANDONG in the white list again, then can avoid wrong report.
Particular text information in the above-described embodiments is that example is illustrated with " blacklist ", in fact any utilization method and system of the present invention that the processing of particular text information (as: Item Title etc.) all be may be obvious that.Therefore above embodiment only is used to illustrate the present invention, but not is used to limit the present invention.

Claims (13)

1. specific text infor mation processing method based on key tree is characterized in that may further comprise the steps:
Particular text information is stored;
Generation includes the key tree of described particular text information;
According to described key tree whether given character string is included in the included particular text information of described key tree and searches for, export Search Results then.
2. method according to claim 1 is characterized in that, described particular text information is made up of the particular text character string, and described key tree is made up of a plurality of nodes that are tree-shaped annexation; Wherein
Make a node in the corresponding described key tree of each character of a particular text character string;
Each node has key assignments, lower floor's pointer, same layer pointer and the node place number of plies;
Described key assignments is deposited is a character in the particular text character string, and lower floor's pointer value is a leaf node for empty node, if this leaf node place number of plies is for being root node first; Wherein
A node of key assignments minimum in all nodes of the adjacent lower floor of lower floor's pointed of last layer node, the inferior node of same layer pointed key assignments of the node that key assignments is little in the same node layer less than own key assignments.
3. method according to claim 2 is characterized in that, the key tree that described generation includes described particular text information also comprises: the step of newly-increased particular text character string, and the step of this newly-increased particular text character string specifically comprises:
Step 1, read first character for the treatment of newly-increased blacklist character string;
Whether match the key assignments of this character in step 2, the key tree ground floor, be then to change step 7 over to, otherwise change step 3 over to;
Step 3, insertion root node, newly-increased record in the blacklist record sheet, key assignments is this character, and the number of plies is one, and sub-pointer is a null value, and right pointer is a null value;
Step 4, read next character of character string;
Step 5, judge whether null character (NUL), be then newly-increased the end, otherwise change step 6 over to into the expression end of string;
Step 6, insertion child node, newly-increased record in the blacklist record sheet, key assignments is this character, the number of plies is that the number of plies of a last character corresponding record adds 1, sub-pointer is a null value, and right pointer is a null value, simultaneously, the sub-pointed that needs to revise a last character corresponding record should increase record position newly, changed step 4 over to;
Step 7, read the character late of character string;
Step 8, judge whether null character (NUL), be then to change step 9 over to, otherwise change 11 over to;
Whether step 9, the last character of judgement are leaf nodes, are then newly-increased the end, otherwise change step 10 over to;
The record of correspondence in step 10, all leaf node of the last character of deletion and the blacklist record sheet;
Whether match the key assignments of this character under step 11, the key tree in one deck, be then to change step 12 over to, otherwise change step 13 over to;
Step 12, judge whether this character is leaf node, be then newly-increased the end, otherwise change step 7 over to;
Step 13, insertion child node, and revise record relevant in the blacklist record sheet simultaneously, change step 7 over to.
4. according to claim 2 or 3 described methods, it is characterized in that also comprising, the step of search particular text character string, the step of this search particular text character string specifically comprises:
Step 1 ', read first character of waiting to look into the particular text character string;
Step 2 ', judge in the described key tree ground floor whether match the key assignments of this character, if then change over to step 3 ', if otherwise search step finish and export the not result in the particular text information of storage of this particular text character string;
Step 3 ', read the character late of waiting to look into the particular text character string;
Step 4 ', judge whether current character is null character (NUL), if then change over to step 5 ', if otherwise change over to step 6 ';
Step 5 ', finish search step, and judge whether a last character is leaf node, if then export the particular text result of information that this particular text character string belongs to storage, if otherwise export the not result in the particular text information of storage of this particular text character string;
Step 6 ', judge whether described key tree matches the key assignments of this character in one deck down, if then change over to step 3 ', if otherwise search step finish and export the not result in the particular text information of storage of this particular text character string.
5. method according to claim 1 is characterized in that also comprising, the treatment step of erroneous judgement text message, be used for the text message of easy erroneous judgement is checked, if the result is a coupling, does not then need to enter the step of the given character string of search, otherwise search for the step of given character string.
6. method according to claim 1 is characterized in that, described particular text information is meant: blacklist and/or white list.
7. the particular text information handling system based on the key tree comprises: terminal, gateway; It is characterized in that comprising: the particular text signal conditioning package; Described terminal is connected with described particular text signal conditioning package through gateway; Wherein
Described particular text signal conditioning package further comprises:
Data storage cell is used for particular text information is stored;
Key tree generation unit is used to generate the key tree that includes described particular text information;
The character string search unit is used for searching for whether be included in the included particular text information of described key tree through the given character string of described terminal according to described key tree, exports Search Results then.
8. system according to claim 7 is characterized in that, described particular text information is made up of the particular text character string, and described key tree is made up of a plurality of nodes that are tree-shaped annexation; Wherein
Make a node in the corresponding described key tree of each character of a particular text character string;
Each node has key assignments, lower floor's pointer, same layer pointer and the node place number of plies;
Described key assignments is deposited is a character in the particular text character string, is root node with layer pointer value for empty node, and lower floor's pointer value is a leaf node for empty node; Wherein
A node of key assignments minimum in all nodes of the adjacent lower floor of lower floor's pointed of last layer node, the inferior node of same layer pointed key assignments of the node that key assignments is little in the same node layer less than own key assignments.
9. system according to claim 8 is characterized in that, described key tree generation unit comprises that the particular text character string increase part, is used for according to described key tree, with given character string being inserted in the node that described key sets of character one by one.
10. according to Claim 8 or 9 described systems, it is characterized in that, described character string search unit comprises, particular text character string search part, be used for judging according to described key tree whether the character string of coming in and going out through terminal is present in the particular text information of storage, if then export the particular text result of information that this given character string belongs to storage, if otherwise export the not result in the particular text information of storage of this given character string.
11. system according to claim 10, it is characterized in that, described particular text signal conditioning package also comprises, erroneous judgement text message unit, be used for the text message of easy erroneous judgement is checked, if the result is a coupling, then do not need to enter the given character string of described particular text character string search unit searches, otherwise the given character string of described particular text character string search unit searches.
12. system according to claim 7 is characterized in that, described particular text information is meant: blacklist and/or white list.
13. system according to claim 12, it is characterized in that described particular text signal conditioning package also comprises: blacklist file processing unit, white list file processing unit, the newly-increased unit of blacklist, blacklist delete cells, the newly-increased unit of white list and white list delete cells.
CNB2006101143563A 2006-11-08 2006-11-08 Specific text infor mation processing method based on key tree and system therefor Active CN100468409C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006101143563A CN100468409C (en) 2006-11-08 2006-11-08 Specific text infor mation processing method based on key tree and system therefor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006101143563A CN100468409C (en) 2006-11-08 2006-11-08 Specific text infor mation processing method based on key tree and system therefor

Publications (2)

Publication Number Publication Date
CN1979482A true CN1979482A (en) 2007-06-13
CN100468409C CN100468409C (en) 2009-03-11

Family

ID=38130651

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006101143563A Active CN100468409C (en) 2006-11-08 2006-11-08 Specific text infor mation processing method based on key tree and system therefor

Country Status (1)

Country Link
CN (1) CN100468409C (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100527134C (en) * 2007-12-04 2009-08-12 威盛电子股份有限公司 Multiple modes search method and system
CN102682017A (en) * 2011-03-15 2012-09-19 阿里巴巴集团控股有限公司 Information retrieval method and system
CN102737105A (en) * 2012-03-31 2012-10-17 北京小米科技有限责任公司 Dict-tree generation method and searching method
CN102882987A (en) * 2011-07-12 2013-01-16 阿里巴巴集团控股有限公司 Domain filter list storing and matching method and device
CN104537107A (en) * 2015-01-15 2015-04-22 中国联合网络通信集团有限公司 URL storage matching method and device
CN108241695A (en) * 2016-12-26 2018-07-03 北京国双科技有限公司 Information processing method and device
CN114090570A (en) * 2021-09-29 2022-02-25 北京信息科技大学 Data storage method and device based on combination of radix tree and hash table

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100527134C (en) * 2007-12-04 2009-08-12 威盛电子股份有限公司 Multiple modes search method and system
CN102682017A (en) * 2011-03-15 2012-09-19 阿里巴巴集团控股有限公司 Information retrieval method and system
CN102682017B (en) * 2011-03-15 2014-04-23 阿里巴巴集团控股有限公司 Information retrieval method and system
CN102882987A (en) * 2011-07-12 2013-01-16 阿里巴巴集团控股有限公司 Domain filter list storing and matching method and device
CN102882987B (en) * 2011-07-12 2015-08-26 阿里巴巴集团控股有限公司 Domain filter list storage, matching process and device
CN102737105A (en) * 2012-03-31 2012-10-17 北京小米科技有限责任公司 Dict-tree generation method and searching method
CN104537107A (en) * 2015-01-15 2015-04-22 中国联合网络通信集团有限公司 URL storage matching method and device
CN108241695A (en) * 2016-12-26 2018-07-03 北京国双科技有限公司 Information processing method and device
CN108241695B (en) * 2016-12-26 2021-11-02 北京国双科技有限公司 Information processing method and device
CN114090570A (en) * 2021-09-29 2022-02-25 北京信息科技大学 Data storage method and device based on combination of radix tree and hash table

Also Published As

Publication number Publication date
CN100468409C (en) 2009-03-11

Similar Documents

Publication Publication Date Title
CN100468409C (en) Specific text infor mation processing method based on key tree and system therefor
CN109887153B (en) Finance and tax processing method and system
CN109523153A (en) Acquisition methods, device, computer equipment and the storage medium of illegal fund collection enterprise
CN101604437A (en) Account is real time processing system and account batch real-time processing method in batches
CN104731976A (en) Method for finding and sorting private data in data table
CN110889310B (en) Financial document information intelligent extraction system and method
CN107656958A (en) A kind of classifying method and server of multi-data source data
CN106408358A (en) Invoice management method and invoice management apparatus
CN111428599A (en) Bill identification method, device and equipment
CN110413569A (en) Archives of paper quality electronization archiving method, device and terminal device
CN111914294B (en) Database sensitive data identification method and system
CN107067323B (en) Financial data processing system and batch credit transaction distribution system and method thereof
CN204833438U (en) Self -service equipment of finance
CN112364645A (en) Method and equipment for automatically auditing ERP financial system business documents
US20070255651A1 (en) Batch processing of financial transactions
CN113836898A (en) Automatic order dispatching method for power system
CN108984682A (en) One kind being used for the matched information processing method of industrial policy and system
CN111400187B (en) Parameter dynamic verification system and method based on customized data source
CN117407726A (en) Intelligent service data matching method, system and storage medium
CN115168345B (en) Database classification method, system, device and storage medium
CN109829769B (en) Electronic invoice duplicate checking method and system
CN111126966A (en) Bill auditing method and device, computer equipment and computer-readable storage medium
CN110941652A (en) Analysis method of bank flow data
CN103136187A (en) Method and system for extraction of patent rejection information
CN110807702A (en) Method, device, equipment and storage medium for managing information after loan

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant