CN101442731B - Method and apparatus for removing call ticket repeat - Google Patents

Method and apparatus for removing call ticket repeat Download PDF

Info

Publication number
CN101442731B
CN101442731B CN2008101832739A CN200810183273A CN101442731B CN 101442731 B CN101442731 B CN 101442731B CN 2008101832739 A CN2008101832739 A CN 2008101832739A CN 200810183273 A CN200810183273 A CN 200810183273A CN 101442731 B CN101442731 B CN 101442731B
Authority
CN
China
Prior art keywords
ticket
feature string
binary tree
balanced binary
hash table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2008101832739A
Other languages
Chinese (zh)
Other versions
CN101442731A (en
Inventor
刘喜
姜晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Group Anhui Co Ltd
Original Assignee
China Mobile Group Anhui Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Group Anhui Co Ltd filed Critical China Mobile Group Anhui Co Ltd
Priority to CN2008101832739A priority Critical patent/CN101442731B/en
Publication of CN101442731A publication Critical patent/CN101442731A/en
Application granted granted Critical
Publication of CN101442731B publication Critical patent/CN101442731B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a repetition deleting method and device for ticket, mainly applied in communication charging field and comprising following steps: extracting key domain information from ticket; computing the key domain information using MD5 algorithm, generating MD5 characteristic string; comparing the MD5 characteristic string with that corresponding to normal ticket stored in index file; if finding same MD5 characteristic string, then the ticket is duplicate, then deleting, or keeping the MD5 characteristic string the ticket corresponds to index file and confirming the ticket as normal ticket. In the invention, MD5 characteristic string of index file is stored in hash table whose conflict is solved through chained list or balanced binary tree. The invention integrates advantage of hash table, balanced binary tree and MD5 algorithm, implements effective processing for arbitrary service and ticket of arbitrary type, then unique repetition deleting index is formed and length of index is unique, which promotes repetition deleting efficiency and expandability, largely saves memory space.

Description

A kind of removing call ticket repeat method and device
Technical field
The present invention relates to the communication charge field, particularly a kind of removing call ticket repeat method and device that is widely used in charging, settlement system in the communications field.
Background technology
Charge system is before carrying out wholesale price to ticket, must be earlier every the ticket that collects from each network element or business platform be picked heavily and handle, therefore for the charge system that need handle mass tickets every day, the performance of picking heavy system will directly have influence on the promptness that ticket is handled.
General ticket picks heavy process all to carry out in internal memory, basic principle is as follows: at first extract corresponding key information combination (for example phone number, air time, the other side's number, SP code etc. from ticket, inhomogeneous ticket key combination is different), in internal memory, search the information that has this combined field whether again, if existing this ticket key information combination, then judge the list of attaching most importance to, otherwise this ticket key information is kept in the internal memory.And can only preserve a certain amount of key information in the internal memory, and for expired data, will output to disk, deposit with document form.When the processing delay ticket, must earlier it be re-loaded in the internal memory.Therefore, how to manage, deposit these ticket key information, be directly connected to the storage size of system's needs and pick heavy efficient.
The existing weight technology of looking into generally all is the key information combination of extracting in the ticket, directly is stored in the page of internal memory, and uses ltsh chain table or balanced binary tree method retrieval ticket key combined information to deposit the page, judges whether to be duplicate message.
A kind of removing call ticket repeat method based on the internal memory mode is disclosed in one piece of application number is 03145603.0 Chinese patent file, comprise: extract the key information in the ticket, be stored in the page of internal memory after the simple combination, use HASH chain table method or balanced binary tree method retrieval ticket key combined information to deposit the page, and judge whether to be duplicate message.
A kind of system and method for eliminating redundant file in the document storage system is disclosed in one piece of application number is 200610036536.4 Chinese patent file, the method of redundant file comprises in this elimination document storage system: the relevant information of obtaining file by the scanning storage server modules, the ID that comprises size, file number of references and the file of file, and the MD5 value of calculation document content, and after the MD5 value of file carried out Hash operation, find corresponding Hash table by internal memory Hash mapping table.If this document number of references surpasses threshold value, then find out in the storage system redundant file by Hash table and delete according to the size of the MD5 value of file and file.
Existing shortcoming of picking heavy technology existence may be summarized to be following 2 points:
1, the key information of all kinds of tickets is different, and the length after field length and the corresponding combination is all different, not manageability, poor expandability, and general key information combination field is all long, and it is big to take storage, also can influence pick anharmonic ratio efficient on certain program simultaneously.
2, simple ltsh chain table or balanced binary tree method just seem that efficient is still not enough when handling mass data (every day, peak value per hour can reach up to ten million above hundred million tickets), can not satisfy growing communication service demand.
Summary of the invention
The purpose of this invention is to provide a kind of removing call ticket repeat method and device,, not manageability, poor expandability different with the key information that solves existing all kinds of tickets are handled the not enough problem of mass data efficient.
In order to realize above purpose, the invention provides a kind of removing call ticket repeat method, may further comprise the steps:
Step a: from ticket, extract key information;
Step b: use the MD5 algorithm that this key information is calculated, generate the MD5 feature string of this ticket correspondence;
Step c: with described MD5 feature string; the MD5 feature string corresponding with ticket that store in the index file, normal compares; if find identical MD5 feature string; this ticket list of attaching most importance to then; this ticket is rejected; otherwise described MD5 feature string is saved in the index file, and confirms that the ticket of this MD5 feature string correspondence is normal ticket.
In the technique scheme, the MD5 feature string of described normal ticket correspondence is stored in the index file in the mode of Hash table; Described step c specifically comprises:
Step c1:, carry out Hash operation according to the hash function of setting to the MD5 feature string of ticket correspondence;
Step c2:, find the memory node in the described Hash table according to the functional value that Hash operation obtains:
Step c3: if find the MD5 feature string identical MD5 feature string corresponding with this ticket on this memory node, then this ticket list of attaching most importance to is rejected this ticket; Otherwise, the MD5 feature string of this ticket correspondence is inserted in this memory node, and confirms that this ticket is normal ticket.
In the technique scheme, the MD5 feature string in the memory node in the described Hash table is stored with the mode of chained list or the mode of balanced binary tree.
In the technique scheme, when the MD5 feature string in the memory node in the described Hash table is stored in the mode of balanced binary tree, the step that the MD5 feature string of described ticket correspondence is inserted in this ticket memory node comprises: this MD5 feature string is inserted on the balanced binary tree on this memory node, if when the MD5 feature string that inserts makes described balanced binary tree disequilibrium, then adjust by rotation.
Preferably, when described ticket is normal ticket, then export this ticket, and the index delta file after the output renewal.
Preferably, described index file and index delta file are stored in internal memory or the disk; Formulate numerical value if the EMS memory occupation amount surpasses, then merge the index delta file, the part early of time in the index file is stored in the disk, and discharge internal memory automatically; If in the time of the index file in the disk need being re-loaded to internal memory, then reload.
Preferably, described key information includes but not limited to: any one in the combined field of being made up of calling number, called number, air time, SP code or the combination of concentrating.
In order to realize above purpose, the present invention also provides a kind of ticket to pick refitting and put, and comprising: extraction module is used for extracting key information from ticket; Feature is concatenated into module, is used to use the MD5 algorithm that key information is calculated, and generates MD5 feature string; Relatively pick the molality piece, the corresponding MD5 feature string of ticket that be used for described MD5 feature string is stored with index file, normal compares; If find identical MD5 feature string, then this ticket list of attaching most importance to is rejected this ticket; Otherwise the MD5 feature string of this ticket correspondence is saved in the index file, and confirms that this ticket is normal ticket.
Preferably, the described molality piece that relatively picks comprises: the compute location unit is used for according to the hash function of setting described MD5 feature string being carried out Hash operation, and according to the hash function value of this computing gained, finds the memory node in the described Hash table; Search the unit, be used on described memory node, searching and the identical feature string of described MD5 feature string; Pick heavy unit, be used for determining the described ticket list of whether attaching most importance to, if on the memory node of described Hash table, find the MD5 feature string identical with described feature string according to the described result who searches; the described ticket list of attaching most importance to; reject, otherwise described ticket is normal ticket.
Preferentially, the memory node of described Hash table is a chained list or a balanced binary tree.
Preferentially, when the memory node in the described Hash table is a balanced binary tree, the described molality piece that relatively picks also comprises: insert the unit, be used for the MD5 feature string of described normal ticket correspondence is inserted into described balanced binary tree, simultaneously this normal ticket of output and index delta file thereof; If rotary unit when being used for described MD5 feature string and inserting described balanced binary tree, makes described balanced binary tree disequilibrium, then adjusts by this rotary unit spin balancing binary tree.
Preferentially, this ticket picks refitting and puts and also comprise load-on module, is used for as required the index file of disk is re-loaded to internal memory.
Removing call ticket repeat method provided by the invention and device, the length of comprehensive Hash table, balanced binary tree and MD5 algorithm, realized effective processing to any business, any type ticket, by forming unique heavy index that picks after the system handles, and length is unique, improved and picked heavy operational efficiency and extensibility, saved memory space greatly.
Description of drawings
Fig. 1 is the flow chart of a kind of removing call ticket repeat method of the present invention embodiment one;
Fig. 2 is the scheme schematic diagram that adopts balanced binary tree to manage conflict as Hash table;
Fig. 3 is uneven binary tree schematic diagram;
Fig. 4 is the balanced binary tree schematic diagram;
Fig. 5 is the flow chart of a kind of removing call ticket repeat method of the present invention embodiment two;
Fig. 6 is a balanced binary tree rotation schematic diagram;
Fig. 7 is a schematic diagram of searching MD5 feature string on balanced binary tree;
Fig. 8 is that a kind of ticket of the present invention picks the structure chart that embodiment one is put in refitting;
Fig. 9 is that a kind of ticket of the present invention picks the structure chart that embodiment two is put in refitting.
Embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in detail.
Method embodiment
Disclose the flow chart of a kind of removing call ticket repeat method embodiment one as shown in Figure 1, below the technical term of mentioning in the present embodiment be described below:
MD5 (Message-Digest Algorithm 5, Message-Digest Algorithm 5) be a kind of individual event hashing algorithm that is used to produce digital signature, it can be transformed into " byte serial " of a random length string of one 16 byte by an irreversible character string mapping algorithm.Utilize that the MD5 algorithm is can standard professional arbitrarily, the ticket key information combination of any type, with the string of 16 unique bytes of its boil down to.Like this, arbitrarily the key information of ticket all can unified management, storage and uniform.And therefore the key information of general ticket be transformed into 16 byte serials with it and manage and store all far more than 16 bytes, search and when compare efficient will obviously improve, the while is conserve memory cost greatly also.In addition, 16 byte serials that produce by the MD5 algorithm are equally distributed, and this is very beneficial for constructing a simple and optimum hash function and positions, thereby effectively raises treatment effeciency.
Hash function is a many-to-one mapping, is assumed to be y=F (x), may have the corresponding y value of a plurality of x values, and this is referred to as the conflict of Hash table.Therefore when using Hash table, must design a method that solves this conflict simultaneously, promptly how to store the corresponding a plurality of x values of same y value in internal memory, the present invention not only can use chained list to store these data, and has used balanced binary tree to solve the problem of Hash table conflict.For the data of Fig. 2, hash function corresponds to functional value 1,11 with 1,9,17,25 4 numerical value and corresponds to functional value 3 etc., is 1 address for functional value, and 1,9,17,25 these four data are stored with this data structure of balanced binary tree again.
Hash table is meant according to one sets hash function, make that the keyword of each element is all corresponding with a hash function value, and with it as the memory location of this keyword in Hash table, it is the Hash address, thereby eliminated the time waste that relatively brings by traversal, the method that its key is to design the hash function of an optimum and handles conflict.Balanced binary tree is an empty tree, or binary tree with following character: its left subtree and right subtree all are balanced binary trees, and the absolute value of the difference of the degree of depth of left subtree and right subtree is no more than 1, be illustrated in figure 3 as a uneven binary tree, Figure 4 shows that a balanced binary tree, time complexity all is 0 (log n) under average and worst case, and search efficiency is very high.Insert or deletion of node after, may disequilibrium, need rotate by one or many and adjust this moment, makes it keep balance.
A kind of removing call ticket repeat method of the present invention embodiment one may further comprise the steps:
Step S102 extracts key information, for example any one of field combination such as calling number, called number, air time, SP code or several combinations from ticket;
Step S104 uses the MD5 algorithm that key information is calculated, and generates MD5 feature string;
Step S106 carries out Hash operation according to the hash function of setting to the MD5 feature string that generates in step S104, and according to calculating gained hash function value described MD5 feature string is navigated on the balanced binary tree as described Hash table memory node;
Step S108 is searching the MD5 feature string that generates in step S104 on the balanced binary tree of finishing the location, if find the MD5 feature string identical with described MD5 feature string on balanced binary tree, jump to step S110, otherwise jump to step S112;
Step S110 if find the MD5 feature string identical with described MD5 feature string on balanced binary tree, then determines this ticket list of attaching most importance to, and will weigh single rejecting;
Step S112 if do not find the MD5 feature string identical with described MD5 feature string on balanced binary tree, determines that then ticket is not heavy single, and step finishes.
The length of embodiment one comprehensive Hash table, balanced binary tree and the MD5 algorithm of a kind of removing call ticket repeat method of the present invention, realized effective processing to any business, any type ticket, after picking heavy system handles, form unique heavy index that picks, and length is unique, improved and picked heavy operational efficiency and extensibility, saved memory space greatly.
Disclose the flow chart of a kind of removing call ticket repeat method embodiment two as shown in Figure 5, on embodiment one basis, scheme carried out refinement, may further comprise the steps:
Step S202 reads in system with ticket, and ticket picks heavy step to begin;
Step S204, from ticket, extract key information, the for example combination of any one or a few of field combination such as calling number, called number, air time, SP code, the normal pairing MD5 feature of ticket string is stored in the index file in the mode of Hash table, this index file is stored in internal memory or the disk, and the conflict of this Hash table then solves by balanced binary tree;
Step S206, whether judge has the pairing index file of the normal ticket of storage, if having, jump to step S210 in the internal memory, otherwise, jump to step S208;
Step S208 if there is not storage to handle the index file of ticket in the internal memory, is re-loaded to internal memory with the index file in the disk, continues execution in step S210;
Step S210, otherwise, use the MD5 algorithm that key information is calculated, generate MD5 feature string, MD5 feature string is 16 unique byte serials, every byte is 8 binary values, as: 00,000,000 00,000,001 00,000,010 00,000,011 00,000,100 00,000,101 0,000,011,000,000,111 00,001,000 00,001,001 00,001,010 00,001,011 00,001,100 0,000,110,100,001,110 00010000;
Step S212, because the MD5 feature string that generates among the step S210 is equally distributed, so hash function can directly be got first of MD5 feature string or the value of front two is located on the balanced binary tree as the Hash table memory node of correspondence, and such hash function computing just can navigate to 1/256 or 1/65536 of former data volume.At this, illustrate that a hash function computing navigates to 1/256 or 1/65536 reason of former data volume: what the MD5 algorithm generated is the feature string of one 16 byte, as 0123456789ABCDEF, wherein each all is an octet, promptly form, as 01010101, because of each binary digit all has two kinds of possible values by 8 binary values, so the value that every byte is possible is 256, promptly 2 8Therefore, if first that gets this 16 byte serial as the direct addressing of hash function value, then traversal is only searched and need be carried out at back 15 byte serials, for equally distributed data, be 1/256 of 16 byte serial values only, if get front two as the direct addressing of hash function value, then the data of desired seek then are 1/65536 of 16 byte serials.
Step S214, the node in hash function location is a balanced binary tree, searches MD5 feature string on this balanced binary tree, the left subtree of balanced binary tree arbitrary node be empty or less than this nodal value, and right subtree is a sky or greater than this nodal value; If the root node value of this balanced binary tree is empty, then jump to step S216, if the root node value of this balanced binary tree equals this MD5 feature string, this ticket list of attaching most importance to then, jump to step S226, if the root node value of this balanced binary tree is not equal to this MD5 feature string, then jump to step S224;
Step S216, if the root node value of this balanced binary tree is empty, then this ticket is not heavy single, this MD5 feature string is inserted in this balanced binary tree, simultaneously implementation step S218;
Step S218 if this ticket is not heavy single, when being inserted into this MD5 feature string on this balanced binary tree, exports correct ticket and index delta file, jumps to step S220;
Step S220 judges that whether insert this MD5 feature string causes this balanced binary tree disequilibrium, if disequilibrium then jumps to step S222, otherwise jumps to step S228;
Step S222 is if this balanced binary tree disequilibrium is then adjusted by rotation; This data structure of balanced binary tree, arbitrary nodal value greater than its left subtree less than the value of its right subtree, and its left side, the difference in height of right subtree is not more than 1, when inserting certain numerical value therein, the difference that may cause its left and right sides subtree is greater than 1, thereby no longer be balanced binary tree, therefore must be rotated and make it to keep balance, the simplest balanced binary tree below the node 6 among Fig. 3 for example, as shown in Figure 6, the left subtree height of root node 14 is 0, the right subtree height is 1, and when inserting node 17, the left subtree height of root node 14 is 0, and the right subtree height is 2, it no longer is balanced binary tree, need to rotate to be 17 left subtree to 14, and 17 be root node that rotation remains balance afterwards.For unbalanced binary tree, it searches number of times more than balanced binary tree, above-mentioned example explanation: for the unbalanced binary tree before the rotation, finding 22 number of comparisons is 3, for postrotational balanced binary tree then is 2, when left and right sides subtree difference in height was big, its difference was clearly;
Step S224, if MD5 feature string is less than the root node value of balanced binary tree, then continue to search the left subtree of root node, if root node value greater than balanced binary tree, then continue to search the right subtree of root node, till finding identical nodal value or certain node not to have corresponding left subtree or right subtree; If find identical nodal value, then this ticket is heavy single, jumps to step S226, if do not have corresponding left subtree or right subtree, then this ticket is not heavy single, jumps to step S218; For example, the MD5 feature string that certain bar ticket generates, be assumed to be 1122000000000003, as shown in Figure 7, at first navigate to the Hash address and be 11 balanced binary tree, MD5 feature string is compared with the value of its root node, for example root node A0 is 1123000000000000, MD5 feature string is less than root node, then search the root node A01 of its left subtree again, as be 1122000000000000, then continue to search the root node A012 of the right subtree of A01, if the value of A012 is not heavy single for empty this ticket, jump to step S218, if the value of A012 is 1122000000000003, this ticket list of attaching most importance to, jump to step S226, otherwise continue to search its left subtree or right subtree;
Step S226 if this ticket is heavy single, rejects this ticket;
Step S228, after executing the process of picking the hard words list, the ticket processing finishes, the internal memory occupancy volume of picking heavy process when ticket surpasses when specifying numerical value, system discharges internal memory automatically, and merge the index delta file, when continuing to handle next CDR file, will reload index file as required.
The embodiment two of a kind of removing call ticket repeat method of the present invention utilizes the MD5 algorithm that ticket key information is carried out computing; obtain the unified and unique MD5 feature string of length; adopt the mode of Hash table to store to its corresponding MD5 feature string of normal ticket; the conflict of Hash table then solves by balanced binary tree; the nodal value of MD5 feature string and balanced binary tree is compared; judge thus whether ticket is heavy single; realized any business; picking heavily of any type ticket handled; improved and picked heavy operational efficiency of picking of heavy system and extensibility; saved memory space greatly; the memory space of saving can reach more than 50%; store the ticket of having handled by the combination of Hash table and balanced binary tree, have the high characteristics of efficient.
In addition to the implementation, a kind of removing call ticket repeat method of the present invention utilizes the MD5 algorithm that ticket key information is carried out computing, obtain MD5 feature string, this MD5 feature string adopts the mode of Hash table to store, the conflict of Hash table also can solve by chained list, the nodal value of MD5 feature string and chained list is compared, judge the ticket list of whether attaching most importance to thus, method similar to the aforementioned embodiment also can realize picking heavily of ticket.
Device embodiment
As shown in Figure 8, disclose the present invention a kind of ticket and picked the structure chart that embodiment one is put in refitting, a kind of ticket picks refitting and puts embodiment one and comprise that extraction module 1, feature concatenate into module 2, relatively pick molality piece 3: extraction module 1 is used for extracting key information from ticket; Feature is concatenated into module 2 and is used to use the MD5 algorithm that key information is calculated, and generates MD5 feature string; Relatively pick molality piece 3: the corresponding MD5 feature string of ticket that be used for described MD5 feature string is stored with index file, normal compares, if find identical MD5 feature string, then this ticket list of attaching most importance to is rejected this ticket; Otherwise the MD5 feature string of this ticket correspondence is saved in the index file, and confirms that this ticket is normal ticket.
Relatively picking molality piece 3 comprises: compute location unit 4 is used for according to the hash function of setting described MD5 feature string being carried out Hash operation, and according to the functional value of this computing gained, finds the memory node in the described Hash table; Search unit 5, be used on described memory node, searching and the identical feature string of described MD5 feature string; Pick heavy unit 6, be used for determining the described ticket list of whether attaching most importance to, if on the memory node of described Hash table, find the MD5 feature string identical with described feature string according to the described result who searches; the described ticket list of attaching most importance to; reject, otherwise described ticket is normal ticket.
Below a kind of ticket being picked the workflow of putting embodiment one of resetting is described below:
Extraction module 1 extracts key information from ticket, feature is concatenated into module 2 and according to the MD5 algorithm key information that generates in the extraction module 1 is calculated, generate MD5 feature string, relatively pick the MD5 feature string of according to the hash function of setting feature being concatenated into module 2 generations in the compute location unit 4 in the molality piece 3 and carry out Hash operation, this MD5 feature string is navigated on the balanced binary tree of storage index file according to calculating gained hash function value, search unit 5 and on the balanced binary tree of finishing the location, search MD5 feature string, on balanced binary tree, find the MD5 feature string identical if search unit 5 with described MD5 feature string, pick the 6 definite tickets in heavy unit list of attaching most importance to, to weigh single rejecting, do not find the MD5 feature string identical with described MD5 feature string if search unit 5 on balanced binary tree, it is not heavy single picking the 6 definite tickets in heavy unit.
A kind of ticket of the present invention picks refitting and puts the advantage that embodiment one combines Hash table, balanced binary tree and MD5 algorithm, effectively realized processing to different business, dissimilar tickets, generated the unified and unique MD5 feature string of length, improve ticket and picked heavy efficient, saved memory space greatly.
As shown in Figure 9, disclose a kind of ticket of the present invention and picked the structure chart that embodiment two is put in refitting, a kind of ticket of the present invention picks the embodiment two that puts that resets and has comprised that a kind of ticket picks whole modules that embodiment one is put in refitting, in addition, relatively pick the molality piece and also comprise insertion unit 7, rotary unit 8, ticket picks to reset to put and also comprises load-on module 9: determine that tickets are not heavy single if insertion unit 7 is used for picking heavy unit 6, MD5 feature string is inserted on the balanced binary tree, export correct ticket and index delta file simultaneously, if being used for inserting the MD5 feature string that inserts unit 7, rotary unit 8 makes the balanced binary tree disequilibrium, then adjust by rotary unit 8 spin balancing binary trees, load-on module 9 is used for as required the index file of disk is re-loaded to internal memory.
Below a kind of ticket being picked the workflow of putting embodiment two of resetting is described below:
Extraction module 1 extracts key information from ticket, for example any one of field combination such as calling number, called number, air time, SP code or several combinations, the normal pairing MD5 feature of ticket string is stored in the index file in the mode of Hash table, this index file is stored in internal memory or the disk, and the conflict of Hash table then solves by balanced binary tree; Feature is concatenated into module 2 and according to the MD5 algorithm key information that generates in the extraction module 1 is calculated, generate MD5 feature string, relatively pick the MD5 feature string of according to the hash function of setting feature being concatenated into module 2 generations in the compute location unit 4 in the molality piece 3 and carry out Hash operation, and described MD5 feature string is navigated on the balanced binary tree as described Hash table memory node according to calculating gained hash function value, search unit 5 and on the balanced binary tree of finishing the location, search MD5 feature string, on balanced binary tree, find the MD5 feature string identical if search unit 5 with this MD5 feature string, pick the 6 definite tickets in heavy unit list of attaching most importance to, to weigh single rejecting, on balanced binary tree, do not find the MD5 feature string identical if search unit 5 with this MD5 feature string, it is not heavy single picking the 6 definite tickets in heavy unit, searches unit 5 and searches the process of MD5 feature string specifically as method embodiment step S214 on balanced binary tree, step S216, step S224 is described; Determine that ticket is not heavy single if pick heavy unit 6, inserting unit 7 is inserted into MD5 feature string on the balanced binary tree, export correct ticket and index delta file simultaneously, make the balanced binary tree disequilibrium if insert the MD5 feature string that inserts unit 7, rotary unit 8 is adjusted by the spin balancing binary tree, after executing the process of picking the hard words list, the ticket processing finishes, the internal memory occupancy volume of picking heavy process when ticket surpasses when specifying numerical value, ticket picks refitting and puts automatic release internal memory, and merge the index delta file, when continuing to handle next CDR file, load-on module 9 will reload index file as required.
A kind of ticket of the present invention picks refitting and puts embodiment two and utilize the MD5 algorithm that ticket key information is carried out computing; obtain the unified and unique MD5 feature string of length; the normal pairing MD5 feature of ticket string adopts the mode of Hash table to store; the conflict of Hash table then solves by balanced binary tree; the nodal value of MD5 feature string and balanced binary tree is compared; judge thus whether ticket is heavy single; realized any business; picking heavily of any type ticket handled; improved and picked heavy operational efficiency of picking of heavy system and extensibility, saved memory space greatly.
In addition to the implementation, a kind of ticket of the present invention picks refitting and puts, relatively pick calculating in the molality piece and be decided to be the memory node that also available chained list in the unit is used as Hash table, the nodal value of MD5 feature string and chained list is compared, judge the ticket list of whether attaching most importance to thus, structure similar to the aforementioned embodiment also can realize the heavy function of picking of ticket.
More than disclosed only be several specific embodiment of the present invention, still, the present invention is not limited thereto, any those skilled in the art can think variation all should fall into protection scope of the present invention.

Claims (9)

1. a removing call ticket repeat method is characterized in that, this method may further comprise the steps:
Step a: from ticket, extract key information;
Step b: use the MD5 algorithm that described key information is calculated, generate the MD5 feature string of this ticket correspondence;
Step c: with described MD5 feature string, the MD5 feature string corresponding with ticket that store in the index file, normal compares;
If find identical MD5 feature string, then this ticket list of attaching most importance to is rejected this ticket; Otherwise the MD5 feature string of this ticket correspondence is saved in the index file, and confirms that this ticket is normal ticket;
The MD5 feature string of described normal ticket correspondence is stored in the index file in the mode of Hash table; Described step c specifically comprises:
Step c1:, carry out Hash operation according to the hash function of setting to the MD5 feature string of this ticket correspondence;
Step c2:, find the memory node in the described Hash table according to the functional value that Hash operation obtains;
Step c3: if find the MD5 feature string identical MD5 feature string corresponding with this ticket on this memory node, then this ticket list of attaching most importance to is rejected this ticket; Otherwise, the MD5 feature string of this ticket correspondence is inserted in this memory node, and confirms that this ticket is normal ticket;
MD5 feature string in the memory node in the described Hash table is stored with the mode of chained list or the mode of balanced binary tree.
2. removing call ticket repeat method according to claim 1, it is characterized in that, MD5 feature string in the memory node in the described Hash table, when storing in the mode of balanced binary tree, among the described step c3, the described step that the MD5 feature string of this ticket correspondence is inserted in this memory node specifically comprises:
This MD5 feature string is inserted on the balanced binary tree on this memory node,, then adjusts by rotation if when the MD5 feature string that inserts makes described balanced binary tree disequilibrium.
3. removing call ticket repeat method according to claim 1 and 2 is characterized in that, when described ticket is normal ticket, then exports this ticket, and the index delta file of output after upgrading.
4. removing call ticket repeat method according to claim 3 is characterized in that, described index file and index delta file are stored in internal memory or the disk.
5. removing call ticket repeat method according to claim 4 is characterized in that, specifies numerical value if the EMS memory occupation amount surpasses, and then merges the index delta file, the part early of time in the index file is stored in the disk, and discharge internal memory automatically; If in the time of the index file in the disk need being re-loaded to internal memory, then reload.
6. want 1 or 2 described removing call ticket repeat methods according to right, it is characterized in that, described key information comprises: the combination of any one or a few in the combined field of being made up of calling number, called number, air time, SP code.
7. a ticket picks to reset and puts, and it is characterized in that, comprising:
Extraction module is used for extracting key information from ticket;
Feature is concatenated into module, is used to use the MD5 algorithm that described key information is calculated, and generates MD5 feature string;
Relatively pick the molality piece: the corresponding MD5 feature string of ticket that be used for described MD5 feature string is stored with index file, normal compares;
If if find identical MD5 feature string, then this ticket list of attaching most importance to is rejected this ticket; Otherwise the MD5 feature string of this ticket correspondence is saved in the index file, and confirms that this ticket is normal ticket; Wherein, the described molality piece that relatively picks comprises:
The compute location unit is used for according to the hash function of setting described MD5 feature string being carried out Hash operation, and according to the functional value of this computing gained, finds the memory node in the described Hash table;
Search the unit, be used on described memory node, searching and the identical feature string of described MD5 feature string;
Pick heavy unit, be used for determining the described ticket list of whether attaching most importance to, if on the memory node of described Hash table, find the MD5 feature string identical with described MD5 feature string according to the described result who searches, the described ticket list of attaching most importance to, reject, otherwise described ticket is normal ticket;
The memory node of described Hash table is a chained list or a balanced binary tree.
8. ticket according to claim 7 picks refitting and puts, and it is characterized in that when the memory node in the described Hash table was a balanced binary tree, the described molality piece that relatively picks also comprised:
Insert the unit, be used for the MD5 feature string of described normal ticket correspondence is inserted into described balanced binary tree, simultaneously this normal ticket of output and index delta file thereof;
If rotary unit when being used for described MD5 feature string and inserting described balanced binary tree, makes described balanced binary tree disequilibrium, then adjusts by described rotary unit spin balancing binary tree.
9. ticket according to claim 7 picks refitting and puts, and it is characterized in that, also comprises load-on module, is used for as required the index file of disk is re-loaded to internal memory.
CN2008101832739A 2008-12-12 2008-12-12 Method and apparatus for removing call ticket repeat Expired - Fee Related CN101442731B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008101832739A CN101442731B (en) 2008-12-12 2008-12-12 Method and apparatus for removing call ticket repeat

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008101832739A CN101442731B (en) 2008-12-12 2008-12-12 Method and apparatus for removing call ticket repeat

Publications (2)

Publication Number Publication Date
CN101442731A CN101442731A (en) 2009-05-27
CN101442731B true CN101442731B (en) 2010-07-14

Family

ID=40726941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101832739A Expired - Fee Related CN101442731B (en) 2008-12-12 2008-12-12 Method and apparatus for removing call ticket repeat

Country Status (1)

Country Link
CN (1) CN101442731B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102065192B (en) * 2009-11-18 2013-12-04 中国移动通信集团安徽有限公司 Call ticket de-duplication method and device
CN102024046B (en) * 2010-12-14 2013-04-24 华为数字技术(成都)有限公司 Data repeatability checking method and device as well as system
US9110936B2 (en) 2010-12-28 2015-08-18 Microsoft Technology Licensing, Llc Using index partitioning and reconciliation for data deduplication
CN102541918A (en) * 2010-12-30 2012-07-04 阿里巴巴集团控股有限公司 Method and equipment for identifying repeated information
CN102156744A (en) * 2011-04-18 2011-08-17 北京神州数码思特奇信息技术股份有限公司 Method for eliminating repetition of memory dialog list
CN102298633B (en) * 2011-09-08 2013-05-29 厦门市美亚柏科信息股份有限公司 Method and system for investigating repeated data in distributed mass data
CN102591792B (en) * 2012-01-13 2015-04-29 从兴技术有限公司 Storage method for memory data
CN102591855A (en) * 2012-01-13 2012-07-18 广州从兴电子开发有限公司 Data identification method and data identification system
CN103207878B (en) * 2012-01-17 2016-05-04 阿里巴巴集团控股有限公司 The inspection method releasing news and device
CN103037344B (en) * 2012-12-06 2016-04-20 亚信科技(中国)有限公司 A kind of ticket De-weight method and device
CN105930396B (en) * 2016-04-15 2019-04-09 北京思特奇信息技术股份有限公司 A kind of repetition removing method and system based on database
CN108990001B (en) * 2017-06-05 2021-04-20 中兴通讯股份有限公司 Ticket repetition eliminating method, device, storage medium and computer equipment
CN107357862B (en) * 2017-06-30 2020-03-13 中国联合网络通信集团有限公司 Method and device for arranging repeated voice messages
CN108650429B (en) * 2018-04-08 2020-10-23 国网辽宁省电力有限公司信息通信分公司 Call bill repetition eliminating method and system
CN109976896B (en) * 2019-04-09 2021-06-29 中国联合网络通信集团有限公司 Service re-ranking processing method and device
CN111209272B (en) * 2019-12-26 2023-04-18 杭州亚信云信息科技有限公司 Method, device and system for checking call ticket
CN112069510B (en) * 2020-07-24 2024-01-30 北京思特奇信息技术股份有限公司 Data encryption and duplication elimination method
CN114245330B (en) * 2021-11-17 2024-04-02 中国联合网络通信集团有限公司 Method, apparatus, device, computer readable storage medium and product for bill merging

Also Published As

Publication number Publication date
CN101442731A (en) 2009-05-27

Similar Documents

Publication Publication Date Title
CN101442731B (en) Method and apparatus for removing call ticket repeat
CN100462979C (en) Distributed indesx file searching method, searching system and searching server
CN102906751A (en) Method and device for data storage and data query
CN104462141B (en) Method, system and the storage engines device of a kind of data storage and inquiry
CN101594319B (en) Entry lookup method and entry lookup device
CN100571317C (en) A kind of calling list rearrangement method and device
CN105099729B (en) A kind of method and apparatus of identification User Identity
CN106161633B (en) Transmission method and system for packed files based on cloud computing environment
CN106534164B (en) Effective virtual identity depicting method based on cyberspace user identifier
WO2014059808A1 (en) Method for managing mass historical data of network management
CN101694672A (en) Distributed safe retrieval system
Xiao et al. Using parallel bloom filters for multiattribute representation on network services
CN102855238A (en) Method and system for downloading resource data
WO2016043757A1 (en) Data to be backed up in a backup system
CN102546299A (en) Method for detecting deep packet under large flow
CN107798106A (en) A kind of URL De-weight methods in distributed reptile system
CN105868218A (en) Data processing method and electronic device
CN103746851A (en) Method and device for realizing counting of independent user number
CN103049445B (en) A kind of method, system and picture state server for inquiring about pictorial information
CN108762979B (en) Terminal information backup method and backup device based on matching tree
CN106776617B (en) Log file saving method and device
CN108228763A (en) It is a kind of based on intelligent terminal from the method for Media Redundancy data processing
CN101483844A (en) Method and system for indexing mobile telephone number
CN104657383A (en) Repeated video detection method and system based on correlation properties
CN105357177A (en) Method for processing data packet filtering rule set and data packet matching method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100714

Termination date: 20211212