Summary of the invention
Technical matters to be solved by this invention provides a kind of new removing call ticket repeat method, realizes using less relatively memory headroom to carry out ticket faster and picks heavily processing, and make that the data volume of heavy handling property of picking of ticket and ticket is irrelevant.
The technical scheme that the present invention solves the problems of the technologies described above is as follows: a kind of internal memory removing call ticket repeat method comprises the steps:
Step 1: CDR file is read in internal memory;
Step 2: from described CDR file, read a ticket writing;
Step 3:, find in the internal memory and the corresponding concordance list of described ticket writing according to the key message in the described ticket writing;
Step 4: the field contents in the described ticket writing is combined into a character string, and asks the index of MD5 value as this ticket writing;
Step 5: described MD5 value is inserted in the described concordance list,, then described ticket writing is write in the normal CDR file,, then described ticket writing is write heavy monofile if insert failure if insert successfully;
Step 6: repeating step 2 is to step 5, all ticket writings in having traveled through described CDR file.
The invention has the beneficial effects as follows: utilize concordance list and adopt the MD5 value to pick heavily in internal memory, than traditional database mode and file system mode, speed of the present invention is faster; Than traditional pure internal memory mode, because the present invention has adopted the mode of MD5 value, it is heavy to need not that each field contents in the ticket writing is compared row, has saved the time; Adopt the mode of directly the MD5 value being inserted in the concordance list to pick heavily, faster than inquiry speed relatively; Method of the present invention only need be operated with the corresponding concordance list of described ticket writing in internal memory, and concordance list that therefore will be not all is written into internal memory, and also the mode than traditional is lower to the demand of internal memory; The present invention has realized using less relatively memory headroom to carry out ticket faster and has picked heavily processing, and makes that the data volume of heavy handling property of picking of ticket and ticket is irrelevant.
On the basis of technique scheme, the present invention can also do following improvement.
Further, described concordance list adopts the T-tree index.
Adopt the beneficial effect of above-mentioned further scheme to be, the traversal searching speed is fast, and this T-tree index also is topmost a kind of indexed mode in the memory database.
Further, the key message number of being section and/or the date in the described ticket writing.
Adopt the beneficial effect of above-mentioned further scheme to be, employing section and/or date are as key message, can classify preferably to ticket, such as all tickets of certain number section all records of certain hour at certain day are placed in the table, this number every day section just has 24 tables, corresponding concordance list also has 24, will be convenient to like this to record management, search and pick heavily and wait.
Further, a char (32) type-word section is only arranged in the described concordance list, record all the non-heavy unirecord MD5 values in the CDR file.
Adopt the beneficial effect of above-mentioned further scheme to be, make that taking up room of concordance list is little, inquiry velocity is fast, and loading and unloading speed is fast, flexible operation.
Further, in the step 3,, then described concordance list is loaded in the internal memory from its storer of preserving if do not exist and the corresponding concordance list of described ticket writing in the internal memory; If all do not exist and the corresponding concordance list of described ticket writing in described internal memory and the storer, then newly-built concordance list conduct and the corresponding concordance list of described ticket writing in internal memory.
Adopt the beneficial effect of above-mentioned further scheme to be, load concordance list as required, realized better control, needn't load whole concordance lists simultaneously and pick heavily, avoid the waste in unnecessary memory space EMS memory occupation.
Further, described method is provided with about ticket writing and handles bar number or the limit value of ticket writing time, when the ticket writing of handling reaches described limit value, the concordance list that is of little use in the internal memory is unloaded, and be kept in the storer of appointment.
Adopt the beneficial effect of above-mentioned further scheme to be, after having handled a collection of ticket writing, corresponding meeting is loaded into the concordance list of some in the internal memory, carry out regular unloading according to preset condition, to not re-use or use concordance list seldom to unload, only keep the concordance list (concordance list commonly used) of dsc data, the concordance list after the unloading reloads in the internal memory when to be needed, continuation that internal memory uses and the dirigibility of using have been guaranteed.
Embodiment
Below in conjunction with accompanying drawing principle of the present invention and feature are described, institute gives an actual example and only is used to explain the present invention, is not to be used to limit scope of the present invention.
Internal memory removing call ticket repeat method of the present invention comprises the steps:
Step 1: CDR file is read in internal memory;
Step 2: from described CDR file, read a ticket writing;
Step 3:, find in the internal memory and the corresponding concordance list of described ticket writing according to the key message in the described ticket writing;
Step 4: the field contents in the described ticket writing is combined into a character string, and asks the index of MD5 value as this ticket writing;
Step 5: described MD5 value is inserted in the described concordance list,, then described ticket writing is write in the normal CDR file,, then described ticket writing is write heavy monofile if insert failure if insert successfully;
Step 6: repeating step 2 is to step 5, all ticket writings in having traveled through described CDR file.
Concordance list wherein adopts the T-tree index, and a char (32) type-word section is only arranged in the concordance list, records all the non-heavy unirecord MD5 values in the CDR file.The mode that key message in the ticket writing can the number of employing section, the date combines, with different number sections not the ticket that produces of same date, time classify, so that search, also can adopt other ticket content to classify as required certainly as key message.Internal memory removing call ticket repeat method of the present invention is provided with about ticket writing and handles bar number or the limit value of ticket writing time, when the ticket writing of handling reaches described limit value, the concordance list that is of little use in the internal memory is unloaded, and be kept in the storer of appointment.
In step 3,, then described concordance list is loaded in the internal memory from its storer of preserving if do not exist and the corresponding concordance list of described ticket writing in the internal memory; If all do not exist and the corresponding concordance list of described ticket writing in described internal memory and the storer, then newly-built concordance list conduct and the corresponding concordance list of described ticket writing in internal memory.
According to as mentioned above, adopt mode for the management of concordance list among the present invention as Fig. 1.Among Fig. 1, loading zone is set in internal memory, on other storeies, as hard disk etc., unload zone is set, obsolete concordance list is positioned in the unload zone, when needs use, the concordance list that use is loaded in the loading zone in the internal memory uses.Because concordance list adopts the T-tree index, char (a 32) field of preserving the MD5 value is wherein only arranged, and classify according to number section and date, therefore each concordance list occupation space is all very little, be convenient to loading and unloading like this to concordance list, only will be loaded in the loading zone of internal memory with the corresponding concordance list of current ticket writing, other obsolete all having carried out in internal memory unload and are kept in the unload zone of storer, thereby guaranteed less take of concordance list, saved memory source internal memory.
As shown in Figure 2, for using an idiographic flow of the inventive method, Fig. 3 is the system diagram that flow process shown in Figure 2 adopts.This flow process comprises:
1, program loads from CDR file and database and picks reconfiguration information.
2, confirm whether the CDR file of not handling last time (handling the CDR file of half) is arranged after finishing when 1, promptly the breakpoint file is obtained the filename of processing and the line number of processing if having from the breakpoint file, begins to handle from the processing line number of record; If there is not the breakpoint file, directly walk downward, to follow-up flow process.
3, obtain CDR file from program entry, obtain file by configuration information and pick heavy type:
(1) appends index (only in concordance list, index information, do not generate outlet)
(2) do not pick heavily (directly moving on to program exit, not recording indexes in concordance list) from program entry
(3) normally pick heavy file, get each the bar ticket writing in the CDR file, by configuration information generate the index table name as: find concordance list (if table does not exist then from disk table is loaded into the internal memory) by number section and date, if the situation of Out of Memory when inserting, occurs, program can be used lru algorithm, concordance list useless is unloaded to disk, when unloading, writes the breakpoint file simultaneously.
4, ticket writing is all fields are combined into a long character string, carry out the calculating of MD5 value, and be inserted in the internal memory concordance list, if success then represent that this ticket writing is not heavy single, if failure and newspaper repeat, representing that then this record is heavy single, is not that heavy single ticket outputs to outlet, weighs and singly outputs to heavy monofile.
5, write the processing daily record after handling a file, write down the situation of the processing of these all records of file,, then write the detailed error message of running log newspaper if having unusually.
6, every one batch, i.e. the time of appointment, then the daily record under the temp directory is outputed to formal catalogue, simultaneously with concordance list unloading useless in the internal memory.
Said process, in index management, create several concordance lists (may be 1 to 2 thousand sheets) by number section, date (it is configurable specifically how dividing), adopt the T-Tree index, every concordance list has only a field, be char (32), the MD5 character string of the index value of ticket is left a record in the table.Read picking heavy index field in the ticket according to the heavy set of fields of picking of joining in the configuration information, be combined into a character string after, ask MD5 value, insert corresponding internal memory concordance list then, if success then be normal ticket, if the major key constraint then is to weigh list.In the flow process shown in Figure 2, also keep to append index function and skip not pick and weigh function, and support the function of a plurality of catalogues of routine processes, breakpoint restore funcitons when the support processing is unusual, index field customization index, the customization of index condition, also support multiple form ticket to pick heavily (ascii scale-of-two split), adopt quick internal memory replacement algorithm to unload some internal memory dynamic index tables, keep the internal memory dynamic index table of dsc data.
About internal memory control problem
(1) for fear of the excessive internal memory of use, need the dynamic quantity of showing in the internal memory that controls, general the data load with the same day gets final product to internal memory, the table of ticket correspondence then is loaded into internal memory with corresponding tables not in internal memory if discovery is newly arrived; If the table quantity in the discovery internal memory has reached maximal value, then unload the less table of some utilization rate.
(2) since in the actual production ticket be that the time is continuous, so load table, the frequent odds of unload table are little, can not produce big influence to efficient.
(3) memory database need cooperate transformation, increases by two functions of " load table " and " unload table ".
Unload table: similar drop, difference is the full dose file of not deleting under the corresponding catalogue;
Load table: on the basis of creating table, the full dose file load under the corresponding catalogue is arrived internal memory, and rebuild the T-Tree index.
For example: with Sichuan No. 20,091,011 one day ticket of a main frame is example: have section 1240 numbers, wherein 1038 numbers sections have record, and single section dominant record number is 281519, the smallest record number is 1, ticket adds up to 100,000,000, estimates to take to be stored as 5.5G, and the expectation committed memory is 8G.
About efficiency
(1) memory database is to pick the insert function of bringing up again for a special use, gets around SQL and resolves, and directly inserts internal memory, and in addition, affairs adopt does not write logging mode, and (whole file) inserts efficient (contain and generate the MD5 sign indicating number) and be about 30000 to 50000/second.
(2) need to optimize memory table " loading and unloading " switchover policy, it is little to guarantee to switch probability.
About the data security problem
Because when inserting, affairs are in does not write logging mode, in case memory database is unusual or main frame is unusual, after restarting, data will be lost, and therefore need security guarantee:
(1) pick file of heavy every processing after, export formal ticket, submit affairs to, the inlet ticket moves into the A catalogue;
(2) pick heavyly, carry out the exp order, internal storage data is landed fully to disk, and the CDR file of A catalogue on the disk is moved into the B catalogue every one batch (perhaps handling a batch file);
(3) pick heavy opening after, if find the A catalogue file is arranged, then the A catalogue file is inserted in the memory management again, but does not export any file, handle the back and move into the B catalogue, purpose is the main record that indexes;
(4) when a file processing when intact, because Out of Memory, when taking place, the concordance list unloading generates the breakpoint file, the line number that the record breakpoint is handled, catalogue, effectively write down number, wrong unirecord number, heavy unirecord number, handle this file when restarting, leap to breakpoint and write down the row processing, former wrong list and heavy list all keep.
(5) a plurality of processes all may have access to same concordance list, and mutually exclusive operation has been done in memory management for this reason, and data access safety compares human nature, can observe the situation of concordance list.
The internal memory ticket that the present invention is based on md5, memory shared, dynamic T-tree concordance list and quick internal memory replacement algorithm picks heavily, reach the win-win result of data storage usage space and treatment effeciency, adopt internal memory dynamic table algorithm to pick heavily, and unload some internal memory dynamic index tables by quick internal memory replacement algorithm, keep dsc data internal memory dynamic table.This method adopts md5, memory shared, dynamic T-tree concordance list and quick internal memory replacement algorithm, has high performance processing power, has realized that handling property and data volume are irrelevant.Pick the storage organization of heavy information in internal memory, possess and share and feature such as concurrent, time window.
Pick the embodiment of heavy index: use the probability in the database, can and hour create memory table by preceding 7 dates that add the air time of phone number, with the table space subregion that hour carries out similar database of air time, reach the data payload balance simultaneously.As table name be: T_VC_1355227_2010072001 represent the cell-phone number section be all tickets of 1355227 at 2010-7-20, all tickets are put in this table in the time of 1, simultaneously this table are placed on table space and are on 01 the file system table space.If have 1000 numbers sections these 1000 numbers sections will be divided into 24 parts like this, divide the file system of 24 settings equally.Picking heavy process is:
(1) reads a CDR file;
(2) from this CDR file, read a ticket writing;
(3) navigate to concordance list in the memory management according to the phone number of this record and time;
(4) if this concordance list does not exist, then load required concordance list from unload zone automatically, newly-built concordance list if unload zone does not have concordance list yet;
(5) MD5 is intact index inserts, and is not heavy single if successfully represent this ticket, if unsuccessful, represents that then this ticket is heavy single, writes heavy monofile with this ticket;
(6) when the higher limit that reaches setting counted in the record of handling, as: 100 ten thousand records, program is carried out unloading operation to concordance list according to optimal algorithm; Or the time of arrival appointment, will carry out unloading operation automatically, keep dsc data in internal memory.
Below for adopting the profile information of the inventive method, the presents configuration be the process operation time essential information used, adopt the ini file form.
[SYS_INFO]
The # Log Directory
LOGDIR=/tpt/mmdb/work/zhuoch/wbin/data/log
[CONFIG]
IDXCTRL?=?/tpt/mmdb/work/zhuoch/wbin/cfg/IndexFields.cfg
The ####{ option }, when handling the fedx ticket, use by needs.Be adding-D__USE_FEDX__ among the makefile
The #FEDX configuration file
FEDXCTRL?=?${FEDX_CONFIG_PATH}/config.xml
[CONTROL]
The #{ option } the program run sign, acquiescence weed_dup
CTRLFLAG?=?weed_dup
The #{ option } program start and stop control table, acquiescence sys_proc
CNTTAB =?sys_proc
The #{ option } pick and heavily show prefix.Acquiescence t_dup
TABPRE =?t_bill1
The #{ option } there is maximum number simultaneously in table in the memory database, acquiescence 1000
MAXTAB?=?300
The #{ option } when reaching the table maximum number, each unload table number, acquiescence 200
DELTAB?=?50
The #{ option } daily record batch (second), gave tacit consent to 900 seconds
LOGBATCH?=?900
The #{ option } catalogue handle number of files next time. and acquiescence 100, handling a Directory Value all the time is 0
FILECOUNT?=?0
[DATABASE]
# memory database encrypt file
LOGIN =?${DCI_HOME}/cfg/login.db
# memory database Instance Name
SERVER =?imdb
#DIR begins can only be since 01, comes the front if repeat to give tacit consent to get
[DIR01]
The # file layout that enters the mouth
INDIR =?/tpt/mmdb/work/zhuoch/wbin/data/in/A*.chk
# export document form
OUTDIR =?/tpt/mmdb/work/zhuoch/wbin/data/out/P%s
# exports temp directory
OUTTMP =/tpt/mmdb/work/zhuoch/wbin/data/outtmp
The # backup directory
BAKDIR =?/tpt/mmdb/work/zhuoch/wbin/data/bak
# backs up temp directory
BAKTMP =?/tpt/mmdb/work/zhuoch/wbin/data/baktmp
The fixed elongated cut-point of ##
CHANGEDATE?=?20090701
The #{ option }, inlet does not pick heavily, directly writes the outlet file layout, acquiescence * .nodup
IGNOREFILE?=?y. .nodup
The #{ option }, inlet does not pick heavily, and does not write outlet, only appends the index file form, acquiescence * .addidx
ADDIDXFILE?=?y. .addidx
The #{ option }, inlet does not pick heavily, and does not write outlet, only deletes index index file form in the table, acquiescence * .delidx
DELIDXFILE?=?y. .delidx
The #{ option }, rs chacter (only effective when handling separator record).Space ^ ~ replacement, tab replaces with ^^, the acquiescence space
SPLIT?=
#split separator form ascii f format fedx FEDX form
What branch was preceding among the #RECTYPE contrasts CHANGEDATE constantly with current system, than the big new that uses of CHANGEDATE, little of old (principle: preceding new back is old).
[VC/VC2] expression uses file-level to pick heavily below the #, and promptly a file is a kind of form, if there is not the expression record level of [...] to pick heavily.In the manipulative indexing configuration, judge according to a certain field condition
RECTYPE?= new:ascii[A];old:split
Below be the index condition file, the available respectively two kinds of different type files of index file represent can, both get one in practice
(1) xml pattern
(2) cfg pattern
Attention: wherein, must have libxml to resolve the storehouse simultaneously if use xml pattern should add grand-D__USE_XML__ in makefile.[CONFIG] middle IDXCTRL changes corresponding xml file in the configuration file
Below be xml schema file and cfg schema file
The XML file:
<DUP〉<!--pick and reshuffle--〉
<REC_CFG〉<!--one pick reshuffle--〉
<REC_FIELD〉[A]</REC_FIELD〉<!--file-level is picked and is reshuffled--〉
<FILE_HEAD〉y, a</FILE_HEAD〉<!--configuration of file reputation--〉
<CON_FIELD〉[substr (svcName, 1,2)=00] | [svcName=01]</CON_FIELD〉<!--picking heavy record condition configuration--〉
<MDB_FIELD〉<!--picking heavy memory database configuration--〉
<TABLE〉substr (msisdn, 1,7), substr (start_datetime, 1,8)</TABLE〉<!--picking heavily table configuration--〉
<TABLESPACE〉substr (start_datetime, 7,2)</TABLESPACE〉<!--picking heavy table space configuration--〉
</MDB_FIELD>
<IND_FIELD〉svcName, msisdn, other_party, start_datetime</IND_FIELD〉<!--picking heavy index field configuration--〉
<KEY_FIELD〉<!--picking heavy field configuration--〉
<FIELD seq='0'fieldName='svcName'startPos='0'length='2'type='0' desc=' subsystem code '/
<FIELD seq='1'fieldName='msisdn'startPos='40'length='15'type='0 ' desc=' subsystem code '/
<FIELD seq='2'fieldName='start_datetime'startPos='55'length='14 ' type='0'desc=' subsystem code '/
<FIELD seq='3'fieldName='other_party'startPos='117'length='24't ype='0'desc=' subsystem code '/
</KEY_FIELD>
</REC_CFG>
<REC_CFG〉<!--another picking reshuffled--〉
…
</REC_CFG>
</DUP>
Cfg file:
Second parameter opens the beginning position since 1 among all substr of ##
##[KEY_FIELD] in all fields open the beginning position since 0, separate to begin first field among the split also since 0
## distributes rationally to improve to pick and reshuffles inquiry velocity
Have under the same case among ##1, a plurality of REC_FIELD, picking that the CON_FIELD condition is complicated more reshuffled, and please fits over the back as far as possible
# record level is picked when weighing by following configuration; If picking heavily, file-level changes [REC_FIELD] [VC] into
[REC_FIELD]?[A]
#{ option }
#[FILE_HEAD] y, a
#{ option }
#[CON_FIELD]?[?substr(billing_type,1,2)?=?00?]?|?[?billing_type?=?01?]
# according to field spell table name TABPRE}_{$1}_{$2}, table space must be the parts of field in the table name, substr (since 1), perhaps table space is unworthy of, and gets the default table space
[MDB_FIELD]?TABLE:substr(msisdn,1,7),substr(start_datetime,1,8) TABLESPACE:substr(start_datetime,7,2)
The # field format
[IND_FIELD]?svcName,msisdn,?other_party,?start_datetime
[KEY_FIELD]
The # field name, field location in the separator form (since 0), field starting position (since 0), field length, field type (0:ascii 1:int 2:float), remarks
So # notices that the field starting position is useless among the fedx. get center section content in the field value, after taking-up value is earlier come, substr again
0 svcName 0 2 0?aaaa
1 msisdn 40 15?0?aaaa
2 start_datetime 55 14?0?aaaa
3 other_party 117 24?0?aaaa
The above only is preferred embodiment of the present invention, and is in order to restriction the present invention, within the spirit and principles in the present invention not all, any modification of being done, is equal to replacement, improvement etc., all should be included within protection scope of the present invention.