CN102024046A

CN102024046A - Data repeatability checking method and device as well as system

Info

Publication number: CN102024046A
Application number: CN201010588219XA
Authority: CN
Inventors: 刘洋
Original assignee: Huawei Symantec Technologies Co Ltd
Current assignee: Huawei Digital Technologies Chengdu Co Ltd
Priority date: 2010-12-14
Filing date: 2010-12-14
Publication date: 2011-04-20
Anticipated expiration: 2030-12-14
Also published as: CN102024046B; WO2012079460A1

Abstract

The embodiment of the invention discloses a data repeatability checking method and device as well as a system. The method comprises the following steps: matching the parameter of each character of the data with the parameter of a leaf node respectively in a parallel index tree to achieve that each leaf node in the parallel index tree respectively corresponds to one character, and the leaf node parameter at least comprises the character string length of the data containing the character and the position of the character in the character string; and judging whether the data and the saved data are repeated or not according to the matching result of each character, if not, the parameter of each character of the data is saved in the parallel index tree to serve as the leaf node. The checking method of data repeatability and the device thereof provided by the embodiment of the invention perform the parallel matching on the parameter value of each character of the data using the parallel index tree, and the scheme is independent of the database saving the data, thereby having smaller indexing quantity and significantly improving the checking efficiency of the data repeatability.

Description

Data repeatability method of calibration and Apparatus and system

Technical field

The embodiment of the invention relates to data processing technique, relates in particular to a kind of data repeatability method of calibration and Apparatus and system.

Background technology

In the various fields of data manipulation, for example in certain software systems, often to guarantee that certain data item possesses uniqueness, need will increase data and data with existing newly and carry out repeated verification this moment at this data item.For example, verification in the new user of registration, need be carried out to newly-increased user name in certain Web application forum, checks whether have the user name that repeats, will inform that the user re-enters a user name if repeat.

Available data repeatability method of calibration implementation mainly is divided into two kinds: a kind of is to carry out repeatability to judge before newly-increased data are inserted; Another kind is to carry out repeatability to judge after newly-increased data are inserted.This dual mode all needs dependency database to carry out data to compare verification repeatability one by one.Yet, the checking mode of dependency database, along with the increase of data, its judgement speed and efficient will significantly descend.

Summary of the invention

The embodiment of the invention provides a kind of data repeatability method of calibration and Apparatus and system, to improve the efficient of data repeatability verification.

The embodiment of the invention provides a kind of data repeatability method of calibration, comprising:

The parameter of each character of data is mated with the parameter of leaf node respectively in parallel index tree, each leaf node of described parallel index tree is corresponding with a character respectively, and the parameter of leaf node comprises the string length and the position of character in described character string of character place data at least;

Judge that according to the matching result of each character whether described data repeat with the data of having stored, if not, then the parameter of described each character of the data parameter as leaf node is stored in the described parallel index tree.

The embodiment of the invention provides a kind of data repeatability calibration equipment, comprising:

Parallel index tree memory module, the parameter that is used for each leaf node of memory parallel index tree, each leaf node of described parallel index tree is corresponding with a character respectively, and the parameter of leaf node comprises the string length and the position of character in described character string of character place data at least;

The parameter matching module is used for the parameter of each character of data is mated with the parameter of leaf node respectively at parallel index tree;

Parallel repeated judge module is used for judging according to the matching result of each character whether described data repeat with the data of having stored, if not, then the parameter of described each character of the data parameter as leaf node is stored in the described parallel index tree.

The embodiment of the invention also provides a kind of data application system, comprising:

Application server is used to receive the data that the user imports, and data is offered the verification server carry out repeated verification;

The verification server, the parameter that is used for each character of data that will receive is mated with the parameter of leaf node respectively at parallel index tree, each leaf node of described parallel index tree is corresponding with a character respectively, and the parameter of leaf node comprises the string length and the position of character in described character string of character place data at least; Judge according to the matching result of each character whether described data repeat with the data of having stored, if not, then the parameter of described each character of the data parameter as leaf node is stored in the described parallel index tree, simultaneously described data are offered database server and store;

Database server is used for described data are stored.

Data repeatability method of calibration and Apparatus and system that the embodiment of the invention provides, the coupling that the parameter value of each character in the data is walked abreast with the form of parallel index tree, and this scheme does not rely on the database of storage data, thereby have less index amount, can significantly improve data repeatability verification efficient.

Description of drawings

The process flow diagram of the data repeatability method of calibration that Fig. 1 provides for the embodiment of the invention one;

The process flow diagram of the data repeatability method of calibration that Fig. 2 provides for the embodiment of the invention two;

Fig. 3 by in the embodiment of the invention the tree structure synoptic diagram of storage data;

The process flow diagram of the data repeatability method of calibration that Fig. 4 provides for the embodiment of the invention three;

The structural representation of the data repeatability calibration equipment that Fig. 5 provides for the embodiment of the invention six;

The structural representation of the data repeatability calibration equipment that Fig. 6 provides for the embodiment of the invention seven;

The structural representation of parameter matching module in the data repeatability calibration equipment that Fig. 7 provides for the embodiment of the invention eight;

The structural representation of the data repeatability calibration equipment that Fig. 8 provides for the embodiment of the invention nine;

The structural representation of the data application system that Fig. 9 provides for the embodiment of the invention ten.

Embodiment

For the purpose, technical scheme and the advantage that make the embodiment of the invention clearer, below in conjunction with the accompanying drawing in the embodiment of the invention, technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.

Embodiment one

The process flow diagram of the data repeatability method of calibration that Fig. 1 provides for the embodiment of the invention one, this method be applicable to any situation that need carry out repeated verification to data, the name that for example Adds User whether with store user name and repeat; In the file system filename of newly-increased file whether with various situations such as the filename that has used repeats.Data repeatability method of calibration in the present embodiment specifically can be carried out by data operation system, and data operation system can be the device that software and hardware combining realizes.This method specifically comprises the steps:

Step 110, the parameter of each character of data is mated with the parameter of leaf node respectively in parallel index tree, each leaf node that is somebody's turn to do parallel index tree is corresponding with a character respectively, and the parameter of leaf node comprises the string length and the position of character in this character string of character place data at least;

Step 120, judge that according to the matching result of each character whether these data repeat with the data of having stored, if not, promptly these data do not repeat with the data of having stored, and then execution in step 130; If then be considered as these data and repeat execution in step 140 with the data of having stored.

Step 130, the parameter of this each character of data parameter as leaf node is stored in the parallel index tree, simultaneously these data can be accepted or store, with data persistence in database, be about to data storage in this database, flow process finishes.

Step 140, can refuse these data or directly abandon, if file path repeats then can rewrite file storage path etc.

In above-mentioned steps 120, because whether each storage data all can store the parameter of this each character of data parameter as leaf node in the parallel index tree into, repeat so just can judge these data by character properties coupling in parallel index tree that will increase data newly.

The repeatability that the technical scheme of present embodiment has adopted each character at data to carry out and the mode of line index is come checking data.Parallel index tree can comprise a plurality of leaf nodes for the form of B+ tree, can successively be provided with, and leaf node is corresponding with a character respectively.The parameter of each leaf node comprise at least corresponding character place data string length and the position of corresponding character in the data character string.When the parameter matching of character and leaf node, each characters of data can be mated with corresponding leaf node simultaneously, improve the speed of coupling.When any one character does not find corresponding leaf node, illustrate that then these data are not stored, and do not repeat with data with existing; When can both find corresponding leaf node to each character the time, illustrate that the stored probability of these data is higher.The processing mode of this moment has multiple, a kind of is that these data of acquiescence repeat, then can directly abandon this data, this situation hour has higher accuracy in the data character string length, another kind is to proceed more accurate repeated verification, and this situation is more necessary when the data character string length is longer.No matter adopt above-mentioned which kind of processing mode, owing at first each character of data has been carried out parallel verification, thus can verify a part of data repeatability at least, thereby can improve the efficient of repeated verification to a certain extent.In the verification of present data repeatability, the string length of user name and filename etc. is not oversize usually, so the technical scheme of present embodiment can be raised the efficiency in most repeated verification, and guarantees certain accuracy.

Each character in the data is corresponding one by one with leaf node, can be that character is directly corresponding with leaf node, but for improving computing velocity and improving the versatility of calculating, before the parameter of each character of data is mated with the parameter of the leaf node of having stored respectively in parallel index tree, can also be at first each characters of data be converted to Data Identification respectively, each leaf node is corresponding with character by the Data Identification of character, for example byte (byte) form.For example, with each character of digital byte-identifier, as " 0001 " representative " on ", " 0002 " representative " z " etc., as long as satisfy Data Identification can unique identification character.So-called herein character can be a few bit bytes in individual digit, punctuation mark, English alphabet, Chinese character or the data, also can be the organic assembling of above-mentioned element, and for example filename suffix " .PDF " can be defined as a character.

Embodiment two

The process flow diagram of the data repeatability method of calibration that Fig. 2 provides for the embodiment of the invention two.Present embodiment has further increased the means of serial index based on embodiment one on the basis of also line index.In the present embodiment, after the data of judging data according to the matching result of each character and having stored repeat, also carry out following operation:

Step 210, the index of data is mated in the secondary index table, comprise the index of storing data in this secondary index table, the index of having stored data can be a data character string itself, also can be the Numerical Index string that Data Identification is formed;

Coupling in the above-mentioned steps 210 be data directory search coupling one by one, can adopt the various Data Matching schemes in the prior art, belong to the means of accurately searching according to data itself.

Step 220, judge that according to the matching result of data in the secondary index table whether these data repeat with the data of having stored, if then execution in step 230, if not, then execution in step 240;

Step 230, generation data reproducible results, promptly data repeat with the data of having stored, and flow process finishes;

Step 240, data and the data of having stored do not repeat, then the parameter of each character of the data parameter as leaf node is stored in the parallel index tree, and with the index stores of these data in the secondary index table, carry out index and coupling during in order to newly-increased other data.

The technical scheme of present embodiment by and line index improved data repeatability verification speed, and guaranteed the uniqueness and the accuracy of the verifications of data repeatability by the accurate index of serial.Though accurately index-check still will depend on the index of data itself, but owing at first passed through the also eliminating of line index, the data volume that feasible needs carry out the accuracy verification significantly reduces, therefore can improve data repeatability verification efficient to a certain extent, this advantage is particularly evident under the less situation of data character string length, and string lengths such as user name, filename mostly are the short character strings that is no more than 20 characters usually.

In the present embodiment, preferably can set needs the also character quantity of line index, promptly before the parameter of each character of data is mated with the parameter of leaf node respectively in parallel index tree, also comprise: the character that quantity is set in intercepting from data is as each character that mates with the parameter of leaf node in parallel index tree, and the also line index of then carrying out is only for the character of fixed intercepting.The concrete intercepting quantity of setting can be set according to the data volume that will preserve, character number.

For example, the number of characters of setting also line index is 7, then carries out index at every turn and all be truncated to many 7 characters from data, remaining character does not carry out and line index, but it should be noted that and during line index, though the quantity of intercepting reduces, the length of data character string is not drawn intercepting and is reduced.Because and the mode of line index is with the obvious advantage under the less situation of number of characters, reliability is higher, so the character of intercepting limited quantity carries out and line index, can either keep and the efficient and the accuracy advantage of line index unnecessary and line index in the time of avoiding number of characters too much again.

Embodiment three

The organizational form of parallel index tree can have multiple, for example, initial character with data is a plurality of parallel index tree of root node tissue, perhaps with other parameters of the data integral body such as string length, user name type or filename type of data a plurality of parallel index tree as the root node tissue.The mode of using various parallel index trees to mate is similar, and the embodiment of the invention three is the example explanation that makes an explanation with the initial character of data for parallel index tree root node.Be the example of sake of clarity in conjunction with a simplification.When tentation data has been stored " a ", " ad ", " an ", " adm ", " adn " and data such as " and " in the storehouse, be illustrated in figure 3 as the tree structure synoptic diagram of the data of storing.Needing newly-increased data is " adi ".In the parallel index tree of setting up for these data, with initial " a " as root node, with each character as leaf node, each character may appear in the different data, therefore the different numbers of plies and position are arranged, then each can be organized the number of plies and the position parameter as a leaf node of this character correspondence, then a character may corresponding a plurality of leaf nodes.The perhaps parameter of a leaf node that also all numbers of plies and the position grouping of this character can be stored as this character, the then corresponding leaf node of character.Each character the parameter value form of each leaf node be recorded as " Data Identification of character " number of plies 1[position 1], number of plies 2[position 2] ....But each character is the parameter of carry leaf node down, when the Data Identification of each character specifically is set at a=97, and d=100, m=109, i=105, during n=110, then the parameter of the parallel index tree leaf node of above-mentioned example can be expressed as the matrix form in the table 1:

Table 1

97	{1[1]，2[1]，3[1]}
		100	{2[2]，3[2]，3[3]}
109	{3[3]}
		110	{2[2]，3[2]，3[3]}

The process flow diagram of the data repeatability method of calibration that Fig. 4 provides for the embodiment of the invention three, present embodiment is based on the various embodiments described above, and it is concrete, the quantity of parallel index tree is a plurality of, the initial character of the root node corresponding data character string of each parallel index tree, then the operation that the parameter of each character of data is mated with the parameter of leaf node respectively in parallel index tree specifically comprises:

Step 410, select corresponding parallel index tree, selected the parallel index tree shown in the table 1 according to the initial " a " of newly-increased data " adi " according to the initial character of data character string;

Step 420, carrying out the following matching operation of searching respectively at each parallel-by-character ground in the data, wherein, respectively " a ", " d " and " i " are searched matching operation simultaneously, is that example describes with coupling " d " below, and concrete operations are as follows:

Step 421, search leaf node place layer according to the length of data character string at the parallel index tree of selecting, the string length of " adi " is 3, then searches the 3rd layer parameter in " d " pairing each leaf node, finds two values, 3[2], 3[3];

Step 422, according to the position of character in the data character string, in the leaf node place layer that finds, search the leaf node of coupling, produce character and search matching result; " d " matches 3[2 in view of the above at second character position of " adi "].

" i " character is similarly searched, but lookup result does not have the leaf node of coupling for not.

Step 430, search matching result for not the time, produce the data search matching result, and stop the matching operation of searching of other characters when the character that recognizes a character.If select the result of parallel index tree to be sky according to initial, the character that also is equivalent to a character is searched matching result for denying.

In abovementioned steps, just can stop to search " d " character when not having the leaf node of " i " character when finding, any one character does not match and just means that these data do not repeat to exist.

The technical scheme of present embodiment has provided the parallel index tree of concrete utilization and has carried out one of preferred version of index.Repeated verification carried out concomitantly in each character, and decidable does not repeat when a character matches, thereby stop the matching operation of searching of other characters, therefore has higher verification efficient.

Embodiment four

Present embodiment is based on previous embodiment, preferably the parameter of each leaf node of parallel index tree also comprises preceding coordinate of character and/or back coordinate, coordinate is meant the parameter of character before this character before so-called, the Data Identification of previous character for example, the Data Identification of perhaps preceding two characters; Correspondingly, back coordinate is meant this character parameter of character afterwards.Before comprising character in the leaf node coordinate and/or after coordinate time, then according to the position of character in the data character string, in the leaf node place layer that finds, search the leaf node of coupling after, produce character and search before the matching result, also carry out following operation:

Preceding coordinate and/or back coordinate in the parameter of the preceding character of character and/or back character and the leaf node that matches are carried out the consistance coupling.

Before character preceding coordinate and the back character back coordinate can include only one, perhaps before coordinate and the back coordinate include, can take into account the execution speed and the accuracy of matching operation and select.

The parametric form of the leaf node before being provided with behind coordinate and the back coordinate is { a coordinate 1 behind the 1:{| of number of plies 1[position, | back coordinate 2......}], coordinate 1 behind the coordinate 1| before the 2:{ of number of plies 2[position, coordinate 2} behind the preceding coordinate 2|], coordinate 1| before the 3:{ of number of plies 3[position, preceding coordinate 2|}].Then the leaf node numerical value of the parallel index tree correspondence of data shown in Figure 3 is as shown in table 2:

Table 2

97	{1[1]，2[1：{\|100，\|110}]，3[1：{\|100，\|110}]}
		100	{2[2：{97\|}]，3[2：{97\|109，97\|110}]，3[3：{110\|}]}
109	{3[3：{100\|}]}
		110	{2[2：{97\|}]，3[2：{97\|100}]，3[3：{100\|}]}

Still matching " d " character with previous embodiment three steps 422 is example, after in the leaf node place layer that finds, searching the leaf node of coupling, produce character and search before the matching result, preceding coordinate in the parameter of preceding character that also will " d " character and back character and the leaf node that matches and back coordinate carry out consistance and mate.The preceding character of " d " is " a (97) ", and back character is " i (105) ", through coupling as can be known, at 3[2:{97|109,97|110}] in do not have 3[2:{97|105}], therefore, to the matching result of " d " character for not yet.

Coordinate and recoil target technical scheme can further improve the accuracy of PARALLEL MATCHING in the repeated checking procedure before the above-mentioned coupling, and minimizing need be carried out the situation of serial coupling, improves repeated verification efficient.

In above-mentioned example, suppose that newly-increased character string is " admin is drinking ", after carrying out also line index, intercepting " admin " finds that " admin*** " exists, and then continues to search coupling in the secondary index table.The data directory that is converted to " admin is drinking " character string correspondence of Data Identification form is "/97/100/109/105/110/410,510,610 ", and wherein " 410,510,610 " are corresponding to " drinking ".

Embodiment five

The data repeatability method of calibration that the embodiment of the invention five provides can be improved based on the various embodiments described above, and the parameter of parallel index tree leaf node also comprises the character occurrence number.Still continue to use previous embodiment, increase the occurrence number of character at the same number of plies same position of certain leaf node, for example, " a " character has occurred twice in the position of the second layer first character, then " 2 " are carried out record as the parameter of leaf node, the parametric form of leaf node be number of plies 1[position 1: number of times], number of plies 2[position 2: number of times] ....The parameter of leaf node that then comprises occurrence number in the parallel index tree of data shown in Figure 3 is as shown in table 3:

Table 3

97	{1[1:1]，2[1:2]，3[1:3]}
		100	{2[2:1]，3[2:2]，3[3:1]}
109	{3[3:1}
		110	{2[2:1}]，3[2:1]，3[3:1]}

Then based on previous embodiment, store in parallel index tree as the parameter of leaf node the parameter of each character of data after, also comprise:

Character occurrence number in the parameter of corresponding leaf node is added one;

When deleted data, search corresponding leaf node in the parallel index tree according to each character of deleted data, and character occurrence number in the parameter of the leaf node that finds is subtracted one.

Additions and deletions step to data does not have specific sequential relationship.Technique scheme can satisfy the situation demand that data are deleted.When needs from database during deleted data, the character occurrence number is subtracted one, its leaf node still is retained in the parallel index tree in the time of then can either avoiding data to delete, in the time of also can guaranteeing data deletion, and can be with the leaf node deletion of respective symbols in other data of sign.

For example, " d " character has occurred twice at the 3rd layer second, is designated as 3[2:2], when " adn " deleted, with 3[2:2] be revised as 3[2:1], can characterize " d " character in " adm ", can characterize again and reduce " adn ".

Search matching operation according to the technique scheme execution and determine that afterwards " adi " character does not repeat, then with data " adi " persistent storage in database.And, the leaf node concordance list of parallel index tree is revised as table 4:

Table 4

97	{1[1:1]，2[1:2{\|100，\|110}]，3[1:4{\|100，\|110}]}
		100	{2[2:1{97\|}]，3[2:3{97\|109，97\|110，97\|105}]，3[3:1{110\|}]}
109	{3[3:1{100\|}]}
		110	{2[2:1{97\|}]，3[2:1{97\|100}]，3[3:1{100\|}]}
105	3[3:1{100\|}]

Wherein, revise the character " a " and the occurrence number of " d ", preceding coordinate and back coordinate, also increased the index of character " i ".

The technical scheme advantage of various embodiments of the present invention is particularly remarkable under the situation that data volume increases.When data volume reaches magnanimity, when for example the registered user name of forum or the filename increase in the file system reach magnanimity, the speed of the duplication check mode of the simple dependency database of prior art will significantly reduce, database need be divided under the situation in storehouse if occur, just can not judge by dependency database, of a high price, can not accept.When prior art adopted the mode of simple dependency database, when concurrent visit pressure ratio was bigger, if a lot of repetitions takes place, then the database newspaper was unusually very many, will influence the performance and the stability of database itself.

The technical scheme of various embodiments of the present invention has overcome the defective of prior art, and dependency database has not been realized the uniqueness verification of some independent data item; The efficient of verification is not needed the influence of data quantity stored, does not influence check logic and efficient even data volume reaches magnanimity yet; Because the shared storage space of parallel index tree is little, so the system branch storehouse can support data magnanimity the time changes, applicability is very wide, and conventional system and distributed system can be general.No matter how much quantity of database has, also no matter whether database is distributed system, because parallel index tree and the shared storage space of secondary index table are little, can centralized stores, so repeated verification can not be subjected to the influence of database form, need not the change that extra work adapts to the database form.

Owing to adopted the technical scheme of the embodiment of the invention, need to have reduced the memory space of index coupling, for example, the quantity of characters such as all Chinese characters, English alphabet and numeral probably is 6000.In the technical scheme of the embodiment of the invention, the division of parallel index tree is virtual structure, the actual needs physical store be the parameter and the secondary index table of parallel each leaf node of index tree, the occupied index stores amount of these characters can be stored in the internal memory fully, helps further improving matching speed.

Embodiment six

The structural representation of the data repeatability calibration equipment that Fig. 5 provides for the embodiment of the invention six, this device comprises: parallel index tree memory module 510, parameter matching module 520 and parallel repeated judge module 530.Wherein, parallel index tree memory module 510 is used for the parameter of each leaf node of memory parallel index tree, each leaf node of parallel index tree is corresponding with a character respectively, and the parameter of leaf node comprises the string length and the position of character in described character string of character place data at least; Parameter matching module 520 is used for the parameter of each character of data is mated with the parameter of leaf node respectively at parallel index tree; Parallel repeated judge module 530 is used for judging according to the matching result of each character whether described data repeat with the data of having stored, if not, then the parameter of each character of the data parameter as leaf node stored in the parallel index tree.

The repeatability that the technical scheme of present embodiment has adopted each character at data to carry out and the mode of line index is come checking data can improve the efficient of repeated verification.

Embodiment seven

The structural representation of the data repeatability calibration equipment that Fig. 6 provides for the embodiment of the invention seven, present embodiment also comprises based on embodiment six: the repeated judge module 560 of secondary index table memory module 540, index matching module 550 and serial.Wherein, secondary index table memory module 540 is used to store the secondary index table, comprises the index of storing data in this secondary index table; Index matching module 550 is used for judging according to the matching result of each character after data repeat with the data of having stored when parallel repeated judge module 530, and the index of data is mated in the secondary index table; Whether serial repeatability judge module 560 is used for repeating with the data of having stored according to the matching result judgment data of data at the secondary index table, if, then produce the data reproducible results, if not, then the parallel repeated judge module 530 of indication stores the parameter of each character of the data parameter as leaf node in the parallel index tree into, and with the index stores of data in the secondary index table.

The technical scheme of present embodiment by and line index improved data repeatability verification speed, and guaranteed the uniqueness and the accuracy of the verifications of data repeatability by the accurate index of serial.

On the basis of technique scheme, further can also comprise: character interception module 570, link to each other with parameter matching module 520, be used for the parameter of each character of data before parallel index tree mates with the parameter of leaf node respectively, intercepting is set the character of quantity as each character that mates with the parameter of leaf node in parallel index tree from data.Preferably control and carry out the also workload of line index by intercepting setting quantity character.

Preferably also comprise in this device: data conversion module 580, link to each other with parameter matching module 520, be used for the parameter of each character of data before parallel index tree mates with the parameter of the leaf node of having stored respectively, each character of data is converted to Data Identification respectively, wherein, each leaf node is corresponding with character by the Data Identification of character, and the index of data is the Numerical Index string that Data Identification is formed.

Represent data by Data Identification, can reduce the calculated amount of coupling and index, can also make data directory be independent of database storing.

Embodiment eight

The structural representation of parameter matching module in the data repeatability calibration equipment that Fig. 7 provides for the embodiment of the invention eight, present embodiment can be based on the foregoing description six or seven, in the present embodiment, the quantity of parallel index tree is a plurality of, the initial character of the root node corresponding data character string of each parallel index tree, then parameter matching module 520 specifically comprises: index tree selected cell 521, one or more search matching unit 522, and generation unit 523 as a result.Wherein, index tree selected cell 521 is used for selecting corresponding parallel index tree according to the initial character of data character string; Search matching unit 522 and be used for carrying out respectively at each parallel-by-character ground of data and search matching operation, each is searched matching unit 522 and comprises: layer selects subelement 5221 and node matching subelement 5222.Wherein, layer selects subelement 5221 to be used for searching leaf node place layer according to the length of data character string at the parallel index tree of selecting; Node matching subelement 5222 is used for according to the position of character at the data character string, searches the leaf node of coupling in the leaf node place layer that finds, and produces character and searches matching result.Generation unit 523 is used for searching matching result for not the time when the character that recognizes a character as a result, produces the data search matching result, and stops the matching operation of searching of other characters.

On the basis of such scheme, the parameter of each leaf node of parallel index tree can also comprise preceding coordinate of character and/or back coordinate, then each is searched matching unit 522 and also comprises: coordinate coupling subelement 5223, be used for according to the position of character at the data character string, after in the leaf node place layer that finds, searching the leaf node of coupling, produce character and search before the matching result, preceding coordinate and/or back coordinate in the parameter of the preceding character of character and/or back character and the leaf node that matches are carried out the consistance coupling.

Embodiment nine

The structural representation of the data repeatability calibration equipment that Fig. 8 provides for the embodiment of the invention nine, present embodiment can be based on the above-mentioned embodiment that respectively installs, in the present embodiment, the parameter of parallel index tree leaf node can further include the character occurrence number, and then this device can also comprise: number of times increases module 590 and number of times reduces module 5100.Wherein, number of times increases module 590 and is used for after storing the parameter of each character of data into parallel index tree as the parameter of leaf node the character occurrence number in the parameter of corresponding leaf node being added one; Number of times reduces module 5100 and links to each other with parallel index tree memory module 510, is used for when deleted data, searches corresponding leaf node in the index tree that walks abreast according to each character of deleted data, and character occurrence number in the parameter of the leaf node that finds is subtracted one.

The data that the provided repeatability calibration equipment of various embodiments of the present invention can be carried out the technical scheme of data repeatability method of calibration any embodiment of the present invention, comprises corresponding functional modules, effectively improves repeated verification efficient.

Embodiment ten

The structural representation of the data application system that Fig. 9 provides for the embodiment of the invention ten, this system comprises: application server 910, verification server 920 and database server 930.Wherein, application server 910 is used to receive the data of user's input, data is offered verification server 920 carry out repeated verification; The parameter that verification server 920 is used for each character of data that will receive is mated with the parameter of leaf node respectively at parallel index tree, each leaf node of described parallel index tree is corresponding with a character respectively, and the parameter of leaf node comprises the string length and the position of character in described character string of character place data at least; Judge according to the matching result of each character whether described data repeat with the data of having stored, if not, then the parameter of described each character of the data parameter as leaf node is stored in the described parallel index tree, simultaneously described data are offered database server 930 and store; Database server 930 is used for data are stored.So-called database server 930 should be made broad understanding, both can be the database that storage medium constitutes, and can be again file system, as Content Management System (Content Management System is called for short CMS).

The data repeatability calibration equipment that verification server in the data application system that the embodiment of the invention provided can adopt the embodiment of the invention to provide, the verification server can be independent of the application server setting, also can be integrated among the application server.Application server can be for possessing the server of any Application Service Function, it for example is the WEB of forum webpage publisher server, the business of the login of process user, registration and forum's visit, application server also has the concrete professional function of other responses except the data of needs being carried out repeated verification offer the verification server.

The technical scheme of various embodiments of the present invention, with and the depth-first fashion of line index improved repeated verification speed, again can with the first depth-first of parallel serial more earlier again the mode of breadth First guarantee the uniqueness of repeated verification.The repeated verification implementation of prior art, data volume increase the back verification efficient are had a significant impact.The technical scheme of the embodiment of the invention does not rely on the application data of persistence, with the character that uses direct relation is arranged, verification efficient has direct relation with the character number of forming data, secondary index and data volume have indirect relation, but can load the related data index as required, and use " depth-first, breadth First " strategy greatly to raise the efficiency, so smaller to the verification effectiveness affects even data volume reaches magnanimity.When the automatic back-up system of needs developed, as becoming distributed system from common system, data volume was huge, need carry out the branch storehouse.In the case, existing verification mode can not meet the demands, even not available, and the technical scheme of the embodiment of the invention only has direct relation with the character that uses, and whether guard system is not distributed, can both use centralized verification mode; No matter how how data volume changes, the character string that data all are made up of character, and verification mode need not change.The technical scheme of the embodiment of the invention is not owing to rely on hardware, so the implementation cost performance is very high, especially this advantage is particularly remarkable under the huge situation of data volume.

One of ordinary skill in the art will appreciate that: all or part of step that realizes said method embodiment can be finished by the relevant hardware of programmed instruction, aforesaid program can be stored in the computer read/write memory medium, this program is carried out the step that comprises said method embodiment when carrying out; And aforesaid storage medium comprises: various media that can be program code stored such as ROM, RAM, magnetic disc or CD.

It should be noted that at last: above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment the present invention is had been described in detail, those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment put down in writing, and perhaps part technical characterictic wherein is equal to replacement; And these modifications or replacement do not make the essence of appropriate technical solution break away from the spirit and scope of various embodiments of the present invention technical scheme.

Claims

1. a data repeatability method of calibration is characterized in that, comprising:

2. data repeatability method of calibration according to claim 1 is characterized in that, after the data of judging described data according to the matching result of each character and having stored repeat, also comprises:

The index of described data is mated in the secondary index table, comprise the index of storing data in the described secondary index table;

Judge according to the matching result of described data in the secondary index table whether described data repeat with the data of having stored, if, then produce the data reproducible results, if not, then the parameter of described each character of the data parameter as leaf node is stored in the described parallel index tree, and with the index stores of described data in the secondary index table.

3. data repeatability method of calibration according to claim 2 is characterized in that, before the parameter of each character of data is mated with the parameter of leaf node respectively in parallel index tree, also comprises:

Intercepting is set the character of quantity as each character that mates with the parameter of leaf node in parallel index tree from described data.

4. data repeatability method of calibration according to claim 2, it is characterized in that, before the parameter of each character of data is mated with the parameter of the leaf node of having stored respectively in parallel index tree, also comprise: each character of described data is converted to Data Identification respectively, wherein, each described leaf node is corresponding with described character by the Data Identification of character, and the index of described data is the Numerical Index string that Data Identification is formed.

5. according to the arbitrary described data repeatability method of calibration of claim 1～4, it is characterized in that, the quantity of described parallel index tree is a plurality of, the initial character of the root node corresponding data character string of each parallel index tree, then mate the parameter of each character of data respectively in parallel index tree and comprise with the parameter of leaf node:

Initial character according to described data character string is selected corresponding parallel index tree;

Carry out the following matching operation of searching respectively at each character in the described data:

Length according to described data character string is searched leaf node place layer at the parallel index tree of selecting;

According to the position of character in the data character string, in the leaf node place layer that finds, search

The leaf node of joining produces character and searches matching result;

Search matching result for not the time when the character that recognizes a character, produce the data search matching result, and stop the matching operation of searching of other characters.

6. data repeatability method of calibration according to claim 5, it is characterized in that, the parameter of each leaf node of parallel index tree also comprises preceding coordinate of character and/or back coordinate, then according to the position of character in the data character string, after in the leaf node place layer that finds, searching the leaf node of coupling, produce character and search before the matching result, described step also comprises:

7. according to the arbitrary described data repeatability method of calibration of claim 1～4, it is characterized in that, the parameter of leaf node also comprises the character occurrence number, then store in described parallel index tree as the parameter of leaf node the parameter of described each character of data after, also comprise:

8. a data repeatability calibration equipment is characterized in that, comprising:

9. data repeatability calibration equipment according to claim 8 is characterized in that, also comprises:

Secondary index table memory module is used to store the secondary index table, comprises the index of storing data in the described secondary index table;

The index matching module is used for judging according to the matching result of each character after described data repeat with the data of having stored when described parallel repeated judge module, and the index of described data is mated in the secondary index table;

Serial repeatability judge module, be used for judging at the matching result of secondary index table whether described data repeat with the data of having stored according to described data, if, then produce the data reproducible results, if not, then the parallel repeated judge module of indication stores the parameter of described each character of the data parameter as leaf node in the described parallel index tree into, and with the index stores of described data in the secondary index table;

The character interception module, be used for the parameter of each character of data before parallel index tree mates with the parameter of leaf node respectively, intercepting is set the character of quantity as each character that mates with the parameter of leaf node in parallel index tree from described data;

Data conversion module, be used for the parameter of each character of data before parallel index tree mates with the parameter of the leaf node of having stored respectively, each character of described data is converted to Data Identification respectively, wherein, each described leaf node is corresponding with described character by the Data Identification of character, and the index of described data is the Numerical Index string that Data Identification is formed.

10. data repeatability calibration equipment according to claim 8 is characterized in that the quantity of described parallel index tree is a plurality of, the initial character of the root node corresponding data character string of each parallel index tree, and then the parameter matching module comprises:

The index tree selected cell is used for selecting corresponding parallel index tree according to the initial character of described data character string;

One or more search matching unit, are used for carrying out respectively at each parallel-by-character ground of described data searching matching operation, and each described matching unit of searching comprises:

Layer selects subelement, is used for looking at the parallel index tree of selecting according to the length of described data character string

Look for leaf node place layer;

The node matching subelement is used for according to the position of character at the data character string, what find

Search the leaf node of coupling in the layer of leaf node place, produce character and search matching result;

Generation unit as a result is used for searching matching result for not the time when the character that recognizes a character, produces the data search matching result, and stops the matching operation of searching of other characters.

11. data repeatability calibration equipment according to claim 10 is characterized in that the parameter of each leaf node also comprises preceding coordinate of character and/or back coordinate, each described matching unit of searching also comprises:

Coordinate coupling subelement, be used for according to the position of character at the data character string, after in the leaf node place layer that finds, searching the leaf node of coupling, produce character and search before the matching result, preceding coordinate and/or back coordinate in the parameter of the preceding character of character and/or back character and the leaf node that matches are carried out the consistance coupling.

12. data repeatability calibration equipment according to claim 8 is characterized in that the parameter of parallel index tree leaf node also comprises the character occurrence number, described device also comprises:

Number of times increases module, is used for after storing the parameter of described each character of data into described parallel index tree as the parameter of leaf node the character occurrence number in the parameter of corresponding leaf node being added one;

Number of times reduces module, is used for when deleted data, searches corresponding leaf node in the index tree that walks abreast according to each character of deleted data, and character occurrence number in the parameter of the leaf node that finds is subtracted one.

13. a data application system is characterized in that, comprising:

Database server is used for described data are stored.