CN104991963B - Document handling method and device - Google Patents

Document handling method and device Download PDF

Info

Publication number
CN104991963B
CN104991963B CN201510437027.1A CN201510437027A CN104991963B CN 104991963 B CN104991963 B CN 104991963B CN 201510437027 A CN201510437027 A CN 201510437027A CN 104991963 B CN104991963 B CN 104991963B
Authority
CN
China
Prior art keywords
filename
expression formula
documents
lists
pair
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510437027.1A
Other languages
Chinese (zh)
Other versions
CN104991963A (en
Inventor
鲁莽
孙艳
林子涯
韩方明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN201510437027.1A priority Critical patent/CN104991963B/en
Publication of CN104991963A publication Critical patent/CN104991963A/en
Application granted granted Critical
Publication of CN104991963B publication Critical patent/CN104991963B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/122File system administration, e.g. details of archiving or snapshots using management policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems

Abstract

The present invention provides a kind of document handling method and devices, wherein this method includes:Obtain the lists of documents for two groups of files for needing to parse comparison;Filename containing asterisk wildcard in the lists of documents of two groups of files is matched two-by-two;Intersection operation based on oriented topological sequences is carried out to the filename after matching two-by-two, obtains the candidate topological sequences of intersection;Candidate topological sequences are reduced to the filename expression formula containing asterisk wildcard, and determine the converging relation between the filename pair for generating file name;Converging relation between each pair of filename determining is carried out summarizing merging, the converging relation between lists of documents to determine two groups of files;According to the converging relation between the lists of documents of two groups of determining files, handled in response to two groups of files of operational order pair input by user.Reached through the invention and the lists of documents containing asterisk wildcard quickly and correctly parsed, to effectively improve file management efficiency and reliability purpose.

Description

Document handling method and device
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of document handling method and device.
Background technology
File backup is as one of the important means of host data backup, mainly by executing special backup job (can be understood as program or task dispatching) is realized, wherein file extent involved by each backup job mainly by What the file backup inventory that user is arranged in backup policy determined.However, due to the characteristic of host file system, backup policy It not only supports specific filename, also allows to be used for using a variety of different stages, different types of asterisk wildcard in lists of documents Indicate the file set for having general character in file designation.Meanwhile the lists of documents in backup policy are divided into "comprising" and " exclusion " Two parts, it is final to determine the file extent for needing to back up after host is by calculating this two parts inventory into row set.
Hosts file inventory simplifies the support of a variety of asterisk wildcards the expression of lists of documents, effectively reduces backup policy Number of entries, but as well as the abstractness and complexity of asterisk wildcard, increase the parsing difficulty of lists of documents so that right Segmentation, combination and change of backup policy etc. become difficult to operate.Currently, in the industry for lists of documents of the host containing asterisk wildcard Parsing there is no effective solution scheme, can only by virtue of experience carry out manual analysis by professional, or by allowing system By the filename expansion containing asterisk wildcard, after being enumerated as detailed lists of documents, then it is further processed.However, due to master Quantity of documents is huge in machine file system, and name is intricate, and with the needs of business processing, lists of documents are often sent out Raw more frequent variation.First method whole process needs artificial participation, cannot achieve automatic business processing, lacks reliability, the Two kinds of methods not only need to expend a large amount of system resource, but also due to output since operation object is specific lists of documents As a result entry is excessive, and there is also inconveniences in practical applications.
Currently, utilization of the lists of documents containing asterisk wildcard in Hosts file management is very universal, such as:Batch documents Backup, recovery, deletion etc. are mainly all based on this kind of lists of documents, therefore, the parsing to the lists of documents containing asterisk wildcard With the speed of processing, it is directly related to the efficiency of Hosts file management, while the reliability of the analysis result of lists of documents, also will Directly affect safety and the integrality of host data.
For how quickly and correctly to be parsed to the lists of documents containing asterisk wildcard, effective solution is not yet proposed at present Certainly scheme.
Invention content
An embodiment of the present invention provides a kind of document handling methods, are carried out quickly to the lists of documents containing asterisk wildcard with reaching Accurately parse, to effectively improve file management efficiency and reliability purpose, this method includes:
Obtain the lists of documents for two groups of files for needing to parse comparison;
Filename containing asterisk wildcard in the lists of documents of two groups of files is matched two-by-two;
Intersection operation based on oriented topological sequences is carried out to the filename after matching two-by-two, obtains candidate's topology of intersection Sequence;
The candidate topological sequences are reduced to the filename expression formula containing asterisk wildcard, and according to reduction result, determine life At the converging relation between the filename pair of file name;
Converging relation between each pair of filename determining is carried out summarizing merging, to determine the file of two groups of files Converging relation between inventory;
According to the converging relation between the lists of documents of two groups of determining files, in response to operational order pair input by user Two groups of files are handled.
In one embodiment, according to reduction result, determine that the intersection between the filename pair for generating file name is closed System, including:
Validity checking is carried out to the filename expression formula restored;
According to validity checking as a result, determining the intersection between the filename pair for generating file name according to following rule Relationship:
If the filename expression formula restored fails to pass through validity checking, it is determined that generate file name expression formula The intersection of filename pair is sky;
If the filename expression formula restored can by validity checking, and the filename expression formula restored with generate One filename of the filename centering of file name is identical, it is determined that generates the filename of file name expression formula to it Between for comprising with by inclusion relation;
If the filename expression formula restored can by validity checking, and with generate file name expression formula file Any one filename of name centering is all different, it is determined that is closed for intersection between the filename pair of generation file name expression formula System.
In one embodiment, validity checking is carried out to the filename expression formula restored, including:
By the length of filename hop count, removing asterisk wildcard part in the filename expression formula restored, per segment file name Length be compared with scheduled restriction rule;
If all meeting restriction rule, it is determined that pass through validity checking.
In one embodiment, the intersection operation based on oriented topological sequences is carried out to the filename after matching two-by-two, The candidate topological sequences of intersection are obtained, including:
Following operation is executed to each pair of filename in the filename after matching two-by-two:
The topological digraph of equivalence of two filenames of the centering is built respectively;
The topological digraph of the equivalence of both of these documents name is merged, all topological orders of the digraph after being merged Row add weights and filter out legal sequence;
The merging treatment that topological sequences after screening are carried out with adjacent node, until it cannot remerge, to be closed The candidate topological sequences of one or more intersections after and.
The embodiment of the present invention additionally provides a kind of document handling apparatus, is carried out soon to the lists of documents containing asterisk wildcard with reaching Speed and accurately parse, to effectively improve file management efficiency and reliability purpose, which includes:
Acquisition module, the lists of documents for obtaining two groups of files for needing to parse comparison;
Matching module, for being matched the filename containing asterisk wildcard in the lists of documents of two groups of files two-by-two;
Topological computing module, for carrying out the intersection operation based on oriented topological sequences to the filename after matching two-by-two, Obtain the candidate topological sequences of intersection;
Converging relation determining module, for the candidate topological sequences to be reduced to the filename expression formula containing asterisk wildcard, And according to reduction result, determine the converging relation between the filename pair for generating file name;
Merging module carries out summarizing merging, to determine for the converging relation between each pair of filename to determining Converging relation between the lists of documents of two groups of files;
Processing module is used for according to the converging relation between the lists of documents of two groups of determining files, defeated in response to user Two groups of files of the operational order pair entered are handled.
In one embodiment, the converging relation determining module includes:
Validity checking unit, for carrying out validity checking to the filename expression formula restored;
Intersection judging unit, for generating file name as a result, being determined according to following rule according to validity checking Converging relation between filename pair:
If the filename expression formula restored fails to pass through validity checking, it is determined that generate file name expression formula The intersection of filename pair is sky;
If the filename expression formula restored can by validity checking, and the filename expression formula restored with generate One filename of the filename centering of file name is identical, it is determined that generates the filename of file name expression formula to it Between for comprising with by inclusion relation;
If the filename expression formula restored can by validity checking, and with generate file name expression formula file Any one filename of name centering is all different, it is determined that is closed for intersection between the filename pair of generation file name expression formula System.
In one embodiment, the validity checking unit includes:
Comparing subunit, the filename hop count in filename expression formula for will restore remove asterisk wildcard part Length is compared per the length of segment file name with scheduled restriction rule;
Determination subelement, for working as the filename hop count in the filename expression formula restored, removing asterisk wildcard part In the case that length, the length of every segment file name all meet restriction rule, determination passes through validity checking.
In one embodiment, the topological computing module is specifically used for each right in the filename after matching two-by-two Filename executes following operation:
The topological digraph of equivalence of two filenames of the centering is built respectively;
The topological digraph of the equivalence of both of these documents name is merged, all topological orders of the digraph after being merged Row add weights and filter out legal sequence;
The merging treatment that topological sequences after screening are carried out with adjacent node, until it cannot remerge, to be closed The candidate topological sequences of one or more intersections after and.
In the above-described embodiments, the lists of documents of two groups of files are extracted, and the entry wherein containing asterisk wildcard is carried out Pairing carries out the operation that seeks common ground two-by-two, then the seek common ground result row of calculating of entry is further summarized and arranged, to obtain two The content of mutual inclusion relation and intersection part between the lists of documents of group file is further introduced based on topology The character string intersection algorithm of digraph realizes the accurate analysis of all kinds of asterisk wildcards in lists of documents, to solve existing skill The technical issues of being difficult to quickly and correctly parse the lists of documents containing asterisk wildcard in art, has reached and has effectively improved file The technique effect of the efficiency and reliability of management.
Description of the drawings
Attached drawing described herein is used to provide further understanding of the present invention, and is constituted part of this application, not Constitute limitation of the invention.In the accompanying drawings:
Fig. 1 is a kind of method flow diagram of document handling method according to the ... of the embodiment of the present invention;
Fig. 2 is the another method flow chart of document handling method according to the ... of the embodiment of the present invention;
Fig. 3 is the another method flow diagram of document handling method according to the ... of the embodiment of the present invention;
Fig. 4 is the structure diagram of document handling apparatus according to the ... of the embodiment of the present invention.
Specific implementation mode
To make the objectives, technical solutions, and advantages of the present invention clearer, right with reference to embodiment and attached drawing The present invention is described in further details.Here, the exemplary embodiment and its explanation of the present invention be for explaining the present invention, but simultaneously It is not as a limitation of the invention.
In this example, a kind of document handling method is provided, to solve to operate in Hosts file management in the prior art In, it is difficult to the problem of accurately parsing the lists of documents containing asterisk wildcard.As shown in Figure 1, this method includes:
Step 101:Obtain the lists of documents for two groups of files for needing to parse comparison;
The file of the needs parsing comparison of acquisition can be two groups input by user or multigroup clear comprising asterisk wildcard file Single operation or file.It, can be from the definition of file operation involved in operation when input is operation when specific implementation Lists of documents are read in part, when input is only includes the file of lists of documents, can directly execute read operation, clear in file After the completion of single reading, so that it may parse.It when implementing, is typically chosen two groups of files and carries out subsequent operations, that is, can two Two groups of files of group carry out.
Specifically, complete lists of documents extraction mechanism can be preset, it then can be with during actually executing Automatically judge the type of file input by user and operation, while corresponding operation is taken according to determining input type, from defeated The content of the lists of documents part containing asterisk wildcard is correctly extracted in the file entered and operation.
Step 102:Filename containing asterisk wildcard in the lists of documents of two groups of files is matched two-by-two;
I.e., it is possible to the first filename containing asterisk wildcard from acquisition each group inventory in two groups of files for needing to be compared, Then the filename containing asterisk wildcard in two groups of files is matched two-by-two, for example, according to similarity can match It is right.
Step 103:Filename after matching two-by-two is carried out, based on the intersection operation of oriented topological sequences, to obtain intersection Candidate topological sequences;
Specifically, it to need the filename compared to build topological digraph of equal value respectively, and merges, finds out merging All topological sequences of digraph afterwards add weights and filter out legal sequence, are carried out to the topological sequences after screening adjacent Node merging treatment, until it cannot remerge, the topological sequences of the one or more candidate obtained after merging.
Step 104:The candidate topological sequences are reduced to the filename expression formula containing asterisk wildcard, and are tied according to reduction Fruit determines the converging relation between the filename pair for generating file name;
Specifically, after the candidate topological sequences are reduced to the filename expression formula containing asterisk wildcard, according to reduction As a result, determine generate file name filename pair between converging relation before, can also first to restore contain asterisk wildcard Filename expression formula carry out legitimacy screening, while eliminating the expression formula wherein repeated, then determine mutual friendship again Collection relationship.
The step realize principle can be:The topological sequences obtained after node is merged are reduced into expression formula, then root According to the naming rule of Hosts file, legitimacy screening is carried out to expression formula, while eliminating the expression formula of repetition, completes above-mentioned behaviour After work, the relationship between the file set that two filenames are referred to can be obtained, that is, determines two filename meanings Relationship between the set in generation is:Including, by comprising, it is unrelated or have one kind between intersection, and obtain between file set Relational expression.
Such as:There is no candidate topological sequences after being calculated, or the filename that candidate topological sequences restore When expression formula fails to screen by legitimacy, then it can determine that the filename intersection of this two groups of lists of documents is sky;Work as reduction Filename expression formula out is identical with one in two filenames of input, then can determine this two groups of lists of documents Filename between be comprising with by comprising relationship;After by screening and eliminating repetition, obtain and two input texts The expression formula that part name is all different, then can be using the expression formula as intersection, this intersection is equivalent to refer to two filenames Lap of the meaning for file extent.
Step 105:Converging relation between each pair of filename determining is carried out summarizing merging, to determine two groups of texts Converging relation between the lists of documents of part;
If two-by-two without intersection between the filename in lists of documents, show there is no lap between two inventories, If all expression formulas are all the subsets of file name suggestion list B in lists of documents A, it can determine that file name suggestion list A is file The subset of name inventory B.
Step 106:According to the converging relation between the lists of documents of two groups of determining files, in response to behaviour input by user It instructs and two groups of files is handled.
Corresponding operation is executed to lists of documents according to the operation of user's selection, if user's selection is union operation, Then according to the intersection result of calculation of lists of documents, operation is merged to lists of documents, if user's selection is cutting operation, Then according to the intersection result of calculation of lists of documents, operation is split to lists of documents, if that user's selection is contrast operation, The then intersection result of calculation of direct output file inventory.That is, on the basis of lists of documents intersection compares analysis result, root According to user need to lists of documents execute corresponding operation (such as:Merge, segmentation, check than equity), to lists of documents into After the corresponding operation processing of row, handling result is exported.
In the above-described embodiments, the lists of documents of two groups of files are extracted, and the entry wherein containing asterisk wildcard is carried out Pairing carries out the operation that seeks common ground two-by-two, then the seek common ground result row of calculating of entry is further summarized and arranged, to obtain two The content of mutual inclusion relation and intersection part between the lists of documents of group file is further introduced based on topology The character string intersection algorithm of digraph realizes the accurate analysis of all kinds of asterisk wildcards in lists of documents, to solve existing skill The technical issues of being difficult to quickly and correctly parse the lists of documents containing asterisk wildcard in art, has reached and has effectively improved file The technique effect of the efficiency and reliability of management.
Below in conjunction with a specific embodiment, above-mentioned document handling method is illustrated, as shown in Fig. 2, including Following steps:
Step 201:User's selection operation type, and specified two or more sets operations for carrying out parsing comparison, or It is the input file for including lists of documents;
Step 202:After the input for receiving user, the action type of record user's selection, while to each group input by user Operation or file are parsed one by one, extract the content of lists of documents, and whether check wherein includes asterisk wildcard, if file There is the content containing asterisk wildcard in inventory, then follow the steps 203, otherwise then executes step 208.
Step 203:According to the lists of documents extracted from operation, the filename in different groups of lists of documents is matched two-by-two Form filename pair.
Step 204:To the filename of generation to by carrying out, based on the intersection operation of oriented topological sequences, obtaining intersection Candidate topological sequences.
By taking Hosts file name AB.**.CD.** and AB.CD.**.EF as an example, " " is separator, and * * are to represent any section The asterisk wildcard of number, any amount character.When carrying out intersection calculating, filename is respectively converted into band edge as follows first The digraph of weights:
Sequence 1:First node-(0)-AB-(1)-CD-(1)-end-node
Sequence 2:First node-(0)-AB.CD-(1)-EF-(0)-end-node
Wherein, the number in horizontal line bracket is side right value, and when there is asterisk wildcard, side right value takes 1.
Then, above-mentioned two sequence is merged into operation, obtains the topological sequences without side right value:
First node --- AB.CD --- AB --- CD --- EF --- end-nodes
Specifically, merging calculating to belonging to not homotactic adjacent node, and add side right value, AB.CD nodes and AB nodes are AB.CD after merging, and merge back weights still can not merge no intersection for 1, CD and EF, therefore retain original side right Value.
The intersection sequence being finally calculated is:
First node-(0)-AB.CD-(1)-CD-(1)-EF-(0)-end-node
Step 205:The intersection candidate's topological sequences being calculated are reduced to the filename expression formula containing asterisk wildcard, example Such as, above-mentioned intersection sequence can be reduced to expression formula AB.CD.**.CD.**.EF.
Step 206:Validity checking and screening are carried out to the expression formula restored, the expression formula of repetition is removed, obtains text The expression formula of part name intersection, and the relationship between each pair of filename inputted is judged according to intersection expression formula.
It is carried out specifically, validity checking screening can be the Naming conventions limitation based on file system, such as:Intersection Filename expression formula remove whether the length of asterisk wildcard part is more than limitation, whether the hop count of filename is more than to allow quantity, Whether every section of length is more than file system limitation etc..
If intersection expression formula is sky, show that the file that two filename expression formulas of input are referred to is non-overlapping, such as Fruit intersection expression formula is identical as one in two original filename expression formulas, then shows what file name expression formula was referred to File is the subset of another expression formula, otherwise shows to have no inclusion relation between two filenames.
Step 207:Converging relation between each pair of filename obtained to processing is summarized, is merged, to obtain each group Converging relation between lists of documents.If showing the non-overlapping portion of two inventories without intersection two-by-two between the filename in inventory Point;If all expression formulas are all the subsets of file name suggestion list B in lists of documents A, show that file name suggestion list A is also filename The subset of inventory B.
Step 208:It, only need to be by the filename in the lists of documents of each group one by one due to not including asterisk wildcard in lists of documents It is compared, identical filename is the intersection part of inventory.
Step 209:Corresponding operating is executed to these lists of documents according to the user's choice:If user's selection combining is grasped Make, thens follow the steps 210;If user selects cutting operation, 211 are thened follow the steps;If user's comparative selection operates, hold Row step 212.
Step 210:According to the intersection result of calculation of lists of documents, operation is merged to lists of documents.
Step 211:According to the intersection result of calculation of lists of documents, operation is split to lists of documents.
Step 212:The result that the intersection of output file inventory calculates.
As shown in figure 3, for the specific statement to above-mentioned steps 204 to step 207, that is, the friendship based on oriented topological sequences Set operation may include steps of:
Step 301:Topological digraph of equal value is built respectively for the filename NAMEA and NAMEB of input, and is closed And;
Step 302:The topological digraph after merging is calculated, generates corresponding all possible topological sequences, and to open up Flutter sequence addition side right value;
Step 303:Each topological sequences are recycled and carry out node merging, until it cannot remerge;
Step 304:Topological sequences after merging are screened, candidate candidate intersection topological sequences are obtained;
Step 305:Candidate intersection topological sequences are reduced to the filename expression formula containing asterisk wildcard;
Step 306:The filename expression formula repeated is eliminated, and carries out legitimacy screening, obtains the corresponding table in intersection part Up to formula EXP;
Step 307:The correlation of NAMEA and NAMEB are judged according to EXP:If EXP be sky, NAMEA and Intersection is not present in file extent non-overlapping copies representated by NAMEB;If the expression formula of EXP is identical as NAMEA or NAMEB, Illustrate between NAMEA and NAMEB for comprising with by comprising relationship;If the expression formula of EXP and NAMEA or NAMEB not phases Together, illustrate that NAMEA and NAMEB partly overlap, there are the intersections that expression formula is EXP.
The document handling method provided by above-described embodiment solves the problems, such as the parsing of lists of documents containing asterisk wildcard, carries It has supplied a set of to file backup, recovery, duplication, the side for deleting the lists of documents progress fast resolving comprising asterisk wildcard in operation Case, and comparison, merging and cutting operation to such lists of documents can be realized without manual intervention.First, The accurate parsing for realizing the lists of documents containing asterisk wildcard, due to the abstractness of the lists of documents containing asterisk wildcard, the mode manually parsed It can only by virtue of experience be estimated, cannot achieve entirely accurate parsing, be calculated by the intersection based on oriented topological sequences of introducing Method solves the problems, such as accurately to parse such lists of documents, secondly, improves the parsing speed of the lists of documents containing asterisk wildcard Degree, reduces resource overhead, was compared more in the past by artificial processing method, and the parsing time shortens 80% or more, further , analysis result has very strong versatility, and the management that Hosts file inventory is used directly for without processing operates, and due to Lists of documents intersection that the program parses, can be very convenient based on this result the result is that provided in the form of expression formula It merges, divide and contrast operation to containing asterisk wildcard lists of documents.
Based on same inventive concept, a kind of document handling apparatus is additionally provided in the embodiment of the present invention, such as following implementation Described in example.Since the principle that document handling apparatus solves the problems, such as is similar to document handling method, the reality of document handling apparatus The implementation that may refer to document handling method is applied, overlaps will not be repeated.It is used below, term " unit " or " mould The combination of the software and/or hardware of predetermined function may be implemented in block ".Although device described in following embodiment is preferably with soft Part is realized, but the realization of the combination of hardware or software and hardware is also that may and be contemplated.Fig. 4 is of the invention real A kind of structure diagram for applying the document handling apparatus of example, as shown in figure 4, including:Acquisition module 401, matching module 402, topology Computing module 403, converging relation determining module 404, merging module 405 and processing unit 406, below say the structure It is bright.
Acquisition module 401, the lists of documents for obtaining two groups of files for needing to parse comparison;
Matching module 402, for being matched the filename containing asterisk wildcard in the lists of documents of two groups of files two-by-two;
Topological computing module 403, for carrying out the intersection fortune based on oriented topological sequences to the filename after matching two-by-two It calculates, obtains the candidate topological sequences of intersection;
Converging relation determining module 404 is reached for the candidate topological sequences to be reduced to the table of file name containing asterisk wildcard Formula, and according to reduction result, determine the converging relation between the filename pair for generating file name;
Merging module 405 carries out summarizing merging, with determination for the converging relation between each pair of filename to determining Go out the converging relation between the lists of documents of two groups of files;
Processing module 406 is used for according to the converging relation between the lists of documents of two groups of determining files, in response to user Two groups of files of operational order pair of input are handled.
In one embodiment, converging relation determining module 404 may include:Validity checking unit is gone back for Dui The filename expression formula that original goes out carries out validity checking;Intersection judging unit, for according to validity checking as a result, according to Lower rule determines the converging relation between the filename pair for generating file name:
1) if the filename expression formula restored fails to pass through validity checking, it is determined that generate file name expression formula Filename pair intersection be sky;
2) if the filename expression formula restored can be by validity checking, and the filename expression formula restored and life A filename at the filename centering of file name is identical, it is determined that generates the filename pair of file name expression formula Between for comprising with by inclusion relation;
If the filename expression formula 3) restored can by validity checking, and with generate file name expression formula text Any one filename of part name centering is all different, it is determined that is closed for intersection between the filename pair of generation file name expression formula System.
In one embodiment, validity checking unit may include:Comparing subunit, the file for will restore The length of filename hop count, removing asterisk wildcard part in name expression formula, length and scheduled restriction rule per segment file name It is compared;Determination subelement, for working as the filename hop count in the filename expression formula restored, removing asterisk wildcard part In the case that length, the length of every segment file name all meet restriction rule, determination passes through validity checking.
In one embodiment, topological computing module 403 can be used for each right in the filename after matching two-by-two Filename executes following operation:The topological digraph of equivalence of two filenames of the centering is built respectively;To both of these documents name The topological digraph of equivalence merge, all topological sequences of the digraph after being merged, addition weights simultaneously filter out conjunction The sequence of method;The merging treatment that topological sequences after screening are carried out with adjacent node, until it cannot remerge, to be closed The candidate topological sequences of one or more intersections after and.
In another embodiment, a kind of software is additionally provided, the software is for executing above-described embodiment and preferred reality Apply the technical solution described in mode.
In another embodiment, a kind of storage medium is additionally provided, above-mentioned software is stored in the storage medium, it should Storage medium includes but not limited to:CD, floppy disk, hard disk, scratch pad memory etc..
It can be seen from the above description that the embodiment of the present invention realizes following technique effect:In above-described embodiment In, the lists of documents of two groups of files are extracted, and pairing two-by-two is carried out to the entry wherein containing asterisk wildcard and carries out the fortune that seeks common ground It calculates, then the seek common ground result row of calculating of entry is further summarized and arranged, to obtain between the lists of documents of two groups of files Mutual inclusion relation and the content of intersection part further introduce character string intersection based on topological digraph and calculate Method realizes the accurate analysis of all kinds of asterisk wildcards in lists of documents, is difficult in the prior art to containing asterisk wildcard to solve The technical issues of lists of documents are quickly and correctly parsed has reached the efficiency and reliability for effectively improving file management Technique effect.
Obviously, those skilled in the art should be understood that each module of the above-mentioned embodiment of the present invention or each step can be with It is realized with general computing device, they can be concentrated on a single computing device, or be distributed in multiple computing devices On the network formed, optionally, they can be realized with the program code that computing device can perform, it is thus possible to by it Store and be performed by computing device in the storage device, and in some cases, can be to be held different from sequence herein The shown or described step of row, either they are fabricated to each integrated circuit modules or will be multiple in them Module or step are fabricated to single integrated circuit module to realize.In this way, the embodiment of the present invention be not limited to it is any specific hard Part and software combine.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the embodiment of the present invention can have various modifications and variations.All within the spirits and principles of the present invention, made by Any modification, equivalent substitution, improvement and etc. should all be included in the protection scope of the present invention.

Claims (8)

1. a kind of document handling method, which is characterized in that including:
Obtain the lists of documents for two groups of files for needing to parse comparison;
Filename containing asterisk wildcard in the lists of documents of two groups of files is matched two-by-two;
Intersection operation based on oriented topological sequences is carried out to the filename after matching two-by-two, obtains the candidate topological order of intersection Row;
The candidate topological sequences are reduced to the filename expression formula containing asterisk wildcard, and according to reduction result, determine to generate and be somebody's turn to do Converging relation between the filename pair of filename;
Converging relation between each pair of filename determining is carried out summarizing merging, to determine the lists of documents of two groups of files Between converging relation;
According to the converging relation between the lists of documents of two groups of determining files, in response to two groups of operational order pair input by user File is handled.
2. the method as described in claim 1, which is characterized in that according to reduction result, determine the filename for generating file name Converging relation between, including:
Validity checking is carried out to the filename expression formula restored;
According to validity checking as a result, determining that the intersection between the filename pair for generating file name is closed according to following rule System:
If the filename expression formula restored fails to pass through validity checking, it is determined that generate the file of file name expression formula The intersection of name pair is sky;
If the filename expression formula restored can by validity checking, and the filename expression formula restored with generate this article One filename of the filename centering of part name is identical, it is determined that is between the filename pair of generation file name expression formula Including and by inclusion relation;
If the filename expression formula restored can by validity checking, and with generate file name expression formula filename pair In any one filename it is all different, it is determined that generate file name expression formula filename pair between be overlapping relation.
3. method as claimed in claim 2, which is characterized in that validity checking is carried out to the filename expression formula restored, Including:
By the length of filename hop count, removing asterisk wildcard part in the filename expression formula restored, per the length of segment file name Degree is compared with scheduled restriction rule;
If all meeting restriction rule, it is determined that pass through validity checking.
4. method as claimed any one in claims 1 to 3, which is characterized in that carry out base to the filename after matching two-by-two In the intersection operation of oriented topological sequences, the candidate topological sequences of intersection are obtained, including:
Following operation is executed to each pair of filename in the filename after matching two-by-two:
The topological digraph of equivalence of two filenames of the centering is built respectively;
The topological digraph of the equivalence of both of these documents name is merged, all topological sequences of the digraph after being merged, Addition weights simultaneously filter out legal sequence;
The merging treatment that topological sequences after screening are carried out with adjacent node, until it cannot remerge, after obtaining merging One or more intersections candidate topological sequences.
5. a kind of document handling apparatus, which is characterized in that including:
Acquisition module, the lists of documents for obtaining two groups of files for needing to parse comparison;
Matching module, for being matched the filename containing asterisk wildcard in the lists of documents of two groups of files two-by-two;
Topological computing module, for the filename after matching two-by-two based on the intersection operation of oriented topological sequences, obtain The candidate topological sequences of intersection;
Converging relation determining module, for the candidate topological sequences to be reduced to the filename expression formula containing asterisk wildcard, and root According to reduction result, the converging relation between the filename pair for generating file name is determined;
Merging module carries out summarizing merging, to determine two groups for the converging relation between each pair of filename to determining Converging relation between the lists of documents of file;
Processing module is used for according to the converging relation between the lists of documents of two groups of determining files, in response to input by user Two groups of files of operational order pair are handled.
6. device as claimed in claim 5, which is characterized in that the converging relation determining module includes:
Validity checking unit, for carrying out validity checking to the filename expression formula restored;
Intersection judging unit is used for according to validity checking as a result, determining the file for generating file name according to following rule Converging relation of the name between:
If the filename expression formula restored fails to pass through validity checking, it is determined that generate the file of file name expression formula The intersection of name pair is sky;
If the filename expression formula restored can by validity checking, and the filename expression formula restored with generate this article One filename of the filename centering of part name is identical, it is determined that is between the filename pair of generation file name expression formula Including and by inclusion relation;
If the filename expression formula restored can by validity checking, and with generate file name expression formula filename pair In any one filename it is all different, it is determined that generate file name expression formula filename pair between be overlapping relation.
7. device as claimed in claim 6, which is characterized in that the validity checking unit includes:
Comparing subunit, the length of the filename hop count in filename expression formula, removing asterisk wildcard part for will restore, It is compared with scheduled restriction rule per the length of segment file name;
Determination subelement, the length of the filename hop count in filename expression formula, removing asterisk wildcard part for that ought restore, In the case of all meeting restriction rule per the length of segment file name, determination passes through validity checking.
8. the device as described in any one of claim 5 to 7, which is characterized in that it is described topology computing module be specifically used for pair Each pair of filename in filename after matching two-by-two executes following operation:
The topological digraph of equivalence of two filenames of the centering is built respectively;
The topological digraph of the equivalence of both of these documents name is merged, all topological sequences of the digraph after being merged, Addition weights simultaneously filter out legal sequence;
The merging treatment that topological sequences after screening are carried out with adjacent node, until it cannot remerge, after obtaining merging One or more intersections candidate topological sequences.
CN201510437027.1A 2015-07-23 2015-07-23 Document handling method and device Active CN104991963B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510437027.1A CN104991963B (en) 2015-07-23 2015-07-23 Document handling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510437027.1A CN104991963B (en) 2015-07-23 2015-07-23 Document handling method and device

Publications (2)

Publication Number Publication Date
CN104991963A CN104991963A (en) 2015-10-21
CN104991963B true CN104991963B (en) 2018-09-25

Family

ID=54303778

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510437027.1A Active CN104991963B (en) 2015-07-23 2015-07-23 Document handling method and device

Country Status (1)

Country Link
CN (1) CN104991963B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110109672B (en) * 2019-04-17 2023-01-10 奇安信科技集团股份有限公司 Analysis processing method and device for expression
JP7073320B2 (en) * 2019-09-18 2022-05-23 本田技研工業株式会社 Document contrast system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117324A (en) * 2011-02-24 2011-07-06 上海北大方正科技电脑系统有限公司 File management method and management system applying fuzzy matrice
CN102693302A (en) * 2012-05-21 2012-09-26 浙江省公众信息产业有限公司 Quick file comparison method, system and client side

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5689361B2 (en) * 2011-05-20 2015-03-25 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Method, program, and system for converting a part of graph data into a data structure that is an image of a homomorphic map

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117324A (en) * 2011-02-24 2011-07-06 上海北大方正科技电脑系统有限公司 File management method and management system applying fuzzy matrice
CN102693302A (en) * 2012-05-21 2012-09-26 浙江省公众信息产业有限公司 Quick file comparison method, system and client side

Also Published As

Publication number Publication date
CN104991963A (en) 2015-10-21

Similar Documents

Publication Publication Date Title
KR101617696B1 (en) Method and device for mining data regular expression
CN102870116B (en) Method and apparatus for content matching
CN110442847B (en) Code similarity detection method and device based on code warehouse process management
CN103473076A (en) Issuing method and issuing system for code version
CN106020798A (en) Webpage version publishing method, device and system
CN111125298A (en) Method, equipment and storage medium for reconstructing NTFS file directory tree
CN107085615B (en) Text duplicate elimination system, method, server and computer storage medium
CN104991963B (en) Document handling method and device
CN107704341A (en) File access pattern method, apparatus and electronic equipment
CN107179965A (en) Database restoring method and device
CN111723087A (en) Mining method and device of data blood relationship, storage medium and electronic equipment
CN103530369A (en) De-weight method and system
CN111176901B (en) HDFS deleted file recovery method, terminal device and storage medium
CN104391945B (en) The treating method and apparatus of database file data directory
CN106569986A (en) Character string replacement method and device
CN110457064B (en) Method and device for generating network cutover script
CN111726249B (en) Configuration file processing method and device of network equipment
CN105740260A (en) Method and device for extracting template file data structure
Schlie et al. Reengineering variants of matlab/simulink software systems
CN111142927A (en) Configuration file merging and splitting processing method and device
CN109947429A (en) Data processing method and device
CN106547756A (en) The creation method and device of data base
CN110222105A (en) Data summarization processing method and processing device
CN110740058B (en) Equipment data extraction method and device for making network cut-over script
CN110018980B (en) Method and device for searching fault data from simulation data of fan controller

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant