CN104991963B - Document handling method and device - Google Patents
Document handling method and device Download PDFInfo
- Publication number
- CN104991963B CN104991963B CN201510437027.1A CN201510437027A CN104991963B CN 104991963 B CN104991963 B CN 104991963B CN 201510437027 A CN201510437027 A CN 201510437027A CN 104991963 B CN104991963 B CN 104991963B
- Authority
- CN
- China
- Prior art keywords
- filename
- expression formula
- documents
- lists
- pair
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/122—File system administration, e.g. details of archiving or snapshots using management policies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
Abstract
The present invention provides a kind of document handling method and devices, wherein this method includes:Obtain the lists of documents for two groups of files for needing to parse comparison;Filename containing asterisk wildcard in the lists of documents of two groups of files is matched two-by-two;Intersection operation based on oriented topological sequences is carried out to the filename after matching two-by-two, obtains the candidate topological sequences of intersection;Candidate topological sequences are reduced to the filename expression formula containing asterisk wildcard, and determine the converging relation between the filename pair for generating file name;Converging relation between each pair of filename determining is carried out summarizing merging, the converging relation between lists of documents to determine two groups of files;According to the converging relation between the lists of documents of two groups of determining files, handled in response to two groups of files of operational order pair input by user.Reached through the invention and the lists of documents containing asterisk wildcard quickly and correctly parsed, to effectively improve file management efficiency and reliability purpose.
Description
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of document handling method and device.
Background technology
File backup is as one of the important means of host data backup, mainly by executing special backup job
(can be understood as program or task dispatching) is realized, wherein file extent involved by each backup job mainly by
What the file backup inventory that user is arranged in backup policy determined.However, due to the characteristic of host file system, backup policy
It not only supports specific filename, also allows to be used for using a variety of different stages, different types of asterisk wildcard in lists of documents
Indicate the file set for having general character in file designation.Meanwhile the lists of documents in backup policy are divided into "comprising" and " exclusion "
Two parts, it is final to determine the file extent for needing to back up after host is by calculating this two parts inventory into row set.
Hosts file inventory simplifies the support of a variety of asterisk wildcards the expression of lists of documents, effectively reduces backup policy
Number of entries, but as well as the abstractness and complexity of asterisk wildcard, increase the parsing difficulty of lists of documents so that right
Segmentation, combination and change of backup policy etc. become difficult to operate.Currently, in the industry for lists of documents of the host containing asterisk wildcard
Parsing there is no effective solution scheme, can only by virtue of experience carry out manual analysis by professional, or by allowing system
By the filename expansion containing asterisk wildcard, after being enumerated as detailed lists of documents, then it is further processed.However, due to master
Quantity of documents is huge in machine file system, and name is intricate, and with the needs of business processing, lists of documents are often sent out
Raw more frequent variation.First method whole process needs artificial participation, cannot achieve automatic business processing, lacks reliability, the
Two kinds of methods not only need to expend a large amount of system resource, but also due to output since operation object is specific lists of documents
As a result entry is excessive, and there is also inconveniences in practical applications.
Currently, utilization of the lists of documents containing asterisk wildcard in Hosts file management is very universal, such as:Batch documents
Backup, recovery, deletion etc. are mainly all based on this kind of lists of documents, therefore, the parsing to the lists of documents containing asterisk wildcard
With the speed of processing, it is directly related to the efficiency of Hosts file management, while the reliability of the analysis result of lists of documents, also will
Directly affect safety and the integrality of host data.
For how quickly and correctly to be parsed to the lists of documents containing asterisk wildcard, effective solution is not yet proposed at present
Certainly scheme.
Invention content
An embodiment of the present invention provides a kind of document handling methods, are carried out quickly to the lists of documents containing asterisk wildcard with reaching
Accurately parse, to effectively improve file management efficiency and reliability purpose, this method includes:
Obtain the lists of documents for two groups of files for needing to parse comparison;
Filename containing asterisk wildcard in the lists of documents of two groups of files is matched two-by-two;
Intersection operation based on oriented topological sequences is carried out to the filename after matching two-by-two, obtains candidate's topology of intersection
Sequence;
The candidate topological sequences are reduced to the filename expression formula containing asterisk wildcard, and according to reduction result, determine life
At the converging relation between the filename pair of file name;
Converging relation between each pair of filename determining is carried out summarizing merging, to determine the file of two groups of files
Converging relation between inventory;
According to the converging relation between the lists of documents of two groups of determining files, in response to operational order pair input by user
Two groups of files are handled.
In one embodiment, according to reduction result, determine that the intersection between the filename pair for generating file name is closed
System, including:
Validity checking is carried out to the filename expression formula restored;
According to validity checking as a result, determining the intersection between the filename pair for generating file name according to following rule
Relationship:
If the filename expression formula restored fails to pass through validity checking, it is determined that generate file name expression formula
The intersection of filename pair is sky;
If the filename expression formula restored can by validity checking, and the filename expression formula restored with generate
One filename of the filename centering of file name is identical, it is determined that generates the filename of file name expression formula to it
Between for comprising with by inclusion relation;
If the filename expression formula restored can by validity checking, and with generate file name expression formula file
Any one filename of name centering is all different, it is determined that is closed for intersection between the filename pair of generation file name expression formula
System.
In one embodiment, validity checking is carried out to the filename expression formula restored, including:
By the length of filename hop count, removing asterisk wildcard part in the filename expression formula restored, per segment file name
Length be compared with scheduled restriction rule;
If all meeting restriction rule, it is determined that pass through validity checking.
In one embodiment, the intersection operation based on oriented topological sequences is carried out to the filename after matching two-by-two,
The candidate topological sequences of intersection are obtained, including:
Following operation is executed to each pair of filename in the filename after matching two-by-two:
The topological digraph of equivalence of two filenames of the centering is built respectively;
The topological digraph of the equivalence of both of these documents name is merged, all topological orders of the digraph after being merged
Row add weights and filter out legal sequence;
The merging treatment that topological sequences after screening are carried out with adjacent node, until it cannot remerge, to be closed
The candidate topological sequences of one or more intersections after and.
The embodiment of the present invention additionally provides a kind of document handling apparatus, is carried out soon to the lists of documents containing asterisk wildcard with reaching
Speed and accurately parse, to effectively improve file management efficiency and reliability purpose, which includes:
Acquisition module, the lists of documents for obtaining two groups of files for needing to parse comparison;
Matching module, for being matched the filename containing asterisk wildcard in the lists of documents of two groups of files two-by-two;
Topological computing module, for carrying out the intersection operation based on oriented topological sequences to the filename after matching two-by-two,
Obtain the candidate topological sequences of intersection;
Converging relation determining module, for the candidate topological sequences to be reduced to the filename expression formula containing asterisk wildcard,
And according to reduction result, determine the converging relation between the filename pair for generating file name;
Merging module carries out summarizing merging, to determine for the converging relation between each pair of filename to determining
Converging relation between the lists of documents of two groups of files;
Processing module is used for according to the converging relation between the lists of documents of two groups of determining files, defeated in response to user
Two groups of files of the operational order pair entered are handled.
In one embodiment, the converging relation determining module includes:
Validity checking unit, for carrying out validity checking to the filename expression formula restored;
Intersection judging unit, for generating file name as a result, being determined according to following rule according to validity checking
Converging relation between filename pair:
If the filename expression formula restored fails to pass through validity checking, it is determined that generate file name expression formula
The intersection of filename pair is sky;
If the filename expression formula restored can by validity checking, and the filename expression formula restored with generate
One filename of the filename centering of file name is identical, it is determined that generates the filename of file name expression formula to it
Between for comprising with by inclusion relation;
If the filename expression formula restored can by validity checking, and with generate file name expression formula file
Any one filename of name centering is all different, it is determined that is closed for intersection between the filename pair of generation file name expression formula
System.
In one embodiment, the validity checking unit includes:
Comparing subunit, the filename hop count in filename expression formula for will restore remove asterisk wildcard part
Length is compared per the length of segment file name with scheduled restriction rule;
Determination subelement, for working as the filename hop count in the filename expression formula restored, removing asterisk wildcard part
In the case that length, the length of every segment file name all meet restriction rule, determination passes through validity checking.
In one embodiment, the topological computing module is specifically used for each right in the filename after matching two-by-two
Filename executes following operation:
The topological digraph of equivalence of two filenames of the centering is built respectively;
The topological digraph of the equivalence of both of these documents name is merged, all topological orders of the digraph after being merged
Row add weights and filter out legal sequence;
The merging treatment that topological sequences after screening are carried out with adjacent node, until it cannot remerge, to be closed
The candidate topological sequences of one or more intersections after and.
In the above-described embodiments, the lists of documents of two groups of files are extracted, and the entry wherein containing asterisk wildcard is carried out
Pairing carries out the operation that seeks common ground two-by-two, then the seek common ground result row of calculating of entry is further summarized and arranged, to obtain two
The content of mutual inclusion relation and intersection part between the lists of documents of group file is further introduced based on topology
The character string intersection algorithm of digraph realizes the accurate analysis of all kinds of asterisk wildcards in lists of documents, to solve existing skill
The technical issues of being difficult to quickly and correctly parse the lists of documents containing asterisk wildcard in art, has reached and has effectively improved file
The technique effect of the efficiency and reliability of management.
Description of the drawings
Attached drawing described herein is used to provide further understanding of the present invention, and is constituted part of this application, not
Constitute limitation of the invention.In the accompanying drawings:
Fig. 1 is a kind of method flow diagram of document handling method according to the ... of the embodiment of the present invention;
Fig. 2 is the another method flow chart of document handling method according to the ... of the embodiment of the present invention;
Fig. 3 is the another method flow diagram of document handling method according to the ... of the embodiment of the present invention;
Fig. 4 is the structure diagram of document handling apparatus according to the ... of the embodiment of the present invention.
Specific implementation mode
To make the objectives, technical solutions, and advantages of the present invention clearer, right with reference to embodiment and attached drawing
The present invention is described in further details.Here, the exemplary embodiment and its explanation of the present invention be for explaining the present invention, but simultaneously
It is not as a limitation of the invention.
In this example, a kind of document handling method is provided, to solve to operate in Hosts file management in the prior art
In, it is difficult to the problem of accurately parsing the lists of documents containing asterisk wildcard.As shown in Figure 1, this method includes:
Step 101:Obtain the lists of documents for two groups of files for needing to parse comparison;
The file of the needs parsing comparison of acquisition can be two groups input by user or multigroup clear comprising asterisk wildcard file
Single operation or file.It, can be from the definition of file operation involved in operation when input is operation when specific implementation
Lists of documents are read in part, when input is only includes the file of lists of documents, can directly execute read operation, clear in file
After the completion of single reading, so that it may parse.It when implementing, is typically chosen two groups of files and carries out subsequent operations, that is, can two
Two groups of files of group carry out.
Specifically, complete lists of documents extraction mechanism can be preset, it then can be with during actually executing
Automatically judge the type of file input by user and operation, while corresponding operation is taken according to determining input type, from defeated
The content of the lists of documents part containing asterisk wildcard is correctly extracted in the file entered and operation.
Step 102:Filename containing asterisk wildcard in the lists of documents of two groups of files is matched two-by-two;
I.e., it is possible to the first filename containing asterisk wildcard from acquisition each group inventory in two groups of files for needing to be compared,
Then the filename containing asterisk wildcard in two groups of files is matched two-by-two, for example, according to similarity can match
It is right.
Step 103:Filename after matching two-by-two is carried out, based on the intersection operation of oriented topological sequences, to obtain intersection
Candidate topological sequences;
Specifically, it to need the filename compared to build topological digraph of equal value respectively, and merges, finds out merging
All topological sequences of digraph afterwards add weights and filter out legal sequence, are carried out to the topological sequences after screening adjacent
Node merging treatment, until it cannot remerge, the topological sequences of the one or more candidate obtained after merging.
Step 104:The candidate topological sequences are reduced to the filename expression formula containing asterisk wildcard, and are tied according to reduction
Fruit determines the converging relation between the filename pair for generating file name;
Specifically, after the candidate topological sequences are reduced to the filename expression formula containing asterisk wildcard, according to reduction
As a result, determine generate file name filename pair between converging relation before, can also first to restore contain asterisk wildcard
Filename expression formula carry out legitimacy screening, while eliminating the expression formula wherein repeated, then determine mutual friendship again
Collection relationship.
The step realize principle can be:The topological sequences obtained after node is merged are reduced into expression formula, then root
According to the naming rule of Hosts file, legitimacy screening is carried out to expression formula, while eliminating the expression formula of repetition, completes above-mentioned behaviour
After work, the relationship between the file set that two filenames are referred to can be obtained, that is, determines two filename meanings
Relationship between the set in generation is:Including, by comprising, it is unrelated or have one kind between intersection, and obtain between file set
Relational expression.
Such as:There is no candidate topological sequences after being calculated, or the filename that candidate topological sequences restore
When expression formula fails to screen by legitimacy, then it can determine that the filename intersection of this two groups of lists of documents is sky;Work as reduction
Filename expression formula out is identical with one in two filenames of input, then can determine this two groups of lists of documents
Filename between be comprising with by comprising relationship;After by screening and eliminating repetition, obtain and two input texts
The expression formula that part name is all different, then can be using the expression formula as intersection, this intersection is equivalent to refer to two filenames
Lap of the meaning for file extent.
Step 105:Converging relation between each pair of filename determining is carried out summarizing merging, to determine two groups of texts
Converging relation between the lists of documents of part;
If two-by-two without intersection between the filename in lists of documents, show there is no lap between two inventories,
If all expression formulas are all the subsets of file name suggestion list B in lists of documents A, it can determine that file name suggestion list A is file
The subset of name inventory B.
Step 106:According to the converging relation between the lists of documents of two groups of determining files, in response to behaviour input by user
It instructs and two groups of files is handled.
Corresponding operation is executed to lists of documents according to the operation of user's selection, if user's selection is union operation,
Then according to the intersection result of calculation of lists of documents, operation is merged to lists of documents, if user's selection is cutting operation,
Then according to the intersection result of calculation of lists of documents, operation is split to lists of documents, if that user's selection is contrast operation,
The then intersection result of calculation of direct output file inventory.That is, on the basis of lists of documents intersection compares analysis result, root
According to user need to lists of documents execute corresponding operation (such as:Merge, segmentation, check than equity), to lists of documents into
After the corresponding operation processing of row, handling result is exported.
In the above-described embodiments, the lists of documents of two groups of files are extracted, and the entry wherein containing asterisk wildcard is carried out
Pairing carries out the operation that seeks common ground two-by-two, then the seek common ground result row of calculating of entry is further summarized and arranged, to obtain two
The content of mutual inclusion relation and intersection part between the lists of documents of group file is further introduced based on topology
The character string intersection algorithm of digraph realizes the accurate analysis of all kinds of asterisk wildcards in lists of documents, to solve existing skill
The technical issues of being difficult to quickly and correctly parse the lists of documents containing asterisk wildcard in art, has reached and has effectively improved file
The technique effect of the efficiency and reliability of management.
Below in conjunction with a specific embodiment, above-mentioned document handling method is illustrated, as shown in Fig. 2, including
Following steps:
Step 201:User's selection operation type, and specified two or more sets operations for carrying out parsing comparison, or
It is the input file for including lists of documents;
Step 202:After the input for receiving user, the action type of record user's selection, while to each group input by user
Operation or file are parsed one by one, extract the content of lists of documents, and whether check wherein includes asterisk wildcard, if file
There is the content containing asterisk wildcard in inventory, then follow the steps 203, otherwise then executes step 208.
Step 203:According to the lists of documents extracted from operation, the filename in different groups of lists of documents is matched two-by-two
Form filename pair.
Step 204:To the filename of generation to by carrying out, based on the intersection operation of oriented topological sequences, obtaining intersection
Candidate topological sequences.
By taking Hosts file name AB.**.CD.** and AB.CD.**.EF as an example, " " is separator, and * * are to represent any section
The asterisk wildcard of number, any amount character.When carrying out intersection calculating, filename is respectively converted into band edge as follows first
The digraph of weights:
Sequence 1:First node-(0)-AB-(1)-CD-(1)-end-node
Sequence 2:First node-(0)-AB.CD-(1)-EF-(0)-end-node
Wherein, the number in horizontal line bracket is side right value, and when there is asterisk wildcard, side right value takes 1.
Then, above-mentioned two sequence is merged into operation, obtains the topological sequences without side right value:
First node --- AB.CD --- AB --- CD --- EF --- end-nodes
Specifically, merging calculating to belonging to not homotactic adjacent node, and add side right value, AB.CD nodes and
AB nodes are AB.CD after merging, and merge back weights still can not merge no intersection for 1, CD and EF, therefore retain original side right
Value.
The intersection sequence being finally calculated is:
First node-(0)-AB.CD-(1)-CD-(1)-EF-(0)-end-node
Step 205:The intersection candidate's topological sequences being calculated are reduced to the filename expression formula containing asterisk wildcard, example
Such as, above-mentioned intersection sequence can be reduced to expression formula AB.CD.**.CD.**.EF.
Step 206:Validity checking and screening are carried out to the expression formula restored, the expression formula of repetition is removed, obtains text
The expression formula of part name intersection, and the relationship between each pair of filename inputted is judged according to intersection expression formula.
It is carried out specifically, validity checking screening can be the Naming conventions limitation based on file system, such as:Intersection
Filename expression formula remove whether the length of asterisk wildcard part is more than limitation, whether the hop count of filename is more than to allow quantity,
Whether every section of length is more than file system limitation etc..
If intersection expression formula is sky, show that the file that two filename expression formulas of input are referred to is non-overlapping, such as
Fruit intersection expression formula is identical as one in two original filename expression formulas, then shows what file name expression formula was referred to
File is the subset of another expression formula, otherwise shows to have no inclusion relation between two filenames.
Step 207:Converging relation between each pair of filename obtained to processing is summarized, is merged, to obtain each group
Converging relation between lists of documents.If showing the non-overlapping portion of two inventories without intersection two-by-two between the filename in inventory
Point;If all expression formulas are all the subsets of file name suggestion list B in lists of documents A, show that file name suggestion list A is also filename
The subset of inventory B.
Step 208:It, only need to be by the filename in the lists of documents of each group one by one due to not including asterisk wildcard in lists of documents
It is compared, identical filename is the intersection part of inventory.
Step 209:Corresponding operating is executed to these lists of documents according to the user's choice:If user's selection combining is grasped
Make, thens follow the steps 210;If user selects cutting operation, 211 are thened follow the steps;If user's comparative selection operates, hold
Row step 212.
Step 210:According to the intersection result of calculation of lists of documents, operation is merged to lists of documents.
Step 211:According to the intersection result of calculation of lists of documents, operation is split to lists of documents.
Step 212:The result that the intersection of output file inventory calculates.
As shown in figure 3, for the specific statement to above-mentioned steps 204 to step 207, that is, the friendship based on oriented topological sequences
Set operation may include steps of:
Step 301:Topological digraph of equal value is built respectively for the filename NAMEA and NAMEB of input, and is closed
And;
Step 302:The topological digraph after merging is calculated, generates corresponding all possible topological sequences, and to open up
Flutter sequence addition side right value;
Step 303:Each topological sequences are recycled and carry out node merging, until it cannot remerge;
Step 304:Topological sequences after merging are screened, candidate candidate intersection topological sequences are obtained;
Step 305:Candidate intersection topological sequences are reduced to the filename expression formula containing asterisk wildcard;
Step 306:The filename expression formula repeated is eliminated, and carries out legitimacy screening, obtains the corresponding table in intersection part
Up to formula EXP;
Step 307:The correlation of NAMEA and NAMEB are judged according to EXP:If EXP be sky, NAMEA and
Intersection is not present in file extent non-overlapping copies representated by NAMEB;If the expression formula of EXP is identical as NAMEA or NAMEB,
Illustrate between NAMEA and NAMEB for comprising with by comprising relationship;If the expression formula of EXP and NAMEA or NAMEB not phases
Together, illustrate that NAMEA and NAMEB partly overlap, there are the intersections that expression formula is EXP.
The document handling method provided by above-described embodiment solves the problems, such as the parsing of lists of documents containing asterisk wildcard, carries
It has supplied a set of to file backup, recovery, duplication, the side for deleting the lists of documents progress fast resolving comprising asterisk wildcard in operation
Case, and comparison, merging and cutting operation to such lists of documents can be realized without manual intervention.First,
The accurate parsing for realizing the lists of documents containing asterisk wildcard, due to the abstractness of the lists of documents containing asterisk wildcard, the mode manually parsed
It can only by virtue of experience be estimated, cannot achieve entirely accurate parsing, be calculated by the intersection based on oriented topological sequences of introducing
Method solves the problems, such as accurately to parse such lists of documents, secondly, improves the parsing speed of the lists of documents containing asterisk wildcard
Degree, reduces resource overhead, was compared more in the past by artificial processing method, and the parsing time shortens 80% or more, further
, analysis result has very strong versatility, and the management that Hosts file inventory is used directly for without processing operates, and due to
Lists of documents intersection that the program parses, can be very convenient based on this result the result is that provided in the form of expression formula
It merges, divide and contrast operation to containing asterisk wildcard lists of documents.
Based on same inventive concept, a kind of document handling apparatus is additionally provided in the embodiment of the present invention, such as following implementation
Described in example.Since the principle that document handling apparatus solves the problems, such as is similar to document handling method, the reality of document handling apparatus
The implementation that may refer to document handling method is applied, overlaps will not be repeated.It is used below, term " unit " or " mould
The combination of the software and/or hardware of predetermined function may be implemented in block ".Although device described in following embodiment is preferably with soft
Part is realized, but the realization of the combination of hardware or software and hardware is also that may and be contemplated.Fig. 4 is of the invention real
A kind of structure diagram for applying the document handling apparatus of example, as shown in figure 4, including:Acquisition module 401, matching module 402, topology
Computing module 403, converging relation determining module 404, merging module 405 and processing unit 406, below say the structure
It is bright.
Acquisition module 401, the lists of documents for obtaining two groups of files for needing to parse comparison;
Matching module 402, for being matched the filename containing asterisk wildcard in the lists of documents of two groups of files two-by-two;
Topological computing module 403, for carrying out the intersection fortune based on oriented topological sequences to the filename after matching two-by-two
It calculates, obtains the candidate topological sequences of intersection;
Converging relation determining module 404 is reached for the candidate topological sequences to be reduced to the table of file name containing asterisk wildcard
Formula, and according to reduction result, determine the converging relation between the filename pair for generating file name;
Merging module 405 carries out summarizing merging, with determination for the converging relation between each pair of filename to determining
Go out the converging relation between the lists of documents of two groups of files;
Processing module 406 is used for according to the converging relation between the lists of documents of two groups of determining files, in response to user
Two groups of files of operational order pair of input are handled.
In one embodiment, converging relation determining module 404 may include:Validity checking unit is gone back for Dui
The filename expression formula that original goes out carries out validity checking;Intersection judging unit, for according to validity checking as a result, according to
Lower rule determines the converging relation between the filename pair for generating file name:
1) if the filename expression formula restored fails to pass through validity checking, it is determined that generate file name expression formula
Filename pair intersection be sky;
2) if the filename expression formula restored can be by validity checking, and the filename expression formula restored and life
A filename at the filename centering of file name is identical, it is determined that generates the filename pair of file name expression formula
Between for comprising with by inclusion relation;
If the filename expression formula 3) restored can by validity checking, and with generate file name expression formula text
Any one filename of part name centering is all different, it is determined that is closed for intersection between the filename pair of generation file name expression formula
System.
In one embodiment, validity checking unit may include:Comparing subunit, the file for will restore
The length of filename hop count, removing asterisk wildcard part in name expression formula, length and scheduled restriction rule per segment file name
It is compared;Determination subelement, for working as the filename hop count in the filename expression formula restored, removing asterisk wildcard part
In the case that length, the length of every segment file name all meet restriction rule, determination passes through validity checking.
In one embodiment, topological computing module 403 can be used for each right in the filename after matching two-by-two
Filename executes following operation:The topological digraph of equivalence of two filenames of the centering is built respectively;To both of these documents name
The topological digraph of equivalence merge, all topological sequences of the digraph after being merged, addition weights simultaneously filter out conjunction
The sequence of method;The merging treatment that topological sequences after screening are carried out with adjacent node, until it cannot remerge, to be closed
The candidate topological sequences of one or more intersections after and.
In another embodiment, a kind of software is additionally provided, the software is for executing above-described embodiment and preferred reality
Apply the technical solution described in mode.
In another embodiment, a kind of storage medium is additionally provided, above-mentioned software is stored in the storage medium, it should
Storage medium includes but not limited to:CD, floppy disk, hard disk, scratch pad memory etc..
It can be seen from the above description that the embodiment of the present invention realizes following technique effect:In above-described embodiment
In, the lists of documents of two groups of files are extracted, and pairing two-by-two is carried out to the entry wherein containing asterisk wildcard and carries out the fortune that seeks common ground
It calculates, then the seek common ground result row of calculating of entry is further summarized and arranged, to obtain between the lists of documents of two groups of files
Mutual inclusion relation and the content of intersection part further introduce character string intersection based on topological digraph and calculate
Method realizes the accurate analysis of all kinds of asterisk wildcards in lists of documents, is difficult in the prior art to containing asterisk wildcard to solve
The technical issues of lists of documents are quickly and correctly parsed has reached the efficiency and reliability for effectively improving file management
Technique effect.
Obviously, those skilled in the art should be understood that each module of the above-mentioned embodiment of the present invention or each step can be with
It is realized with general computing device, they can be concentrated on a single computing device, or be distributed in multiple computing devices
On the network formed, optionally, they can be realized with the program code that computing device can perform, it is thus possible to by it
Store and be performed by computing device in the storage device, and in some cases, can be to be held different from sequence herein
The shown or described step of row, either they are fabricated to each integrated circuit modules or will be multiple in them
Module or step are fabricated to single integrated circuit module to realize.In this way, the embodiment of the present invention be not limited to it is any specific hard
Part and software combine.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the embodiment of the present invention can have various modifications and variations.All within the spirits and principles of the present invention, made by
Any modification, equivalent substitution, improvement and etc. should all be included in the protection scope of the present invention.
Claims (8)
1. a kind of document handling method, which is characterized in that including:
Obtain the lists of documents for two groups of files for needing to parse comparison;
Filename containing asterisk wildcard in the lists of documents of two groups of files is matched two-by-two;
Intersection operation based on oriented topological sequences is carried out to the filename after matching two-by-two, obtains the candidate topological order of intersection
Row;
The candidate topological sequences are reduced to the filename expression formula containing asterisk wildcard, and according to reduction result, determine to generate and be somebody's turn to do
Converging relation between the filename pair of filename;
Converging relation between each pair of filename determining is carried out summarizing merging, to determine the lists of documents of two groups of files
Between converging relation;
According to the converging relation between the lists of documents of two groups of determining files, in response to two groups of operational order pair input by user
File is handled.
2. the method as described in claim 1, which is characterized in that according to reduction result, determine the filename for generating file name
Converging relation between, including:
Validity checking is carried out to the filename expression formula restored;
According to validity checking as a result, determining that the intersection between the filename pair for generating file name is closed according to following rule
System:
If the filename expression formula restored fails to pass through validity checking, it is determined that generate the file of file name expression formula
The intersection of name pair is sky;
If the filename expression formula restored can by validity checking, and the filename expression formula restored with generate this article
One filename of the filename centering of part name is identical, it is determined that is between the filename pair of generation file name expression formula
Including and by inclusion relation;
If the filename expression formula restored can by validity checking, and with generate file name expression formula filename pair
In any one filename it is all different, it is determined that generate file name expression formula filename pair between be overlapping relation.
3. method as claimed in claim 2, which is characterized in that validity checking is carried out to the filename expression formula restored,
Including:
By the length of filename hop count, removing asterisk wildcard part in the filename expression formula restored, per the length of segment file name
Degree is compared with scheduled restriction rule;
If all meeting restriction rule, it is determined that pass through validity checking.
4. method as claimed any one in claims 1 to 3, which is characterized in that carry out base to the filename after matching two-by-two
In the intersection operation of oriented topological sequences, the candidate topological sequences of intersection are obtained, including:
Following operation is executed to each pair of filename in the filename after matching two-by-two:
The topological digraph of equivalence of two filenames of the centering is built respectively;
The topological digraph of the equivalence of both of these documents name is merged, all topological sequences of the digraph after being merged,
Addition weights simultaneously filter out legal sequence;
The merging treatment that topological sequences after screening are carried out with adjacent node, until it cannot remerge, after obtaining merging
One or more intersections candidate topological sequences.
5. a kind of document handling apparatus, which is characterized in that including:
Acquisition module, the lists of documents for obtaining two groups of files for needing to parse comparison;
Matching module, for being matched the filename containing asterisk wildcard in the lists of documents of two groups of files two-by-two;
Topological computing module, for the filename after matching two-by-two based on the intersection operation of oriented topological sequences, obtain
The candidate topological sequences of intersection;
Converging relation determining module, for the candidate topological sequences to be reduced to the filename expression formula containing asterisk wildcard, and root
According to reduction result, the converging relation between the filename pair for generating file name is determined;
Merging module carries out summarizing merging, to determine two groups for the converging relation between each pair of filename to determining
Converging relation between the lists of documents of file;
Processing module is used for according to the converging relation between the lists of documents of two groups of determining files, in response to input by user
Two groups of files of operational order pair are handled.
6. device as claimed in claim 5, which is characterized in that the converging relation determining module includes:
Validity checking unit, for carrying out validity checking to the filename expression formula restored;
Intersection judging unit is used for according to validity checking as a result, determining the file for generating file name according to following rule
Converging relation of the name between:
If the filename expression formula restored fails to pass through validity checking, it is determined that generate the file of file name expression formula
The intersection of name pair is sky;
If the filename expression formula restored can by validity checking, and the filename expression formula restored with generate this article
One filename of the filename centering of part name is identical, it is determined that is between the filename pair of generation file name expression formula
Including and by inclusion relation;
If the filename expression formula restored can by validity checking, and with generate file name expression formula filename pair
In any one filename it is all different, it is determined that generate file name expression formula filename pair between be overlapping relation.
7. device as claimed in claim 6, which is characterized in that the validity checking unit includes:
Comparing subunit, the length of the filename hop count in filename expression formula, removing asterisk wildcard part for will restore,
It is compared with scheduled restriction rule per the length of segment file name;
Determination subelement, the length of the filename hop count in filename expression formula, removing asterisk wildcard part for that ought restore,
In the case of all meeting restriction rule per the length of segment file name, determination passes through validity checking.
8. the device as described in any one of claim 5 to 7, which is characterized in that it is described topology computing module be specifically used for pair
Each pair of filename in filename after matching two-by-two executes following operation:
The topological digraph of equivalence of two filenames of the centering is built respectively;
The topological digraph of the equivalence of both of these documents name is merged, all topological sequences of the digraph after being merged,
Addition weights simultaneously filter out legal sequence;
The merging treatment that topological sequences after screening are carried out with adjacent node, until it cannot remerge, after obtaining merging
One or more intersections candidate topological sequences.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510437027.1A CN104991963B (en) | 2015-07-23 | 2015-07-23 | Document handling method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510437027.1A CN104991963B (en) | 2015-07-23 | 2015-07-23 | Document handling method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104991963A CN104991963A (en) | 2015-10-21 |
CN104991963B true CN104991963B (en) | 2018-09-25 |
Family
ID=54303778
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510437027.1A Active CN104991963B (en) | 2015-07-23 | 2015-07-23 | Document handling method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104991963B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110109672B (en) * | 2019-04-17 | 2023-01-10 | 奇安信科技集团股份有限公司 | Analysis processing method and device for expression |
JP7073320B2 (en) * | 2019-09-18 | 2022-05-23 | 本田技研工業株式会社 | Document contrast system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102117324A (en) * | 2011-02-24 | 2011-07-06 | 上海北大方正科技电脑系统有限公司 | File management method and management system applying fuzzy matrice |
CN102693302A (en) * | 2012-05-21 | 2012-09-26 | 浙江省公众信息产业有限公司 | Quick file comparison method, system and client side |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5689361B2 (en) * | 2011-05-20 | 2015-03-25 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | Method, program, and system for converting a part of graph data into a data structure that is an image of a homomorphic map |
-
2015
- 2015-07-23 CN CN201510437027.1A patent/CN104991963B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102117324A (en) * | 2011-02-24 | 2011-07-06 | 上海北大方正科技电脑系统有限公司 | File management method and management system applying fuzzy matrice |
CN102693302A (en) * | 2012-05-21 | 2012-09-26 | 浙江省公众信息产业有限公司 | Quick file comparison method, system and client side |
Also Published As
Publication number | Publication date |
---|---|
CN104991963A (en) | 2015-10-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101617696B1 (en) | Method and device for mining data regular expression | |
CN102870116B (en) | Method and apparatus for content matching | |
CN110442847B (en) | Code similarity detection method and device based on code warehouse process management | |
CN103473076A (en) | Issuing method and issuing system for code version | |
CN106020798A (en) | Webpage version publishing method, device and system | |
CN111125298A (en) | Method, equipment and storage medium for reconstructing NTFS file directory tree | |
CN107085615B (en) | Text duplicate elimination system, method, server and computer storage medium | |
CN104991963B (en) | Document handling method and device | |
CN107704341A (en) | File access pattern method, apparatus and electronic equipment | |
CN107179965A (en) | Database restoring method and device | |
CN111723087A (en) | Mining method and device of data blood relationship, storage medium and electronic equipment | |
CN103530369A (en) | De-weight method and system | |
CN111176901B (en) | HDFS deleted file recovery method, terminal device and storage medium | |
CN104391945B (en) | The treating method and apparatus of database file data directory | |
CN106569986A (en) | Character string replacement method and device | |
CN110457064B (en) | Method and device for generating network cutover script | |
CN111726249B (en) | Configuration file processing method and device of network equipment | |
CN105740260A (en) | Method and device for extracting template file data structure | |
Schlie et al. | Reengineering variants of matlab/simulink software systems | |
CN111142927A (en) | Configuration file merging and splitting processing method and device | |
CN109947429A (en) | Data processing method and device | |
CN106547756A (en) | The creation method and device of data base | |
CN110222105A (en) | Data summarization processing method and processing device | |
CN110740058B (en) | Equipment data extraction method and device for making network cut-over script | |
CN110018980B (en) | Method and device for searching fault data from simulation data of fan controller |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |