CN104991963A - File processing method and file processing apparatus - Google Patents

File processing method and file processing apparatus Download PDF

Info

Publication number
CN104991963A
CN104991963A CN201510437027.1A CN201510437027A CN104991963A CN 104991963 A CN104991963 A CN 104991963A CN 201510437027 A CN201510437027 A CN 201510437027A CN 104991963 A CN104991963 A CN 104991963A
Authority
CN
China
Prior art keywords
filename
lists
expression formula
documents
files
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510437027.1A
Other languages
Chinese (zh)
Other versions
CN104991963B (en
Inventor
鲁莽
孙艳
林子涯
韩方明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN201510437027.1A priority Critical patent/CN104991963B/en
Publication of CN104991963A publication Critical patent/CN104991963A/en
Application granted granted Critical
Publication of CN104991963B publication Critical patent/CN104991963B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/122File system administration, e.g. details of archiving or snapshots using management policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a file processing method and a file processing apparatus. The method comprises: acquiring file lists of two sets of files that need to be parsed and compared; matching the file names containing a wildcard in the file lists of the two sets of the files by pairs; carrying out intersection operation based on a directional topological sequence on the file names matched by pairs to obtain a candidate topological sequence of an intersection; restoring the candidate topological sequence to an expression of the file name containing the wildcard, and determining an intersection relationship between a file name pair generating the file name; summarizing and consolidating the intersection relationship between each determined pair of the file names so as to determine the intersection relationship between the file lists of the two sets of the files; and according to the determined intersection relationship between the file lists of the two set of the files, performing processing on the two sets of the files in response to an operation instruction input by a user. By adopting the file processing method provided by the present invention, the purpose of improving the efficiency and reliability of file management caused by quick and accurate parsing on the file list containing the wildcard is implemented.

Description

Document handling method and device
Technical field
The present invention relates to technical field of data processing, particularly a kind of document handling method and device.
Background technology
One of important means that file backup backs up as host data, mainly realized by the special backup job (can be understood as program or task dispatching) of execution, wherein, the file backup inventory that the file extent involved by each backup job is mainly arranged in backup policy by user is determined.But due to the characteristic of host file system, backup policy not only supports concrete filename, also allow in lists of documents, use multiple different stage, dissimilar asterisk wildcard, for representing the file set having general character in file designation.Meanwhile, the lists of documents in backup policy are divided into " comprising " and " eliminating " two parts, main frame by this two parts inventory is carried out set calculate after, finally determine need backup file extent.
The support of Hosts file inventory to multiple asterisk wildcard simplifies the expression of lists of documents, effectively reduce the number of entries of backup policy, but simultaneously also due to abstractness and the complicacy of asterisk wildcard, add the parsing difficulty of lists of documents, the segmentation to backup policy, combination and change etc. are become and is difficult to operation.At present, the parsing in the industry main frame being contained to the lists of documents of asterisk wildcard there is no effective solution, by virtue of experience can only carry out manual analysis by professional, or by allowing system be launched by the filename containing asterisk wildcard, after enumerating the lists of documents for detail, then be further processed.But due to host file system file substantial amounts, it is intricate to name, and along with the needs of business processing, lists of documents often occur to change comparatively frequently.First method whole process needs artificial participation, cannot realize automatic business processing, lacks reliability, second method due to operand be concrete lists of documents, not only need the system resource of at substantial, and due to Output rusults entry too much, also there is inconvenience in actual applications.
At present, the utilization of lists of documents in Hosts file management containing asterisk wildcard is very general, such as: the backup, recovery, deletion etc. of batch documents are main all based on this kind of lists of documents, therefore, to the parsing of the lists of documents containing asterisk wildcard and the speed of process, be directly connected to the efficiency of Hosts file management, the reliability of the analysis result of lists of documents, also will directly affect security and the integrality of host data simultaneously.
For how to carry out resolving fast and exactly to the lists of documents containing asterisk wildcard, at present effective solution is not yet proposed.
Summary of the invention
Embodiments provide a kind of document handling method, carry out resolving fast and exactly to the lists of documents containing asterisk wildcard to reach, thus effectively improve the efficiency of file management and the object of reliability, the method comprises:
Obtain the lists of documents needing two groups of files of resolving contrast;
Filename containing asterisk wildcard in the lists of documents of two groups of files is matched between two;
Intersection operation based on oriented topological sequences is carried out to the filename after pairing between two, obtains the candidate's topological sequences occured simultaneously;
Described candidate's topological sequences is reduced to the filename expression formula containing asterisk wildcard, and according to reduction result, the filename determining to generate this filename between converging relation;
Merging is gathered to the converging relation between each pair of filename determined, with the converging relation between the lists of documents determining two groups of files;
According to the converging relation between the lists of documents of the two groups of files determined, the operational order in response to user's input processes two groups of files.
In one embodiment, according to reduction result, the filename determining to generate this filename between converging relation, comprising:
Validity checking is carried out to the filename expression formula restored;
According to the result of validity checking, the filename determining to generate this filename according to following rule between converging relation:
If the filename expression formula restored is not by validity checking, then determines to generate this table of file name and reach the right common factor of the filename of formula for empty;
If the filename expression formula restored is by validity checking, and the filename expression formula restored is identical with the filename of filename centering generating this filename, then determine to generate filename that this table of file name reaches formula between for comprising and involved relation;
If the filename expression formula restored is by validity checking, and to reach any one filename of the filename centering of formula all different from generating this table of file name, then determine to generate filename that this table of file name reaches formula between be overlapping relation.
In one embodiment, validity checking is carried out to the filename expression formula restored, comprising:
Filename hop count in the filename expression formula restored, the length of removing asterisk wildcard part, the length of every segment file name and predetermined restriction rule are compared;
If all meet restriction rule, then determine to pass through validity checking.
In one embodiment, the intersection operation based on oriented topological sequences is carried out to the filename after pairing between two, obtains the candidate's topological sequences occured simultaneously, comprising:
Following operation is performed to each pair of filename in the filename after pairing between two:
Build the equivalence topology digraph of two filenames of this centering respectively;
The equivalence topology digraph of these two filenames is merged, obtains all topological sequences of the digraph after merging, add weights and filter out legal sequence;
Topological sequences after screening is carried out to the merging treatment of adjacent node, till can not remerging, to obtain candidate's topological sequences of the one or more common factors after merging.
The embodiment of the present invention additionally provides a kind of document handling apparatus, and carry out resolving fast and exactly to the lists of documents containing asterisk wildcard to reach, thus effectively improve the efficiency of file management and the object of reliability, this device comprises:
Acquisition module, for obtaining the lists of documents needing two groups of files of resolving contrast;
Matching module, for matching the filename containing asterisk wildcard in the lists of documents of two groups of files between two;
Topology calculate module, for carrying out the intersection operation based on oriented topological sequences to the filename after pairing between two, obtains the candidate's topological sequences occured simultaneously;
Converging relation determination module, for described candidate's topological sequences being reduced to the filename expression formula containing asterisk wildcard, and according to reduction result, the filename determining to generate this filename between converging relation;
Merge module, for gathering merging to the converging relation between each pair of filename determined, with the converging relation between the lists of documents determining two groups of files;
Processing module, for according to the converging relation between the lists of documents of the two groups of files determined, the operational order in response to user's input processes two groups of files.
In one embodiment, described converging relation determination module comprises:
Validity checking unit, for carrying out validity checking to the filename expression formula restored;
Common factor judging unit, for the result according to validity checking, the filename determining to generate this filename according to following rule between converging relation:
If the filename expression formula restored is not by validity checking, then determines to generate this table of file name and reach the right common factor of the filename of formula for empty;
If the filename expression formula restored is by validity checking, and the filename expression formula restored is identical with the filename of filename centering generating this filename, then determine to generate filename that this table of file name reaches formula between for comprising and involved relation;
If the filename expression formula restored is by validity checking, and to reach any one filename of the filename centering of formula all different from generating this table of file name, then determine to generate filename that this table of file name reaches formula between be overlapping relation.
In one embodiment, described validity checking unit comprises:
Relatively subelement, for comparing the filename hop count in the filename restored expression formula, the length of removing asterisk wildcard part, the length of every segment file name and predetermined restriction rule;
Determine subelement, for when the filename hop count in the filename expression formula restored, the length of removing asterisk wildcard part, the length of every segment file name all meet restriction rule, determine to pass through validity checking.
In one embodiment, described topology calculate module is specifically for performing following operation to each pair of filename in the filename after pairing between two:
Build the equivalence topology digraph of two filenames of this centering respectively;
The equivalence topology digraph of these two filenames is merged, obtains all topological sequences of the digraph after merging, add weights and filter out legal sequence;
Topological sequences after screening is carried out to the merging treatment of adjacent node, till can not remerging, to obtain candidate's topological sequences of the one or more common factors after merging.
In the above-described embodiments, extract the lists of documents of two groups of files, and to wherein matching containing the entry of asterisk wildcard the computing that seeks common ground between two, the result row calculated that entry sought common ground again gathers further and arranges, thus the content of mutual relation of inclusion between the lists of documents obtaining two groups of files and common factor part, further, introduce the character string intersection algorithm based on topological digraph, achieve the accurate analysis of all kinds of asterisk wildcard in lists of documents, thus solve in prior art the technical matters be difficult to carrying out resolving fast and exactly containing the lists of documents of asterisk wildcard, reach and effectively improve the efficiency of file management and the technique effect of reliability.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide a further understanding of the present invention, forms a application's part, does not form limitation of the invention.In the accompanying drawings:
Fig. 1 is a kind of method flow diagram of the document handling method according to the embodiment of the present invention;
Fig. 2 is the other method process flow diagram of the document handling method according to the embodiment of the present invention;
Fig. 3 is the another method flow diagram of the document handling method according to the embodiment of the present invention;
Fig. 4 is the structured flowchart of the document handling apparatus according to the embodiment of the present invention.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with embodiment and accompanying drawing, the present invention is described in further details.At this, exemplary embodiment of the present invention and illustrating for explaining the present invention, but not as a limitation of the invention.
In this example, provide a kind of document handling method, in order to solve in prior art in Hosts file bookkeeping, be difficult to the problem of accurately resolving the lists of documents containing asterisk wildcard.As shown in Figure 1, the method comprises:
Step 101: obtain the lists of documents needing two groups of files of resolving contrast;
The file of contrast resolved by the needs obtained, and can be operation or file that user's two groups or many groups of inputting comprise asterisk wildcard lists of documents.When specific implementation, when being input as operation, the definitional part file reading inventory of file operation can be related to from operation, when being input as the file of only include file inventory, directly can perform read operation, after lists of documents have read, just can carry out having resolved.When implementing, general selection two groups of files carry out subsequent operation, that is, can carry out by two groups of two groups of files.
Particularly, complete lists of documents extraction mechanism can be preset, then the type of the file that user inputs and operation automatically can be judged in the process of reality execution, take corresponding operation according to the input type determined simultaneously, from the file and operation of input, correctly extract the content of the lists of documents part containing asterisk wildcard.
Step 102: the filename containing asterisk wildcard in the lists of documents of two groups of files is matched between two;
That is, first can obtain the filename containing asterisk wildcard in each group of inventory from the two groups of files needing to compare, then the filename containing asterisk wildcard in two groups of files be matched between two, such as, coupling pairing can be carried out according to similarity.
Step 103: the intersection operation based on oriented topological sequences is carried out to the filename after pairing between two, obtains the candidate's topological sequences occured simultaneously;
Particularly, topological digraph of equal value is built respectively for needing the filename of comparison, and merge, obtain all topological sequences of digraph after merging, add weights and filter out legal sequence, adjacent node merging treatment is carried out to the topological sequences after screening, till can not remerging, the topological sequences of the one or more candidates obtained after merging.
Step 104: described candidate's topological sequences is reduced to the filename expression formula containing asterisk wildcard, and according to reduction result, the filename determining to generate this filename between converging relation;
Particularly, after described candidate's topological sequences is reduced to the filename expression formula containing asterisk wildcard, according to reduction result, the filename determining to generate this filename between converging relation before, first can also carry out legitimacy screening to the filename expression formula containing asterisk wildcard restored, eliminate the expression formula wherein repeated simultaneously, and then determine converging relation each other.
The principle that this step realizes can be: the topological sequences obtained after being merged by node is reduced into expression formula, then according to the naming rule of Hosts file, legitimacy screening is carried out to expression formula, eliminate the expression formula of repetition simultaneously, after completing aforesaid operations, just the relation between file set that two filenames refer to can be obtained, namely determine that the pass between the set that two filenames refer to is: to comprise, involved, irrelevant or have one between common factor, and obtain the relational expression between file set.
Such as: after calculating, there is no candidate's topological sequences, or when the filename expression formula that restores of candidate's topological sequences is not all screened by legitimacy, then can determine that the filename of these two groups of lists of documents occurs simultaneously for empty; When the filename expression formula restored is identical with in two filenames of input, be then comprise and involved relation between the filename can determining these two groups of lists of documents; When by after screening and eliminating repetition, obtain the expression formula all not identical with two import file names, then can using this expression formula as common factor, this common factor is just equivalent to refer to the lap of two filename indications for file extent.
Step 105: gather merging to the converging relation between each pair of filename determined, with the converging relation between the lists of documents determining two groups of files;
If between two without common factor between the filename in lists of documents, then show there is no lap between two inventories, if all expression formulas are all the subsets of file name suggestion list B in lists of documents A, then can determine that file name suggestion list A is the subset of file name suggestion list B.
Step 106: according to the converging relation between the lists of documents of the two groups of files determined, the operational order in response to user's input processes two groups of files.
The operation selected according to user performs corresponding operation to lists of documents, if what user selected is union operation, then according to the common factor result of calculation of lists of documents, union operation is carried out to lists of documents, if what user selected is cutting operation, then according to the common factor result of calculation of lists of documents, cutting operation is carried out to lists of documents, if that user's selection is contrast operation, then the common factor result of calculation of direct output file inventory.Namely, on the basis of lists of documents common factor comparison analysis result, need to perform corresponding operation (such as: merge, split, check than equity) to lists of documents according to user, after corresponding operational processes is carried out to lists of documents, output processing result.
In the above-described embodiments, extract the lists of documents of two groups of files, and to wherein matching containing the entry of asterisk wildcard the computing that seeks common ground between two, the result row calculated that entry sought common ground again gathers further and arranges, thus the content of mutual relation of inclusion between the lists of documents obtaining two groups of files and common factor part, further, introduce the character string intersection algorithm based on topological digraph, achieve the accurate analysis of all kinds of asterisk wildcard in lists of documents, thus solve in prior art the technical matters be difficult to carrying out resolving fast and exactly containing the lists of documents of asterisk wildcard, reach and effectively improve the efficiency of file management and the technique effect of reliability.
Below in conjunction with a specific embodiment, above-mentioned document handling method is described, as shown in Figure 2, comprises the steps:
Step 201: user selects action type, and specify two or more sets operations needing to carry out resolving contrast, or the input file of include file inventory;
Step 202: after receiving the input of user, the action type that recording user is selected, simultaneously to user input each group job or file resolve one by one, extract the content of lists of documents, and check wherein whether comprise asterisk wildcard, if there is the content containing asterisk wildcard in lists of documents, then perform step 203, otherwise then perform step 208.
Step 203: according to the lists of documents extracted from operation, matches the filename in difference group lists of documents between two and forms filename pair.
Step 204: to the filename generated to by the intersection operation carried out based on oriented topological sequences, obtain the candidate's topological sequences occured simultaneously.
For Hosts file name AB.**.CD.** and AB.CD.**.EF, ". " is separator, and * * is the asterisk wildcard of any hop count of representative, any amount character.When carrying out common factor and calculating, first filename is converted to respectively the digraph of band edge weights as follows:
Sequence 1: first node-(0)-AB-(1)-CD-(1)-end-node
Sequence 2: first node-(0)-AB.CD-(1)-EF-(0)-end-node
Wherein, the numeral in horizontal line bracket is limit weights, and when there being asterisk wildcard, limit weights get 1.
Then, above-mentioned two sequences are carried out union operation, obtain not containing the topological sequences of limit weights:
First node---AB.CD---AB---CD---EF---end-node
Concrete, carrying out joint account to belonging to not homotactic adjacent node, and add limit weights, is AB.CD after AB.CD node and AB node merge, and merging back weights is still that 1, CD and EF cannot merge without occuring simultaneously, and therefore retains original limit weights.
The common factor sequence finally calculated is:
First node-(0)-AB.CD-(1)-CD-(1)-EF-(0)-end-node
Step 205: the common factor candidate topological sequences calculated is reduced to the filename expression formula containing asterisk wildcard, such as, can be reduced to expression formula AB.CD.**.CD.**.EF by above-mentioned common factor sequence.
Step 206: validity checking and screening are carried out to the expression formula restored, remove repeat expression formula, obtain filename occur simultaneously expression formula, and according to common factor expression formula judge input each pair of filename between relation.
Concrete, validity checking screening can be carry out based on the Naming conventions restriction of file system, such as: whether the length of the filename expression formula removing asterisk wildcard part of common factor exceedes restriction, whether the hop count of filename exceedes permission quantity, and whether the length of every section exceedes file system restriction etc.
If common factor expression formula is empty, then show the file zero lap that two the filename expression formulas inputted refer to, if common factor expression formula is identical with in original two filename expression formulas, then showing that this table of file name reaches the file that formula refers to is the subset of another expression formula, otherwise shows to there is no relation of inclusion between two filenames.
Step 207: the converging relation between each pair of filename draw process gathers, merges, to obtain the converging relation between each group of lists of documents.If between two without common factor between the filename in inventory, then show two inventory zero lap parts; If all expression formulas are all the subsets of file name suggestion list B in lists of documents A, then show that file name suggestion list A is also the subset of file name suggestion list B.
Step 208: owing to not comprising asterisk wildcard in lists of documents, only the filename in the lists of documents of each group need be compared one by one, identical filename is the common factor part of inventory.
Step 209: the selection according to user performs corresponding operating to these lists of documents: if the operation of user's selection combining, then perform step 210; If user selects cutting operation, then perform step 211; If user's comparative selection operates, then perform step 212.
Step 210: according to the common factor result of calculation of lists of documents, union operation is carried out to lists of documents.
Step 211: according to the common factor result of calculation of lists of documents, cutting operation is carried out to lists of documents.
Step 212: the result of the common factor calculating of output file inventory.
As shown in Figure 3, be the concrete statement to above-mentioned steps 204 to step 207, that is, the intersection operation based on oriented topological sequences can comprise the steps:
Step 301: filename NAMEA and NAMEB for input builds topological digraph of equal value respectively, and merges;
Step 302: calculate the topological digraph after merging, generates corresponding all possible topological sequences, and is topological sequences interpolation limit weights;
Step 303: node merging is carried out, till can not remerging to the circulation of each topological sequences;
Step 304: the topological sequences after being combined screens, the candidate obtaining candidate occurs simultaneously topological sequences;
Step 305: the common factor topological sequences of candidate is reduced to the filename expression formula containing asterisk wildcard;
Step 306: eliminate the filename expression formula repeated, and carry out legitimacy screening, obtains the partly corresponding expression formula EXP that occurs simultaneously;
Step 307: the mutual relationship of NAMEA and NAMEB is judged according to EXP: if EXP is as empty, then the file extent non-overlapping copies representated by NAMEA and NAMEB, does not exist common factor; If the expression formula of EXP is identical with NAMEA or NAMEB, then illustrate between NAMEA and NAMEB as comprising and involved relation; If the expression formula of EXP and NAMEA or NAMEB are not identical, illustrate that NAMEA and NAMEB partly overlaps, there is the common factor that expression formula is EXP.
By the document handling method that above-described embodiment provides, solve a difficult problem of resolving containing asterisk wildcard lists of documents, provide a set of to file backup, recover, copy, delete in operation the scheme that the lists of documents comprising asterisk wildcard carry out fast resolving, and can when without the need to realizing contrast to these type of lists of documents, merging and cutting operation when manual intervention.First, achieve the accurate parsing containing asterisk wildcard lists of documents, due to the abstractness containing asterisk wildcard lists of documents, the mode of artificial parsing can only by virtue of experience be estimated, entirely accurate cannot be realized resolve, by the intersection algorithm based on oriented topological sequences introduced, solve a difficult problem of accurately these type of lists of documents being resolved, secondly, improve the resolution speed containing asterisk wildcard lists of documents, decrease resource overhead, compared by artificial disposal route more in the past, resolve time shorten more than 80%, further, analysis result has very strong versatility, the bookkeeping of Hosts file inventory can be directly used in without the need to process, and to resolve due to the program lists of documents common factor result obtained be provide with the form of expression formula, can easily merge containing asterisk wildcard lists of documents based on this result, segmentation and contrast operation.
Based on same inventive concept, additionally provide a kind of document handling apparatus in the embodiment of the present invention, as described in the following examples.The principle of dealing with problems due to document handling apparatus is similar to document handling method, and therefore the enforcement of document handling apparatus see the enforcement of document handling method, can repeat part and repeat no more.Following used, term " unit " or " module " can realize the software of predetermined function and/or the combination of hardware.Although the device described by following examples preferably realizes with software, hardware, or the realization of the combination of software and hardware also may and conceived.Fig. 4 is a kind of structured flowchart of the document handling apparatus of the embodiment of the present invention, as shown in Figure 4, comprise: acquisition module 401, matching module 402, topology calculate module 403, converging relation determination module 404, merging module 405 and processing unit 406, be described this structure below.
Acquisition module 401, for obtaining the lists of documents needing two groups of files of resolving contrast;
Matching module 402, for matching the filename containing asterisk wildcard in the lists of documents of two groups of files between two;
Topology calculate module 403, for carrying out the intersection operation based on oriented topological sequences to the filename after pairing between two, obtains the candidate's topological sequences occured simultaneously;
Converging relation determination module 404, for described candidate's topological sequences being reduced to the filename expression formula containing asterisk wildcard, and according to reduction result, the filename determining to generate this filename between converging relation;
Merge module 405, for gathering merging to the converging relation between each pair of filename determined, with the converging relation between the lists of documents determining two groups of files;
Processing module 406, for according to the converging relation between the lists of documents of the two groups of files determined, the operational order in response to user's input processes two groups of files.
In one embodiment, converging relation determination module 404 can comprise: validity checking unit, for carrying out validity checking to the filename expression formula restored; Common factor judging unit, for the result according to validity checking, the filename determining to generate this filename according to following rule between converging relation:
1) if the filename expression formula restored is not by validity checking, then determine to generate this table of file name and reach the right common factor of the filename of formula for empty;
2) if the filename expression formula restored is by validity checking, and the filename expression formula restored is identical with the filename of filename centering generating this filename, then determine to generate filename that this table of file name reaches formula between for comprising and involved relation;
3) if the filename expression formula restored is by validity checking, and to reach any one filename of the filename centering of formula all different from generating this table of file name, then determine to generate filename that this table of file name reaches formula between be overlapping relation.
In one embodiment, validity checking unit can comprise: compare subelement, for the filename hop count in the filename restored expression formula, the length of removing asterisk wildcard part, the length of every segment file name and predetermined restriction rule being compared; Determine subelement, for when the filename hop count in the filename expression formula restored, the length of removing asterisk wildcard part, the length of every segment file name all meet restriction rule, determine to pass through validity checking.
In one embodiment, each pair of filename that topology calculate module 403 may be used in the filename after to pairing between two performs following operation: the equivalence topology digraph building two filenames of this centering respectively; The equivalence topology digraph of these two filenames is merged, obtains all topological sequences of the digraph after merging, add weights and filter out legal sequence; Topological sequences after screening is carried out to the merging treatment of adjacent node, till can not remerging, to obtain candidate's topological sequences of the one or more common factors after merging.
In another embodiment, additionally provide a kind of software, this software is for performing the technical scheme described in above-described embodiment and preferred implementation.
In another embodiment, additionally provide a kind of storage medium, store above-mentioned software in this storage medium, this storage medium includes but not limited to: CD, floppy disk, hard disk, scratch pad memory etc.
From above description, can find out, the embodiment of the present invention achieves following technique effect: in the above-described embodiments, extract the lists of documents of two groups of files, and to wherein matching containing the entry of asterisk wildcard the computing that seeks common ground between two, the result row calculated that entry sought common ground again gathers further and arranges, thus the content of mutual relation of inclusion between the lists of documents obtaining two groups of files and common factor part, further, introduce the character string intersection algorithm based on topological digraph, achieve the accurate analysis of all kinds of asterisk wildcard in lists of documents, thus solve in prior art the technical matters be difficult to carrying out resolving fast and exactly containing the lists of documents of asterisk wildcard, reach and effectively improve the efficiency of file management and the technique effect of reliability.
Obviously, those skilled in the art should be understood that, each module of the above-mentioned embodiment of the present invention or each step can realize with general calculation element, they can concentrate on single calculation element, or be distributed on network that multiple calculation element forms, alternatively, they can realize with the executable program code of calculation element, thus, they can be stored and be performed by calculation element in the storage device, and in some cases, step shown or described by can performing with the order be different from herein, or they are made into each integrated circuit modules respectively, or the multiple module in them or step are made into single integrated circuit module to realize.Like this, the embodiment of the present invention is not restricted to any specific hardware and software combination.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the embodiment of the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (8)

1. a document handling method, is characterized in that, comprising:
Obtain the lists of documents needing two groups of files of resolving contrast;
Filename containing asterisk wildcard in the lists of documents of two groups of files is matched between two;
Intersection operation based on oriented topological sequences is carried out to the filename after pairing between two, obtains the candidate's topological sequences occured simultaneously;
Described candidate's topological sequences is reduced to the filename expression formula containing asterisk wildcard, and according to reduction result, the filename determining to generate this filename between converging relation;
Merging is gathered to the converging relation between each pair of filename determined, with the converging relation between the lists of documents determining two groups of files;
According to the converging relation between the lists of documents of the two groups of files determined, the operational order in response to user's input processes two groups of files.
2. the method for claim 1, is characterized in that, according to reduction result, the filename determining to generate this filename between converging relation, comprising:
Validity checking is carried out to the filename expression formula restored;
According to the result of validity checking, the filename determining to generate this filename according to following rule between converging relation:
If the filename expression formula restored is not by validity checking, then determines to generate this table of file name and reach the right common factor of the filename of formula for empty;
If the filename expression formula restored is by validity checking, and the filename expression formula restored is identical with the filename of filename centering generating this filename, then determine to generate filename that this table of file name reaches formula between for comprising and involved relation;
If the filename expression formula restored is by validity checking, and to reach any one filename of the filename centering of formula all different from generating this table of file name, then determine to generate filename that this table of file name reaches formula between be overlapping relation.
3. the method for claim 1, is characterized in that, carries out validity checking, comprising the filename expression formula restored:
Filename hop count in the filename expression formula restored, the length of removing asterisk wildcard part, the length of every segment file name and predetermined restriction rule are compared;
If all meet restriction rule, then determine to pass through validity checking.
4. method as claimed any one in claims 1 to 3, is characterized in that, carries out the intersection operation based on oriented topological sequences, obtain the candidate's topological sequences occured simultaneously, comprising the filename after pairing between two:
Following operation is performed to each pair of filename in the filename after pairing between two:
Build the equivalence topology digraph of two filenames of this centering respectively;
The equivalence topology digraph of these two filenames is merged, obtains all topological sequences of the digraph after merging, add weights and filter out legal sequence;
Topological sequences after screening is carried out to the merging treatment of adjacent node, till can not remerging, to obtain candidate's topological sequences of the one or more common factors after merging.
5. a document handling apparatus, is characterized in that, comprising:
Acquisition module, for obtaining the lists of documents needing two groups of files of resolving contrast;
Matching module, for matching the filename containing asterisk wildcard in the lists of documents of two groups of files between two;
Topology calculate module, for carrying out the intersection operation based on oriented topological sequences to the filename after pairing between two, obtains the candidate's topological sequences occured simultaneously;
Converging relation determination module, for described candidate's topological sequences being reduced to the filename expression formula containing asterisk wildcard, and according to reduction result, the filename determining to generate this filename between converging relation;
Merge module, for gathering merging to the converging relation between each pair of filename determined, with the converging relation between the lists of documents determining two groups of files;
Processing module, for according to the converging relation between the lists of documents of the two groups of files determined, the operational order in response to user's input processes two groups of files.
6. device as claimed in claim 5, it is characterized in that, described converging relation determination module comprises:
Validity checking unit, for carrying out validity checking to the filename expression formula restored;
Common factor judging unit, for the result according to validity checking, the filename determining to generate this filename according to following rule between converging relation:
If the filename expression formula restored is not by validity checking, then determines to generate this table of file name and reach the right common factor of the filename of formula for empty;
If the filename expression formula restored is by validity checking, and the filename expression formula restored is identical with the filename of filename centering generating this filename, then determine to generate filename that this table of file name reaches formula between for comprising and involved relation;
If the filename expression formula restored is by validity checking, and to reach any one filename of the filename centering of formula all different from generating this table of file name, then determine to generate filename that this table of file name reaches formula between be overlapping relation.
7. device as claimed in claim 5, it is characterized in that, described validity checking unit comprises:
Relatively subelement, for comparing the filename hop count in the filename restored expression formula, the length of removing asterisk wildcard part, the length of every segment file name and predetermined restriction rule;
Determine subelement, for when the filename hop count in the filename expression formula restored, the length of removing asterisk wildcard part, the length of every segment file name all meet restriction rule, determine to pass through validity checking.
8. the device according to any one of claim 5 to 7, is characterized in that, described topology calculate module is specifically for performing following operation to each pair of filename in the filename after pairing between two:
Build the equivalence topology digraph of two filenames of this centering respectively;
The equivalence topology digraph of these two filenames is merged, obtains all topological sequences of the digraph after merging, add weights and filter out legal sequence;
Topological sequences after screening is carried out to the merging treatment of adjacent node, till can not remerging, to obtain candidate's topological sequences of the one or more common factors after merging.
CN201510437027.1A 2015-07-23 2015-07-23 Document handling method and device Active CN104991963B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510437027.1A CN104991963B (en) 2015-07-23 2015-07-23 Document handling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510437027.1A CN104991963B (en) 2015-07-23 2015-07-23 Document handling method and device

Publications (2)

Publication Number Publication Date
CN104991963A true CN104991963A (en) 2015-10-21
CN104991963B CN104991963B (en) 2018-09-25

Family

ID=54303778

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510437027.1A Active CN104991963B (en) 2015-07-23 2015-07-23 Document handling method and device

Country Status (1)

Country Link
CN (1) CN104991963B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110109672A (en) * 2019-04-17 2019-08-09 北京奇安信科技有限公司 A kind of analyzing and processing method and device of expression formula
CN112527952A (en) * 2019-09-18 2021-03-19 本田技研工业株式会社 File comparison system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117324A (en) * 2011-02-24 2011-07-06 上海北大方正科技电脑系统有限公司 File management method and management system applying fuzzy matrice
CN102693302A (en) * 2012-05-21 2012-09-26 浙江省公众信息产业有限公司 Quick file comparison method, system and client side
US20120296923A1 (en) * 2011-05-20 2012-11-22 International Business Machines Corporation Method, program, and system for converting part of graph data to data structure as an image of homomorphism

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117324A (en) * 2011-02-24 2011-07-06 上海北大方正科技电脑系统有限公司 File management method and management system applying fuzzy matrice
US20120296923A1 (en) * 2011-05-20 2012-11-22 International Business Machines Corporation Method, program, and system for converting part of graph data to data structure as an image of homomorphism
CN102693302A (en) * 2012-05-21 2012-09-26 浙江省公众信息产业有限公司 Quick file comparison method, system and client side

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110109672A (en) * 2019-04-17 2019-08-09 北京奇安信科技有限公司 A kind of analyzing and processing method and device of expression formula
CN110109672B (en) * 2019-04-17 2023-01-10 奇安信科技集团股份有限公司 Analysis processing method and device for expression
CN112527952A (en) * 2019-09-18 2021-03-19 本田技研工业株式会社 File comparison system
CN112527952B (en) * 2019-09-18 2024-04-30 本田技研工业株式会社 File comparison system

Also Published As

Publication number Publication date
CN104991963B (en) 2018-09-25

Similar Documents

Publication Publication Date Title
US10705748B2 (en) Method and device for file name identification and file cleaning
US9442979B2 (en) Data analysis using multiple systems
CN107783850B (en) Method, device, server and system for analyzing node tree checking record
CN106557307B (en) Service data processing method and system
US20190005057A1 (en) Methods and Devices for File Folder Path Identification and File Folder Cleaning
CN115470191A (en) Database updating system, method and corresponding computer equipment and storage medium
CN104991963A (en) File processing method and file processing apparatus
CN108228813B (en) Method and device for deleting duplicate database in distributed system
CN104391945B (en) The treating method and apparatus of database file data directory
CN111176901B (en) HDFS deleted file recovery method, terminal device and storage medium
CN106874243B (en) Formula processing method and device based on character string
CN112433757A (en) Method and device for determining interface calling relationship
CN110008178B (en) Distributed file system metadata organization method and device
CN104199689A (en) Method and device for installing comprehensive front end system
CN114118944A (en) Forensic laboratory grading management method, terminal device and storage medium
CN113760237A (en) Compiling address updating method and device, terminal equipment and readable storage medium
CN109408290B (en) Fragmented file recovery method and device based on InoDB and storage medium
CN113641523A (en) Log processing method and device
CN109088859B (en) Method, device, server and readable storage medium for identifying suspicious target object
CN111240873A (en) Code error record management method and device
KR101638048B1 (en) Sql query processing method using mapreduce
JP2016071725A (en) Workflow control program, workflow control method and information processing unit
CN106708606B (en) Data processing method and device based on MapReduce
CN112882721B (en) Software package compiling method and device
CN110109892A (en) A kind of data migration method, device and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant