CN117852521B - Data calculation result file comparison method, system and comparison configuration system - Google Patents

Data calculation result file comparison method, system and comparison configuration system Download PDF

Info

Publication number
CN117852521B
CN117852521B CN202410261953.7A CN202410261953A CN117852521B CN 117852521 B CN117852521 B CN 117852521B CN 202410261953 A CN202410261953 A CN 202410261953A CN 117852521 B CN117852521 B CN 117852521B
Authority
CN
China
Prior art keywords
comparison
rule
preset
data
calculation result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410261953.7A
Other languages
Chinese (zh)
Other versions
CN117852521A (en
Inventor
易真
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Zhongke Hexun Technology Co ltd
Original Assignee
Chengdu Zhongke Hexun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Zhongke Hexun Technology Co ltd filed Critical Chengdu Zhongke Hexun Technology Co ltd
Priority to CN202410261953.7A priority Critical patent/CN117852521B/en
Publication of CN117852521A publication Critical patent/CN117852521A/en
Application granted granted Critical
Publication of CN117852521B publication Critical patent/CN117852521B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a data calculation result file comparison method, a system and a comparison configuration system, which are used for obtaining two data calculation result files to be compared, at least completing a group of data comparison of preset comparison line numbers, and carrying out line-by-line comparison on any group of data, wherein the method comprises the following steps: dividing each row of characters into character units according to blank characters; if the comparison rule is preset, comparing according to the preset comparison rule, otherwise judging whether an English comma or an English period exists, and if so, ignoring the English comma or the period and comparing the numerical values. The comparison rules comprise one or more of one type of comparison rules and two types of comparison rules; the comparison rule comprises any one or more of a first neglecting rule and a second neglecting rule; the second class comparison rule comprises a first numerical error judgment rule or a second numerical error judgment rule. According to the application scheme, redundant information in the comparison result can be reduced, and the auditing and checking efficiency is improved.

Description

Data calculation result file comparison method, system and comparison configuration system
Technical Field
The invention relates to the technical field of computer data processing, in particular to a data calculation result file comparison method, a data calculation result file comparison system and a comparison configuration system.
Background
The comparison of the numerical calculation result files is to compare the result files after the calculation of the same use case file by different algorithms (such as a calculation program and a verification program). In the software research and development process, the debugging and testing efficiency of the software has great influence on the reliability of calculation and analysis of the software, and the research and development and testing progress of the software can be effectively accelerated by developing a calculation file comparison and analysis tool, so that the debugging and analysis capability of the calculation result of the software is improved. At present, most of debugging and analysis of the calculated result are manually checked, so that a great deal of time is spent, and the efficiency is low.
The existing B/S and C/S end comparison tools have single comparison rules, and a great amount of redundant information exists in comparison results, so that the auditing and checking efficiency is lower.
Disclosure of Invention
The technical problem to be solved by the application is to provide a data calculation result file comparison method, a data calculation result file comparison system and a data calculation result file comparison configuration system, which have the characteristics of reducing redundant information in comparison results and improving auditing and checking efficiency.
In a first aspect, an embodiment provides a data calculation result file comparing method, including:
Obtaining two data calculation result files needing to be compared, at least completing a group of data comparison of a preset comparison line number, and performing line-by-line comparison on any group of data, wherein the method comprises the following steps:
dividing each row of characters into character units according to blank characters;
judging whether preset comparison rules are preset, if so, comparing according to the preset comparison rules, if not, judging whether English commas or English periods exist, and if so, ignoring the English commas or the English periods and then comparing the values; for any one of the numerical value comparisons, if the numerical values of the comparison are the same, the character units which are currently segmented are considered to be the same, and if the numerical values of the comparison are different, the character units which are currently segmented are considered to be different;
the comparison rules comprise one or more of one type of comparison rules and two types of comparison rules;
the comparison rule comprises any one or more of a first neglecting rule and a second neglecting rule; the first ignoring rule includes: judging whether a first neglected marker exists in the character unit, and if so, considering that the line comprising the first neglected marker and the line comprising the second neglected marker are identical to each other; the second ignoring rule includes: judging whether a third neglected marker exists in the character unit, and if so, considering the content of the row where the third neglected marker exists as the same;
The second class comparison rule comprises a first numerical error judgment rule or a second numerical error judgment rule; the first numerical error judgment rule includes: judging whether the difference of the compared numerical values is within a preset percentage threshold value range, if so, considering that the character units which are currently segmented are the same, and if not, considering that the character units which are currently segmented are different; the second numerical error judgment rule includes: judging whether the difference between the compared numerical values is within a preset difference threshold value range, if so, considering that the character units which are currently segmented are the same, and if not, considering that the character units which are currently segmented are different.
In one embodiment, the obtaining two data calculation result files to be compared at least completes a set of data comparison of a preset comparison line number includes: if the data of at least one file in the next group of data cannot meet the preset comparison line number requirement and the data line numbers of the two files are inconsistent, the second-class comparison rule is not executed in the data comparison.
In an embodiment, the obtaining two data calculation result files to be compared at least completes a set of data comparison of a preset comparison line number, and performs line-by-line comparison on any set of data, and further includes: judging whether the compared character units are non-numerical values, and if so, judging that the character units are not numerical values ending with English commas or English periods, and considering that the character units which are currently segmented are the same.
In one embodiment, a method step of determining whether the character units to be compared are non-numeric is performed first, and then a method step of determining whether a comparison rule is preset is performed.
In one embodiment, the comparison rule further includes three types of comparison rules, the three types of comparison rules including:
judging whether the number of character units in the same row is the same, if not, judging whether the number of the character units is a number or the number ends with an English comma or an English period, and if the number of the character units is a number or the number ends with an English comma or an English period, considering that the character units which are currently segmented are different.
In a second aspect, the present application provides a data calculation result file comparison configuration system configured to configure comparison rules of a data calculation result file to be compared, including any one or more of a first comparison rule configuration module and a second comparison rule configuration module;
the comparison rule configuration module comprises any one or more of a first neglecting rule configuration unit and a second neglecting rule configuration unit;
The first neglecting rule configuration unit is configured to preset a first neglecting marker and a second neglecting marker matched with the first neglecting marker, judge whether the first neglecting marker exists in the character unit or not based on the first neglecting marker and the second neglecting marker, and if so, consider the content comprising the line where the first neglecting marker exists and the line where the second neglecting marker exists and between the line where the second neglecting marker exists as the same in the subsequent character unit; the second neglecting rule configuration unit is configured to preset a third neglecting marker, judge whether the third neglecting marker exists in the character unit based on the third neglecting marker, and consider the content of the row where the third neglecting marker exists to be the same if the third neglecting marker exists;
The second class comparison rule configuration module comprises a first numerical error judgment rule configuration unit and/or a second numerical error judgment rule configuration unit, and is configured to configure the second class comparison rule based on the configured first numerical error judgment rule configuration unit or the second numerical error judgment rule configuration unit or configure the second class comparison rule based on the configured first numerical error judgment rule configuration unit and the second numerical error judgment rule configuration unit and the highest priority in the preset first numerical error judgment rule configuration unit and second numerical error judgment rule configuration unit;
The first numerical error judgment rule configuration unit is configured to be used for presetting a percentage threshold range, judging whether the difference of the compared numerical values is in the preset percentage threshold range or not based on the preset percentage threshold range, if so, considering that the character units which are currently segmented are the same, and if not, considering that the character units which are currently segmented are different; the second numerical error judgment rule configuration unit is configured to be used for presetting a difference threshold range, judging whether the difference of the compared numerical values is in the preset difference threshold range or not based on the preset difference threshold range, if so, considering that the character units which are currently segmented are the same, and if not, considering that the character units which are currently segmented are different.
In one embodiment, the first numerical error determination rule configuration unit includes a plurality of percentage threshold range configuration subunits configured to determine whether a difference between the compared numerical values is within a preset percentage threshold range based on a selected percentage threshold range configuration subunit or a highest priority percentage threshold range configuration subunit of the plurality of percentage threshold range configuration subunits.
In a third aspect, the present application provides a data calculation result file comparison system, including:
The data calculation result file acquisition interface is configured to acquire two data calculation result files needing to be compared;
The data calculation result file comparison module is configured to complete comparison of at least one group of data based on a preset comparison line number, and perform line-by-line comparison on any group of data, and comprises the following steps:
A character unit dividing unit configured to divide each line of characters into respective character units according to a blank;
The comparison rule judging unit is configured to judge whether a comparison rule is preset, if yes, the comparison rule configured by the data calculation result file comparison configuration system according to any embodiment is compared, if no, whether an English comma or an English period exists is judged, if yes, the English comma or the English period is ignored, and then the numerical value is compared; for any one of the numerical value comparisons, if the numerical values of the comparisons are the same, the character units currently segmented are considered to be the same, and if the numerical values of the comparisons are different, the character units currently segmented are considered to be different.
In an embodiment, the data calculation result file comparison module is further configured to determine a number of lines of the data to be compared in the next group, and if the data of at least one file cannot meet the preset comparison line number requirement and the number of lines of the data of the two files are inconsistent in the next group of data, the second-class comparison rule is not executed in the data comparison.
The beneficial effects of the invention are as follows:
the requirement of the actual comparison rule of the user can be better met, when the data calculation result file is compared, comparison can be carried out according to the preset comparison rule, the corresponding content can be ignored and compared by the neglect rule existing in the preset comparison rule, and errors which do not affect the result can be identified as the same by the numerical comparison rule, so that redundant information in the comparison result can be reduced, the auditing and checking efficiency is improved, the work efficiency of enterprises is improved, and the work cost is reduced by the comparison scheme of the application.
Drawings
FIG. 1 is a block diagram of a data calculation result file comparison configuration system according to an embodiment of the present application;
FIG. 2 is a directory hierarchy diagram of a folder-based acquisition comparison file according to one embodiment of the present application;
FIG. 3 is a flow chart of a comparison method of data calculation result files according to an embodiment of the application;
FIG. 4 is a flow chart of a method according to an embodiment of the step S20 in FIG. 3;
FIG. 5 is a flow chart of another embodiment of the step S20 in FIG. 3 according to the present application;
FIG. 6 is a flow chart of a comparison method of data calculation result files according to another embodiment of the application;
FIG. 7 is a block diagram of a data calculation result file comparison system according to an embodiment of the present application.
In the drawings: 01 represents a first class comparison rule configuration module, 011 represents a first comparison rule configuration unit, 012 represents a second comparison rule configuration unit, 02 represents a second class comparison rule configuration module, 021 represents a first numerical error judgment rule configuration unit, 022 represents a second numerical error judgment rule configuration unit, 11 represents a data calculation result file acquisition interface, 12 represents a data calculation result file comparison module, 1201 represents a character unit segmentation unit, and 1202 represents a comparison rule judgment unit.
Detailed Description
The application will be described in further detail below with reference to the drawings by means of specific embodiments. Wherein like elements in different embodiments are numbered alike in association. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted, or replaced by other elements, materials, or methods in different situations. In some instances, related operations of the present application have not been shown or described in the specification in order to avoid obscuring the core portions of the present application, and may be unnecessary to persons skilled in the art from a detailed description of the related operations, which may be presented in the description and general knowledge of one skilled in the art.
Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.
The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning.
For convenience of explanation of the inventive concept of the present application, a brief explanation of the comparison technique of the numerical calculation result file is described below.
The comparison of the numerical calculation result files is to compare the result files after the calculation of the same use case file by different algorithms (such as a calculation program and a verification program).
The existing B/S and C/S end comparison tools basically compare by fixed rules, and can not filter out non-numerical data, so long as the data are compared, no matter what type of data (character strings, time, numerical values and the like) are compared, and error labeling can be carried out as long as different places are available. In addition, the existing comparison tools take single characters as units to participate in comparison, and as long as the single characters are different, different single characters are marked, so that the specific implementation of numerical comparison is difficult to meet.
Applicants found in the study that: the existing comparison tools have several problems:
(1) Whether or not the data is of a numerical type, the data is involved in comparison (abcd and abc are marked as different from each other by a mark "d", and the data is not of a numerical type and should not be compared);
(2) The contrast is the same, only different characters will be marked, (123. Contrast with 123, only the difference of "the legend";
(3) The data of a certain line which is not concerned by the user can be compared when different occurs (the line still participates in the comparison when the user does not care the data of the line where the keyword is located, and the places where different occurs are marked different);
(4) The data of a certain interval which is not concerned by the user can also participate in comparison when the difference occurs (the data of all lines between the start marker and the end marker are not concerned by the user, and the interval still participates in comparison);
(5) It is not determined whether the data is a numerical value (similar to 1.0000e-06 and 1.0000e-006, the actual numerical values are equal, but the expressions are different and marked differently);
(6) For numerical values, if the error is within a certain range of absolute values, it is considered the same, but common comparison tools are labeled (the absolute error of 100 and 99 is 1, within the range of error, it is labeled differently);
(7) For numerical values, if the absolute error is within a certain percentage range, it is considered the same, but common comparison tools are labeled (100 and 90 are 10% error by 100 calculation percent, and are labeled differently within the error range).
When the compared data volume is very large, the comparison tool at the B/S end also often causes the problem of web page blocking, and even causes web page breakdown or computer crash due to the problem of overlarge data.
In view of the above problems, the application provides a data calculation result file comparison method, a data calculation result file comparison system and a data calculation result file comparison configuration system, which can better meet the requirements of actual comparison rules of users, reduce redundant information in comparison results and improve auditing and checking efficiency.
In order to clearly describe the scheme of the present application, a comparison configuration system for data calculation result files configured to configure comparison rules of data calculation result files to be compared will be described.
In one embodiment of the present application, please refer to fig. 1, a data calculation result file comparison configuration system includes one or more of a comparison rule configuration module 01 and a comparison rule configuration module 02, and the data calculation result file comparison configuration system may include only one comparison rule configuration module 01, only two comparison rule configuration modules 02, and both comparison rule configuration modules 01 and 02 (in the embodiment shown in fig. 1). The comparison rule configuration module 01 includes any one or more of a first ignore rule configuration unit 011 and a second ignore rule configuration unit 012, and the comparison rule configuration module 01 may include only the first ignore rule configuration unit 011, only the second ignore rule configuration unit 012, and may include both the first ignore rule configuration unit 011 and the second ignore rule configuration unit 012 (in the embodiment shown in fig. 1).
The first ignore rule configuration unit 011 is configured to preset a first ignore tag and a second ignore tag matching the first ignore tag, determine whether the first ignore tag exists in the character unit based on the first ignore tag and the second ignore tag, and if so, consider the content including the line where the first ignore tag exists and the line where the second ignore tag exists and between them as the same.
For example, the first ignore flag "start" and the second ignore flag "end" matching with "start" may be preset based on the first ignore rule configuration unit 011 for comparison of the ignore rule, and then whether "start" exists in the character unit or not may be determined based on the preset first ignore flag "start" and the second ignore flag "end" matching with "start", and if "end" exists in the subsequent character unit, the content of the row including "start" and "end" and the row between them is considered to be the same. Please refer to the calculation file A1 and the verification file B1 below, the first line will participate in the comparison, and the second to fourth lines may ignore the comparison and be considered as the same.
Computing file A1: wasd 1.0.0, 0.1..0.1, 0.2.. 0.3 1.0.5 start 21 32333 end 123
Verification file B1: wasd 1.0 22 22 33 44
In one embodiment of the present application, the first ignore rule configuration unit 011 may preset multiple sets of matched first ignore tags and second ignore tags.
Based on the first neglect rule configuration unit 011, a user can set neglect comparison on data of a certain section which is not concerned, so that the obtained comparison result can reduce a lot of redundant information, improve auditing and checking efficiency, and simultaneously reduce occupation of computer resources.
The second ignore rule configuration unit 012 is configured to preset a third ignore tag, and determine whether the third ignore tag exists in the character unit based on the third ignore tag, and if so, consider the content of the line where the third ignore tag exists as the same.
For example, the second ignore rule configuration unit 012 may preset a third ignore flag "cpu" to perform ignore rule comparison, and then determine whether "cpu" exists in the character unit based on the preset third ignore flag "cpu", and if so, consider the content of the line where "cpu" exists as the same. Please refer to the calculation file A2 and the verification file B2 below, the contents of the first row and the second row may be ignored for comparison and considered to be the same.
Computing file A2: cpu 123 234 345 123 234 345 456 abc 100 101
Verification File B2: 11 222 333 444 cpu 234 345 456 abc 100 101
In one embodiment of the present application, the second ignore rule configuration unit 012 may preset a plurality of third ignore markers.
Based on the second neglect rule configuration unit 012, the user can set neglect comparison on data of a certain line which is not concerned, so that the obtained comparison result can reduce a lot of redundant information, improve auditing and checking efficiency, and simultaneously reduce occupation of computer resources.
The second class comparison rule configuration module 02 includes a first numerical error determination rule configuration unit 021 and/or a second numerical error determination rule configuration unit 022. The second class comparison rule configuration module 02 is configured to configure the second class comparison rule based on the configured first numerical error determination rule configuration unit 021 or second numerical error determination rule configuration unit 022, or configure the second class comparison rule based on the configured first numerical error determination rule configuration unit 021 and second numerical error determination rule configuration unit 022 and the highest priority in the preset first numerical error determination rule configuration unit 021 and second numerical error determination rule configuration unit 022.
The second-class comparison rule configuration module 02 may include only the first numerical error determination rule configuration unit 021, or may include only the second numerical error determination rule configuration unit 022. When the second-class comparison rule configuration module 02 includes both the first numerical error determination rule configuration unit 021 and the second numerical error determination rule configuration unit 022 (in the embodiment shown in fig. 1), in one embodiment, the user can set only one of the configuration units or allow setting of both of the configuration units, but if both of the configuration units are set, the comparison rule is executed according to the configuration unit with the highest priority among the two configuration units, and the configuration unit with the highest priority may be preset as the first numerical error determination rule configuration unit 021, may be preset as the second numerical error determination rule configuration unit 022, or may be determined according to the setting behavior of the user, for example, according to the configuration unit set first by the user as the configuration unit with the highest priority, or according to the configuration unit set last by the user as the configuration unit with the highest priority.
The first numerical error judgment rule configuration unit 021 is configured to preset a percentage threshold range, judge whether the difference between the compared numerical values is within the preset percentage threshold range based on the preset percentage threshold range, if so, consider that the currently divided character units are the same, and if not, consider that the currently divided character units are different. The percentage threshold range may be an absolute percentage threshold range (e.g., absolute percentage threshold less than or equal to 5%) or a positive and negative percentage threshold range (e.g., -5% to +5%).
For a percentage threshold range, it may be defined that the error is within a certain percentage range (e.g. the calculated value b is 100, the rule value is set to 10%, the value a is within the range of b-b 10% < = a < = b + b 10%, and b and a are considered the same, or vice versa).
In one embodiment of the application, the user may set the percentage threshold range directly.
In one embodiment of the present application, the first numerical error determination rule configuration unit 021 includes a plurality of percentage threshold range configuration subunits configured to determine whether the difference between the compared numerical values is within a preset percentage threshold range based on the selected percentage threshold range configuration subunit or the highest priority percentage threshold range configuration subunit of the plurality of percentage threshold range configuration subunits. The user may select a desired percentage threshold range based on the provided percentage threshold range. For example, in one embodiment, the first numerical error determination rule configuration unit 021 includes a 3% percent threshold range configuration subunit, a 5% percent threshold range configuration subunit, and a 10% percent threshold range configuration subunit, and the user can select a desired percent threshold range configuration subunit based on the three percent threshold range configuration subunits. In one embodiment, the first numerical error determination rule configuration unit 021 is configured to allow only one percentage threshold range configuration subunit to be selected. In one embodiment, the first numerical error determination rule configuration unit 021 is configured to select a plurality of percentage threshold range configuration subunits, but determine whether the difference between the compared numerical values is within a preset percentage threshold range based on the percentage threshold range configuration subunit with the highest priority. The highest-priority percentage threshold range configuration subunit may be preset as a certain percentage threshold range configuration subunit, may be preset as a percentage threshold range configuration subunit with the largest percentage range or the lowest percentage range or other percentage threshold range configuration subunits, may also be determined according to a setting behavior of a user, for example, the percentage threshold range configuration subunit with the highest priority is selected as the first-selection percentage threshold range configuration subunit by the user, and may also be the percentage threshold range configuration subunit with the highest priority is selected as the last-selection percentage threshold range configuration subunit by the user.
Based on the first numerical error judgment rule configuration unit 021, on one hand, whether the data is numerical can be directly judged, on the other hand, the percentage error can be set according to the requirement of the user on the percentage error, if the error is within the preset allowable percentage range, the error can be identified as the same, and the error is not identified as different as long as a gap exists, so that the obtained comparison result can reduce a lot of redundant information and improve the auditing and checking efficiency.
The second numerical error judgment rule configuration unit 022 is configured to preset a difference threshold range, judge whether the difference of the compared numerical values is within the preset difference threshold range based on the preset difference threshold range, if so, consider that the character units currently segmented are the same, and if not, consider that the character units currently segmented are different. The difference range may be an absolute difference range (e.g., absolute difference less than or equal to 1) or a positive and negative difference range (e.g., -1 to +1).
For the range of difference thresholds, a limitation of the difference may be specified (e.g., the calculated value b is 100, the rule value is set to 10, the value a is in the range of b- (b-10) <=a < =b+ (b-10), and b and a are considered the same, otherwise different).
Based on the second numerical error judgment rule configuration unit 022, on one hand, whether the data is a numerical value can be directly judged, on the other hand, the difference error can be set according to the requirement of the user on the direct difference error, if the error is within the preset allowable difference threshold range, the error can be identified as the same, and the error is not identified as different as long as the difference exists, so that the obtained comparison result can reduce a lot of redundant information and improve the auditing and checking efficiency.
Based on the configured comparison rule, comparison of the data calculation result file can be performed, and in one embodiment of the present application, a data calculation result file comparison method includes: and acquiring two data calculation result files to be compared, at least completing one group of data comparison of a preset comparison line number, and comparing any group of data line by line. The preset comparison line number can be preset according to requirements, for example: 500 rows, 1000 rows, 3000 rows, etc.
Based on the setting of the comparison line number, the memory pressure of the database and the server can be reduced, and the preset line data can be pulled for comparison at one time.
In one embodiment, a method for obtaining two data calculation result files to be compared includes: the user directly selects two data calculation result files which need to be compared.
In one embodiment, a method for obtaining two data calculation result files to be compared includes: based on two folders selected by a user to participate in comparison, automatically screening out file types (such as. Th,. O,. I,. Txt) supporting analysis, matching according to directory levels, and matching if the names of the folders in which the files are located are the same as the folder levels and the file names are the same. Referring to fig. 2, the folders test1 and test2 are selected, and the test1 folder includes js folder, yz folder, 1.I file and 2.I file. the js folder, the yz folder, the 1.i file and the 3.i file are respectively arranged in the test2 folder, the js folder and the yz folder of the test1 are respectively provided with the 1.i file and the 2.i file, and the js folder and the yz folder of the test2 folder are respectively provided with the 1.i file and the 3.i file. The file matching relationship is: test1/1. I-test 2/1.I, test1/js/1. I-test 2/js/1.I, test1/yz/1. I-test 2/yz/1.I.
In one embodiment, a determination may be made based on the configured invalid file name, filtering out unwanted file data (e.g., without focusing on the "restart.i" file, the tool may automatically ignore the comparison match with the file name restart.i). Finally, the files to be compared are in one-to-one correspondence and uploaded to the server to form a comparison task.
In one embodiment, please refer to fig. 3, for a row-by-row comparison for any one set of data, comprising:
Step S10, judging whether a comparison rule is preset, if yes, proceeding to step S20, and if no, proceeding to step S30.
And S20, comparing according to a preset comparison rule. The comparison rule comprises one or more of a class comparison rule and a class comparison rule.
In one embodiment, referring to fig. 4, step S20 includes:
step S2011, judging whether a class of comparison rules is preset, if yes, comparing based on the preset class of comparison rules, and if no, proceeding to step S2012.
Wherein, one class of comparison rules comprises any one or more of a first neglecting rule and a second neglecting rule. Wherein the first ignoring rule comprises: judging whether a first neglected marker exists in the character unit, and if so, considering that the first neglected marker exists in the subsequent character unit and the second neglected marker matched with the first neglected marker exists in the subsequent character unit, and if so, considering that the line containing the first neglected marker and the line containing the second neglected marker are identical. The second ignoring rule includes: judging whether a third ignore tag exists in the character unit, and if so, considering the content of the row where the third ignore tag exists as the same.
In an embodiment, in the case that the first neglecting rule and the second neglecting rule are preset in the preset comparison rule, whether the judgment of the first neglecting rule and the second neglecting rule is met or not may be executed in parallel or may be executed sequentially, and in the case of executing sequentially, the front-back sequence is not limited.
Step S2012, judging whether a second class comparison rule is preset, and if so, comparing based on the preset second class comparison rule.
In one embodiment, the second class of comparison rules includes a first numerical error determination rule or a second numerical error determination rule. The first numerical error judgment rule includes: judging whether the difference between the compared numerical values is within a preset percentage threshold value range, if so, considering that the character units which are currently segmented are the same, and if not, considering that the character units which are currently segmented are different. The second numerical error judgment rule includes: judging whether the difference between the compared numerical values is within a preset difference threshold value range, if so, considering that the character units which are currently segmented are the same, and if not, considering that the character units which are currently segmented are different.
In one embodiment, the comparison rules further include three types of comparison rules, including: judging whether the number of character units in the same row is the same, if not, judging whether the number of the character units is a number or the number ends with an English comma or an English period, and if the number of the character units is a number or the number ends with an English comma or an English period, considering that the character units which are currently segmented are different.
In one embodiment, referring to fig. 5, step S20 includes:
step S2021, judging whether a class of comparison rules are preset, if yes, comparing based on the preset class of comparison rules, if not, proceeding to step S2022, otherwise proceeding to step S2022.
Step S2022, determining whether three types of comparison rules are preset, if yes, determining whether the number of character units in the same row is the same, if not, comparing according to the three types of comparison rules (if the number of the character units is a number or the number ends with an english comma or an english period, the currently divided character units are considered to be different, if the number of the character units is not a number or the number ends with an english comma or an english period, the currently divided character units are considered to be the same), and if the number of the character units is the same, proceeding to step S2023.
In step S2023, it is determined whether a second class comparison rule is preset, and if so, comparison is performed based on the preset second class comparison rule.
In the embodiment, the order from one type of comparison rule to three types of comparison rules is adopted, so that the calculation force in the comparison process is saved, the occupation of computer resources is reduced, and the calculation efficiency is improved.
Step S30, judging whether English commas or English periods exist, if so, ignoring the English commas or the English periods, and comparing the numbers; for any one of the numerical value comparisons, if the numerical values of the comparisons are the same, the character units currently segmented are considered to be the same, and if the numerical values of the comparisons are different, the character units currently segmented are considered to be different.
In the research of the applicant, it is found that there may be a case that the number of lines is inconsistent in the two data calculation result comparison files to be compared, so in one embodiment of the present application, obtaining the two data calculation result files to be compared at least completes a set of data comparison of a preset comparison line number, including: if the data of at least one file in the next group of data cannot meet the preset comparison line number requirement and the data line numbers of the two files are inconsistent, the second-class comparison rule is not executed in the data comparison.
In some embodiments, if the data of one calculation result file cannot meet the preset number of comparison lines and the number of data lines of the two calculation result files is inconsistent in the next group of data to be compared in the calculation result file a and the calculation result file B, and if one type of comparison rule, two types of comparison rule and three types of comparison rule are preset, only one type of comparison rule and three types of comparison rule are executed, and two types of comparison rule is not executed any more.
In one embodiment, the row-by-row comparison is performed for any one set of data, and the method further comprises:
in step S40, it is determined whether the character unit to be compared is a non-numeric value, if so, and the character unit is not a numeric value ending with an English comma or an English period (e.g. "123", "1.0000E-001", "etc.), then the currently divided character units are considered to be the same. Step S40 and step S10 may be interchanged either in no order or in order.
Based on the comparison rule, the high-efficiency data processing mode reduces the labor and material cost of enterprises in the aspect of data processing, and improves the economic benefit of the enterprises.
In one embodiment, referring to fig. 6, step S40 is performed first, and then step S10 is performed.
In one embodiment, the data is the same and different, and the flag bits of 0 and 1 are set, respectively. For the display of the comparison result, the up-down scrolling paging technology is adopted, and the comparison result data of the preset number of lines (such as 50 lines) is pulled for display at one time, so that the data rendering speed is obviously improved, and a user can browse the data content more smoothly. For different data, a unique color marking mode is adopted, the color judgment is mainly judged according to different and same identifiers of the comparison module, and if the identifier is 1, the difference is judged to be presented in a red background. If the identifier is 0, then the identity is deemed to be displayed in a transparent background. In one embodiment, the user is provided with detailed anchor information for the comparison (whether different information exists for each row will be stored, the primary key stored being the row number and identifier). The anchor point can accurately mark the position of each line of data, a user clicks the anchor point to acquire the corresponding line number, then the tool can automatically pull and display 50 lines of data starting from the current line, the user double clicks the comparison information of each line, and the detailed comparison information is displayed below the comparison interface, so that the user can conveniently and rapidly conduct different comparisons.
Through humanized data display and anchor point information design, the data processing result is more visual and easy to understand, and more convenient operation experience is provided for the user.
An embodiment of the present application provides a data calculation result file comparison system configured to implement the data calculation result file comparison method of any one of the embodiments, referring to fig. 7, the comparison system includes:
the data calculation result file obtaining interface 11 is configured to obtain two data calculation result files to be compared.
The data calculation result file comparing module 12 is configured to complete the comparison of at least one set of data based on a preset number of comparison lines, and perform the line-by-line comparison on any set of data, and includes:
The character unit division unit 1201 is configured to divide each line of characters into individual character units by a blank symbol.
The comparison rule determining unit 1202 is configured to determine whether a comparison rule is preset, if yes, compare the comparison rule configured by the data calculation result file comparison configuration system according to any of the above embodiments, if no, determine whether an english comma or an english period exists, and if yes, ignore the english comma or the english period and then compare the values. For any one of the numerical value comparisons, if the numerical values of the comparisons are the same, the character units currently segmented are considered to be the same, and if the numerical values of the comparisons are different, the character units currently segmented are considered to be different.
In an embodiment, the data calculation result file comparison module is further configured to determine a number of lines of the data to be compared in the next group, and if the data of at least one file cannot meet the preset comparison line number requirement and the number of lines of the data of the two files are inconsistent in the next group of data, the second-class comparison rule is not executed in the data comparison.
An embodiment of the present application provides a computer readable storage medium having a program stored thereon, where the stored program includes a data calculation result file comparison method capable of being loaded and processed by a processor in any of the above embodiments.
Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by a computer program. When all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a computer readable storage medium, and the storage medium may include: read-only memory, random access memory, magnetic disk, optical disk, hard disk, etc., and the program is executed by a computer to realize the above-mentioned functions. For example, the program is stored in the memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above can be realized. In addition, when all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and the program in the above embodiments may be implemented by downloading or copying the program into a memory of a local device or updating a version of a system of the local device, and when the program in the memory is executed by a processor.
The foregoing description of the invention has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the invention pertains, based on the idea of the invention.

Claims (9)

1. A data calculation result file comparison method, comprising:
Obtaining two data calculation result files needing to be compared, at least completing a group of data comparison of a preset comparison line number, and performing line-by-line comparison on any group of data, wherein the method comprises the following steps:
dividing each row of characters into character units according to blank characters;
judging whether preset comparison rules are preset, if so, comparing according to the preset comparison rules, if not, judging whether English commas or English periods exist, and if so, ignoring the English commas or the English periods and then comparing the values; for any one of the numerical value comparisons, if the numerical values of the comparison are the same, the character units which are currently segmented are considered to be the same, and if the numerical values of the comparison are different, the character units which are currently segmented are considered to be different;
the comparison rules comprise one or more of one type of comparison rules and two types of comparison rules;
the comparison rule comprises any one or more of a first neglecting rule and a second neglecting rule; the first ignoring rule includes: judging whether a first neglected marker exists in the character unit, and if so, considering that the line comprising the first neglected marker and the line comprising the second neglected marker are identical to each other; the second ignoring rule includes: judging whether a third neglected marker exists in the character unit, and if so, considering the content of the row where the third neglected marker exists as the same;
The second class comparison rule comprises a first numerical error judgment rule or a second numerical error judgment rule; the first numerical error judgment rule includes: judging whether the difference of the compared numerical values is within a preset percentage threshold value range, if so, considering that the character units which are currently segmented are the same, and if not, considering that the character units which are currently segmented are different; the second numerical error judgment rule includes: judging whether the difference between the compared numerical values is within a preset difference threshold value range, if so, considering that the character units which are currently segmented are the same, and if not, considering that the character units which are currently segmented are different.
2. The method for comparing data calculation result files according to claim 1, wherein said obtaining two data calculation result files to be compared at least completes a set of data comparison of a preset number of comparison lines, comprises: if the data of at least one file in the next group of data cannot meet the preset comparison line number requirement and the data line numbers of the two files are inconsistent, the second-class comparison rule is not executed in the data comparison.
3. The method for comparing data calculation result files according to claim 1, wherein said obtaining two data calculation result files to be compared at least completes a set of data comparison of a preset comparison line number, and performs a line-by-line comparison for any set of data, further comprising: judging whether the compared character units are non-numerical values, and if so, judging that the character units are not numerical values ending with English commas or English periods, and considering that the character units which are currently segmented are the same.
4. The method for comparing data calculation result files as claimed in claim 3, wherein the method step of judging whether the compared character units are non-numerical values is performed first, and then the method step of judging whether the comparison rules are preset is performed.
5. The data calculation result file comparison method according to claim 1, wherein the comparison rule further includes three types of comparison rules including:
judging whether the number of character units in the same row is the same, if not, judging whether the number of the character units is a number or the number ends with an English comma or an English period, and if the number of the character units is a number or the number ends with an English comma or an English period, considering that the character units which are currently segmented are different.
6. The data calculation result file comparison configuration system is characterized by being configured to be used for configuring comparison rules of data calculation result files needing to be compared, and comprising any one or more of a class-one comparison rule configuration module (01) and a class-two comparison rule configuration module (02);
The comparison rule configuration module (01) comprises any one or more of a first neglect rule configuration unit (011) and a second neglect rule configuration unit (012);
The first neglecting rule configuration unit (011) is configured to preset a first neglecting marker and a second neglecting marker matched with the first neglecting marker, judge whether the first neglecting marker exists in the character unit or not based on the first neglecting marker and the second neglecting marker, and if so, consider the content comprising the line where the first neglecting marker exists and the line where the second neglecting marker exists and between the line where the second neglecting marker exists as the same in the subsequent character unit; the second neglecting rule configuration unit (012) is configured to preset a third neglecting marker, judge whether the third neglecting marker exists in the character unit based on the third neglecting marker, and if so, consider the content of the row where the third neglecting marker exists to be the same;
The second class comparison rule configuration module (02) comprises a first numerical error judgment rule configuration unit (021) and/or a second numerical error judgment rule configuration unit (022), and the second class comparison rule configuration module (02) is configured to configure the second class comparison rule based on the configured first numerical error judgment rule configuration unit (021) or the second numerical error judgment rule configuration unit (022), or configure the second class comparison rule based on the configured first numerical error judgment rule configuration unit (021) and the second numerical error judgment rule configuration unit (022) and the highest priority in the preset first numerical error judgment rule configuration unit (021) and second numerical error judgment rule configuration unit (022);
The first numerical error judgment rule configuration unit (021) is configured to be used for presetting a percentage threshold range, judging whether the difference of the compared numerical values is in the preset percentage threshold range or not based on the preset percentage threshold range, if so, considering that the character units which are currently segmented are the same, and if not, considering that the character units which are currently segmented are different; the second numerical error judgment rule configuration unit (022) is configured to preset a difference threshold range, judge whether the difference of the compared numerical values is within the preset difference threshold range based on the preset difference threshold range, and if so, consider that the currently divided character units are identical, and if not, consider that the currently divided character units are identical.
7. The data calculation result file comparison configuration system according to claim 6, wherein the first numerical error judgment rule configuration unit (021) includes a plurality of percentage threshold range configuration subunits configured to judge whether a difference of the compared numerical values is within a preset percentage threshold range based on a selected single percentage threshold range configuration subunit or a highest priority percentage threshold range configuration subunit among the plurality of percentage threshold range configuration subunits.
8. A data calculation result file comparison system, comprising:
a data calculation result file acquisition interface (11) configured to acquire two data calculation result files to be compared;
a data calculation result file comparison module (12) configured to complete comparison of at least one set of data based on a preset number of comparison lines, and perform line-by-line comparison on any set of data, including:
A character unit dividing unit (1201) configured to divide each line of characters into individual character units according to a blank character;
A comparison rule judging unit (1202) configured to judge whether a comparison rule is preset, if yes, comparing according to the comparison rule configured by the data calculation result file comparison configuration system according to claim 6 or 7, if no, judging whether an english comma or an english period exists, if yes, ignoring the english comma or the english period, and comparing the values; for any one of the numerical value comparisons, if the numerical values of the comparisons are the same, the character units currently segmented are considered to be the same, and if the numerical values of the comparisons are different, the character units currently segmented are considered to be different.
9. The data calculation result file comparison system according to claim 8, wherein the data calculation result file comparison module (12) is further configured to determine the number of lines of the data to be compared in the next set, and if the data of at least one file cannot meet the preset comparison line number requirement and the number of lines of the data of the two files are inconsistent in the next set of data, the second type comparison rule is not executed in the data comparison.
CN202410261953.7A 2024-03-07 2024-03-07 Data calculation result file comparison method, system and comparison configuration system Active CN117852521B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410261953.7A CN117852521B (en) 2024-03-07 2024-03-07 Data calculation result file comparison method, system and comparison configuration system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410261953.7A CN117852521B (en) 2024-03-07 2024-03-07 Data calculation result file comparison method, system and comparison configuration system

Publications (2)

Publication Number Publication Date
CN117852521A CN117852521A (en) 2024-04-09
CN117852521B true CN117852521B (en) 2024-06-07

Family

ID=90533034

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410261953.7A Active CN117852521B (en) 2024-03-07 2024-03-07 Data calculation result file comparison method, system and comparison configuration system

Country Status (1)

Country Link
CN (1) CN117852521B (en)

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013100572A1 (en) * 2011-12-27 2013-07-04 Macrogen Inc. Apparatus and method for managing genetic information
CN103365871A (en) * 2012-03-29 2013-10-23 北京恒安永通科技有限公司 Automatic generation method of rules
CN103970728A (en) * 2013-02-01 2014-08-06 中国银联股份有限公司 Comparison method and system for file
CN105653554A (en) * 2014-11-14 2016-06-08 卓望数码技术(深圳)有限公司 File data comparison method and system
CN105868217A (en) * 2015-01-22 2016-08-17 阿里巴巴集团控股有限公司 Automatic conversion method and system for numerical value
CN107729817A (en) * 2017-09-20 2018-02-23 成都准星云学科技有限公司 A kind of method that rule-based division identifies more candidate item confidence levels
CN109190092A (en) * 2018-08-15 2019-01-11 深圳平安综合金融服务有限公司上海分公司 The consistency checking method of separate sources file
CN109299471A (en) * 2018-11-05 2019-02-01 广州百田信息科技有限公司 A kind of method, apparatus and terminal of text matches
CN109408113A (en) * 2018-09-03 2019-03-01 平安普惠企业管理有限公司 A kind of code text processing method, system and terminal device
CN110162510A (en) * 2019-04-26 2019-08-23 平安普惠企业管理有限公司 Transcription comparison method, device, computer equipment and storage medium
CN111090788A (en) * 2019-12-03 2020-05-01 广州品唯软件有限公司 Json file comparison method and device, storage medium and computer equipment
CN111564817A (en) * 2020-06-15 2020-08-21 国网山东省电力公司电力科学研究院 Relay protection constant value comparison method and system based on file comparison tool
CN111752769A (en) * 2019-03-27 2020-10-09 北京奇虎科技有限公司 Cloud control rule testing method and device, computer storage medium and computing equipment
CN112947991A (en) * 2021-03-30 2021-06-11 建信金融科技有限责任公司 Method and device for acquiring version difference code file, computer equipment and medium
CN114489723A (en) * 2022-01-04 2022-05-13 武汉烽火技术服务有限公司 Universal configuration difference comparison method and device
CN115759055A (en) * 2022-11-16 2023-03-07 扬州大学 English place name proofreading method considering multi-dimensional character characteristics
CN115795560A (en) * 2022-11-11 2023-03-14 重庆傲雄在线信息技术有限公司 Method, device, equipment and medium for checking integrity of file across systems

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013100572A1 (en) * 2011-12-27 2013-07-04 Macrogen Inc. Apparatus and method for managing genetic information
CN103365871A (en) * 2012-03-29 2013-10-23 北京恒安永通科技有限公司 Automatic generation method of rules
CN103970728A (en) * 2013-02-01 2014-08-06 中国银联股份有限公司 Comparison method and system for file
CN105653554A (en) * 2014-11-14 2016-06-08 卓望数码技术(深圳)有限公司 File data comparison method and system
CN105868217A (en) * 2015-01-22 2016-08-17 阿里巴巴集团控股有限公司 Automatic conversion method and system for numerical value
CN107729817A (en) * 2017-09-20 2018-02-23 成都准星云学科技有限公司 A kind of method that rule-based division identifies more candidate item confidence levels
CN109190092A (en) * 2018-08-15 2019-01-11 深圳平安综合金融服务有限公司上海分公司 The consistency checking method of separate sources file
CN109408113A (en) * 2018-09-03 2019-03-01 平安普惠企业管理有限公司 A kind of code text processing method, system and terminal device
CN109299471A (en) * 2018-11-05 2019-02-01 广州百田信息科技有限公司 A kind of method, apparatus and terminal of text matches
CN111752769A (en) * 2019-03-27 2020-10-09 北京奇虎科技有限公司 Cloud control rule testing method and device, computer storage medium and computing equipment
CN110162510A (en) * 2019-04-26 2019-08-23 平安普惠企业管理有限公司 Transcription comparison method, device, computer equipment and storage medium
CN111090788A (en) * 2019-12-03 2020-05-01 广州品唯软件有限公司 Json file comparison method and device, storage medium and computer equipment
CN111564817A (en) * 2020-06-15 2020-08-21 国网山东省电力公司电力科学研究院 Relay protection constant value comparison method and system based on file comparison tool
CN112947991A (en) * 2021-03-30 2021-06-11 建信金融科技有限责任公司 Method and device for acquiring version difference code file, computer equipment and medium
CN114489723A (en) * 2022-01-04 2022-05-13 武汉烽火技术服务有限公司 Universal configuration difference comparison method and device
CN115795560A (en) * 2022-11-11 2023-03-14 重庆傲雄在线信息技术有限公司 Method, device, equipment and medium for checking integrity of file across systems
CN115759055A (en) * 2022-11-16 2023-03-07 扬州大学 English place name proofreading method considering multi-dimensional character characteristics

Also Published As

Publication number Publication date
CN117852521A (en) 2024-04-09

Similar Documents

Publication Publication Date Title
Di Gregorio Using Nvivo for your literature review
US7823138B2 (en) Distributed testing for computing features
CN107358208B (en) A kind of PDF document structured message extracting method and device
CN111680634A (en) Document file processing method and device, computer equipment and storage medium
EP3495968A1 (en) Method and system for extraction of relevant sections from plurality of documents
CN111144079B (en) Method and device for intelligently acquiring learning resources, printer and storage medium
CN111241389A (en) Sensitive word filtering method and device based on matrix, electronic equipment and storage medium
US20080052619A1 (en) Spell Checking Documents with Marked Data Blocks
WO2020143301A1 (en) Training sample validity detection method, computer device, and computer non-volatile storage medium
CN112667802A (en) Service information input method, device, server and storage medium
CN110750984A (en) Command line character string processing method, terminal, device and readable storage medium
CN112783825B (en) Data archiving method, device, computer device and storage medium
CN111159975B (en) Display method and device
CN114861614A (en) Method and device for filling data, electronic equipment and medium
CN117114142B (en) AI-based data rule expression generation method, apparatus, device and medium
JP2004252881A (en) Text data correction method
CN117852521B (en) Data calculation result file comparison method, system and comparison configuration system
CN113177389A (en) Text processing method and device, electronic equipment and storage medium
CN105893614A (en) Information recommendation method and device and electronic equipment
CN115357689A (en) Data processing method, device and medium of distributed log and computer equipment
CN109725919A (en) A kind of information processing method, device, equipment and storage medium
CN114925125A (en) Data processing method, device and system, electronic equipment and storage medium
WO2021117483A1 (en) Information processing device, information processing method, and program
CN112364640A (en) Entity noun linking method, device, computer equipment and storage medium
CN109189911A (en) A kind of searching method, device and the terminal device of question and answer content

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant