CN107807972A - A kind of test data consistency detecting method - Google Patents

A kind of test data consistency detecting method Download PDF

Info

Publication number
CN107807972A
CN107807972A CN201710975998.0A CN201710975998A CN107807972A CN 107807972 A CN107807972 A CN 107807972A CN 201710975998 A CN201710975998 A CN 201710975998A CN 107807972 A CN107807972 A CN 107807972A
Authority
CN
China
Prior art keywords
data
acquisition system
consistency
data acquisition
cell
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710975998.0A
Other languages
Chinese (zh)
Other versions
CN107807972B (en
Inventor
时鹏
仲华强
张明媚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN201710975998.0A priority Critical patent/CN107807972B/en
Publication of CN107807972A publication Critical patent/CN107807972A/en
Application granted granted Critical
Publication of CN107807972B publication Critical patent/CN107807972B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity

Abstract

The present invention provides a kind of test data consistency detecting method, and the inconsistence problems of storage format in test data and numerical bias can be detected.Methods described includes:Build test data congruity theory system;Obtain test data to be detected;The test data to be detected of acquisition is quantified in terms of storage format and numerical value consistent degree, wherein, the amount of consistent degree degree of consistency between two data objects of measurement, described two data objects are that two data cells, the data cell in a data acquisition system and another data acquisition system or two datasets are closed;According to the test data uniformity Evaluation principle, according to quantized result, determine between two data cells, the data cell in a data acquisition system and between another data acquisition system, and/or two datasets close between the degree of consistency.The present invention is applied to test data analysis and assessment field.

Description

A kind of test data consistency detecting method
Technical field
The present invention relates to test data analysis and assessment field, particularly relates to a kind of test data consistency detecting method.
Background technology
With the development of science and technology, the test data of accumulation is also increasingly numerous and jumbled.One of test data it is main the characteristics of It is non-identical property, because data source is very extensive, causes the type of data and storage format that diversified form is presented, and The characteristics of due to scientific experimentation itself, inhomogeneities as samples, external factor interference, detection device error etc., it can equally make Into differing for result of the test numerical value.This inconsistency of test data, may bring unpredictable consequence:One side Face, inaccurate data obfuscation can cause counting loss in accurate data, so as to influence the assessment of result of the test;It is another Aspect, some data belonged in the range of normal deviate are rejected, and the waste of data resource can be caused, so as to influence experimental project The development of application.Therefore, in order to preferably playing the effect of test data, avoid testing the wasting of resources, test data Consistency detection problem has become urgent problem.
For current data inconsistency is studied greatly both for distributed system, the data one in distributed system Cause property and the data consistency that the present invention describes are different, and the data consistency in distributed system refers in relational data There is correct and complete logical relation in storehouse between related data.For example, when user accesses same number in the same time According to storehouse, and identical data is used, following three kinds of situations can now occur:Lose renewal, undetermined correlation and not Consistent analysis.Because inconsistent data can cause contradiction, it is therefore necessary to ensure the consistent of data, present research field is also Rest on the agreement to shared data;Tsing-Hua University Deng Dong was once ground to the fault-toleranr technique in big data processing Study carefully, he, which summarizes, there is the inconsistent objective phenomenon of numerical value in some big datas, using wide variety of sequence similar function and Gather similar function to tolerate the mistake of data.Above-mentioned problem is simply studied the inconsistent situation of data in terms of macroscopic view, But classification analysis is not carried out to the inconsistent phenomenon of data, while does not also set up complete data consistency detection.
The content of the invention
The technical problem to be solved in the present invention is to provide a kind of test data consistency detecting method, to solve prior art Existing does not carry out classification analysis to the inconsistent phenomenon of data, and does not set up asking for complete data consistency detection Topic.
In order to solve the above technical problems, the embodiment of the present invention provides a kind of test data consistency detecting method, including:
According to test data it is inconsistent the reason for, build test data congruity theory system, wherein, the data differ The reason for cause, includes:Storage format is inconsistent and numerical value is inconsistent, and the test data is data acquisition system, the data acquisition system by One or more data cell compositions, the test data congruity theory system include:Test data uniformity Evaluation principle;
Obtain test data to be detected;
The test data to be detected of acquisition is quantified in terms of storage format and numerical value consistent degree, wherein, The amount of consistent degree degree of consistency between two data objects of measurement, described two data objects are two data sheets Member, the data cell in a data acquisition system and another data acquisition system or two datasets are closed;
According to the test data uniformity Evaluation principle, according to quantized result, determine between two data cells, one The degree of consistency between data cell and another data acquisition system in data acquisition system, and/or between two datasets conjunction.
Further, the data cell is described to define or is reflected the objective thing of experiment with one or more atomic datas Real;
Each data cell is made up of one or more data item.
Further, quantifying the degree of consistency of storage format includes:
According to the degree of consistency corresponding relation between predetermined storage format, determine that two data objects are storing Stylistic degree of consistency quantized value.
Further, the storage format includes:Structured storage, semi-structured storage, unstructured storage;Wherein,
The structured storage includes:SQLServer、Sybase、MySQL、Oracle;
The semi-structured storage includes:XML、HTML;
The unstructured storage includes:Text, document, picture, audio, video.
Further, the degree of consistency of quantized values includes:
The relative deviation δ between corresponding data item is calculated in two data cells, the δ is expressed as:
Wherein, xikRepresent certain data item in first data cell, xjkRepresent second data cell in first X in data cellikCorresponding data item;
Averaged after calculating relative deviation to all data item in first data cell, obtain two data cells Between average deviationIt is describedIt is expressed as:
Wherein, n represents the number of data item in first data cell, and δ n represent nth in first data cell According to relative deviation corresponding to item;
According to the mapping between the average deviation between predetermined data cell and value consistency degree quantized value Relation, obtain the degree of consistency quantized value of described two data cells numerically.
Further, the test data uniformity Evaluation principle includes:Uniformity Evaluation principle between data cell, The uniformity Evaluation principle between uniformity Evaluation principle, data acquisition system between data cell and data acquisition system.
Further, the uniformity Evaluation principle between the data cell includes:It is completely the same between data cell The weak consistency between strong consistency, data cell between property, data cell;Wherein,
Crash consistency between the data cell, for representing the storage format and numerical value all phases of two data cells Together;
Strong consistency between the data cell, for representing that the storage format of two data cells is different, passing through After storage format conversion, the numerical value of two data cells is identical;
Weak consistency between the data cell, for representing that the storage format of two data cells is different, passing through After storage format conversion, the average deviation between two data cells is in the range of default first threshold.
Further, the uniformity Evaluation principle between the data cell and data acquisition system includes:Data cell and number According to the strong consistency between the crash consistency between set, data cell and data acquisition system, data cell and data acquisition system it Between weak consistency;Wherein,
If data cell is data cell a, data acquisition system is data acquisition system B, and
Crash consistency between the data cell and data acquisition system, for representing in data cell a and data acquisition system B Some data cell it is in storage format and numerically all identical;
Strong consistency between the data cell and data acquisition system, for representing depositing for data cell a and data acquisition system B It is different to store up form, after being converted by storage format, some data cell in data cell a and data acquisition system B is in numerical value It is upper identical;
Weak consistency between the data cell and data acquisition system, for representing depositing for data cell a and data acquisition system B It is different to store up form, after being converted by storage format, between some data cell in data cell a and data acquisition system B Average deviation is in the range of default Second Threshold.
Further, the uniformity Evaluation principle between the data acquisition system includes:It is completely the same between data acquisition system The weak consistency between strong consistency, data acquisition system between property, data acquisition system;Wherein,
If data acquisition system includes:Data acquisition system A and data acquisition system B, data acquisition system A={ ai, i=1,2 ..., m;Sum According to set B={ bj, j=1,2 ..., n, m<n;
Crash consistency between the data acquisition system is:Any data unit a in data acquisition system AiIn data acquisition system B In have corresponding completely the same data cell;
Strong consistency between the data acquisition system is:Any data unit a in data acquisition system AiIn data acquisition system B Numerically there is same data cell bj
Weak consistency between the data acquisition system is that data acquisition system A is identical with data acquisition system B storage formats, data set Close any data unit a in AiIt can be found in permissible level deviation and data cell b that its deviation is minimumj;Or, data set Close A and data acquisition system B storage formats differ, after being changed by storage format, any data unit a in data acquisition system AiAll It can be found in permissible level deviation and data cell b that its deviation is minimumj
Further, it is described according to the test data uniformity Evaluation principle, according to quantized result, determine two data Between data cell and another data acquisition system between unit, in a data acquisition system, and/or between two datasets conjunction The degree of consistency include:
According to the uniformity Evaluation principle between the data cell, according to quantized result, determine two data cells it Between the degree of consistency, wherein, the quantized result is:Storage format degree of consistency quantized value and numerical value degree of consistency amount The degree of consistency vector of change value composition;And/or
According to the uniformity Evaluation principle between the data cell and data acquisition system, according to quantized result, one is determined The degree of consistency between data cell and another data acquisition system in data acquisition system;And/or
According to the uniformity Evaluation principle between the data acquisition system, according to quantized result, determine that two datasets close it Between the degree of consistency.
The above-mentioned technical proposal of the present invention has the beneficial effect that:
In such scheme, according to test data it is inconsistent the reason for, build test data congruity theory system, wherein, The reason for data are inconsistent includes:Storage format is inconsistent and numerical value is inconsistent, and the test data is data acquisition system, institute State data acquisition system to be made up of one or more data cells, the test data congruity theory system includes:Test data one Cause property Evaluation principle;Obtain test data to be detected;To the test data to be detected of acquisition in storage format sum Quantified in terms of value consistent degree, wherein, the amount of consistent degree degree of consistency between two data objects of measurement is described Two data objects are two data cells, the data cell in a data acquisition system and another data acquisition system or two numbers According to set;According to the test data uniformity Evaluation principle, according to quantized result, determine between two data cells, one The degree of consistency between data cell and another data acquisition system in data acquisition system, and/or between two datasets conjunction;From And effective detection scheme is provided for the inconsistence problems of storage format in test data and numerical bias, it is practical.
Brief description of the drawings
Fig. 1 is the schematic flow sheet one of test data consistency detecting method provided in an embodiment of the present invention;
Fig. 2 is the schematic flow sheet two of test data consistency detecting method provided in an embodiment of the present invention;
Degree of consistency corresponding relation signals of the Fig. 3 between predetermined storage format provided in an embodiment of the present invention Figure;
Fig. 4 is the degree of consistency relation between data cell in different pieces of information set provided in an embodiment of the present invention.
Embodiment
To make the technical problem to be solved in the present invention, technical scheme and advantage clearer, below in conjunction with accompanying drawing and tool Body embodiment is described in detail.
The present invention does not carry out classification analysis for existing to the inconsistent phenomenon of data, and it is consistent not set up complete data A kind of the problem of property detection method, there is provided test data consistency detecting method.
As shown in figure 1, test data consistency detecting method provided in an embodiment of the present invention, including:
S101, according to test data it is inconsistent the reason for, build test data congruity theory system, wherein, the number According to it is inconsistent the reason for include:Storage format is inconsistent and numerical value is inconsistent, and the test data is data acquisition system, the data Set is made up of one or more data cells, and the test data congruity theory system includes:Test data uniformity is commented Sentence principle;
S102, obtain test data to be detected;
S103, the test data to be detected of acquisition is quantified in terms of storage format and numerical value consistent degree, Wherein, the amount of consistent degree degree of consistency between two data objects of measurement, described two data objects are two numbers Closed according to the data cell in unit, a data acquisition system and another data acquisition system or two datasets;
S104, according to the test data uniformity Evaluation principle, according to quantized result, determine two data cells it Between, the data cell in a data acquisition system and between another data acquisition system, and/or two datasets close between uniformity Degree.
Test data consistency detecting method described in the embodiment of the present invention, according to test data it is inconsistent the reason for, structure Test data congruity theory system is built, wherein, the reason for data are inconsistent, includes:Storage format is inconsistent and numerical value not Unanimously, the test data is data acquisition system, and the data acquisition system is made up of one or more data cells, the test data Congruity theory system includes:Test data uniformity Evaluation principle;Obtain test data to be detected;To being treated described in acquisition The test data of detection is quantified in terms of storage format and numerical value consistent degree, wherein, the consistent degree is two numbers of measurement According to the amount of the degree of consistency between object, described two data objects are the data in two data cells, a data acquisition system Unit and another data acquisition system or two datasets are closed;According to the test data uniformity Evaluation principle, tied according to quantization Fruit, determine between two data cells, the data cell in a data acquisition system and between another data acquisition system, and/or two The degree of consistency between individual data acquisition system;So as to be provided for the inconsistence problems of storage format in test data and numerical bias Effective detection scheme, it is practical.
In the present embodiment, by analyzing test data, the reason for test data is inconsistent is obtained, and according to test data not The reason for consistent, test battery data consistency theoretical system is constructed, wherein, the reason for data are inconsistent, includes:Deposit It is inconsistent inconsistent with numerical value to store up form.As shown in Fig. 2 the test data congruity theory system includes:Uniformity is related Concept, the uniformity vector quantization algorithm for carrying out degree of consistency judge to test data and test data uniformity are judged Principle;Wherein, the uniformity related notion includes:Atomic data, data cell, data acquisition system, uniformity and uniformity Degree is (referred to as:Consistent degree);The concrete meaning of each concept is as follows:
Atomic data:For a kind of things of independent description or phenomenon, atomic data is most basic effective in data record Data unit, it is minimum mark unit (can not divide again);
Data cell:It is to be described to define or reflected experiment objective fact with one or more atomic datas, per number It is made up of according to unit one or more data item, it is not subdivisible in the physical sense;
Data acquisition system:Data acquisition system is made up of one or more data cell, and each different data source is exactly one Data acquisition system or the different pieces of information combination extracted from a data source.
Uniformity:Refer to that there is correct and complete logical relation between associated data;Its inconsistent form of expression is to deposit Storage form is inconsistent, data value is inconsistent.
The degree of consistency:For measuring similarity of two data objects in terms of storage format and numerical value, consistent degree is got over The general character of high explanation data is more, and the more low then data difference of consistent degree is bigger.The degree of consistency specifically includes three ranks:
A) crash consistency:Identical degree can be reached between two data (storage format, numerical value);
B) strong consistency:As long as two data are expressed unanimously in the physical sense, i.e., by storage format conversion and semanteme Two data can reach the agonic state of numerical value after clash handle;
C) weak consistency:Similar with strong consistency, two data can differ in storage format, but be converted by storage format Allow numerically there may be certain deviation between two data afterwards.
In the present embodiment, according to the uniformity related notion of definition, test data congruity theory system is built, for experiment The inconsistent detection of data provides theories integration, the test data congruity theory system based on structure, test data is existed Quantified in terms of storage format and numerical value consistent degree, the degree of consistency of test data is determined by quantized result.
, can be in terms of storage format and numerical value two according to the inconsistent form of expression of test data in the present embodiment To judge the degree of consistency of test data, wherein, the degree of consistency of the storage format is used to represent that two data objects are deposited Store up the correlation degree between form.
In the present embodiment, the consistent degree be measure two data objects the degree of consistency amount, span [0, 1] between;General character between the bigger explanation data of consistent degree numerical value is more, and consistent degree numerical value is smaller, shows difference between data It is bigger.In order to which the degree of consistency in terms of storage format and numerical value two assesses data object, with a two-dimentional row vector C =(cv,cf) degree of consistency of test data is represented, wherein, C (Consistency) represents consistent degree vector, cvRepresent number The degree of consistency (quantized value) of value, cfRepresent the degree of consistency (quantized value) of storage format.Clearly to measure mark Standard, it can specify that c in consistent degree vectorvAnd cfSpan be 1-9, numerical value is bigger to represent that the degree of consistency is higher, different journeys The conformance definition of degree is also to be divided according to the span of consistent degree vector.
Degree of consistency relation between test data is determined by consistent degree vector (degree of consistency vector) value, The value of consistent degree vector can determine that the consistent degree vector quantization algorithm includes two by consistent degree vector quantization algorithm Aspect:
A) quantization of the storage format degree of consistency:By matching a variety of storage formats, for structural data, Semi-structured data and unstructured data define some value specifications, determine two data objects in storage format according to value On degree of consistency quantized value;
B) quantization of value consistency degree:By calculating being averaged between data cell in the data acquisition system for belonging to different Deviation, determine two data objects (data cell and another data set between two data cells, in a data acquisition system Between conjunction, between two datasets close) degree of consistency quantized value numerically.
In the present embodiment, the c in consistent degree vectorvAnd cfSpecific obtaining value method it is as follows:
(1)cvThe calculation procedure of value:
A11, calculates in two data cells the relative deviation δ between corresponding data item, and the δ is expressed as:
Wherein, xikRepresent certain data item in first data cell, xjkRepresent second data cell in first X in data cellikCorresponding data item;
A12, averaged after calculating relative deviation to all data item in first data cell, obtain two data Average deviation between unitIt is describedIt is expressed as:
Wherein, n represents the number of data item in first data cell, and δ n represent nth in first data cell According to relative deviation corresponding to item;
A13, according between the average deviation between predetermined data cell and value consistency degree quantized value Mapping relations, as shown in table 1, obtain the degree of consistency quantized value of described two data cells numerically;Or, according toIf Threshold value is determined to determine the degree of consistency quantized value of two data cells numerically.
The mapping relations between average deviation and value consistency degree between the data cell of table 1
(2)cfThe computational methods of value:
In the present embodiment, cfValue is calculated based on storage format, the c when storage format is identicalfValue be 9, in order to A variety of storage formats are matched, wherein, the storage format includes but is not limited to:Structured storage, it is semi-structured storage, Unstructured storage;Structured storage counter structure data, the semi-structured corresponding semi-structured data of storage are unstructured to deposit The corresponding unstructured data of storage;For structural data (structured data), semi-structured data (Semi- Structured data) and unstructured data (unstructured data) define some value specifications.
In the present embodiment, structured storage refers to the storage in relevant database, including SQLServer, Sybase, Other mainstream data library storages such as MySQL, Oracle;Semi-structured storage includes other labeling forms such as XML, HTML and deposited Storage;Unstructured storage includes text (text) data, document (document) data, picture (image), audio (audio) With video (video), because unstructured data structure is more complicated, in order to preferably show one between each storage format Cause property Degree of Accord Relation, structured storage is fallen into 5 types (SQLServer, Sybase, MySQL, Oracle, others), half structure Change storage and be divided into 3 classes (XML, HTML, others), unstructured storage fall into 5 types (text, document, image, Audio, video), 5 class unstructured datas are further divided into 3 big groups, text composition groups A, document and imag composition Group B, audio and video are grouped C.
, can be according to the degree of consistency corresponding relation between predetermined storage format, such as Fig. 3 institutes in the present embodiment Show, determine degree of consistency quantized value of two data objects in storage format, such as:
The c when the storage format of 2 data cells belongs to same group of same elementf=9;
The c when the storage format of 2 data cells belongs to same group of different elementsf=8;
When the storage format one of 2 data cells belongs to semi-structured data, another belongs to group A or structuring number According to when cf=7;
When the storage format one of 2 data cells belong to semi-structured data another belong to group B when cf=6;
When the storage format one of 2 data cells belong to semi-structured data another belong to group C when cf=5;
The c when three different groups that the storage format of 2 data cells belongs in unstructured dataf=4;
When the storage format one of 2 data cells belong to structural data another belong to group A when cf=3;
When the storage format one of 2 data cells belong to structural data another belong to group B when cf=2;
When the storage format one of 2 data cells belong to structural data another belong to group C when cf=1.
In the present embodiment, the reason for test data is inconsistent is produced by analyzing, inconsistent test data is by counting Caused by two aspects of deviation being present with numerical value according to storage format is different.Because the situation of atomic data is relatively simple and easy area Point, so the present embodiment is studied mainly for the consistency problem of data cell and data acquisition system.
Test data (data cell and data acquisition system) uniformity refers to that the storage format of two groups of test datas and numerical value are complete It is exactly the same.Because in actual tests, the very rare completely the same data of two aspects (are probably from same data Source), it is therefore necessary to the degree of data cell uniformity is defined, to select appropriate degree according to practical application request Consistent data.
In the present embodiment, as an alternative embodiment, the test data uniformity Evaluation principle includes:Data cell it Between uniformity Evaluation principle, the uniformity Evaluation principle between data cell and data acquisition system, consistent between data acquisition system Property Evaluation principle.
In the present embodiment, it can be judged according to the uniformity between the data cell in test data uniformity Evaluation principle The uniformity Evaluation principle between uniformity Evaluation principle, data acquisition system between principle, data cell and data acquisition system, respectively The uniformity between uniformity, data acquisition system to the uniformity between data cell, data cell and data acquisition system is commented Sentence, determine the degree of consistency between two data cells, the data cell in a data acquisition system and another data acquisition system Between the degree of consistency, two datasets close between the degree of consistency.
In the present embodiment, the test data degree of consistency is divided into completely according to the power of the test data degree of consistency Uniformity, strong consistency and weak consistency.
In the present embodiment, to one between the uniformity Evaluation principle between data cell, data cell and data acquisition system Uniformity Evaluation principle between cause property Evaluation principle, data acquisition system is described in detail:
(1) the uniformity Evaluation principle between data cell
Crash consistency between data cell:Identical degree can be reached between two data cells, i.e.,:Deposit Store up form and data value all identical data cells.It can be seen from the definition of consistent degree, only work as cv=9 and cfData sheet when=9 Member is only completely the same, now consistent degree vector value C=99.
Strong consistency between data cell:No matter whether the storage format of two data cells is identical, as long as in physics Express consistent in meaning, i.e., converting latter two data cell by storage format, can to reach numerical value identical.According to consistent degree Definition understand, work as cv=9, cfData cell belongs to strong consistency during ∈ [1,9], now consistent degree vector value C ∈ [91, 99)。
Weak consistency between data cell:Similar with strong consistency, the storage format of two data cells can not phase Together, after storage format converts, the average deviation between two data cells is in the range of default first threshold, i.e.,:Permit Perhaps numerically there may be certain deviation between two data cells, work as cv∈[1,9],cfData cell category during ∈ [1,9] In weak consistency, now consistent degree vector value C ∈ [11,91).
(2) the uniformity Evaluation principle between data cell and data acquisition system
Relation between data cell and data acquisition system is defined based on the relation between data cell, uniformity journey Degree relation is also classified into crash consistency, strong consistency and weak consistency.Defined for convenience of description formization, hereafter with data cell A and data acquisition system B={ bi(i=1,2 ..., n,) exemplified by be described, specific conformance definition is as follows.
Crash consistency between data cell and data acquisition system:Some data in data cell a and data acquisition system B Unit can reach identical, i.e. c in terms of storage format with data valuev=9, cf=9.
Strong consistency between data cell and data acquisition system:Data cell a and data acquisition system B storage format allows not It is identical, but after storage format is changed, a is identical with the numerical value of some data cell in B, i.e. cv=9,1≤cf< 9。
Weak consistency between data cell and data acquisition system:It is similar with strong consistency, it is allowed to data cell a and data Set B storage format differs, but after storage format is changed, some data in data cell a and data acquisition system B Average deviation between unit is in default Second Threshold scope (for example, more than 0 and less than or equal to 10%), then:1≤cv 9,1≤cf≤9;For example, the average deviation between some data cell in data cell a and BMore than 0 and it is less than or equal to 10%, then the degree of consistency relation between data cell a and data acquisition system B is weak consistency.
(3) the uniformity Evaluation principle between data acquisition system
Degree of consistency relation between two datasets are closed is closed based on data cell and the data acquisition system degree of consistency It is and defines, is equally divided into crash consistency, strong consistency and weak consistency.Define for convenience of description formization, have below Body is defined with data acquisition system A={ ai(i=1,2 ..., m) and data acquisition system B={ bjExemplified by (j=1,2 ..., n) (m n) Explanation.
Crash consistency between data acquisition system:Data acquisition system A is identical with data acquisition system B storage formats and A ∈ B.Summarize Say, any data unit a in data acquisition system AiThere is corresponding completely the same data cell in data acquisition system B, Now consistent degree vector value C=99;Crash consistency requires stricter between data acquisition system, as long as there is a data sheet in A Data cell is incomplete same in terms of storage format or numerical value in first and B decides that the two data acquisition systems are not belonging to complete one Cause.
Strong consistency between data acquisition system:Data cell a in data acquisition system AiWith the data cell in data acquisition system B bjBelong to strong consistency relation.Put it briefly, data acquisition system A and data acquisition system B storage formats differ, but are changed by form Afterwards, any data unit a in data acquisition system AiNumerically there is same data cell b in data acquisition system Bj, this When consistent degree vector value C ∈ [91,99).
Weak consistency between data acquisition system:Data cell in data acquisition system A has with the data cell in data acquisition system B And only two kinds of relations:Data acquisition system A is identical with data acquisition system B storage formats, any data unit a in data acquisition system AiCan (the relative deviation average 0 of two data cells in permissible level deviation) find and data cell that its deviation is minimum bj(bj∈B);Data acquisition system A and data acquisition system B storage formats differ, but data acquisition system A and data acquisition system B turns by form After changing, any data unit a in data acquisition system AiCan in permissible level deviation (the relative deviation average 0 of two data cells) find and data cell b that its deviation is minimumj(bj∈ B), now consistent degree vector value C ∈ [11,91).
It is further, described according to the examination in the embodiment of afore-mentioned test data consistency detection Data consistency Evaluation principle is tested, according to quantized result, is determined between two data cells, the data sheet in a data acquisition system The degree of consistency between member and another data acquisition system, and/or between two datasets conjunction includes:
According to the uniformity Evaluation principle between the data cell, according to quantized result, determine two data cells it Between the degree of consistency, wherein, the quantized result is:Storage format degree of consistency quantized value and numerical value degree of consistency amount The degree of consistency vector of change value composition;And/or
According to the uniformity Evaluation principle between the data cell and data acquisition system, according to quantized result, one is determined The degree of consistency between data cell and another data acquisition system in data acquisition system;And/or
According to the uniformity Evaluation principle between the data acquisition system, according to quantized result, determine that two datasets close it Between the degree of consistency.
In the present embodiment, the test data for not meeting above-mentioned several data consistency categories, it is under the jurisdiction of inconsistency Test data.
In the present embodiment, the test data consistency detecting method described in embodiment, is tied for a better understanding of the present invention Close specific embodiment and the present invention is further described to the test data consistency detecting method described in the embodiment of the present invention:
In the present embodiment, the test data set used can be the creep data of certain metal material, and creep refers to solid material Material strains ever-increasing phenomenon over time in the case where temperature and stress all keep constant, creep data by Extraneous factor influence is bigger, and many reasons cause the inconsistent of numerical value.Data in table 2 are collected from different data sources Creep data of the relevant T91 types of steel at 650 DEG C, including A and B two datasets close (referred to as:Data set), wherein, number It is stored according to collection A in SQL Server databases, data set B is stored in Mysql databases, and data set A is by { a1,a2,a3, a4Four data cell compositions, data set B is by { b1,b2,b3,b4,b5,b6,b7,b8Eight data cell compositions.
The creep data of the T91 types of steel of table 2
A collection Stress (MPa) Rupture time (h) Temperature (DEG C) A collection Stress (MPa) Rupture time (h) Temperature (DEG C)
a1 160 16.2 650 a2 160 65 650
a3 140 115 650 a4 120 200 650
B collection Stress (MPa) Rupture time (h) Temperature (DEG C) B collection Stress (MPa) Rupture time (h) Temperature (DEG C)
b1 160 21 650 b2 160 29.9 650
b3 160 35.4 650 b4 160 80 650
b5 150 60.3 650 b6 150 60.3 650
b7 100 686 650 b8 90 3570 650
Because data set A storage format is SQL Server database formats, data set B storage format is MySQL Database format, according to c in consistent degree vectorfQuantizing rule, the c between data set A and data set Bf=9, storage format category In crash consistency category.
In order between apparent display data unit, between data cell and data acquisition system, the number between data acquisition system It is worth (cv) degree of consistency relation, the creep data during two datasets are closed is fitted.
It can be seen from Fig. 4, by taking data acquisition system B as an example, the consistent implementations between its each data cell are as follows:
In Fig. 4, data cell b5And b6It is completely superposed, shows that the two data cells are identical in numerical values recited, Due to b5And b6The storage format of data cell is consistent, it can be seen from the uniformity Evaluation principle between data cell, data cell b5And b6Meet crash consistency requirement;
In Fig. 4, data cell b2And b3Between have a certain distance, the stool and urine of its distance be two data cells it Between existing deviation, i.e.,cv=3;Due to b2And b3The storage format of data cell is consistent, i.e. cf=9;Root Understood according to the uniformity Evaluation principle between data cell, data cell b2And b3Between meet weak consistency requirement;
In Fig. 4, data cell b7And b8All it is distributed in the both sides of matched curve, and the numerical difference between data cell Away from it is larger (10%) any requirement of data consistency, therefore data cell b, are not met7And b8It is inconsistency Data cell.
It can be seen from Fig. 4, the consistent implementations between data cell and data acquisition system B in data acquisition system A are as follows:
In Fig. 4, data cell a in data acquisition system A1With the b in data acquisition system B1Closest and relative deviation averageThat is cv=1, due to the c between data acquisition system A and data acquisition system Bf=9, therefore data cell a1With data acquisition system B Meet weak consistency requirement;
In Fig. 4, data cell a in data acquisition system A2With the b in data acquisition system B4Closest and relative deviation averageThat is cv=2, due to the c between data acquisition system A and data acquisition system Bf=9, therefore data cell a2And data acquisition system B meets weak consistency requirement;
In Fig. 4, data cell a in data acquisition system A3With the b in data acquisition system B5Closest and relative deviation averageThat is cv=0, therefore data cell a3Any coherence request is not met with data acquisition system B;
In Fig. 4, data cell a in data acquisition system A4With the b in data acquisition system B7Closest and relative deviation averageThat is cv=0, therefore data cell a4Any coherence request is not met with data acquisition system B.
From the conformance definition between two data acquisition systems, data acquisition system A and data acquisition system B does not meet any uniformity It is required that therefore they are inconsistency data acquisition systems.
To sum up, by carrying out consistency detection to actual experimental case data, the present embodiment is demonstrated by test data one Cause property theoretical system is applied in test data, can not only be detected abnormal data but also can accurately be analyzed two numbers According between the data cell between unit, in a data acquisition system and another data acquisition system, two datasets close between it is specific The degree of consistency, so as to demonstrate test data consistency detecting method described in the embodiment of the present invention have it is preferably practical Property, reliable basis can be provided for the quality evaluation of test data, and ensure that the preciseness and accuracy of experimental data processing.
The present invention provides a kind of detection architecture and method of test data uniformity, is differed by analyzing test data generation The reason for causing phenomenon, constructs test battery data consistency theoretical system, including data mode and uniformity define, consistent The quantization algorithm and uniformity Evaluation principle of degree.According to test data it is inconsistent the reason for, based on test data congruity theory System proposes specific test data consistency detecting method.By carrying out consistency detection to actual experimental case data, test Having demonstrate,proved the present invention has preferable practicality, provides reliable basis for the quality evaluation of test data, ensure that test data The preciseness and accuracy of processing.
It should be noted that herein, such as first and second or the like relational terms are used merely to a reality Body or operation make a distinction with another entity or operation, and not necessarily require or imply and deposited between these entities or operation In any this actual relation or order.
Described above is the preferred embodiment of the present invention, it is noted that for those skilled in the art For, on the premise of principle of the present invention is not departed from, some improvements and modifications can also be made, these improvements and modifications It should be regarded as protection scope of the present invention.

Claims (10)

  1. A kind of 1. test data consistency detecting method, it is characterised in that including:
    According to test data it is inconsistent the reason for, build test data congruity theory system, wherein, the data are inconsistent Reason includes:Storage format is inconsistent and numerical value is inconsistent, and the test data is data acquisition system, and the data acquisition system is by one Or multiple data cell compositions, the test data congruity theory system include:Test data uniformity Evaluation principle;
    Obtain test data to be detected;
    The test data to be detected of acquisition is quantified in terms of storage format and numerical value consistent degree, wherein, it is described The amount of consistent degree degree of consistency between two data objects of measurement, described two data objects are two data cells, one Data cell and another data acquisition system or two datasets in individual data acquisition system are closed;
    According to the test data uniformity Evaluation principle, according to quantized result, determine between two data cells, a data The degree of consistency between data cell and another data acquisition system in set, and/or between two datasets conjunction.
  2. 2. test data consistency detecting method according to claim 1, it is characterised in that the data cell is with one Individual or multiple atomic datas test objective fact to describe to define or reflect;
    Each data cell is made up of one or more data item.
  3. 3. test data consistency detecting method according to claim 1, it is characterised in that quantify the consistent of storage format Property degree includes:
    According to the degree of consistency corresponding relation between predetermined storage format, determine two data objects in storage format On degree of consistency quantized value.
  4. 4. test data consistency detecting method according to claim 3, it is characterised in that the storage format includes: Structured storage, semi-structured storage, unstructured storage;Wherein,
    The structured storage includes:SQLServer、Sybase、MySQL、Oracle;
    The semi-structured storage includes:XML、HTML;
    The unstructured storage includes:Text, document, picture, audio, video.
  5. 5. test data consistency detecting method according to claim 3, it is characterised in that the uniformity journey of quantized values Degree includes:
    The relative deviation δ between corresponding data item is calculated in two data cells, the δ is expressed as:
    Wherein, xikRepresent certain data item in first data cell, xjkRepresent second data cell in first data X in unitikCorresponding data item;
    Average, obtained between two data cells after calculating relative deviation to all data item in first data cell Average deviationIt is describedIt is expressed as:
    Wherein, n represents the number of data item in first data cell, and δ n represent nth data item in first data cell Corresponding relative deviation;
    According to the mapping relations between the average deviation between predetermined data cell and value consistency degree quantized value, Obtain the degree of consistency quantized value of described two data cells numerically.
  6. 6. test data consistency detecting method according to claim 1, it is characterised in that the test data uniformity Evaluation principle includes:Uniformity between uniformity Evaluation principle, data cell and data acquisition system between data cell is judged Uniformity Evaluation principle between principle, data acquisition system.
  7. 7. test data consistency detecting method according to claim 6, it is characterised in that between the data cell Uniformity Evaluation principle includes:The strong consistency between crash consistency, data cell, data cell between data cell it Between weak consistency;Wherein,
    Crash consistency between the data cell, the storage format and numerical value for two data cells of expression are all identical;
    Strong consistency between the data cell, for representing that the storage format of two data cells is different, by storing After format conversion, the numerical value of two data cells is identical;
    Weak consistency between the data cell, for representing that the storage format of two data cells is different, by storing After format conversion, the average deviation between two data cells is in the range of default first threshold.
  8. 8. test data consistency detecting method according to claim 6, it is characterised in that the data cell and data Uniformity Evaluation principle between set includes:Crash consistency, data cell and number between data cell and data acquisition system According to the weak consistency between the strong consistency between set, data cell and data acquisition system;Wherein,
    If data cell is data cell a, data acquisition system is data acquisition system B, and
    Crash consistency between the data cell and data acquisition system, for representing certain in data cell a and data acquisition system B One data cell is all identical in storage format and numerically;
    Strong consistency between the data cell and data acquisition system, for representing data cell a and data acquisition system B storage lattice Formula is different, after being converted by storage format, data cell a and some data cell in data acquisition system B numerically phase Together;
    Weak consistency between the data cell and data acquisition system, for representing data cell a and data acquisition system B storage lattice Formula is different, after being converted by storage format, being averaged between some data cell in data cell a and data acquisition system B Deviation is in the range of default Second Threshold.
  9. 9. test data consistency detecting method according to claim 6, it is characterised in that between the data acquisition system Uniformity Evaluation principle includes:The strong consistency between crash consistency, data acquisition system, data acquisition system between data acquisition system it Between weak consistency;Wherein,
    If data acquisition system includes:Data acquisition system A and data acquisition system B, data acquisition system A={ ai, i=1,2 ..., m;And data set Close B={ bj, j=1,2 ..., n, m<n;
    Crash consistency between the data acquisition system is:Any data unit a in data acquisition system AiIn data acquisition system B all There is corresponding completely the same data cell;
    Strong consistency between the data acquisition system is:Any data unit a in data acquisition system AiIn number in data acquisition system B There is same data cell b in valuej
    Weak consistency between the data acquisition system is that data acquisition system A is identical with data acquisition system B storage formats, in data acquisition system A Any data unit aiIt can be found in permissible level deviation and data cell b that its deviation is minimumj;Or, data acquisition system A and Data acquisition system B storage formats differ, after being changed by storage format, any data unit a in data acquisition system AiCan be Found in permissible level deviation and data cell b that its deviation is minimumj
  10. 10. test data consistency detecting method according to claim 6, it is characterised in that described according to the experiment Data consistency Evaluation principle, according to quantized result, determine between two data cells, the data cell in a data acquisition system The degree of consistency between another data acquisition system, and/or between two datasets conjunction includes:
    According to the uniformity Evaluation principle between the data cell, according to quantized result, determine between two data cells The degree of consistency, wherein, the quantized result is:Storage format degree of consistency quantized value and numerical value degree of consistency quantized value The degree of consistency vector of composition;And/or
    According to the uniformity Evaluation principle between the data cell and data acquisition system, according to quantized result, a data are determined The degree of consistency between data cell and another data acquisition system in set;And/or
    According to the uniformity Evaluation principle between the data acquisition system, according to quantized result, between determining that two datasets are closed The degree of consistency.
CN201710975998.0A 2017-10-19 2017-10-19 Test data consistency detection method Active CN107807972B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710975998.0A CN107807972B (en) 2017-10-19 2017-10-19 Test data consistency detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710975998.0A CN107807972B (en) 2017-10-19 2017-10-19 Test data consistency detection method

Publications (2)

Publication Number Publication Date
CN107807972A true CN107807972A (en) 2018-03-16
CN107807972B CN107807972B (en) 2020-12-22

Family

ID=61585130

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710975998.0A Active CN107807972B (en) 2017-10-19 2017-10-19 Test data consistency detection method

Country Status (1)

Country Link
CN (1) CN107807972B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012130489A1 (en) * 2011-04-01 2012-10-04 Siemens Aktiengesellschaft Method, system, and computer program product for maintaining data consistency between two databases
CN103559330A (en) * 2013-10-10 2014-02-05 上海华为技术有限公司 Method and system for detecting data consistency
CN104252664A (en) * 2014-09-16 2014-12-31 国家海洋信息中心 Ocean environment monitoring data reporting verification realization method and device
CN106092137A (en) * 2016-06-06 2016-11-09 长安大学 The outdoor calibrator (-ter) unit of a kind of vehicle-mounted three-dimensional laser pavement detection system and method
CN106294294A (en) * 2016-08-03 2017-01-04 上海自仪泰雷兹交通自动化系统有限公司 The consistency desired result method of rail traffic signal system consolidation form data file
CN106469195A (en) * 2016-08-31 2017-03-01 国信优易数据有限公司 Based on conforming data file Valuation Method and system
CN106469395A (en) * 2016-08-31 2017-03-01 国信优易数据有限公司 A kind of data commodity dynamic comprehensive appraisal procedure and system
CN106874483A (en) * 2017-02-20 2017-06-20 山东鲁能软件技术有限公司 A kind of device and method of the patterned quality of data evaluation and test based on big data technology

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012130489A1 (en) * 2011-04-01 2012-10-04 Siemens Aktiengesellschaft Method, system, and computer program product for maintaining data consistency between two databases
CN103559330A (en) * 2013-10-10 2014-02-05 上海华为技术有限公司 Method and system for detecting data consistency
CN104252664A (en) * 2014-09-16 2014-12-31 国家海洋信息中心 Ocean environment monitoring data reporting verification realization method and device
CN106092137A (en) * 2016-06-06 2016-11-09 长安大学 The outdoor calibrator (-ter) unit of a kind of vehicle-mounted three-dimensional laser pavement detection system and method
CN106294294A (en) * 2016-08-03 2017-01-04 上海自仪泰雷兹交通自动化系统有限公司 The consistency desired result method of rail traffic signal system consolidation form data file
CN106469195A (en) * 2016-08-31 2017-03-01 国信优易数据有限公司 Based on conforming data file Valuation Method and system
CN106469395A (en) * 2016-08-31 2017-03-01 国信优易数据有限公司 A kind of data commodity dynamic comprehensive appraisal procedure and system
CN106874483A (en) * 2017-02-20 2017-06-20 山东鲁能软件技术有限公司 A kind of device and method of the patterned quality of data evaluation and test based on big data technology

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
余伟等: "Web大数据环境下的不一致跨源数据发现", 《计算机研究与发展》 *

Also Published As

Publication number Publication date
CN107807972B (en) 2020-12-22

Similar Documents

Publication Publication Date Title
Feng Intercoder reliability indices: disuse, misuse, and abuse
Sileshi A critical review of forest biomass estimation models, common mistakes and corrective measures
Allen et al. Quantifying uncertainty in high-resolution coupled hydrodynamic-ecosystem models
Governato et al. Properties of galaxy clusters: mass and correlation functions
Yin et al. Joint inference about sensitivity and specificity at the optimal cut-off point associated with Youden index
Lomax Introduction to structural equation modeling
Irpino et al. Linear regression for numeric symbolic variables: a least squares approach based on Wasserstein Distance
Kuutma et al. Properties of brightest group galaxies in cosmic web filaments
CN115047064A (en) Pipeline defect quantification method, processor and pipeline defect quantification device
Imron et al. Application of data mining classification method for student graduation prediction using K-nearest neighbor (K-NN) algorithm
CN107807972A (en) A kind of test data consistency detecting method
Daras et al. Systemic geopolitical modeling. Part 1: prediction of geopolitical events
Wang et al. A bootstrap semiparametric homogeneity test for the distributions of multigroup proportional data, with applications to analysis of quality of life outcomes in clinical trials
CN110163487A (en) The paper citation impact power standardized method that non-subject relies on
Jayasinghe et al. Evaluating non-deterministic retrieval systems
Wei Analysis on the quantitative evaluation method of university students’ comprehensive ability
Arieska et al. Margin Of Error Between Simple Random Sampling And Stratified Sampling
Xie et al. The quality assessment and sampling model for the geological spatial data in China
Liu Data Analytics Models and Methods for Fault Identification and Prognosis in Mechanical Structures and Manufacturing Processes
Ashton et al. Calibrating gravitational-wave search algorithms with conformal prediction
Chen et al. Study on Safety Evaluation of Freeway Tunnel Operation Based on the Grey Correlation Method and IAHP
Yang et al. General evaluation model for complex environment system
CN105718467A (en) Method and system for evaluating and recommending retrieval algorithms
Radicchi et al. Why Sirtes's claims (Sirtes, 2012) do not square with reality.
Kang et al. A Study on the Development of Global Competitiveness Index for Local Governments in Korea

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant