CN102609418B

CN102609418B - Data quality grade judging method

Info

Publication number: CN102609418B
Application number: CN201110023938.1A
Authority: CN
Inventors: 齐志英
Original assignee: BEIJINGDUXIU TECHNOLOGY Co Ltd
Current assignee: BEIJINGDUXIU TECHNOLOGY Co Ltd
Priority date: 2011-01-21
Filing date: 2011-01-21
Publication date: 2015-02-04
Anticipated expiration: 2031-01-21
Also published as: CN102609418A

Abstract

The invention provides a data quality grade judging method, which comprises the following steps: transmitting an obtained objective data group to a server; extracting a characteristic vector of each objective data in the objective data group and performing the character conversion on the characteristic vector so as to obtain character set data corresponding to each characteristic vector; examining the character set data according to a grade standard stored in the server so as to obtain a quality grade code of each characteristic vector; and feeding back each objective data and the corresponding quality grade code to a user. According to the method, the quality of the objective data group can be examined by the grade standard stored in the server, so that the quality grade code corresponding to the objective data group is obtained and is fed back to the user; the method is rapid in examination speed and high in accuracy; and the user can carry out a corresponding processing according to each objective data and the corresponding quality grade code.

Description

Data quality grade determination methods

Technical field

What the present invention relates to is a kind of determination methods of the quality of data, particularly relates to a kind of configurable data quality grade determination methods based on algorithm.

Background technology

Along with the development of infotech; various information emerges in multitude; in practical application, user carries out in the retrieval of data message or the process of typing; often can find that the attribute of some data messages is imperfect; or containing error message in some data message; as the nonstandard expression etc. of the mistake of phonetically similar word/nearly sound word, the likeness in form different mistake of meaning and some special symbols; these Incomplete informations and error message can cause puzzlement to user; the demand of the person that cannot meet acquisition of information; and the accuracy of retrieval or typing is reduced, thus affect the use of user.This just makes the judgement retrieving information of input or the data message of typing being carried out to data message quality seem particularly important, namely Quality estimation is carried out to retrieving information or entry information, obtain the quality scale coding of the quality of data, can the autotelic data to different quality grade encoding to modify targetedly and perfect to make user.

In view of above-mentioned defect, creator of the present invention obtains this creation finally through long research and practice.

Summary of the invention

The object of the invention is to, for Problems existing in data information retrieval and Input Process, provide a kind of data quality grade determination methods.

For achieving the above object, the technical solution used in the present invention is, provide a kind of data quality grade determination methods, the method comprises the following steps:

The target data group of acquisition is sent to server;

The proper vector of each target data in target data group described in extraction, and described proper vector is carried out character conversion, obtain the character set data that each described proper vector is corresponding;

Each described character set data is tested according to the class criteria stored in described server, obtains the quality scale coding of each described proper vector; And

The quality scale coding of each described target data and correspondence thereof is returned user.

Wherein, described proper vector is the attribute information of described target data; If the value of certain attribute information is empty, then illustrate that the target data that this proper vector is tackled is imperfect.

Wherein, described class criteria is the structuring character set judging target data group quality scale, the class criteria that dissimilar proper vector is corresponding different, as represented the corresponding time class criteria of the proper vector of time, represent the corresponding title class criteria of proper vector etc. of title, but not as limit.

During enforcement, the present invention is further comprising the steps of: the class criteria of specifying inspection according to the proper vector of each described target data group in advance, to improve speed and the accuracy of data quality grade inspection.

During enforcement, the present invention is further comprising the steps of: if not according to the proper vector specified level standard in advance of each described target data group, then be defaulted as default setting, the maximum class criteria of the automatic range of choice of described server is tested, to improve the versatility of character set data inspection.

During enforcement, the size of the similarity that described quality scale coding mates with described class criteria according to each described character set data arranges.

Beneficial effect of the present invention: the present invention can be tested to the quality of target data group by the class criteria stored in server, obtains the quality scale coding corresponding to target data group, and returns to user; Its inspection speed is fast, and accuracy is high; User can process accordingly according to the quality scale coding of each target data and correspondence thereof.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of data quality grade determination methods of the present invention;

Fig. 2 is the process flow diagram of the first embodiment of data quality grade determination methods of the present invention;

Fig. 3 is the process flow diagram of the second embodiment of data quality grade determination methods of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.

Data quality grade determination methods of the present invention is given target data group, the data area of different quality rank is oriented according to proper vector coupling, then can position renewal to the data of different quality rank, thus provide basis for the optimization process of late time data information.

Refer to Fig. 1, the invention provides a kind of data quality grade determination methods, the method comprises the following steps:

The target data group of acquisition is sent to server; Wherein, described target data group refers to the data needing to carry out rank judgement;

Refer to shown in Fig. 2, its first embodiment provided when implementing for data quality grade determination methods of the present invention, it comprises the following steps:

S101: the target data group of acquisition is sent to server; Wherein, described target data group refers to the data needing to carry out rank judgement;

S102: the proper vector of each target data in the target data group described in extraction, and described proper vector is carried out character conversion, obtain the character set data that each described proper vector is corresponding; Wherein, described proper vector is the attribute information of described target data; If the value of certain attribute information is empty, then illustrate that the target data that this proper vector is tackled is imperfect;

S103: the class criteria of specifying inspection according to the proper vector of each described target data group in advance, to improve speed and the accuracy of data quality grade inspection;

S104: tested according to the class criteria stored in described server by each described character set data, obtains the quality scale coding of each described proper vector; Wherein, described class criteria is the structuring character set judging target data group quality scale, the class criteria that dissimilar proper vector is corresponding different, as represented the corresponding time class criteria of the proper vector of time, represent the corresponding title class criteria of proper vector etc. of title, but not as limit; And

S105: the quality scale coding of each described target data and correspondence thereof is returned user.

Preferably, step S103 can perform before step S101 or S102.

The present embodiment is when implementing, in the process of described character set data being tested according to class criteria corresponding to described proper vector, can first split character set data, fractionation mode can adopt character is split, split by blank character and character number or with the character of described dictionary of information for standard splits, but fractionation mode is not limited thereto, to improve the inspection speed of described character set data.

The present embodiment is when implementing, in the process that described character set data is tested, one by one each described character set data is mated according to the class criteria of its correspondence, if mate completely, namely similarity is 100%, then show that the quality scale of the target data corresponding to this character set data is the highest, be arranged in order, obtain the quality scale coding of the target data group corresponding to described character set data.

Refer to Fig. 3, the second embodiment that the present invention provides when implementing comprises the following steps:

S201: the target data group of acquisition is sent to server; Wherein, described target data group refers to the data needing to carry out rank judgement;

S202: the proper vector of each target data in the target data group described in extraction, and described proper vector is carried out character conversion, obtain the character set data that each described proper vector is corresponding; Wherein, described proper vector is the attribute information of described target data; If the value of certain attribute information is empty, then illustrate that the target data that this proper vector is tackled is imperfect;

S203: if not according to the proper vector specified level standard in advance of each described target data group, be then defaulted as default setting, the maximum class criteria of the automatic range of choice of described server is tested;

S204: tested according to the class criteria stored in described server by each described character set data, obtains the quality scale coding of each described proper vector; Wherein, described class criteria is the structuring character set judging target data group quality scale, the class criteria that dissimilar proper vector is corresponding different, as represented the corresponding time class criteria of the proper vector of time, represent the corresponding title class criteria of proper vector etc. of title, but not as limit; And

S205: the quality scale coding of each described target data and correspondence thereof is returned user.

Preferably, step S203 can perform before step S201 or S202.

In the process that described character set data is tested of the present invention and embodiment, one by one each described character set data is mated according to the class criteria of its correspondence, if mate completely, namely similarity is 100%, then show that the quality scale of the target data corresponding to this character set data is the highest, be arranged in order, obtain the quality scale coding of the target data group corresponding to described character set data.

Result of the present invention returns to user, to facilitate user to process described target data group according to quality scale coding, comprises and retains prime information or carry out data correction etc. to the target data group of low quality level coding.

The present invention and embodiment, can by generating algorithm configuration files such as proper vector, rank numbering, class criteria, algorithm statements when implementing, so that master routine reads this algorithm configuration file with realization to the control of data and process, and generator program control data; Wherein, proper vector refers to the attribute of target data group; Rank numbering refers to target data group, the mark of the good bad degree of the quality of data; Class criteria refers to the structuring character set judging target data group quality scale, the class criteria that different characteristic vector is corresponding different;

Secondly algorithm statement comprises: the class criteria first matching target data group correspondence according to proper vector, carries out quality inspection with class criteria to target data group, and finally, the target data returning different stage is trooped conjunction.

Wherein, program control data refers to the data through program construction process, comprising: proper vector, class criteria, algorithm handle.

Such as: if target data group is present in tables of data, attribute can be regarded as field name, i.e. proper vector; Return set and be equivalent to the good and bad extent and scope of the corresponding result set of catching respectively;

Attribute 1| attribute 2| attribute 3|

Rank 1| rank 2| rank 3|

Second step: read algorithm configuration file and form program control data;

Last it is noted that above embodiment is only in order to illustrate technical scheme of the present invention, but not be limited; Those of ordinary skill in the art can modify to the technical scheme of invention, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of embodiment of the present invention technical scheme.Like this, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention also comprises these change and modification.

Claims

1. a data quality grade determination methods, is characterized in that, comprises the following steps:

The target data group of acquisition is sent to server;

The proper vector of each target data in target data group described in extraction, and described proper vector is carried out character conversion, obtain the character set data that each described proper vector is corresponding; The class criteria of inspection is specified in advance according to the proper vector of each described target data group, if not according to the proper vector specified level standard in advance of each described target data group, then be defaulted as default setting, the maximum class criteria of the automatic range of choice of described server is tested;

Each described character set data is tested according to the class criteria stored in described server, obtains the quality scale coding of each described proper vector;

And the quality scale coding of each described target data and correspondence thereof is returned user;

Testing in process to described data, first character set data is split, fractionation mode comprise character is split, split by blank character and character number or with the character of dictionary of information for standard splits.

2. data quality grade determination methods according to claim 1, is characterized in that, the size of the similarity that described quality scale coding mates with described class criteria according to each described character set data arranges.