CN102609418A

CN102609418A - Data quality grade judging method

Info

Publication number: CN102609418A
Application number: CN2011100239381A
Authority: CN
Inventors: 齐志英
Original assignee: BEIJINGDUXIU TECHNOLOGY Co Ltd
Current assignee: BEIJINGDUXIU TECHNOLOGY Co Ltd
Priority date: 2011-01-21
Filing date: 2011-01-21
Publication date: 2012-07-25
Anticipated expiration: 2031-01-21
Also published as: CN102609418B

Abstract

The invention provides a data quality grade judging method, which comprises the following steps: transmitting an obtained objective data group to a server; extracting a characteristic vector of each objective data in the objective data group and performing the character conversion on the characteristic vector so as to obtain character set data corresponding to each characteristic vector; examining the character set data according to a grade standard stored in the server so as to obtain a quality grade code of each characteristic vector; and feeding back each objective data and the corresponding quality grade code to a user. According to the method, the quality of the objective data group can be examined by the grade standard stored in the server, so that the quality grade code corresponding to the objective data group is obtained and is fed back to the user; the method is rapid in examination speed and high in accuracy; and the user can carry out a corresponding processing according to each objective data and the corresponding quality grade code.

Description

Quality of data rank determination methods

Technical field

What the present invention relates to is a kind of determination methods of the quality of data, relates in particular to a kind of configurable quality of data rank determination methods based on algorithm.

Background technology

Continuous development along with infotech; The various information emerge in multitude, the user carries out finding that through regular meeting the attribute of some data messages is imperfect in the process of retrieval or typing of data message in the practical application; Or contain error message in some data message; Like the expression lack of standardization of different wrong and some special symbols of mistake, the likeness in form meaning of phonetically similar word/nearly sound word etc., these imperfect information and error message meeting cause puzzlement to the user, can't satisfy the demand that information obtains taker; And the accuracy of retrieval or typing is reduced, thereby influence user's use.This seems particularly important with regard to the judgement that makes data message to the retrieving information of input or typing carry out the data message quality; Promptly retrieving information or entry information are carried out the quality judgement; Obtain the quality scale coding of the quality of data, so that the user can autotelic data to the different quality grade encoding made amendment targetedly and perfect.

In view of above-mentioned defective, creator of the present invention is through research and practice have obtained this creation finally for a long time.

Summary of the invention

The objective of the invention is to,, a kind of quality of data rank determination methods is provided to the problem that exists in data information retrieval and the typing process.

For realizing above-mentioned purpose, the technical scheme that the present invention adopts is that a kind of quality of data rank determination methods is provided, and this method may further comprise the steps:

The target data crowd who obtains is sent to server;

Extract the proper vector of each target data among the described target data crowd, and described proper vector is carried out character conversion, obtain each described proper vector corresponding characters collection data;

Each described character set data is tested according to the rank standard of storing in the described server, obtain the quality scale coding of each described proper vector; And

Each described target data and corresponding quality scale coding thereof are returned the user.

Wherein, described proper vector is the attribute information of described target data; If the value of certain attribute information be a sky, explain that then the target data that this proper vector tackles is imperfect.

Wherein, Described rank standard is to judge the structuring character set of target data crowd quality scale; The rank standard that dissimilar proper vectors is corresponding different; Like the corresponding time rank standard of the proper vector of express time, the corresponding title rank of the proper vector standard of expression title etc., but not as limit.

During enforcement, the present invention is further comprising the steps of: specify the rank standard of check in advance according to each described target data crowd's proper vector, to improve the speed and the accuracy of quality of data level checking.

During enforcement; The present invention is further comprising the steps of: if not according to each described target data crowd's proper vector specified level standard in advance; Then be defaulted as default setting; The maximum rank standard of the automatic range of choice of described server is tested, to improve the versatility of character set data check.

During enforcement, described quality scale coding is arranged according to the size of the similarity of each described character set data and described rank matches criteria.

Beneficial effect of the present invention: the present invention can test to target data crowd's quality through the rank standard of storing in the server, obtains the pairing quality scale coding of target data crowd, and returns to the user; Its check speed is fast, and accuracy is high; The user can handle according to each target data and corresponding quality scale coding thereof accordingly.

Description of drawings

Fig. 1 is the process flow diagram of quality of data rank determination methods of the present invention;

Fig. 2 is the process flow diagram of first embodiment of quality of data rank determination methods of the present invention;

Fig. 3 is the process flow diagram of second embodiment of quality of data rank determination methods of the present invention.

Embodiment

To combine the accompanying drawing in the embodiment of the invention below, the technical scheme in the embodiment of the invention is carried out clear, intactly description, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.

Quality of data rank determination methods of the present invention is given target data crowd; Go out other data area of different quality level according to the proper vector Matching Location; Can position renewal to other data of different quality level then, thus for the optimization process of late time data information provide the basis.

See also Fig. 1, the invention provides a kind of quality of data rank determination methods, this method may further comprise the steps:

The target data crowd who obtains is sent to server; Wherein, described target data crowd is meant and need carries out the data that rank is judged;

See also shown in Figure 2ly, it is first embodiment that quality of data rank determination methods of the present invention provides when implementing, and it may further comprise the steps:

S101: the target data crowd who obtains is sent to server; Wherein, described target data crowd is meant and need carries out the data that rank is judged;

S102: extract the proper vector of each target data among the described target data crowd, and described proper vector is carried out character conversion, obtain each described proper vector corresponding characters collection data; Wherein, described proper vector is the attribute information of described target data; If the value of certain attribute information be a sky, explain that then the target data that this proper vector tackles is imperfect;

S103: specify the rank standard of check in advance according to each described target data crowd's proper vector, to improve the speed and the accuracy of quality of data level checking;

S104: each described character set data is tested according to the rank standard of storing in the described server, obtain the quality scale coding of each described proper vector; Wherein, Described rank standard is to judge the structuring character set of target data crowd quality scale; The rank standard that dissimilar proper vectors is corresponding different; Like the corresponding time rank standard of the proper vector of express time, the corresponding title rank of the proper vector standard of expression title etc., but not as limit; And

S105: each described target data and corresponding quality scale coding thereof are returned the user.

Preferably, step S103 can carry out before step S101 or S102.

Present embodiment is when implementing; In the process of described character set data being tested according to described proper vector corresponding grade standard; Can split character set data earlier; The fractionation mode can adopt character is split, splits or be that standard splits with the character of described dictionary of information by blank character and character number, but the fractionation mode is not limited thereto, to improve the check speed of described character set data.

Present embodiment is when implementing; In the process that described character set data is tested, one by one each described character set data is mated according to its corresponding grade standard, if coupling fully; Be that similarity is 100%; The quality scale that then shows the pairing target data of this character set data is the highest, is arranged in order, and obtains the pairing target data crowd's of described character set data quality scale coding.

See also Fig. 3, second embodiment that the present invention provides when implementing may further comprise the steps:

S201: the target data crowd who obtains is sent to server; Wherein, described target data crowd is meant and need carries out the data that rank is judged;

S202: extract the proper vector of each target data among the described target data crowd, and described proper vector is carried out character conversion, obtain each described proper vector corresponding characters collection data; Wherein, described proper vector is the attribute information of described target data; If the value of certain attribute information be a sky, explain that then the target data that this proper vector tackles is imperfect;

S203: if not according to each described target data crowd's proper vector specified level standard in advance, then be defaulted as default setting, the maximum rank standard of the automatic range of choice of described server is tested;

S204: each described character set data is tested according to the rank standard of storing in the described server, obtain the quality scale coding of each described proper vector; Wherein, Described rank standard is to judge the structuring character set of target data crowd quality scale; The rank standard that dissimilar proper vectors is corresponding different; Like the corresponding time rank standard of the proper vector of express time, the corresponding title rank of the proper vector standard of expression title etc., but not as limit; And

S205: each described target data and corresponding quality scale coding thereof are returned the user.

Preferably, step S203 can carry out before step S201 or S202.

In the process that described character set data is tested of the present invention and embodiment; One by one each described character set data is mated according to its corresponding grade standard; If mate fully, promptly similarity is 100%, shows that then the quality scale of the pairing target data of this character set data is the highest; Be arranged in order, obtain the pairing target data crowd's of described character set data quality scale coding.

Result of the present invention returns to the user, described target data crowd is handled according to the quality scale coding to make things convenient for the user, comprises that the target data crowd who keeps prime information or quality level is encoded carries out data correction etc.

The present invention and embodiment can be with generating algorithm configuration files such as proper vector, rank numbering, rank standard, algorithm statements, so that master routine reads this algorithm configuration file to realize control and the processing to data, generator program control data when implementing; Wherein, proper vector is meant target data crowd's attribute; The rank numbering is meant target data crowd, the sign of the good bad degree of the quality of data; The rank standard is meant the structuring character set of judging target data crowd quality scale, the corresponding different rank standard of different characteristic vector;

Algorithm statement comprises: at first match target data crowd corresponding grade standard according to proper vector, secondly, with the rank standard target data crowd is carried out quality inspection, at last, the target data of returning different stage is trooped and is closed.

Wherein, program control data is meant the data of handling through program construction, comprising: proper vector, rank standard, algorithm handle.

For example: if the target data crowd is present in the tables of data, attribute can be regarded as field title, i.e. proper vector; Return set and be equivalent to the good and bad extent and scope of the corresponding result set of catching respectively;

Attribute 1| attribute 2| attribute 3|

Rank 1| rank 2| rank 3|

Second step: read the algorithm configuration file and form program control data;

What should explain at last is: above embodiment is only in order to explaining technical scheme of the present invention, but not limits it; Those of ordinary skill in the art can make amendment to the technical scheme of invention, perhaps part technical characterictic wherein is equal to replacement; And these are revised or replacement, do not make the spirit and the scope of the essence disengaging embodiment of the invention technical scheme of relevant art scheme.Like this, belong within the scope of claim of the present invention and equivalent technologies thereof if of the present invention these are revised with modification, then the present invention also comprises these changes and modification interior.

Claims

1. a quality of data rank determination methods is characterized in that, may further comprise the steps:

The target data crowd who obtains is sent to server;

2. quality of data rank determination methods according to claim 1 is characterized in that, and is further comprising the steps of: the rank standard of specifying check according to each described target data crowd's proper vector in advance.

3. quality of data rank determination methods according to claim 1; It is characterized in that; Further comprising the steps of: if not according to each described target data crowd's proper vector specified level standard in advance; Then be defaulted as default setting, the maximum rank standard of the automatic range of choice of described server is tested.

4. according to claim 2 or 3 described quality of data rank determination methods, it is characterized in that described quality scale coding is arranged according to the size of the similarity of each described character set data and described rank matches criteria.